国产欧美日韩视频一区二区,日韩国产一区二区三区在线

We propose a new method for estimating the minimizer $\boldsymbol{x}^*$ and the minimum value $f^*$ of a smooth and strongly convex regression function $f$ from the observations contaminated by random noise. Our estimator $\boldsymbol{z}_n$ of the minimizer $\boldsymbol{x}^*$ is based on a version of the projected gradient descent with the gradient estimated by a regularized local polynomial algorithm. Next, we propose a two-stage procedure for estimation of the minimum value $f^*$ of regression function $f$. At the first stage, we construct an accurate enough estimator of $\boldsymbol{x}^*$, which can be, for example, $\boldsymbol{z}_n$. At the second stage, we estimate the function value at the point obtained in the first stage using a rate optimal nonparametric procedure. We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $\boldsymbol{z}_n$, and for the risk of estimating $f^*$. We establish minimax lower bounds showing that, under certain choice of parameters, the proposed algorithms achieve the minimax optimal rates of convergence on the class of smooth and strongly convex functions.

相關內容

估計/估計量

關注 3

MoDELS · 推斷 · 似然 · 正則的 · 貝葉斯推斷 ·

2023 年 1 月 31 日

On the Stability of General Bayesian Inference

Jack Jewson,Jim Q. Smith,Chris Holmes

from arxiv, 24 pages, 7 figures

We study the stability of posterior predictive inferences to the specification of the likelihood model and perturbations of the data generating process. In modern big data analyses, the decision-maker may elicit useful broad structural judgements but a level of interpolation is required to arrive at a likelihood model. One model, often a computationally convenient canonical form, is chosen, when many alternatives would have been equally consistent with the elicited judgements. Equally, observational datasets often contain unforeseen heterogeneities and recording errors. Acknowledging such imprecisions, a faithful Bayesian analysis should be stable across reasonable equivalence classes for these inputs. We show that traditional Bayesian updating provides stability across a very strict class of likelihood models and DGPs, while a generalised Bayesian alternative using the beta-divergence loss function is shown to be stable across practical and interpretable neighbourhoods. We illustrate this in linear regression, binary classification, and mixture modelling examples, showing that stable updating does not compromise the ability to learn about the DGP. These stability results provide a compelling justification for using generalised Bayes to facilitate inference under simplified canonical models.

平滑 · 線性的 · 各向同性 · 樣本 · 類別 ·

2023 年 1 月 31 日

Sampling numbers of smoothness classes via $\ell^1$-minimization

Thomas Jahn,Tino Ullrich,Felix Voigtlaender

Using techniques developed recently in the field of compressed sensing we prove new upper bounds for general (non-linear) sampling numbers of (quasi-)Banach smoothness spaces in $L^2$. In relevant cases such as mixed and isotropic weighted Wiener classes or Sobolev spaces with mixed smoothness, sampling numbers in $L^2$ can be upper bounded by best $n$-term trigonometric widths in $L^\infty$. We describe a recovery procedure based on $\ell^1$-minimization (basis pursuit denoising) using only $m$ function values. With this method, a significant gain in the rate of convergence compared to recently developed linear recovery methods is achieved. In this deterministic worst-case setting we see an additional speed-up of $n^{-1/2}$ compared to linear methods in case of weighted Wiener spaces. For their quasi-Banach counterparts even arbitrary polynomial speed-up is possible. Surprisingly, our approach allows to recover mixed smoothness Sobolev functions belonging to $S^r_pW(\mathbb{T}^d)$ on the $d$-torus with a logarithmically better rate of convergence than any linear method can achieve when $1 < p < 2$ and $d$ is large. This effect is not present for isotropic Sobolev spaces.

統計量 · 估計/估計量 · TD · Learning · 價值函數 ·

2023 年 1 月 30 日

On the Statistical Benefits of Temporal Difference Learning

David Cheikhi,Daniel Russo

from arxiv, 26 pages, 7 figures, submitted to ICML 2023

Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions that minimize prediction error on the training data. Temporal difference learning (TD) methods instead fit value functions by minimizing the degree of temporal inconsistency between estimates made at successive time-steps. Focusing on finite state Markov chains, we provide a crisp asymptotic theory of the statistical advantages of this approach. First, we show that an intuitive inverse trajectory pooling coefficient completely characterizes the percent reduction in mean-squared error of value estimates. Depending on problem structure, the reduction could be enormous or nonexistent. Next, we prove that there can be dramatic improvements in estimates of the difference in value-to-go for two states: TD's errors are bounded in terms of a novel measure - the problem's trajectory crossing time - which can be much smaller than the problem's time horizon.

極小點 · 方陣 · CASE · 情景 · dynamic programming ·

2023 年 1 月 30 日

Faster Algorithm for Minimum Ply Covering of Points with Unit Squares

Siddhartha Sarkar

from arxiv, 29 pages

Biedl et al. introduced the minimum ply cover problem in CG 2021 following the seminal work of Erlebach and van Leeuwen in SODA 2008. They showed that determining the minimum ply cover number for a given set of points by a given set of axis-parallel unit squares is NP-hard, and gave a polynomial time $2$-approximation algorithm for instances in which the minimum ply cover number is bounded by a constant. Durocher et al. recently presented a polynomial time $(8 + \epsilon)$-approximation algorithm for the general case when the minimum ply cover number is $\omega(1)$, for every fixed $\epsilon > 0$. They divide the problem into subproblems by using a standard grid decomposition technique. They have designed an involved dynamic programming scheme to solve the subproblem where each subproblem is defined by a unit side length square gridcell. Then they merge the solutions of the subproblems to obtain the final ply cover. We use a horizontal slab decomposition technique to divide the problem into subproblems. Our algorithm uses a simple greedy heuristic to obtain a $(27+\epsilon)$-approximation algorithm for the general problem, for a small constant $\epsilon>0$. Our algorithm runs considerably faster than the algorithm of Durocher et al. We also give a fast $2$-approximation algorithm for the special case where the input squares are intersected by a horizontal line. The hardness of this special case is still open. Our algorithm is potentially extendable to minimum ply covering with other geometric objects such as unit disks, identical rectangles etc.

估計/估計量 · 泛函 · Minimax · Performer · Integration ·

2023 年 1 月 30 日

Efficient functional estimation and the super-oracle phenomenon

Thomas B. Berrett,Richard J. Samworth

from arxiv, 76 pages

We consider the estimation of two-sample integral functionals, of the type that occur naturally, for example, when the object of interest is a divergence between unknown probability densities. Our first main result is that, in wide generality, a weighted nearest neighbour estimator is efficient, in the sense of achieving the local asymptotic minimax lower bound. Moreover, we also prove a corresponding central limit theorem, which facilitates the construction of asymptotically valid confidence intervals for the functional, having asymptotically minimal width. One interesting consequence of our results is the discovery that, for certain functionals, the worst-case performance of our estimator may improve on that of the natural `oracle' estimator, which is given access to the values of the unknown densities at the observations.

估計/估計量 · 最大平均偏差 · 推斷 · 均值 · Extensibility ·

2023 年 1 月 30 日

Optimally-Weighted Estimators of the Maximum Mean Discrepancy for Likelihood-Free Inference

Ayush Bharti,Masha Naslidnyk,Oscar Key,Samuel Kaski,Fran?ois-Xavier Briol

Likelihood-free inference methods typically make use of a distance between simulated and real data. A common example is the maximum mean discrepancy (MMD), which has previously been used for approximate Bayesian computation, minimum distance estimation, generalised Bayesian inference, and within the nonparametric learning framework. The MMD is commonly estimated at a root-$m$ rate, where $m$ is the number of simulated samples. This can lead to significant computational challenges since a large $m$ is required to obtain an accurate estimate, which is crucial for parameter estimation. In this paper, we propose a novel estimator for the MMD with significantly improved sample complexity. The estimator is particularly well suited for computationally expensive smooth simulators with low- to mid-dimensional inputs. This claim is supported through both theoretical results and an extensive simulation study on benchmark simulators.

Markov · 優化器 · Processing（編程語言） · Learning · 值迭代 ·

2023 年 1 月 30 日

Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents

Wenhao Xu,Xuefeng Gao,Xuedong He

The optimized certainty equivalent (OCE) is a family of risk measures that cover important examples such as entropic risk, conditional value-at-risk and mean-variance models. In this paper, we propose a new episodic risk-sensitive reinforcement learning formulation based on tabular Markov decision processes with recursive OCEs. We design an efficient learning algorithm for this problem based on value iteration and upper confidence bound. We derive an upper bound on the regret of the proposed algorithm, and also establish a minimax lower bound. Our bounds show that the regret rate achieved by our proposed algorithm has optimal dependence on the number of episodes and the number of actions.

損失函數（機器學習） · 損失 · 泛函 · binary · 指數損失 ·

2023 年 1 月 27 日

An Analysis of Loss Functions for Binary Classification and Regression

Jeffrey Buzas

This paper explores connections between margin-based loss functions and consistency in binary classification and regression applications. It is shown that a large class of margin-based loss functions for binary classification/regression result in estimating scores equivalent to log-likelihood scores weighted by an even function. A simple characterization for conformable (consistent) loss functions is given, which allows for straightforward comparison of different losses, including exponential loss, logistic loss, and others. The characterization is used to construct a new Huber-type loss function for the logistic model. A simple relation between the margin and standardized logistic regression residuals is derived, demonstrating that all margin-based loss can be viewed as loss functions of squared standardized logistic regression residuals. The relation provides new, straightforward interpretations for exponential and logistic loss, and aids in understanding why exponential loss is sensitive to outliers. In particular, it is shown that minimizing empirical exponential loss is equivalent to minimizing the sum of squared standardized logistic regression residuals. The relation also provides new insight into the AdaBoost algorithm.

推斷 · MoDELS · 損失函數（機器學習） · 損失 · 貝葉斯推斷 ·

2023 年 1 月 27 日

Bayesian inference using loss functions

Yu Luo,David A. Stephens,Daniel J. Graham,Emma J. McCoy

In the usual Bayesian setting, a full probabilistic model is required to link the data and parameters, and the form of this model and the inference and prediction mechanisms are specified via de Finetti's representation. In general, such a formulation is not robust to model mis-specification of its component parts. An alternative approach is to draw inference based on loss functions, where the quantity of interest is defined as a minimizer of some expected loss, and to construct posterior distributions based on the loss-based formulation; this strategy underpins the construction of the Gibbs posterior. We develop a Bayesian non-parametric approach; specifically, we generalize the Bayesian bootstrap, and specify a Dirichlet process model for the distribution of the observables. We implement this using direct prior-to-posterior calculations, but also using predictive sampling. We also study the assessment of posterior validity for non-standard Bayesian calculations. We provide a computationally efficient way to calibrate the scaling parameter in the Gibbs posterior so that it can achieve the desired coverage rate. We show that the developed non-standard Bayesian updating procedures yield valid posterior distributions in terms of consistency and asymptotic normality under model mis-specification. Simulation studies show that the proposed methods can recover the true value of the parameter efficiently and achieve frequentist coverage even when the sample size is small. Finally, we apply our methods to evaluate the causal impact of speed cameras on traffic collisions in England.

樣本 · 類別 · 損失 · Performer · SimPLe ·

2019 年 1 月 16 日

Class-Balanced Loss Based on Effective Number of Samples

Yin Cui,Menglin Jia,Tsung-Yi Lin,Yang Song,Serge Belongie

from arxiv, Code is available at: //github.com/richardaecn/class-balanced-loss

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.