亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We propose a novel active learning strategy for regression, which is model-agnostic, robust against model mismatch, and interpretable. Assuming that a small number of initial samples are available, we derive the optimal training density that minimizes the generalization error of local polynomial smoothing (LPS) with its kernel bandwidth tuned locally: We adopt the mean integrated squared error (MISE) as a generalization criterion, and use the asymptotic behavior of the MISE as well as thelocally optimal bandwidths (LOB) -- the bandwidth function that minimizes MISE in the asymptotic limit. The asymptotic expression of our objective then reveals the dependence of the MISE on the training density, enabling analytic minimization. As a result, we obtain the optimal training density in a closed-form. The almost model-free nature of our approach should encode raw properties of the target problem, and thus provide a robust and model-agnostic active learning strategy. Furthermore, the obtained training density factorizes the influence of local function complexity, noise leveland test density in a transparent and interpretable way. We validate our theory in numerical simulations, and show that the proposed active learning method outperforms the existing state-of-the-art model-agnostic approaches.

相關內容

We consider the nonparametric estimation of an S-shaped regression function. The least squares estimator provides a very natural, tuning-free approach, but results in a non-convex optimisation problem, since the inflection point is unknown. We show that the estimator may nevertheless be regarded as a projection onto a finite union of convex cones, which allows us to propose a mixed primal-dual bases algorithm for its efficient, sequential computation. After developing a projection framework that demonstrates the consistency and robustness to misspecification of the estimator, our main theoretical results provide sharp oracle inequalities that yield worst-case and adaptive risk bounds for the estimation of the regression function, as well as a rate of convergence for the estimation of the inflection point. These results reveal not only that the estimator achieves the minimax optimal rate of convergence for both the estimation of the regression function and its inflection point (up to a logarithmic factor in the latter case), but also that it is able to achieve an almost-parametric rate when the true regression function is piecewise affine with not too many affine pieces. Simulations and a real data application to air pollution modelling also confirm the desirable finite-sample properties of the estimator, and our algorithm is implemented in the R package Sshaped.

We consider a class of semi-parametric dynamic models with strong white noise errors. This class of processes includes the standard Vector Autoregressive (VAR) model, the nonfundamental structural VAR, the mixed causal-noncausal models, as well as nonlinear dynamic models such as the (multivariate) ARCH-M model. For estimation of processes in this class, we propose the Generalized Covariance (GCov) estimator, which is obtained by minimizing a residual-based multivariate portmanteau statistic as an alternative to the Generalized Method of Moments. We derive the asymptotic properties of the GCov estimator and of the associated residual-based portmanteau statistic. Moreover, we show that the GCov estimators are semi-parametrically efficient and the residual-based portmanteau statistics are asymptotically chi-square distributed. The finite sample performance of the GCov estimator is illustrated in a simulation study. The estimator is also applied to a dynamic model of cryptocurrency prices.

Non-parametric estimation of functions as well as their derivatives by means of local-polynomial regression is a subject that was studied in the literature since the late 1970's. Given a set of noisy samples of a $\mathcal{C}^k$ smooth function, we perform a local polynomial fit, and by taking its $m$-th derivative we obtain an estimate for the $m$-th function derivative. The known optimal rates of convergence for this problem for a $k$-times smooth function $f:\mathbb{R}^d \to \mathbb{R}$ are $n^{-\frac{k-m}{2k + d}}$. However in modern applications it is often the case that we have to estimate a function operating to $\mathbb{R}^D$, for $D \gg d$ extremely large. In this work, we prove that these same rates of convergence are also achievable by local-polynomial regression in case of a high dimensional target, given some assumptions on the noise distribution. This result is an extension to Stone's seminal work from 1980 to the regime of high-dimensional target domain. In addition, we unveil a connection between the failure probability $\varepsilon$ and the number of samples required to achieve the optimal rates.

When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose an online debiasing estimator to correct these distributional anomalies in least squares estimation. Our proposed method takes advantage of the covariance structure present in the dataset and provides sharper estimates in directions for which more information has accrued. We establish an asymptotic normality property for our proposed online debiasing estimator under mild conditions on the data collection process, and provide asymptotically exact confidence intervals. We additionally prove a minimax lower bound for the adaptive linear regression problem, thereby providing a baseline by which to compare estimators. There are various conditions under which our proposed estimator achieves the minimax lower bound up to logarithmic factors. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.

We develop a Bayesian nonparametric autoregressive model applied to flexibly estimate general transition densities exhibiting nonlinear lag dependence. Our approach is related to Bayesian density regression using Dirichlet process mixtures, with the Markovian likelihood defined through the conditional distribution obtained from the mixture. This results in a Bayesian nonparametric extension of a mixtures-of-experts model formulation. We address computational challenges to posterior sampling that arise from the Markovian structure in the likelihood. The base model is illustrated with synthetic data from a classical model for population dynamics, as well as a series of waiting times between eruptions of Old Faithful Geyser. We study inferences available through the base model before extending the methodology to include automatic relevance detection among a pre-specified set of lags. Inference for global and local lag selection is explored with additional simulation studies, and the methods are illustrated through analysis of an annual time series of pink salmon abundance in a stream in Alaska. We further explore and compare transition density estimation performance for alternative configurations of the proposed model.

In this paper we propose a dimension-reduction strategy in order to improve the performance of importance sampling in high dimension. The idea is to estimate variance terms in a small number of suitably chosen directions. We first prove that the optimal directions, i.e., the ones that minimize the Kullback--Leibler divergence with the optimal auxiliary density, are the eigenvectors associated to extreme (small or large) eigenvalues of the optimal covariance matrix. We then perform extensive numerical experiments that show that as dimension increases, these directions give estimations which are very close to optimal. Moreover, we show that the estimation remains accurate even when a simple empirical estimator of the covariance matrix is used to estimate these directions. These theoretical and numerical results open the way for different generalizations, in particular the incorporation of such ideas in adaptive importance sampling schemes.

In this work we investigate the variation of the online kernelized ridge regression algorithm in the setting of $d-$dimensional adversarial nonparametric regression. We derive the regret upper bounds on the classes of Sobolev spaces $W_{p}^{\beta}(\mathcal{X})$, $p\geq 2, \beta>\frac{d}{p}$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $\beta> \frac{d}{2}$ or $p=\infty$ these rates are (essentially) optimal. Finally, we compare the performance of the kernelized ridge regression forecaster to the known non-parametric forecasters in terms of the regret rates and their computational complexity as well as to the excess risk rates in the setting of statistical (i.i.d.) nonparametric regression.

Non-parametric estimation of functions as well as their derivatives by means of local-polynomial regression is a subject that was studied in the literature since the late 1970's. Given a set of noisy samples of a $\mathcal{C}^k$ smooth function, we perform a local polynomial fit, and by taking its $m$-th derivative we obtain an estimate for the $m$-th function derivative. The known optimal rates of convergence for this problem for a $k$-times smooth function $f:\mathbb{R}^d \to \mathbb{R}$ are $n^{-\frac{k-m}{2k + d}}$. However in modern applications it is often the case that we have to estimate a function operating to $\mathbb{R}^D$, for $D \gg d$ extremely large. In this work, we prove that these same rates of convergence are also achievable by local-polynomial regression in case of a high dimensional target, given some assumptions on the noise distribution. This result is an extension to Stone's seminal work from 1980 to the regime of high-dimensional target domain. In addition, we unveil a connection between the failure probability $\varepsilon$ and the number of samples required to achieve the optimal rates.

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.

Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.

北京阿比特科技有限公司