亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The $L_p$-discrepancy is a quantitative measure for the irregularity of distribution of an $N$-element point set in the $d$-dimensional unit cube, which is closely related to the worst-case error of quasi-Monte Carlo algorithms for numerical integration. Its inverse for dimension $d$ and error threshold $\varepsilon \in (0,1)$ is the number of points in $[0,1)^d$ that is required such that the minimal normalized $L_p$-discrepancy is less or equal $\varepsilon$. It is well known, that the inverse of $L_2$-discrepancy grows exponentially fast with the dimension $d$, i.e., we have the curse of dimensionality, whereas the inverse of $L_{\infty}$-discrepancy depends exactly linearly on $d$. The behavior of inverse of $L_p$-discrepancy for general $p \not\in \{2,\infty\}$ is an open problem since many years. In this paper we show that the $L_p$-discrepancy suffers from the curse of dimensionality for all $p$ which are of the form $p=2 \ell/(2 \ell -1)$ with $\ell \in \mathbb{N}$. This result follows from a more general result that we show for the worst-case error of numerical integration in an anchored Sobolev space with anchor 0 of once differentiable functions in each variable whose first derivative has finite $L_q$-norm, where $q$ is an even positive integer satisfying $1/p+1/q=1$.

相關內容

維度災難是指在高維空間中分析和組織數據時出現的各種現象,這些現象在低維設置(例如日常體驗的三維物理空間)中不會發生。

An important issue in many multivariate regression problems is to eliminate candidate predictors with null predictor vectors. In large-dimensional (LD) setting where the numbers of responses and predictors are large, model selection encounters the scalability challenge. Knock-one-out (KOO) statistics hold promise to meet this challenge. In this paper, the almost sure limits and the central limit theorem of the KOO statistics are derived under the LD setting and mild distributional assumptions (finite fourth moments) of the errors. These theoretical results guarantee the strong consistency of a subset selection rule based on the KOO statistics with a general threshold. For enhancing the robustness of the selection rule, we also propose a bootstrap threshold for the KOO approach. Simulation results support our conclusions and demonstrate the selection probabilities by the KOO approach with the bootstrap threshold outperform the methods using Akaike information threshold, Bayesian information threshold and Mallow's C$_p$ threshold. We compare the proposed KOO approach with those based on information threshold to a chemometrics dataset and a yeast cell-cycle dataset, which suggests our proposed method identifies useful models.

Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model.

This paper proposes a new approach to identifying the effective cointegration rank in high-dimensional unit-root (HDUR) time series from a prediction perspective using reduced-rank regression. For a HDUR process $\mathbf{x}_t\in \mathbb{R}^N$ and a stationary series $\mathbf{y}_t\in \mathbb{R}^p$ of interest, our goal is to predict future values of $\mathbf{y}_t$ using $\mathbf{x}_t$ and lagged values of $\mathbf{y}_t$. The proposed framework consists of a two-step estimation procedure. First, the Principal Component Analysis is used to identify all cointegrating vectors of $\mathbf{x}_t$. Second, the co-integrated stationary series are used as regressors, together with some lagged variables of $\mathbf{y}_t$, to predict $\mathbf{y}_t$. The estimated reduced rank is then defined as the effective cointegration rank of $\mathbf{x}_t$. Under the scenario that the autoregressive coefficient matrices are sparse (or of low-rank), we apply the Least Absolute Shrinkage and Selection Operator (or the reduced-rank techniques) to estimate the autoregressive coefficients when the dimension involved is high. Theoretical properties of the estimators are established under the assumptions that the dimensions $p$ and $N$ and the sample size $T \to \infty$. Both simulated and real examples are used to illustrate the proposed framework, and the empirical application suggests that the proposed procedure fares well in predicting stock returns.

In this paper we obtain quantitative {\it Bernstein-von Mises type} bounds on the normal approximation of the posterior distribution in exponential family models when centering either around the posterior mode or around the maximum likelihood estimator. Our bounds, obtained through a version of Stein's method, are non-asymptotic, and data dependent; they are of the correct order both in the total variation and Wasserstein distances, as well as for approximations for expectations of smooth functions of the posterior. All our results are valid for univariate and multivariate posteriors alike, and do not require a conjugate prior setting. We illustrate our findings on a variety of exponential family distributions, including Poisson, multinomial and normal distribution with unknown mean and variance. The resulting bounds have an explicit dependence on the prior distribution and on sufficient statistics of the data from the sample, and thus provide insight into how these factors may affect the quality of the normal approximation.

We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term ``(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a ``q-learning" theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another recovers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.

We propose a novel technique for analyzing adaptive sampling called the {\em Simulator}. Our approach differs from the existing methods by considering not how much information could be gathered by any fixed sampling strategy, but how difficult it is to distinguish a good sampling strategy from a bad one given the limited amount of data collected up to any given time. This change of perspective allows us to match the strength of both Fano and change-of-measure techniques, without succumbing to the limitations of either method. For concreteness, we apply our techniques to a structured multi-arm bandit problem in the fixed-confidence pure exploration setting, where we show that the constraints on the means imply a substantial gap between the moderate-confidence sample complexity, and the asymptotic sample complexity as $\delta \to 0$ found in the literature. We also prove the first instance-based lower bounds for the top-k problem which incorporate the appropriate log-factors. Moreover, our lower bounds zero-in on the number of times each \emph{individual} arm needs to be pulled, uncovering new phenomena which are drowned out in the aggregate sample complexity. Our new analysis inspires a simple and near-optimal algorithm for the best-arm and top-k identification, the first {\em practical} algorithm of its kind for the latter problem which removes extraneous log factors, and outperforms the state-of-the-art in experiments.

Regular ring lattices (RRLs) are defined as peculiar undirected circulant graphs constructed from a cycle graph, wherein each node is connected to pairs of neighbors that are spaced progressively in terms of vertex degree. This kind of network topology is extensively adopted in several graph-based distributed scalable protocols and their spectral properties often play a central role in the determination of convergence rates for such algorithms. In this work, basic properties of RRL graphs and the eigenvalues of the corresponding Laplacian and Randi\'{c} matrices are investigated. A deep characterization for the spectra of these matrices is given and their relation with the Dirichlet kernel is illustrated. Consequently, the Fiedler value of such a network topology is found analytically. With regard to RRLs, properties on the bounds for the spectral radius of the Laplacian matrix and the essential spectral radius of the Randi\'{c} matrix are also provided, proposing interesting conjectures on the latter quantities.

The error exponent of fixed-length lossy source coding was established by Marton. Ahlswede showed that this exponent can be discontinuous at a rate $R$, depending on the probability distribution $P$ of the given information source and the distortion measure $d(x,y)$. The reason for the discontinuity in the error exponent is that there exists $(d,\Delta)$ such that the rate-distortion function $R(\Delta|P)$ is neither concave nor quasi-concave with respect to $P$. Arimoto's algorithm for computing the error exponent in lossy source coding is based on Blahut's parametric representation of the error exponent. However, Blahut's parametric representation is a lower convex envelope of Marton's exponent, and the two do not generally agree. The contribution of this paper is to provide a parametric representation that perfectly matches with the inverse function of Marton's exponent, thus avoiding the problem of the rate-distortion function being non-convex with respect to $P$. The optimal distribution for fixed parameters can be obtained using Arimoto's algorithm. Performing a nonconvex optimization over the parameters successfully yields the inverse function of Marton's exponent.

In learning theory, a standard assumption is that the data is generated from a finite mixture model. But what happens when the number of components is not known in advance? The problem of estimating the number of components, also called model selection, is important in its own right but there are essentially no known efficient algorithms with provable guarantees let alone ones that can tolerate adversarial corruptions. In this work, we study the problem of robust model selection for univariate Gaussian mixture models (GMMs). Given $\textsf{poly}(k/\epsilon)$ samples from a distribution that is $\epsilon$-close in TV distance to a GMM with $k$ components, we can construct a GMM with $\widetilde{O}(k)$ components that approximates the distribution to within $\widetilde{O}(\epsilon)$ in $\textsf{poly}(k/\epsilon)$ time. Thus we are able to approximately determine the minimum number of components needed to fit the distribution within a logarithmic factor. Prior to our work, the only known algorithms for learning arbitrary univariate GMMs either output significantly more than $k$ components (e.g. $k/\epsilon^2$ components for kernel density estimates) or run in time exponential in $k$. Moreover, by adapting our techniques we obtain similar results for reconstructing Fourier-sparse signals.

Sampling from the posterior is a key technical problem in Bayesian statistics. Rigorous guarantees are difficult to obtain for Markov Chain Monte Carlo algorithms of common use. In this paper, we study an alternative class of algorithms based on diffusion processes. The diffusion is constructed in such a way that, at its final time, it approximates the target posterior distribution. The stochastic differential equation that defines this process is discretized (using a Euler scheme) to provide an efficient sampling algorithm. Our construction of the diffusion is based on the notion of observation process and the related idea of stochastic localization. Namely, the diffusion process describes a sample that is conditioned on increasing information. An overlapping family of processes was derived in the machine learning literature via time-reversal. We apply this method to posterior sampling in the high-dimensional symmetric spiked model. We observe a rank-one matrix ${\boldsymbol \theta}{\boldsymbol \theta}^{\sf T}$ corrupted by Gaussian noise, and want to sample ${\boldsymbol \theta}$ from the posterior. Our sampling algorithm makes use of an oracle that computes the posterior expectation of ${\boldsymbol \theta}$ given the data and the additional observation process. We provide an efficient implementation of this oracle using approximate message passing. We thus develop the first sampling algorithm for this problem with approximation guarantees.

北京阿比特科技有限公司