干逼视频无码免费网站,国产一国产一级毛片A久久久

We show that the error probability of reconstructing kernel matrices from Random Fourier Features for the Gaussian kernel function is at most $\mathcal{O}(R^{2/3} \exp(-D))$, where $D$ is the number of random features and $R$ is the diameter of the data domain. We also provide an information-theoretic method-independent lower bound of $\Omega((1-\exp(-R^2)) \exp(-D))$. Compared to prior work, we are the first to show that the error probability for random Fourier features is independent of the dimensionality of data points. As applications of our theory, we obtain dimension-independent bounds for kernel ridge regression and support vector machines.

相關內容

相互獨立的

關注 1

維數災難 · 近似 · Networking · 相互獨立的 · 模型評估 ·

2022 年 1 月 11 日

Ranks of Tensor Networks for Eigenspace Projections and the Curse of Dimensionality

Mazen Ali

from arxiv, 19 pages, 1 figure. A more detailed exposition can be found in arXiv:1904.03507. Erratum: Example 3.3 does not satisfy Assumption 3.1 (4)

The hierarchical (multi-linear) rank of an order-$d$ tensor is key in determining the cost of representing a tensor as a (tree) Tensor Network (TN). In general, it is known that, for a fixed accuracy, a tensor with random entries cannot be expected to be efficiently approximable without the curse of dimensionality, i.e., a complexity growing exponentially with $d$. In this work, we show that the ground state projection (GSP) of a class of unbounded Hamiltonians can be approximately represented as an operator of low effective dimensionality that is independent of the (high) dimension $d$ of the GSP. This allows to approximate the GSP without the curse of dimensionality.

優化器 · INFORMS · Performer · 估計/估計量 · CASES ·

2022 年 1 月 9 日

Computing optimal experimental designs on finite sets by log-determinant gradient flow

Federico Piazzon

Optimal experimental designs are probability measures with finite support enjoying an optimality property for the computation of least squares estimators. We present an algorithm for computing optimal designs on finite sets based on the long-time asymptotics of the gradient flow of the log-determinant of the so called information matrix. We prove the convergence of the proposed algorithm, and provide a sharp estimate on the rate its convergence. Numerical experiments are performed on few test cases using the new matlab package OptimalDesignComputation.

連結 · 蒙特卡羅 · Integration · Weight · 泛函 ·

2022 年 1 月 7 日

A note on concatenation of quasi-Monte Carlo and plain Monte Carlo rules in high dimensions

Takashi Goda

from arxiv, 13 pages

In this note, we study a concatenation of quasi-Monte Carlo and plain Monte Carlo rules for high-dimensional numerical integration in weighted function spaces. In particular, we consider approximating the integral of periodic functions defined over the $s$-dimensional unit cube by using rank-1 lattice point sets only for the first $d\, (<s)$ coordinates and random points for the remaining $s-d$ coordinates. We prove that, by exploiting a decay of the weights of function spaces, almost the optimal order of the mean squared worst-case error is achieved by such a concatenated quadrature rule as long as $d$ scales at most linearly with the number of points. This result might be useful for numerical integration in extremely high dimensions, such as partial differential equations with random coefficients for which even the standard fast component-by-component algorithm is considered computationally expensive.

鞍點 · Lipschitz連續 · Continuity · 易處理的 · Lipschitz ·

2022 年 1 月 7 日

Stochastic Saddle Point Problems with Decision-Dependent Distributions

Killian Wood,Emiliano Dall'Anese

This paper focuses on stochastic saddle point problems with decision-dependent distributions in both the static and time-varying settings. These are problems whose objective is the expected value of a stochastic payoff function, where random variables are drawn from a distribution induced by a distributional map. For general distributional maps, the problem of finding saddle points is in general computationally burdensome, even if the distribution is known. To enable a tractable solution approach, we introduce the notion of equilibrium points -- which are saddle points for the stationary stochastic minimax problem that they induce -- and provide conditions for their existence and uniqueness. We demonstrate that the distance between the two classes of solutions is bounded provided that the objective has a strongly-convex-strongly-concave payoff and Lipschitz continuous distributional map. We develop deterministic and stochastic primal-dual algorithms and demonstrate their convergence to the equilibrium point. In particular, by modeling errors emerging from a stochastic gradient estimator as sub-Weibull random variables, we provide error bounds in expectation and in high probability that hold for each iteration; moreover, we show convergence to a neighborhood in expectation and almost surely. Finally, we investigate a condition on the distributional map -- which we call opposing mixture dominance -- that ensures the objective is strongly-convex-strongly-concave. Under this assumption, we show that primal-dual algorithms converge to the saddle points in a similar fashion.

估計/估計量 · 近似 · 傅立葉變換 · 離散化 · Less ·

2022 年 1 月 6 日

From ESPRIT to ESPIRA: Estimation of Signal Parameters by Iterative Rational Approximation

Nadiia Derevianko,Gerlind Plonka,Markus Petz

from arxiv, 36 pages, 18 figures

We introduce a new method for Estimation of Signal Parameters based on Iterative Rational Approximation (ESPIRA) for sparse exponential sums. Our algorithm uses the AAA algorithm for rational approximation of the discrete Fourier transform of the given equidistant signal values. We show that ESPIRA can be interpreted as a matrix pencil method applied to Loewner matrices. These Loewner matrices are closely connected with the Hankel matrices which are usually employed for signal recovery. Due to the construction of the Loewner matrices via an adaptive selection of index sets, the matrix pencil method is stabilized. ESPIRA achieves similar recovery results for exact data as ESPRIT and the matrix pencil method but with less computational effort. Moreover, ESPIRA strongly outperforms ESPRIT and the matrix pencil method for noisy data and for signal approximation by short exponential sums.

ETS · 極小點 · 確切的 · 邊緣化 · 查準率/準確率 ·

2022 年 1 月 5 日

High-dimensional variable selection with heterogeneous signals: A precise asymptotic perspective

Saptarshi Roy,Ambuj Tewari,Ziwei Zhu

from arxiv, 28 pages, 3 figures

We study the problem of exact support recovery for high-dimensional sparse linear regression when the signals are weak, rare and possibly heterogeneous. Specifically, we fix the minimum signal magnitude at the information-theoretic optimal rate and investigate the asymptotic selection accuracy of best subset selection (BSS) and marginal screening (MS) procedures under independent Gaussian design. Despite of the ideal setup, somewhat surprisingly, marginal screening can fail to achieve exact recovery with probability converging to one in the presence of heterogeneous signals, whereas BSS enjoys model consistency whenever the minimum signal strength is above the information-theoretic threshold. To mitigate the computational issue of BSS, we also propose a surrogate two-stage algorithm called ETS (Estimate Then Screen) based on iterative hard thresholding and gradient coordinate screening, and we show that ETS shares exactly the same asymptotic optimality in terms of exact recovery as BSS. Finally, we present a simulation study comparing ETS with LASSO and marginal screening. The numerical results echo with our asymptotic theory even for realistic values of the sample size, dimension and sparsity.

噪聲分布 · 優化器 · 噪聲 · 類別 · 泛函 ·

2022 年 1 月 4 日

Optimal design of the Barker proposal and other locally-balanced Metropolis-Hastings algorithms

Jure Vogrinc,Samuel Livingstone,Giacomo Zanella

from arxiv, 24 pages, 4 figures

We study the class of first-order locally-balanced Metropolis--Hastings algorithms introduced in Livingstone & Zanella (2021). To choose a specific algorithm within the class the user must select a balancing function $g:\mathbb{R} \to \mathbb{R}$ satisfying $g(t) = tg(1/t)$, and a noise distribution for the proposal increment. Popular choices within the class are the Metropolis-adjusted Langevin algorithm and the recently introduced Barker proposal. We first establish a universal limiting optimal acceptance rate of 57% and scaling of $n^{-1/3}$ as the dimension $n$ tends to infinity among all members of the class under mild smoothness assumptions on $g$ and when the target distribution for the algorithm is of the product form. In particular we obtain an explicit expression for the asymptotic efficiency of an arbitrary algorithm in the class, as measured by expected squared jumping distance. We then consider how to optimise this expression under various constraints. We derive an optimal choice of noise distribution for the Barker proposal, optimal choice of balancing function under a Gaussian noise distribution, and optimal choice of first-order locally-balanced algorithm among the entire class, which turns out to depend on the specific target distribution. Numerical simulations confirm our theoretical findings and in particular show that a bi-modal choice of noise distribution in the Barker proposal gives rise to a practical algorithm that is consistently more efficient than the original Gaussian version.

近似 · 蒙特卡羅 · 近似誤差 · FAST · SimPLe ·

2022 年 1 月 4 日

Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections

Kimia Nadjahi,Alain Durmus,Pierre E. Jacob,Roland Badeau,Umut ?im?ekli

from arxiv, Published at NeurIPS 2021

The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on a generative modeling problem.

隨機梯度下降 · SGD · 噪聲 · 小批量 · 優化器 ·

2022 年 1 月 4 日

Non Asymptotic Bounds for Optimization via Online Multiplicative Stochastic Gradient Descent

Riddhiman Bhattacharya

The gradient noise of Stochastic Gradient Descent (SGD) is considered to play a key role in its properties (e.g. escaping low potential points and regularization). Past research has indicated that the covariance of the SGD error done via minibatching plays a critical role in determining its regularization and escape from low potential points. It is however not much explored how much the distribution of the error influences the behavior of the algorithm. Motivated by some new research in this area, we prove universality results by showing that noise classes that have the same mean and covariance structure of SGD via minibatching have similar properties. We mainly consider the Multiplicative Stochastic Gradient Descent (M-SGD) algorithm as introduced by Wu et al., which has a much more general noise class than the SGD algorithm done via minibatching. We establish nonasymptotic bounds for the M-SGD algorithm mainly with respect to the Stochastic Differential Equation corresponding to SGD via minibatching. We also show that the M-SGD error is approximately a scaled Gaussian distribution with mean $0$ at any fixed point of the M-SGD algorithm. We also establish bounds for the convergence of the M-SGD algorithm in the strongly convex regime.

優化器 · Lipschitz連續 · 正則化項 · Continuity · Lipschitz ·

2018 年 6 月 1 日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Kevin Scaman,Francis Bach,Sébastien Bubeck,Yin Tat Lee,Laurent Massoulié

from arxiv, 17 pages

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.