蜜桃少妇AV久久久久久久_日韩毛片天天视频_在线免费观看羞羞视频_久久夜色国产精品一区二区_国产午夜福利100集在线观看_毛片一区免费网站观看_美女黄禁止18以下看免费无视频

Many recent problems in signal processing and machine learning such as compressed sensing, image restoration, matrix/tensor recovery, and non-negative matrix factorization can be cast as constrained optimization. Projected gradient descent is a simple yet efficient method for solving such constrained optimization problems. Local convergence analysis furthers our understanding of its asymptotic behavior near the solution, offering sharper bounds on the convergence rate compared to global convergence analysis. However, local guarantees often appear scattered in problem-specific areas of machine learning and signal processing. This manuscript presents a unified framework for the local convergence analysis of projected gradient descent in the context of constrained least squares. The proposed analysis offers insights into pivotal local convergence properties such as the condition of linear convergence, the region of convergence, the exact asymptotic rate of convergence, and the bound on the number of iterations needed to reach a certain level of accuracy. To demonstrate the applicability of the proposed approach, we present a recipe for the convergence analysis of PGD and demonstrate it via a beginning-to-end application of the recipe on four fundamental problems, namely, linearly constrained least squares, sparse recovery, least squares with the unit norm constraint, and matrix completion.

相關內容

Signal Processing

關注 3

信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)期刊采用了理(li)(li)(li)論與實踐的各個方面的信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)。它(ta)以原始研究(jiu)工作，教程(cheng)和(he)評論文章(zhang)以及實際(ji)發展情(qing)況為特色。它(ta)旨(zhi)在將知識和(he)經驗快速傳播給(gei)從(cong)事信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)研究(jiu)，開發或實際(ji)應用的工程(cheng)師和(he)科學家。該期刊涵蓋的主題領域包括：信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)理(li)(li)(li)論；隨機過(guo)程(cheng); 檢(jian)測(ce)和(he)估計；光(guang)譜(pu)分析；過(guo)濾；信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)系(xi)統(tong)；軟件開發；圖像處(chu)理(li)(li)(li); 模式識別(bie); 光(guang)信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)；數字(zi)信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li); 多維信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)；通信(xin)(xin)(xin)(xin)信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)；生物醫(yi)學信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)；地球物理(li)(li)(li)和(he)天體信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)；地球資源信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)；聲音(yin)和(he)振動信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)；數據處(chu)理(li)(li)(li); 遙感; 信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)技術(shu)；雷達(da)信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)；聲納信(xin)(xin)(xin)(xin)號(hao)(hao)(hao)(hao)(hao)處(chu)理(li)(li)(li)；工業(ye)應用；新的應用程(cheng)序。官(guan)網地址：

可約的 · 估計/估計量 · Weight · UniFormer · Oracle ·

2022 年 4 月 20 日

Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence

Sen Na,Micha? Dereziński,Michael W. Mahoney

from arxiv, 40 pages, 16 figures

We consider minimizing a smooth and strongly convex objective function using a stochastic Newton method. At each iteration, the algorithm is given an oracle access to a stochastic estimate of the Hessian matrix. The oracle model includes popular algorithms such as the Subsampled Newton and Newton Sketch, which can efficiently construct stochastic Hessian estimates for many tasks. Despite using second-order information, these existing methods do not exhibit superlinear convergence, unless the stochastic noise is gradually reduced to zero during the iteration, which would lead to a computational blow-up in the per-iteration cost. We address this limitation with Hessian averaging: instead of using the most recent Hessian estimate, our algorithm maintains an average of all past estimates. This reduces the stochastic noise while avoiding the computational blow-up. We show that this scheme enjoys local $Q$-superlinear convergence with a non-asymptotic rate of $(\Upsilon\sqrt{\log (t)/t}\,)^{t}$, where $\Upsilon$ is proportional to the level of stochastic noise in the Hessian oracle. A potential drawback of this (uniform averaging) approach is that the averaged estimates contain Hessian information from the global phase of the iteration, i.e., before the iterates converge to a local neighborhood. This leads to a distortion that may substantially delay the superlinear convergence until long after the local neighborhood is reached. To address this drawback, we study a number of weighted averaging schemes that assign larger weights to recent Hessians, so that the superlinear convergence arises sooner, albeit with a slightly slower rate. Remarkably, we show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still enjoys a superlinear convergence~rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.

局部極小 · 極小值 · 鞍點 · 極小點 · 非凸 ·

2022 年 4 月 20 日

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Zixiang Chen,Dongruo Zhou,Quanquan Gu

from arxiv, 29 pages, 1 figure, 1 table. In ALT 2022

Escaping from saddle points and finding local minimum is a central problem in nonconvex optimization. Perturbed gradient methods are perhaps the simplest approach for this problem. However, to find $(\epsilon, \sqrt{\epsilon})$-approximate local minima, the existing best stochastic gradient complexity for this type of algorithms is $\tilde O(\epsilon^{-3.5})$, which is not optimal. In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima. We show that LENA with stochastic gradient estimators such as SARAH/SPIDER and STORM can find $(\epsilon, \epsilon_{H})$-approximate local minima within $\tilde O(\epsilon^{-3} + \epsilon_{H}^{-6})$ stochastic gradient evaluations (or $\tilde O(\epsilon^{-3})$ when $\epsilon_H = \sqrt{\epsilon}$). The core idea of our framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.

性能度量 · Lipschitz · 鞍點 · Performer · 最優化 ·

2022 年 4 月 20 日

Tight Last-Iterate Convergence of the Extragradient Method for Constrained Monotone Variational Inequalities

Yang Cai,Argyris Oikonomou,Weiqiang Zheng

The monotone variational inequality is a central problem in mathematical programming that unifies and generalizes many important settings such as smooth convex optimization, two-player zero-sum games, convex-concave saddle point problems, etc. The extragradient method by Korpelevich [1976] is one of the most popular methods for solving monotone variational inequalities. Despite its long history and intensive attention from the optimization and machine learning community, the following major problem remains open. What is the last-iterate convergence rate of the extragradient method for monotone and Lipschitz variational inequalities with constraints? We resolve this open problem by showing a tight $O\left(\frac{1}{\sqrt{T}}\right)$ last-iterate convergence rate for arbitrary convex feasible sets, which matches the lower bound by Golowich et al. [2020]. Our rate is measured in terms of the standard gap function. The technical core of our result is the monotonicity of a new performance measure -- the tangent residual, which can be viewed as an adaptation of the norm of the operator that takes the local constraints into account. To establish the monotonicity, we develop a new approach that combines the power of the sum-of-squares programming with the low dimensionality of the update rule of the extragradient method. We believe our approach has many additional applications in the analysis of iterative methods.

SGD · 學習率 · 對率損失 · 隨機梯度下降 · 分離的 ·

2022 年 4 月 18 日

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Mor Shpigel Nacson,Nathan Srebro,Daniel Soudry

from arxiv, Fixed a typo (Eq. (4) - missing \sigma_{max}^2 term in the denominator)

Stochastic Gradient Descent (SGD) is a central tool in machine learning. We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data. Previous works assumed either a vanishing learning rate, iterate averaging, or loss assumptions that do not hold for monotone loss functions used for classification, such as the logistic loss. We prove our result on a fixed dataset, both for sampling with or without replacement. Furthermore, for logistic loss (and similar exponentially-tailed losses), we prove that with SGD the weight vector converges in direction to the $L_2$ max margin vector as $O(1/\log(t))$ for almost all separable datasets, and the loss converges as $O(1/t)$ - similarly to gradient descent. Lastly, we examine the case of a fixed learning rate proportional to the minibatch size. We prove that in this case, the asymptotic convergence rate of SGD (with replacement) does not depend on the minibatch size in terms of epochs, if the support vectors span the data. These results may suggest an explanation to similar behaviors observed in deep networks, when trained with SGD.

優化器 · 有向 · Networking · 可約的 · 線性的 ·

2022 年 4 月 18 日

On Arbitrary Compression for Decentralized Consensus and Stochastic Optimization over Directed Networks

Mohammad Taha Toghani,César A. Uribe

We study the decentralized consensus and stochastic optimization problems with compressed communications over static directed graphs. We propose an iterative gradient-based algorithm that compresses messages according to a desired compression ratio. The proposed method provably reduces the communication overhead on the network at every communication round. Contrary to existing literature, we allow for arbitrary compression ratios in the communicated messages. We show a linear convergence rate for the proposed method on the consensus problem. Moreover, we provide explicit convergence rates for decentralized stochastic optimization problems on smooth functions that are either (i) strongly convex, (ii) convex, or (iii) non-convex. Finally, we provide numerical experiments to illustrate convergence under arbitrary compression ratios and the communication efficiency of our algorithm.

離散化 · 確切的 · 論文 · 數值分析 ·

2022 年 4 月 17 日

Adomian Decomposition Based Numerical Scheme for Flow Simulations

Imanol Garcia-Beristain,Lakhdar Remaki

This paper proposes a numerical method based on the Adomian decomposition approach for the time discretization, applied to Euler equations. A recursive property is demonstrated that allows to formulate the method in an appropriate and efficient way. To obtain a fully numerical scheme, the space discretization is achieved using the classical DG techniques. The efficiency of the obtained numerical scheme is demonstrated through numerical tests by comparison to exact solution and the popular Runge-Kutta DG method results.

方差減小 · 平均梯度 · 估計/估計量 · Batch Size · contrastive ·

2022 年 4 月 17 日

Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization

Gideon Dresdner,Maria-Luiza Vladarean,Gunnar R?tsch,Francesco Locatello,Volkan Cevher,Alp Yurtsever

from arxiv, Artificial Intelligence and Statistics (AISTATS) 2022

We propose a stochastic conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. Existing CGM variants for this template either suffer from slow convergence rates, or require carefully increasing the batch size over the course of the algorithm's execution, which leads to computing full gradients. In contrast, the proposed method, equipped with a stochastic average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques. In applications we put special emphasis on problems with a large number of separable constraints. Such problems are prevalent among semidefinite programming (SDP) formulations arising in machine learning and theoretical computer science. We provide numerical experiments on matrix completion, unsupervised clustering, and sparsest-cut SDPs.

動力系統 · 邊緣化 · MoDELS · 估計誤差 · Performer ·

2022 年 4 月 15 日

Space-sequential particle filters for high-dimensional dynamical systems described by stochastic differential equations

Deniz Akyildiz,Dan Crisan,Joaquin Miguez

We introduce a novel methodology for particle filtering in dynamical systems where the evolution of the signal of interest is described by a SDE and observations are collected instantaneously at prescribed time instants. The new approach includes the discretisation of the SDE and the design of efficient particle filters for the resulting discrete-time state-space model. The discretisation scheme converges with weak order 1 and it is devised to create a sequential dependence structure along the coordinates of the discrete-time state vector. We introduce a class of space-sequential particle filters that exploits this structure to improve performance when the system dimension is large. This is numerically illustrated by a set of computer simulations for a stochastic Lorenz 96 system with additive noise. The new space-sequential particle filters attain approximately constant estimation errors as the dimension of the Lorenz 96 system is increased, with a computational cost that increases polynomially, rather than exponentially, with the system dimension. Besides the new numerical scheme and particle filters, we provide in this paper a general framework for discrete-time filtering in continuous-time dynamical systems described by a SDE and instantaneous observations. Provided that the SDE is discretised using a weakly-convergent scheme, we prove that the marginal posterior laws of the resulting discrete-time state-space model converge to the posterior marginal posterior laws of the original continuous-time state-space model under a suitably defined metric. This result is general and not restricted to the numerical scheme or particle filters specifically studied in this manuscript.

離散化 · 極小點 · 路徑 · Performer · 計算成本 ·

2022 年 4 月 15 日

Convergence of the Discrete Minimum Energy Path

Xuanyu Liu,Huajie Chen,Christoph Ortner

from arxiv, arXiv admin note: text overlap with arXiv:2204.00984

The minimum energy path (MEP) describes the mechanism of reaction, and the energy barrier along the path can be used to calculate the reaction rate in thermal systems. The nudged elastic band (NEB) method is one of the most commonly used schemes to compute MEPs numerically. It approximates an MEP by a discrete set of configuration images, where the discretization size determines both computational cost and accuracy of the simulations. In this paper, we consider a discrete MEP to be a stationary state of the NEB method and prove an optimal convergence rate of the discrete MEP with respect to the number of images. Numerical simulations for the transitions of some several proto-typical model systems are performed to support the theory.

殘差網絡 · Networking · 正則化項 · 泛函 · 層 ·

2022 年 4 月 14 日

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

Rama Cont,Alain Rossier,RenYuan Xu

We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.