无码人妻一区二区三区在线不卡-啊在线不卡视频无码

Nesterov's accelerated forward-backward algorithm (AFBA) is an efficient algorithm for solving a class of two-term convex optimization models consisting of a differentiable function with a Lipschitz continuous gradient plus a nondifferentiable function with a closed form of its proximity operator. It has been shown that the iterative sequence generated by AFBA with a modified Nesterov's momentum scheme converges to a minimizer of the objective function with an $o\left(\frac{1}{k^2}\right)$ convergence rate in terms of the function value (FV-convergence rate) and an $o\left(\frac{1}{k}\right)$ convergence rate in terms of the distance between consecutive iterates (DCI-convergence rate). In this paper, we propose a more general momentum scheme with an introduced power parameter $\omega\in(0,1]$ and show that AFBA with the proposed momentum scheme converges to a minimizer of the objective function with an $o\left(\frac{1}{k^{2\omega}}\right)$ FV-convergence rate and an $o\left(\frac{1}{k^{\omega}}\right)$ DCI-convergence rate. The generality of the proposed momentum scheme provides us a variety of parameter selections for different scenarios, which makes the resulting algorithm more flexible to achieve better performance. We then employ AFBA with the proposed momentum scheme to solve the smoothed hinge loss $\ell_1$-support vector machine model. Numerical results demonstrate that the proposed generalized momentum scheme outperforms two existing momentum schemes.

相關內容

前向-后向算法

關注 0

查準率/準確率 · 線性的 · 黑盒 · 優化器 · MoDELS ·

2022 年 2 月 15 日

Finite-Bit Quantization For Distributed Algorithms With Linear Convergence

Nicolò Michelusi,Gesualdo Scutari,Chang-Shen Lee

from arxiv, Submitted to the IEEE Transactions on Information Theory

This paper studies distributed algorithms for (strongly convex) composite optimization problems over mesh networks, subject to quantized communications. Instead of focusing on a specific algorithmic design, a black-box model is proposed, casting linearly-convergent distributed algorithms in the form of fixed-point iterates. While most existing quantization rules, such as the popular compression rule, rely on some form of communication of scalar signals (in practice quantized at the machine precision), this paper considers regimes operating under limited communication budgets, where communication at machine precision is not viable. To address this challenge, the algorithmic model is coupled with a novel random or deterministic Biased Compression (BC-)rule on the quantizer design as well as with a new Adaptive range Non-uniform Quantizer (ANQ) and communication-efficient encoding scheme, which implement the BC-rule using a finite number of bits (below machine precision). A unified communication complexity analysis is developed for the black-box model, determining the average number of bits required to reach a solution of the optimization problem within a target accuracy. In particular, it is shown that the proposed BC-rule preserves linear convergence of the unquantized algorithms, and a trade-off between convergence rate and communication cost under quantization is characterized. Numerical results validate our theoretical findings and show that distributed algorithms equipped with the proposed ANQ have more favorable communication complexity than algorithms using state-of-the-art quantization rules.

平滑 · 優化器 · 類別 · Lipschitz · 均方誤差 ·

2022 年 2 月 14 日

Blessings and curse of smoothness and phase transitions in nonparametric regressions: a nonasymptotic perspective

Ying Zhu

from arxiv, 3 Tables

When the regression function belongs to the standard smooth classes consisting of univariate functions with derivatives up to the $(\gamma+1)$th order bounded in absolute values by a common constant everywhere or a.e., it is well known that the minimax optimal rate of convergence in mean squared error (MSE) is $\left(\frac{\sigma^{2}}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ when $\gamma$ is finite and the sample size $n\rightarrow\infty$. From a nonasymptotic viewpoint that does not take $n$ to infinity, this paper shows that: for the standard H\"older and Sobolev classes, the minimax optimal rate is $\frac{\sigma^{2}\left(\gamma+1\right)}{n}$ ($\succsim\left(\frac{\sigma^{2}}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$) when $\frac{n}{\sigma^{2}}\precsim\left(\gamma+1\right)^{2\gamma+3}$ and $\left(\frac{\sigma^{2}}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ ($\succsim\frac{\sigma^{2}\left(\gamma+1\right)}{n}$) when $\frac{n}{\sigma^{2}}\succsim\left(\gamma+1\right)^{2\gamma+3}$. To establish these results, we derive upper and lower bounds on the covering and packing numbers for the generalized H\"older class where the absolute value of the $k$th ($k=0,...,\gamma$) derivative is bounded by a parameter $R_{k}$ and the $\gamma$th derivative is $R_{\gamma+1}-$Lipschitz (and also for the generalized ellipsoid class of smooth functions). Our bounds sharpen the classical metric entropy results for the standard classes, and give the general dependence on $\gamma$ and $R_{k}$. By deriving the minimax optimal MSE rates under various (well motivated) $R_{k}$s for the smooth classes with the help of our new entropy bounds, we show several interesting results that cannot be shown with the existing entropy bounds in the literature.

近似 · 可辨認的 · 有限差分 · 泛函 · 同質 ·

2022 年 2 月 14 日

Numerical approximations to a singularly perturbed convection-diffusion problem with a discontinuous initial condition

Jose Luis Gracia,Eugene O'Riordan

from arxiv, 24 pages; 10 figures

A singularly perturbed parabolic problem of convection-diffusion type with a discontinuous initial condition is examined. An analytic function is identified which matches the discontinuity in the initial condition and also satisfies the homogenous parabolic differential equation associated with the problem. The difference between this analytical function and the solution of the parabolic problem is approximated numerically, using an upwind finite difference operator combined with an appropriate layer-adapted mesh. The numerical method is shown to be parameter-uniform. Numerical results are presented to illustrate the theoretical error bounds established in the paper.

近似 · 可約的 · 向量化 · 分解的 · 分離的 ·

2022 年 2 月 13 日

Generalized Unrelated Machine Scheduling Problem

Shichuan Deng,Jian Li,Yuval Rabani

We study the generalized load-balancing (GLB) problem, where we are given $n$ jobs, each of which needs to be assigned to one of $m$ unrelated machines with processing times $\{p_{ij}\}$. Under a job assignment $\sigma$, the load of each machine $i$ is $\psi_i(\mathbf{p}_{i}[\sigma])$ where $\psi_i:\mathbb{R}^n\rightarrow\mathbb{R}_{\geq0}$ is a symmetric monotone norm and $\mathbf{p}_{i}[\sigma]$ is the $n$-dimensional vector $\{p_{ij}\cdot \mathbf{1}[\sigma(j)=i]\}_{j\in [n]}$. Our goal is to minimize the generalized makespan $\phi(\mathsf{load}(\sigma))$, where $\phi:\mathbb{R}^m\rightarrow\mathbb{R}_{\geq0}$ is another symmetric monotone norm and $\mathsf{load}(\sigma)$ is the $m$-dimensional machine load vector. This problem significantly generalizes many classic optimization problems, e.g., makespan minimization, set cover, minimum-norm load-balancing, etc. We obtain a polynomial time randomized algorithm that achieves an approximation factor of $O(\log n)$, matching the lower bound of set cover up to constant factor. We achieve this by rounding a novel configuration LP relaxation with exponential number of variables. To approximately solve the configuration LP, we design an approximate separation oracle for its dual program. In particular, the separation oracle can be reduced to the norm minimization with a linear constraint (NormLin) problem and we devise a polynomial time approximation scheme (PTAS) for it, which may be of independent interest.

優化器 · Integration · 平滑 · GAN · 約束 ·

2022 年 2 月 13 日

Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization

Guodong Zhang,Yuanhao Wang,Laurent Lessard,Roger Grosse

from arxiv, AISTATS 2022

Smooth minimax games often proceed by simultaneous or alternating gradient updates. Although algorithms with alternating updates are commonly used in practice, the majority of existing theoretical analyses focus on simultaneous algorithms for convenience of analysis. In this paper, we study alternating gradient descent-ascent (Alt-GDA) in minimax games and show that Alt-GDA is superior to its simultaneous counterpart~(Sim-GDA) in many settings. We prove that Alt-GDA achieves a near-optimal local convergence rate for strongly convex-strongly concave (SCSC) problems while Sim-GDA converges at a much slower rate. To our knowledge, this is the \emph{first} result of any setting showing that Alt-GDA converges faster than Sim-GDA by more than a constant. We further adapt the theory of integral quadratic constraints (IQC) and show that Alt-GDA attains the same rate \emph{globally} for a subclass of SCSC minimax problems. Empirically, we demonstrate that alternating updates speed up GAN training significantly and the use of optimism only helps for simultaneous algorithms.

解碼 · 層 · CC · MS · 可約的 ·

2022 年 2 月 12 日

Generalized Mutual Information-Maximizing Quantized Decoding of LDPC Codes with Layered Scheduling

Peng Kang,Kui Cai,Xuan He,Shuangyang Li,Jinhong Yuan

from arxiv, This paper is an extended version of the manuscript submitted to the IEEE Transactions on Vehicular Technology

In this paper, we propose a framework of the mutual information-maximizing (MIM) quantized decoding for low-density parity-check (LDPC) codes by using simple mappings and fixed-point additions. Our decoding method is generic in the sense that it can be applied to LDPC codes with arbitrary degree distributions, and can be implemented based on either the belief propagation (BP) algorithm or the min-sum (MS) algorithm. In particular, we propose the MIM density evolution (MIM-DE) to construct the lookup tables (LUTs) for the node updates. The computational complexity and memory requirements are discussed and compared to the LUT decoder variants. For applications with low-latency requirement, we consider the layered schedule to accelerate the convergence speed of decoding quasi-cyclic LDPC codes. In particular, we develop the layered MIM-DE to design the LUTs based on the MS algorithm, leading to the MIM layered quantized MS (MIM-LQMS) decoder. An optimization method is further introduced to reduce the memory requirement for storing the LUTs. Simulation results show that the MIM quantized decoders outperform the state-of-the-art LUT decoders in the waterfall region with both 3-bit and 4-bit precision over the additive white Gaussian noise channels. For small decoding iterations, the MIM quantized decoders also achieve a faster convergence speed compared to the benchmarks. Moreover, the 4-bit MIM-LQMS decoder can approach the error performance of the floating-point layered BP decoder within 0.3 dB in the moderate-to-high SNR regions, over both the AWGN channels and the fast fading channels.

SGD · CASE · 散度 · 目標函數 · Networks ·

2022 年 2 月 11 日

Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD

Jiayi Wang,Shiqiang Wang,Rong-Rong Chen,Mingyue Ji

from arxiv, 36 pages

Hierarchical SGD (H-SGD) has emerged as a new distributed SGD algorithm for multi-level communication networks. In H-SGD, before each global aggregation, workers send their updated local models to local servers for aggregations. Despite recent research efforts, the effect of local aggregation on global convergence still lacks theoretical understanding. In this work, we first introduce a new notion of "upward" and "downward" divergences. We then use it to conduct a novel analysis to obtain a worst-case convergence upper bound for two-level H-SGD with non-IID data, non-convex objective function, and stochastic gradient. By extending this result to the case with random grouping, we observe that this convergence upper bound of H-SGD is between the upper bounds of two single-level local SGD settings, with the number of local iterations equal to the local and global update periods in H-SGD, respectively. We refer to this as the "sandwich behavior". Furthermore, we extend our analytical approach based on "upward" and "downward" divergences to study the convergence for the general case of H-SGD with more than two levels, where the "sandwich behavior" still holds. Our theoretical results provide key insights of why local aggregation can be beneficial in improving the convergence of H-SGD.

邊緣化 · 對率損失 · FAST · 線性分類 · Performer ·

2021 年 7 月 1 日

Fast Margin Maximization via Dual Acceleration

Ziwei Ji,Nathan Srebro,Matus Telgarsky

from arxiv, ICML 2021

We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$. This contrasts with a rate of $\mathcal{O}(1/\log(t))$ for standard gradient descent, and $\mathcal{O}(1/t)$ for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables.

非凸 · 可理解性 · 動量 · PCA · 流 ·

2018 年 10 月 1 日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Tianyi Liu,Shiyang Li,Jianping Shi,Enlu Zhou,Tuo Zhao

from arxiv, arXiv admin note: text overlap with arXiv:1802.05155

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) is one of the most popular algorithms in distributed machine learning. However, its convergence properties for these complicated nonconvex problems is still largely unknown, because of the current technical limit. Therefore, in this paper, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problem - streaming PCA, which helps us to understand Aync-MSGD better even for more general problems. Specifically, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA by diffusion approximation. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.

Better · 強化學習 · 學成 · Performer · 最優化 ·

2018 年 4 月 24 日

Accelerated Reinforcement Learning

K. Lakshmanan

from arxiv, The proof is not complete as it has to be shown the algorithm tracks the ODE

Policy gradient methods are widely used in reinforcement learning algorithms to search for better policies in the parameterized policy space. They do gradient search in the policy space and are known to converge very slowly. Nesterov developed an accelerated gradient search algorithm for convex optimization problems. This has been recently extended for non-convex and also stochastic optimization. We use Nesterov's acceleration for policy gradient search in the well-known actor-critic algorithm and show the convergence using ODE method. We tested this algorithm on a scheduling problem. Here an incoming job is scheduled into one of the four queues based on the queue lengths. We see from experimental results that algorithm using Nesterov's acceleration has significantly better performance compared to algorithm which do not use acceleration. To the best of our knowledge this is the first time Nesterov's acceleration has been used with actor-critic algorithm.