日本一区二区三区不卡网站,亚洲无码精品动漫啪啪一区二区,国内自拍视频一视频二,日本人妖一区二区三区

In the paper, we propose a class of faster adaptive Gradient Descent Ascent (GDA) methods for solving the nonconvex-strongly-concave minimax problems based on unified adaptive matrices, which include almost existing coordinate-wise and global adaptive learning rates. Specifically, we propose a fast Adaptive Gradient Decent Ascent (AdaGDA) method based on the basic momentum technique, which reaches a lower gradient complexity of $O(\kappa^4\epsilon^{-4})$ for finding an $\epsilon$-stationary point without large batches, which improves the results of the existing adaptive GDA methods by a factor of $O(\sqrt{\kappa})$. At the same time, we present an accelerated version of AdaGDA (VR-AdaGDA) method based on the momentum-based variance reduced technique, which achieves a lower gradient complexity of $O(\kappa^{4.5}\epsilon^{-3})$ for finding an $\epsilon$-stationary point without large batches, which improves the results of the existing adaptive GDA methods by a factor of $O(\epsilon^{-1})$. Moreover, we prove that our VR-AdaGDA method reaches the best known gradient complexity of $O(\kappa^{3}\epsilon^{-3})$ with the mini-batch size $O(\kappa^3)$. In particular, we provide an effective convergence analysis framework for our adaptive GDA methods. Some experimental results on policy evaluation and fair classifier tasks demonstrate the efficiency of our algorithms.

相關內容

可約的

關注 2

局部極小 · 極小值 · 鞍點 · 極小點 · 非凸 ·

2022 年 4 月 20 日

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Zixiang Chen,Dongruo Zhou,Quanquan Gu

from arxiv, 29 pages, 1 figure, 1 table. In ALT 2022

Escaping from saddle points and finding local minimum is a central problem in nonconvex optimization. Perturbed gradient methods are perhaps the simplest approach for this problem. However, to find $(\epsilon, \sqrt{\epsilon})$-approximate local minima, the existing best stochastic gradient complexity for this type of algorithms is $\tilde O(\epsilon^{-3.5})$, which is not optimal. In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima. We show that LENA with stochastic gradient estimators such as SARAH/SPIDER and STORM can find $(\epsilon, \epsilon_{H})$-approximate local minima within $\tilde O(\epsilon^{-3} + \epsilon_{H}^{-6})$ stochastic gradient evaluations (or $\tilde O(\epsilon^{-3})$ when $\epsilon_H = \sqrt{\epsilon}$). The core idea of our framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.

鞍點 · Lipschitz連續 · 易處理的 · Continuity · Lipschitz ·

2022 年 4 月 19 日

Stochastic Saddle Point Problems with Decision-Dependent Distributions

Killian Wood,Emiliano Dall'Anese

This paper focuses on stochastic saddle point problems with decision-dependent distributions. These are problems whose objective is the expected value of a stochastic payoff function, where random variables are drawn from a distribution induced by a distributional map. For general distributional maps, the problem of finding saddle points is in general computationally burdensome, even if the distribution is known. To enable a tractable solution approach, we introduce the notion of equilibrium points -- which are saddle points for the stationary stochastic minimax problem that they induce -- and provide conditions for their existence and uniqueness. We demonstrate that the distance between the two solution types is bounded provided that the objective has a strongly-convex-strongly-concave payoff and a Lipschitz continuous distributional map. We develop deterministic and stochastic primal-dual algorithms and demonstrate their convergence to the equilibrium point. In particular, by modeling errors emerging from a stochastic gradient estimator as sub-Weibull random variables, we provide error bounds in expectation and in high probability that hold for each iteration. Moreover, we show convergence to a neighborhood almost surely. Finally, we investigate a condition on the distributional map -- which we call opposing mixture dominance -- that ensures that the objective is strongly-convex-strongly-concave. We tailor the convergence results for the primal-dual algorithms to this opposing mixture dominance setup.

估計/估計量 · Kronecker積 · 協方差矩陣 · Performer · 正則化項 ·

2022 年 4 月 18 日

Covariance Estimation for Matrix-valued Data

Yichi Zhang,Weining Shen,Dehan Kong

Covariance estimation for matrix-valued data has received an increasing interest in applications. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, we propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure. Under these conditions, the original covariance matrix is decomposed into a Kronecker product of two bandable small covariance matrices representing the variability over row and column directions. We formulate a unified framework for estimating bandable covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation. The convergence rates of the proposed estimators are established, and the derived minimax lower bound shows our proposed estimator is rate-optimal under certain divergence regimes of matrix size. We further introduce a class of robust covariance estimators and provide theoretical guarantees to deal with heavy-tailed data. We demonstrate the superior finite-sample performance of our methods using simulations and real applications from a gridded temperature anomalies dataset and a S&P 500 stock data analysis.

優化器 · 有向 · Networking · 可約的 · 線性的 ·

2022 年 4 月 18 日

On Arbitrary Compression for Decentralized Consensus and Stochastic Optimization over Directed Networks

Mohammad Taha Toghani,César A. Uribe

We study the decentralized consensus and stochastic optimization problems with compressed communications over static directed graphs. We propose an iterative gradient-based algorithm that compresses messages according to a desired compression ratio. The proposed method provably reduces the communication overhead on the network at every communication round. Contrary to existing literature, we allow for arbitrary compression ratios in the communicated messages. We show a linear convergence rate for the proposed method on the consensus problem. Moreover, we provide explicit convergence rates for decentralized stochastic optimization problems on smooth functions that are either (i) strongly convex, (ii) convex, or (iii) non-convex. Finally, we provide numerical experiments to illustrate convergence under arbitrary compression ratios and the communication efficiency of our algorithm.

離散化 · 確切的 · 論文 · 數值分析 ·

2022 年 4 月 17 日

Adomian Decomposition Based Numerical Scheme for Flow Simulations

Imanol Garcia-Beristain,Lakhdar Remaki

This paper proposes a numerical method based on the Adomian decomposition approach for the time discretization, applied to Euler equations. A recursive property is demonstrated that allows to formulate the method in an appropriate and efficient way. To obtain a fully numerical scheme, the space discretization is achieved using the classical DG techniques. The efficiency of the obtained numerical scheme is demonstrated through numerical tests by comparison to exact solution and the popular Runge-Kutta DG method results.

方差減小 · 可約的 · 方差 · 優化器 · Batch Size ·

2022 年 4 月 16 日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Yuri Kinoshita,Taiji Suzuki

The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications. Especially, its variance reduced versions have nowadays gained particular attention. In this paper, we study two variants of this kind, namely, the Stochastic Variance Reduced Gradient Langevin Dynamics and the Stochastic Recursive Gradient Langevin Dynamics. We prove their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality which are weaker conditions than those used in prior works for these algorithms. With the batch size and the inner loop length set to $\sqrt{n}$, the gradient complexity to achieve an $\epsilon$-precision is $\tilde{O}((n+dn^{1/2}\epsilon^{-1})\gamma^2 L^2\alpha^{-2})$, which is an improvement from any previous analyses. We also show some essential applications of our result to non-convex optimization.

估計/估計量 · 統計量 · 優化器 · 穩健性 · 噪聲 ·

2022 年 4 月 16 日

Computationally Efficient and Statistically Optimal Robust Low-rank Matrix Estimation

Yinan Shen,Jingyang Li,Jian-Feng Cai,Dong Xia

Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.

估計/估計量 · FPG · PG · 估計誤差 · 價值函數 ·

2022 年 4 月 15 日

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Chengzhuo Ni,Ruiqi Zhang,Xiang Ji,Xuezhou Zhang,Mengdi Wang

Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy. Conventional methods for off-policy PG estimation often suffer from either significant bias or exponentially large variance. In this paper, we propose the double Fitted PG estimation (FPG) algorithm. FPG can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. In the case of linear value function approximation, we provide a tight finite-sample upper bound on policy gradient estimation error, that is governed by the amount of distribution mismatch measured in feature space. We also establish the asymptotic normality of FPG estimation error with a precise covariance characterization, which is further shown to be statistically optimal with a matching Cramer-Rao lower bound. Empirically, we evaluate the performance of FPG on both policy gradient estimation and policy optimization, using either softmax tabular or ReLU policy networks. Under various metrics, our results show that FPG significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.

優化器 · 可約的 · Extensibility · 層 · 核化 ·

2022 年 4 月 15 日

Energy-Efficient Data Transfer Optimization via Decision-Tree Based Uncertainty Reduction

Hasibul Jamil,Lavone Rodolph,Jacob Goldverg,Tevfik Kosar

from arxiv, 10 pages submitted to IEEE ICCCN2022

The increase and rapid growth of data produced by scientific instruments, the Internet of Things (IoT), and social media is causing data transfer performance and resource consumption to garner much attention in the research community. The network infrastructure and end systems that enable this extensive data movement use a substantial amount of electricity, measured in terawatt-hours per year. Managing energy consumption within the core networking infrastructure is an active research area, but there is a limited amount of work on reducing power consumption at the end systems during active data transfers. This paper presents a novel two-phase dynamic throughput and energy optimization model that utilizes an offline decision-search-tree based clustering technique to encapsulate and categorize historical data transfer log information and an online search optimization algorithm to find the best application and kernel layer parameter combination to maximize the achieved data transfer throughput while minimizing the energy consumption. Our model also incorporates an ensemble method to reduce aleatoric uncertainty in finding optimal application and kernel layer parameters during the offline analysis phase. The experimental evaluation results show that our decision-tree based model outperforms the state-of-the-art solutions in this area by achieving 117% higher throughput on average and also consuming 19% less energy at the end systems during active data transfers.

穩健性 · FAST · 異常點 · 線性的 · 原點 ·

2022 年 4 月 15 日

Structured Gradient Descent for Fast Robust Low-Rank Hankel Matrix Completion

HanQin Cai,Jian-Feng Cai,Juntao You

We study the robust matrix completion problem for the low-rank Hankel matrix, which detects the sparse corruptions caused by extreme outliers while we try to recover the original Hankel matrix from the partial observation. In this paper, we explore the convenient Hankel structure and propose a novel non-convex algorithm, coined Hankel Structured Gradient Descent (HSGD), for large-scale robust Hankel matrix completion problems. HSGD is highly computing- and sample-efficient compared to the state-of-the-arts. The recovery guarantee with a linear convergence rate has been established for HSGD under some mild assumptions. The empirical advantages of HSGD are verified on both synthetic datasets and real-world nuclear magnetic resonance signals.