国产裸体美女永久免费无遮挡久久-久久男人免费视频

Randomized iterative algorithms have attracted much attention in recent years because they can approximately solve large-scale linear systems of equations without accessing the entire coefficient matrix. In this paper, we propose two novel pseudoinverse-free randomized block iterative algorithms for solving consistent and inconsistent linear systems. The proposed algorithms require two user-defined random matrices: one for row sampling and the other for column sampling. We can recover the well-known doubly stochastic Gauss--Seidel, randomized Kaczmarz, randomized coordinate descent, and randomized extended Kaczmarz algorithms by choosing appropriate random matrices used in our algorithms. Because our algorithms allow for a much wider selection of these two random matrices, a number of new specific algorithms can be obtained. We prove the linear convergence in the mean square sense of our algorithms. Numerical experiments for linear systems with synthetic and real-world coefficient matrices demonstrate the efficiency of some special cases of our algorithms.

相關內容

線性的

關注 1

非凸 · 可約的 · 最優化 · 學成 · Performer ·

2021 年 12 月 17 日

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Feihu Huang,Junyi Li,Heng Huang

from arxiv, NeurIPS 2021

Adaptive gradient methods have shown excellent performances for solving many machine learning problems. Although multiple adaptive gradient methods were recently studied, they mainly focus on either empirical or theoretical aspects and also only work for specific problems by using some specific adaptive learning rates. Thus, it is desired to design a universal framework for practical algorithms of adaptive gradients with theoretical guarantee to solve general problems. To fill this gap, we propose a faster and universal framework of adaptive gradients (i.e., SUPER-ADAM) by introducing a universal adaptive matrix that includes most existing adaptive gradient forms. Moreover, our framework can flexibly integrate the momentum and variance reduced techniques. In particular, our novel framework provides the convergence analysis support for adaptive gradient methods under the nonconvex setting. In theoretical analysis, we prove that our SUPER-ADAM algorithm can achieve the best known gradient (i.e., stochastic first-order oracle (SFO)) complexity of $\tilde{O}(\epsilon^{-3})$ for finding an $\epsilon$-stationary point of nonconvex optimization, which matches the lower bound for stochastic smooth nonconvex optimization. In numerical experiments, we employ various deep learning tasks to validate that our algorithm consistently outperforms the existing adaptive algorithms. Code is available at //github.com/LIJUNYI95/SuperAdam

Weight · CASES · 類別 · 相似度 · 劃分 ·

2021 年 12 月 17 日

Morphisms and minimization of weighted automata

Sylvain Lombardy,Jacques Sakarovitch

This paper studies the algorithms for the minimisation of weighted automata. It starts with the definition of morphisms-which generalises and unifies the notion of bisimulation to the whole class of weighted automata-and the unicity of a minimal quotient for every automaton, obtained by partition refinement. From a general scheme for the refinement of partitions, two strategies are considered for the computation of the minimal quotient: the Domain Split and the Predecesor Class Split algorithms. They correspond respectivly to the classical Moore and Hopcroft algorithms for the computation of the minimal quotient of deterministic Boolean automata. We show that these two strategies yield algorithms with the same quadratic complexity and we study the cases when the second one can be improved in order to achieve a complexity similar to the one of Hopcroft algorithm.

可約的 · 方差 · 線性的 · 正則化項 · 方差減小 ·

2021 年 12 月 17 日

An Analysis of Stochastic Variance Reduced Gradient for Linear Inverse Problems

Bangti Jin,Zehui Zhou,Jun Zou

from arxiv, 31 pages, 2 figures, to appear at Inverse Problems

Stochastic variance reduced gradient (SVRG) is a popular variance reduction technique for accelerating stochastic gradient descent (SGD). We provide a first analysis of the method for solving a class of linear inverse problems in the lens of the classical regularization theory. We prove that for a suitable constant step size schedule, the method can achieve an optimal convergence rate in terms of the noise level (under suitable regularity condition) and the variance of the SVRG iterate error is smaller than that by SGD. These theoretical findings are corroborated by a set of numerical experiments.

非凸 · 統計量 · 學成 · 泛函 · 估計/估計量 ·

2021 年 12 月 16 日

Analysis of Generalized Bregman Surrogate Algorithms for Nonsmooth Nonconvex Statistical Learning

Yiyuan She,Zhifeng Wang,Jiuwu Jin

Modern statistical applications often involve minimizing an objective function that may be nonsmooth and/or nonconvex. This paper focuses on a broad Bregman-surrogate algorithm framework including the local linear approximation, mirror descent, iterative thresholding, DC programming and many others as particular instances. The recharacterization via generalized Bregman functions enables us to construct suitable error measures and establish global convergence rates for nonconvex and nonsmooth objectives in possibly high dimensions. For sparse learning problems with a composite objective, under some regularity conditions, the obtained estimators as the surrogate's fixed points, though not necessarily local minimizers, enjoy provable statistical guarantees, and the sequence of iterates can be shown to approach the statistical truth within the desired accuracy geometrically fast. The paper also studies how to design adaptive momentum based accelerations without assuming convexity or smoothness by carefully controlling stepsize and relaxation parameters.

估計/估計量 · contrastive · 混合 · 優化器 · 線性的 ·

2021 年 12 月 16 日

Nonparametric empirical Bayes estimation based on generalized Laguerre series

Rida Benhaddou,Matthew Connell

from arxiv, 30 pages

In this work, we delve into the nonparametric empirical Bayes theory and approximate the classical Bayes estimator by a truncation of the generalized Laguerre series and then estimate its coefficients by minimizing the prior risk of the estimator. The minimization process yields a system of linear equations the size of which is equal to the truncation level. We focus on the empirical Bayes estimation problem when the mixing distribution, and therefore the prior distribution, has a support on the positive real half-line or a subinterval of it. By investigating several common mixing distributions, we develop a strategy on how to select the parameter of the generalized Laguerre function basis so that our estimator {possesses a finite} variance. We show that our generalized Laguerre empirical Bayes approach is asymptotically optimal in the minimax sense. Finally, our convergence rate is compared and contrasted with {several} results from the literature.

正則化項 · 線性的 · 方陣 · 極小點 · MoDELS ·

2021 年 12 月 16 日

Randomized regularized extended Kaczmarz algorithms for tensor recovery

Kui Du,Xiao-Hui Sun

from arxiv, 17 pages, 2 figures

Randomized regularized Kaczmarz algorithms have recently been proposed to solve tensor recovery models with {\it consistent} linear measurements. In this work, we propose a novel algorithm based on the randomized extended Kaczmarz algorithm (which converges linearly in expectation to the unique minimum norm least squares solution of a linear system) for tensor recovery models with {\it inconsistent} linear measurements. We prove the linear convergence in expectation of our algorithm. Numerical experiments on a tensor least squares problem and a sparse tensor recovery problem are given to illustrate the theoretical results.

UniFormer · 統計量 · 后驗分布 · 可辨認的 · 啟發式算法 ·

2021 年 12 月 15 日

Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

Jacob Nogas,Tong Li,Fernando J. Yanez,Arghavan Modiri,Nina Deliu,Ben Prystawski,Sofia S. Villar,Anna Rafferty,Joseph J. Williams

Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments, in which maximizing reward means that data is used to progressively assign more participants to more effective arms. Such assignment strategies increase the risk of statistical hypothesis tests identifying a difference between arms when there is not one, and failing to conclude there is a difference in arms when there truly is one. We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis, with the benefits of reward maximization achieved by Thompson Sampling (TS). First, Top-Two Thompson Sampling adds a fixed amount of uniform random allocation (UR) spread evenly over time. Second, a novel heuristic algorithm, called TS PostDiff (Posterior Probability of Difference). TS PostDiff takes a Bayesian approach to mixing TS and UR: the probability a participant is assigned using UR allocation is the posterior probability that the difference between two arms is `small' (below a certain threshold), allowing for more UR exploration when there is little or no reward to be gained. We find that TS PostDiff method performs well across multiple effect sizes, and thus does not require tuning based on a guess for the true effect size.

優化器 · CC · 超參數 · Performer · Better ·

2021 年 12 月 15 日

Provably Faster Algorithms for Bilevel Optimization

Junjie Yang,Kaiyi Ji,Yingbin Liang

from arxiv, This paper is accepted in NeurIPS 2021

Bilevel optimization has been widely applied in many important machine learning applications such as hyperparameter optimization and meta-learning. Recently, several momentum-based algorithms have been proposed to solve bilevel optimization problems faster. However, those momentum-based algorithms do not achieve provably better computational complexity than $\mathcal{\widetilde O}(\epsilon^{-2})$ of the SGD-based algorithm. In this paper, we propose two new algorithms for bilevel optimization, where the first algorithm adopts momentum-based recursive iterations, and the second algorithm adopts recursive gradient estimations in nested loops to decrease the variance. We show that both algorithms achieve the complexity of $\mathcal{\widetilde O}(\epsilon^{-1.5})$, which outperforms all existing algorithms by the order of magnitude. Our experiments validate our theoretical results and demonstrate the superior empirical performance of our algorithms in hyperparameter applications.

Neural Networks · 優化器 · Networks · 局部極小 · Networking ·

2019 年 12 月 19 日

Optimization for deep learning: theory and algorithms

Ruoyu Sun

from arxiv, 38 pages of main body; 5 pages of appendix; 12 pages of references

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

坐標下降 · 優化器 · Performer · 學成 · 在線 ·

2018 年 7 月 16 日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Akshita Bhandari,Chandramani Singh

from arxiv, 20 pages, 4 figures, 2 tables

We propose accelerated randomized coordinate descent algorithms for stochastic optimization and online learning. Our algorithms have significantly less per-iteration complexity than the known accelerated gradient algorithms. The proposed algorithms for online learning have better regret performance than the known randomized online coordinate descent algorithms. Furthermore, the proposed algorithms for stochastic optimization exhibit as good convergence rates as the best known randomized coordinate descent algorithms. We also show simulation results to demonstrate performance of the proposed algorithms.