久草精品视频在线观看_伊人亚洲综合青草青草久热_成人高清视频在线观看_国产伦精品一区二区三区免费迷奷_暴雨入室侵犯进出肉体免费观看_国产亚洲欧美在线_国产A√无码专区亚洲Av

We introduce a streaming framework for analyzing stochastic approximation/optimization problems. This streaming framework is analogous to solving optimization problems using time-varying mini-batches that arrive sequentially. We provide non-asymptotic convergence rates of various gradient-based algorithms; this includes the famous Stochastic Gradient (SG) descent (a.k.a. Robbins-Monro algorithm), mini-batch SG and time-varying mini-batch SG algorithms, as well as their iterated averages (a.k.a. Polyak-Ruppert averaging). We show i) how to accelerate convergence by choosing the learning rate according to the time-varying mini-batches, ii) that Polyak-Ruppert averaging achieves optimal convergence in terms of attaining the Cramer-Rao lower bound, and iii) how time-varying mini-batches together with Polyak-Ruppert averaging can provide variance reduction and accelerate convergence simultaneously, which is advantageous for many learning problems, such as online, sequential, and large-scale learning. We further demonstrate these favorable effects for various time-varying mini-batches.

相關內容

流

關注 1

優化器 · Learning · 泛函 · 情景 · 替代函數 ·

2023 年 6 月 8 日

Target-based Surrogates for Stochastic Optimization

Jonathan Wilder Lavington,Sharan Vaswani,Reza Babanezhad,Mark Schmidt,Nicolas Le Roux

We consider minimizing functions for which it is expensive to compute the (possibly stochastic) gradient. Such functions are prevalent in reinforcement learning, imitation learning and adversarial training. Our target optimization framework uses the (expensive) gradient computation to construct surrogate functions in a \emph{target space} (e.g. the logits output by a linear model for classification) that can be minimized efficiently. This allows for multiple parameter updates to the model, amortizing the cost of gradient computation. In the full-batch setting, we prove that our surrogate is a global upper-bound on the loss, and can be (locally) minimized using a black-box optimization algorithm. We prove that the resulting majorization-minimization algorithm ensures convergence to a stationary point of the loss. Next, we instantiate our framework in the stochastic setting and propose the $SSO$ algorithm, which can be viewed as projected stochastic gradient descent in the target space. This connection enables us to prove theoretical guarantees for $SSO$ when minimizing convex functions. Our framework allows the use of standard stochastic optimization algorithms to construct surrogates which can be minimized by any deterministic optimization method. To evaluate our framework, we consider a suite of supervised learning and imitation learning problems. Our experiments indicate the benefits of target optimization and the effectiveness of $SSO$.

Weight · 隨機梯度下降 · 縮放 · 分解的 · SimPLe ·

2023 年 6 月 8 日

Attentional-Biased Stochastic Gradient Descent

Qi Qi,Yi Xu,Rong Jin,Wotao Yin,Tianbao Yang

from arxiv, 29 pages

In this paper, we present a simple yet effective provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. The individual-level weight of sampled data is systematically proportional to the exponential of a scaled loss value of the data, where the scaling factor is interpreted as the regularization parameter in the framework of distributionally robust optimization (DRO). Depending on whether the scaling factor is positive or negative, ABSGD is guaranteed to converge to a stationary point of an information-regularized min-max or min-min DRO problem, respectively. Compared with existing class-level weighting schemes, our method can capture the diversity between individual examples within each class. Compared with existing individual-level weighting methods using meta-learning that require three backward propagations for computing mini-batch stochastic gradients, our method is more efficient with only one backward propagation at each iteration as in standard deep learning methods. ABSGD is flexible enough to combine with other robust losses without any additional cost. Our empirical studies on several benchmark datasets demonstrate the effectiveness of the proposed method.\footnote{Code is available at:\url{//github.com/qiqi-helloworld/ABSGD/}}

近似 · 簇 · 邊 · 圖 · 標注 ·

2023 年 6 月 8 日

Faster Approximation Algorithms for Parameterized Graph Clustering and Edge Labeling

Vedangi Bengali,Nate Veldt

from arxiv, 12 pages,4 figures

Graph clustering is a fundamental task in network analysis where the goal is to detect sets of nodes that are well-connected to each other but sparsely connected to the rest of the graph. We present faster approximation algorithms for an NP-hard parameterized clustering framework called LambdaCC, which is governed by a tunable resolution parameter and generalizes many other clustering objectives such as modularity, sparsest cut, and cluster deletion. Previous LambdaCC algorithms are either heuristics with no approximation guarantees, or computationally expensive approximation algorithms. We provide fast new approximation algorithms that can be made purely combinatorial. These rely on a new parameterized edge labeling problem we introduce that generalizes previous edge labeling problems that are based on the principle of strong triadic closure and are of independent interest in social network analysis. Our methods are orders of magnitude more scalable than previous approximation algorithms and our lower bounds allow us to obtain a posteriori approximation guarantees for previous heuristics that have no approximation guarantees of their own.

閾值 · 線性的 · 稀疏 · 貪心逐層預訓練 · Performer ·

2023 年 6 月 7 日

Stochastic Natural Thresholding Algorithms

Rachel Grotheer,Shuang Li,Anna Ma,Deanna Needell,Jing Qin

Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and discusses convergence guarantees for stochastic natural thresholding algorithms by extending the NT from the deterministic version with linear measurements to the stochastic version with a general objective function. We also conduct various numerical experiments on linear and nonlinear measurements to demonstrate the performance of StoNT.

端到端 · Learning · 最優化 · 優化器 · 經驗風險最小化 ·

2023 年 6 月 7 日

End-to-End Learning for Stochastic Optimization: A Bayesian Perspective

Yves Rychener,Daniel Kuhn Tobias Sutter

from arxiv, Accepted at ICML 2023

We develop a principled approach to end-to-end learning in stochastic optimization. First, we show that the standard end-to-end learning algorithm admits a Bayesian interpretation and trains a posterior Bayes action map. Building on the insights of this analysis, we then propose new end-to-end learning algorithms for training decision maps that output solutions of empirical risk minimization and distributionally robust optimization problems, two dominant modeling paradigms in optimization under uncertainty. Numerical results for a synthetic newsvendor problem illustrate the key differences between alternative training schemes. We also investigate an economic dispatch problem based on real data to showcase the impact of the neural network architecture of the decision maps on their test performance.

類別 · 優化器 · 泛函 · Performer · 最速下降法 ·

2023 年 6 月 6 日

Complexity of a Class of First-Order Objective-Function-Free Optimization Algorithms

S. Gratton,S. Jerad,Ph. L. Toint

A parametric class of trust-region algorithms for unconstrained nonconvex optimization is considered where the value of the objective function is never computed. The class contains a deterministic version of the first-order Adagrad method typically used for minimization of noisy function, but also allows the use of (possibly approximate) second-order information when available. The rate of convergence of methods in the class is analyzed and is shown to be identical to that known for first-order optimization methods using both function and gradients values, recovering existing results for purely-first order variants and improving the explicit dependence on problem dimension. This rate is shown to be essentially sharp. A new class of methods is also presented, for which a slightly worse and essentially sharp complexity result holds. Limited numerical experiments show that the new methods' performance may be comparable to that of standard steepest descent, despite using significantly less information, and that this performance is relatively insensitive to noise.

Learning · 泛化理論 · 最優化 · 近似 · 表示 ·

2023 年 6 月 6 日

A Survey of Learning on Small Data: Generalization, Optimization, and Challenge

Xiaofeng Cao,Weixin Bu,Shengjun Huang,Minling Zhang,Ivor W. Tsang,Yew Soon Ong,James T. Kwok

Learning on big data brings success for artificial intelligence (AI), but the annotation and training costs are expensive. In future, learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans. A series of learning topics is going on this way such as active learning and few-shot learning. However, there are few theoretical guarantees for their generalization performance. Moreover, most of their settings are passive, that is, the label distribution is explicitly controlled by finite training resources from known distributions. This survey follows the agnostic active sampling theory under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data in model-agnostic supervised and unsupervised fashion. Considering multiple learning communities could produce small data representation and related topics have been well surveyed, we thus subjoin novel geometric representation perspectives for small data: the Euclidean and non-Euclidean (hyperbolic) mean, where the optimization solutions including the Euclidean gradients, non-Euclidean gradients, and Stein gradient are presented and discussed. Later, multiple learning communities that may be improved by learning on small data are summarized, which yield data-efficient representations, such as transfer learning, contrastive learning, graph representation learning. Meanwhile, we find that the meta-learning may provide effective parameter update policies for learning on small data. Then, we explore multiple challenging scenarios for small data, such as the weak supervision and multi-label. Finally, multiple data applications that may benefit from efficient small data representation are surveyed.

非凸 · 優化器 · Lipschitz · 駐點 · 平穩的 ·

2023 年 6 月 6 日

Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

Lesi Chen,Jing Xu,Luo Luo

from arxiv, ICML 2023

We consider the optimization problem of the form $\min_{x \in \mathbb{R}^d} f(x) \triangleq \mathbb{E}_{\xi} [F(x; \xi)]$, where the component $F(x;\xi)$ is $L$-mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most $\mathcal{O}( L^4 d^{3/2} \epsilon^{-4} + \Delta L^3 d^{3/2} \delta^{-1} \epsilon^{-4})$ stochastic zeroth-order oracle complexity to find a $(\delta,\epsilon)$-Goldstein stationary point of objective function, where $\Delta = f(x_0) - \inf_{x \in \mathbb{R}^d} f(x)$ and $x_0$ is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} \epsilon^{-3}+ \Delta L^2 d^{3/2} \delta^{-1} \epsilon^{-3})$.

統計量 · 近似 · 后向 · AIM · Processing（編程語言） ·

2023 年 6 月 6 日

Statistics for stochastic differential equations and approximations of resolvent

Jun Ohkubo

from arxiv, 13 pages, 5 figures

The numerical evaluation of statistics plays a crucial role in statistical physics and its applied fields. It is possible to evaluate the statistics for a stochastic differential equation with Gaussian white noise via the corresponding backward Kolmogorov equation. The important notice is that there is no need to obtain the solution of the backward Kolmogorov equation on the whole domain; it is enough to evaluate a value of the solution at a certain point that corresponds to the initial coordinate for the stochastic differential equation. For this aim, an algorithm based on combinatorics has recently been developed. In this paper, we discuss a higher-order approximation of resolvent, and an algorithm based on a second-order approximation is proposed. The proposed algorithm shows a second-order convergence. Furthermore, the convergence property of the naive algorithms naturally leads to extrapolation methods; they work well to calculate a more accurate value with fewer computational costs. The proposed method is demonstrated with the Ornstein-Uhlenbeck process and the noisy van der Pol system.

Learning · 泛化理論 · 概率近似正確 · 泛化誤差 · 少試學習 ·

2022 年 7 月 29 日

A Survey of Learning on Small Data

Xiaofeng Cao,Weixin Bu,Shengjun Huang,Yingpeng Tang,Yaming Guo,Yi Chang,Ivor W. Tsang

Learning on big data brings success for artificial intelligence (AI), but the annotation and training costs are expensive. In future, learning on small data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans. A series of machine learning models is going on this way such as active learning, few-shot learning, deep clustering. However, there are few theoretical guarantees for their generalization performance. Moreover, most of their settings are passive, that is, the label distribution is explicitly controlled by one specified sampling scenario. This survey follows the agnostic active sampling under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data using a supervised and unsupervised fashion. With these theoretical analyses, we categorize the small data learning models from two geometric perspectives: the Euclidean and non-Euclidean (hyperbolic) mean representation, where their optimization solutions are also presented and discussed. Later, some potential learning scenarios that may benefit from small data learning are then summarized, and their potential learning scenarios are also analyzed. Finally, some challenging applications such as computer vision, natural language processing that may benefit from learning on small data are also surveyed.