亚洲乱色熟女一区二区三区麻豆,国产污片在线观看网站,美洲天堂一区二区三区VA在线

We consider a stochastic version of the proximal point algorithm for optimization problems posed on a Hilbert space. A typical application of this is supervised learning. While the method is not new, it has not been extensively analyzed in this form. Indeed, most related results are confined to the finite-dimensional setting, where error bounds could depend on the dimension of the space. On the other hand, the few existing results in the infinite-dimensional setting only prove very weak types of convergence, owing to weak assumptions on the problem. In particular, there are no results that show convergence with a rate. In this article, we bridge these two worlds by assuming more regularity of the optimization problem, which allows us to prove convergence with an (optimal) sub-linear rate also in an infinite-dimensional setting. In particular, we assume that the objective function is the expected value of a family of convex differentiable functions. While we require that the full objective function is strongly convex, we do not assume that its constituent parts are so. Further, we require that the gradient satisfies a weak local Lipschitz continuity property, where the Lipschitz constant may grow polynomially given certain guarantees on the variance and higher moments near the minimum. We illustrate these results by discretizing a concrete infinite-dimensional classification problem with varying degrees of accuracy.

相關內容

Lipschitz

關注 0

Lipschitz · 優化器 · 均勻穩定性 · 經驗風險最小化 · 經驗風險 ·

2021 年 11 月 18 日

Stability and Deviation Optimal Risk Bounds with Convergence Rate $O(1/n)$

Yegor Klochkov,Nikita Zhivotovskiy

from arxiv, 12 pages; presented at NeurIPS

The sharpest known high probability generalization bounds for uniformly stable algorithms (Feldman, Vondr\'{a}k, 2018, 2019), (Bousquet, Klochkov, Zhivotovskiy, 2020) contain a generally inevitable sampling error term of order $\Theta(1/\sqrt{n})$. When applied to excess risk bounds, this leads to suboptimal results in several standard stochastic convex optimization problems. We show that if the so-called Bernstein condition is satisfied, the term $\Theta(1/\sqrt{n})$ can be avoided, and high probability excess risk bounds of order up to $O(1/n)$ are possible via uniform stability. Using this result, we show a high probability excess risk bound with the rate $O(\log n/n)$ for strongly convex and Lipschitz losses valid for \emph{any} empirical risk minimization method. This resolves a question of Shalev-Shwartz, Shamir, Srebro, and Sridharan (2009). We discuss how $O(\log n/n)$ high probability excess risk bounds are possible for projected gradient descent in the case of strongly convex and Lipschitz losses without the usual smoothness assumption.

共軛梯度 · 共軛 · 正則化項 · CASE · 操作 ·

2021 年 11 月 17 日

Convergence of the conjugate gradient method with unbounded operators

Noe Caruso,Alessandro Michelangeli

from arxiv, To appear in Operators and Matrices

In the framework of inverse linear problems on infinite-dimensional Hilbert space, we prove the convergence of the conjugate gradient iterates to an exact solution to the inverse problem in the most general case where the self-adjoint, non-negative operator is unbounded and with minimal, technically unavoidable assumptions on the initial guess of the iterative algorithm. The convergence is proved to always hold in the Hilbert space norm (error convergence), as well as at other levels of regularity (energy norm, residual, etc.) depending on the regularity of the iterates. We also discuss, both analytically and through a selection of numerical tests, the main features and differences of our convergence result as compared to the case, already available in the literature, where the operator is bounded.

重要性采樣 · Lipschitz · 樣本 · CASES · state-of-the-art ·

2021 年 11 月 16 日

Stochastic Extragradient: General Analysis and Improved Rates

Eduard Gorbunov,Hugo Berard,Gauthier Gidel,Nicolas Loizou

from arxiv, 50 pages, 3 figures, 2 tables

The Stochastic Extragradient (SEG) method is one of the most popular algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. However, several important questions regarding the convergence properties of SEG are still open, including the sampling of stochastic gradients, mini-batching, convergence guarantees for the monotone finite-sum variational inequalities with possibly non-monotone terms, and others. To address these questions, in this paper, we develop a novel theoretical framework that allows us to analyze several variants of SEG in a unified manner. Besides standard setups, like Same-Sample SEG under Lipschitzness and monotonicity or Independent-Samples SEG under uniformly bounded variance, our approach allows us to analyze variants of SEG that were never explicitly considered in the literature before. Notably, we analyze SEG with arbitrary sampling which includes importance sampling and various mini-batching strategies as special cases. Our rates for the new variants of SEG outperform the current state-of-the-art convergence guarantees and rely on less restrictive assumptions.

穩健性 · 塊 · 優化器 · MoDELS · 優化地形 ·

2021 年 11 月 16 日

Robust recovery for stochastic block models

Jingqiu Ding,Tommaso d'Orsi,Rajai Nasser,David Steurer

from arxiv, 203 pages, to appear in FOCS 2021

We develop an efficient algorithm for weak recovery in a robust version of the stochastic block model. The algorithm matches the statistical guarantees of the best known algorithms for the vanilla version of the stochastic block model. In this sense, our results show that there is no price of robustness in the stochastic block model. Our work is heavily inspired by recent work of Banks, Mohanty, and Raghavendra (SODA 2021) that provided an efficient algorithm for the corresponding distinguishing problem. Our algorithm and its analysis significantly depart from previous ones for robust recovery. A key challenge is the peculiar optimization landscape underlying our algorithm: The planted partition may be far from optimal in the sense that completely unrelated solutions could achieve the same objective value. This phenomenon is related to the push-out effect at the BBP phase transition for PCA. To the best of our knowledge, our algorithm is the first to achieve robust recovery in the presence of such a push-out effect in a non-asymptotic setting. Our algorithm is an instantiation of a framework based on convex optimization (related to but distinct from sum-of-squares), which may be useful for other robust matrix estimation problems. A by-product of our analysis is a general technique that boosts the probability of success (over the randomness of the input) of an arbitrary robust weak-recovery algorithm from constant (or slowly vanishing) probability to exponentially high probability.

子空間 · 估計/估計量 · 估計誤差 · 泛函 · 方陣 ·

2021 年 11 月 16 日

Learning functions varying along a central subspace

Hao Liu,Wenjing Liao

Many functions of interest are in a high-dimensional space but exhibit low-dimensional structures. This paper studies regression of a $s$-H\"{o}lder function $f$ in $\mathbb{R}^D$ which varies along a central subspace of dimension $d$ while $d\ll D$. A direct approximation of $f$ in $\mathbb{R}^D$ with an $\varepsilon$ accuracy requires the number of samples $n$ in the order of $\varepsilon^{-(2s+D)/s}$. In this paper, we analyze the Generalized Contour Regression (GCR) algorithm for the estimation of the central subspace and use piecewise polynomials for function approximation. GCR is among the best estimators for the central subspace, but its sample complexity is an open question. We prove that GCR leads to a mean squared estimation error of $O(n^{-1})$ for the central subspace, if a variance quantity is exactly known. The estimation error of this variance quantity is also given in this paper. The mean squared regression error of $f$ is proved to be in the order of $\left(n/\log n\right)^{-\frac{2s}{2s+d}}$ where the exponent depends on the dimension of the central subspace $d$ instead of the ambient space $D$. This result demonstrates that GCR is effective in learning the low-dimensional central subspace. We also propose a modified GCR with improved efficiency. The convergence rate is validated through several numerical experiments.

Continuity · 二階導數 · 泛函 · 近似 · 數值分析 ·

2021 年 11 月 16 日

The convergence of the Regula Falsi method

Trung Nguyen

Regula Falsi, or the method of false position, is a numerical method for finding an approximate solution to f(x) = 0 on a finite interval [a, b], where f is a real-valued continuous function on [a, b] and satisfies f(a)f(b) < 0. Previous studies proved the convergence of this method under certain assumptions about the function f, such as both the first and second derivatives of f do not change the sign on the interval [a, b]. In this paper, we remove those assumptions and prove the convergence of the method for all continuous functions.

噪聲 · 全 · 離散化 · AIM · 后向 ·

2021 年 11 月 16 日

Weak convergence rates for a full implicit scheme of stochastic Cahn-Hilliard equation with additive noise

Meng Cai,Siqing Gan,Yaozhong Hu

from arxiv, 25 pages

The aim of this study is the weak convergence rate of a temporal and spatial discretization scheme for stochastic Cahn-Hilliard equation with additive noise, where the spectral Galerkin method is used in space and the backward Euler scheme is used in time. The presence of the unbounded operator in front of the nonlinear term and the lack of the associated Kolmogorov equations make the error analysis much more challenging and demanding. To overcome these difficulties, we further exploit a novel approach proposed in [7] and combine it with Malliavin calculus to obtain an improved weak rate of convergence, in comparison with the corresponding strong convergence rates. The techniques used here are quite general and hence have the potential to be applied to other non-Markovian equations. As a byproduct the rate of the strong error can also be easily obtained.

優化器 · 值迭代 · CASES · 估計/估計量 · 路徑 ·

2021 年 4 月 22 日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech,Runlong Zhou,Simon S. Du,Matteo Pirotta,Michal Valko,Alessandro Lazaric

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate $\widetilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.

采樣法 · 方差 · 圖形處理器 · INFORMS · 泛化理論 ·

2020 年 6 月 24 日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Weilin Cong,Rana Forsati,Mahmut Kandemir,Mehrdad Mahdavi

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

單純形 · Performer · Processing（編程語言） · 貝葉斯推斷 · 離散化 ·

2018 年 6 月 19 日

Large-Scale Stochastic Sampling from the Probability Simplex

Jack Baker,Paul Fearnhead,Emily B Fox,Christopher Nemeth

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.