日本三级网站在线播放,国产欧美日韩综合在线

Langevin Dynamics has been extensively employed in global non-convex optimization due to the concentration of its stationary distribution around the global minimum of the potential function at low temperatures. In this paper, we propose to utilize a more comprehensive class of stochastic processes, known as reversible diffusion, and apply the Euler-Maruyama discretization for global non-convex optimization. We design the diffusion coefficient to be larger when distant from the optimum and smaller when near, thus enabling accelerated convergence while regulating discretization error, a strategy inspired by landscape modifications. Our proposed method can also be seen as a time change of Langevin Dynamics, and we prove convergence with respect to KL divergence, investigating the trade-off between convergence speed and discretization error. The efficacy of our proposed method is demonstrated through numerical experiments.

相關內容

離散化

關注 0

UniFormer · Learning · 在線 · 類別 · 情景 ·

2023 年 7 月 7 日

Multiclass Online Learning and Uniform Convergence

Steve Hanneke,Shay Moran,Vinod Raman,Unique Subedi,Ambuj Tewari

from arxiv, COLT Camera-Ready, 15 pages

We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. We also prove a separation between online learnability and online uniform convergence by exhibiting an easy-to-learn class whose sequential Rademacher complexity is unbounded. Our learning algorithm uses the multiplicative weights algorithm, with a set of experts defined by executions of the Standard Optimal Algorithm on subsequences of size Littlestone dimension. We argue that the best expert has regret at most Littlestone dimension relative to the best concept in the class. This differs from the well-known covering technique of Ben-David, P\'{a}l, and Shalev-Shwartz (2009) for binary classification, where the best expert has regret zero.

優化器 · 優化地形 · Performer · 離散化 · 動力系統 ·

2023 年 7 月 7 日

Accelerated Optimization Landscape of Linear-Quadratic Regulator

Lechen Feng,Yuan-Hua Ni

Linear-quadratic regulator (LQR) is a landmark problem in the field of optimal control, which is the concern of this paper. Generally, LQR is classified into state-feedback LQR (SLQR) and output-feedback LQR (OLQR) based on whether the full state is obtained. It has been suggested in existing literature that both the SLQR and the OLQR could be viewed as \textit{constrained nonconvex matrix optimization} problems in which the only variable to be optimized is the feedback gain matrix. In this paper, we introduce a first-order accelerated optimization framework of handling the LQR problem, and give its convergence analysis for the cases of SLQR and OLQR, respectively. Specifically, a Lipschiz Hessian property of LQR performance criterion is presented, which turns out to be a crucial property for the application of modern optimization techniques. For the SLQR problem, a continuous-time hybrid dynamic system is introduced, whose solution trajectory is shown to converge exponentially to the optimal feedback gain with Nesterov-optimal order $1-\frac{1}{\sqrt{\kappa}}$ ($\kappa$ the condition number). Then, the symplectic Euler scheme is utilized to discretize the hybrid dynamic system, and a Nesterov-type method with a restarting rule is proposed that preserves the continuous-time convergence rate, i.e., the discretized algorithm admits the Nesterov-optimal convergence order. For the OLQR problem, a Hessian-free accelerated framework is proposed, which is a two-procedure method consisting of semiconvex function optimization and negative curvature exploitation. In a time $\mathcal{O}(\epsilon^{-7/4}\log(1/\epsilon))$, the method can find an $\epsilon$-stationary point of the performance criterion; this entails that the method improves upon the $\mathcal{O}(\epsilon^{-2})$ complexity of vanilla gradient descent. Moreover, our method provides the second-order guarantee of stationary point.

離散化 · Lipschitz · 矩 · 噪聲 · 類別 ·

2023 年 7 月 7 日

Higher order time discretization method for a class of semilinear stochastic partial differential equations with multiplicative noise

Yukun Li,Liet Vo,Guanqian Wang

from arxiv, 28 pages, 8 figures. arXiv admin note: text overlap with arXiv:1811.05028

In this paper, we consider a new approach for semi-discretization in time and spatial discretization of a class of semi-linear stochastic partial differential equations (SPDEs) with multiplicative noise. The drift term of the SPDEs is only assumed to satisfy a one-sided Lipschitz condition and the diffusion term is assumed to be globally Lipschitz continuous. Our new strategy for time discretization is based on the Milstein method from stochastic differential equations. We use the energy method for its error analysis and show a strong convergence order of nearly $1$ for the approximate solution. The proof is based on new H\"older continuity estimates of the SPDE solution and the nonlinear term. For the general polynomial-type drift term, there are difficulties in deriving even the stability of the numerical solutions. We propose an interpolation-based finite element method for spatial discretization to overcome the difficulties. Then we obtain $H^1$ stability, higher moment $H^1$ stability, $L^2$ stability, and higher moment $L^2$ stability results using numerical and stochastic techniques. The nearly optimal convergence orders in time and space are hence obtained by coupling all previous results. Numerical experiments are presented to implement the proposed numerical scheme and to validate the theoretical results.

模型評估 · 規范化的 · 平滑 · 估計/估計量 · Analysis ·

2023 年 7 月 7 日

Improving the accuracy of Raviart-Thomas mixed elements in two-dimensional smooth domains with straight-edged triangles

Fleurianne Bertrand,Vitoriano Ruas

from arxiv, 36 pages, two figures, one table

Several physical problems modeled by second-order partial differential equations can be efficiently solved using mixed finite elements of the Raviart-Thomas family for N-simplexes, introduced in the seventies. In case Neumann conditions are prescribed on a curvilinear boundary, the normal component of the flux variable should preferably not take up values at nodes shifted to the boundary of the approximating polytope in the corresponding normal direction. This is because the method's accuracy downgrades, which was shown in \cite{FBRT}. In that work an order-preserving technique was studied, based on a parametric version of these elements with curved simplexes. In this paper an alternative with straight-edged triangles for two-dimensional problems is proposed. The key point of this method is a Petrov-Galerkin formulation of the mixed problem, in which the test-flux space is a little different from the shape-flux space. After carrying out a well-posedness and stability analysis, error estimates of optimal order are proven.

優化器 · 縮放 · Performer · 噪聲 · 知識 (knowledge) ·

2023 年 7 月 7 日

Adaptive Strategies in Non-convex Optimization

Zhenxun Zhuang

from arxiv, arXiv admin note: text overlap with arXiv:2208.11195

An algorithm is said to be adaptive to a certain parameter (of the problem) if it does not need a priori knowledge of such a parameter but performs competitively to those that know it. This dissertation presents our work on adaptive algorithms in following scenarios: 1. In the stochastic optimization setting, we only receive stochastic gradients and the level of noise in evaluating them greatly affects the convergence rate. Tuning is typically required when without prior knowledge of the noise scale in order to achieve the optimal rate. Considering this, we designed and analyzed noise-adaptive algorithms that can automatically ensure (near)-optimal rates under different noise scales without knowing it. 2. In training deep neural networks, the scales of gradient magnitudes in each coordinate can scatter across a very wide range unless normalization techniques, like BatchNorm, are employed. In such situations, algorithms not addressing this problem of gradient scales can behave very poorly. To mitigate this, we formally established the advantage of scale-free algorithms that adapt to the gradient scales and presented its real benefits in empirical experiments. 3. Traditional analyses in non-convex optimization typically rely on the smoothness assumption. Yet, this condition does not capture the properties of some deep learning objective functions, including the ones involving Long Short-Term Memory networks and Transformers. Instead, they satisfy a much more relaxed condition, with potentially unbounded smoothness. Under this condition, we show that a generalized SignSGD algorithm can theoretically match the best-known convergence rates obtained by SGD with gradient clipping but does not need explicit clipping at all, and it can empirically match the performance of Adam and beat others. Moreover, it can also be made to automatically adapt to the unknown relaxed smoothness.

Extensibility · Continuity · Analysis · 穩健性 · Medium ·

2023 年 7 月 6 日

Recovery of Multiple Parameters in Subdiffusion from One Lateral Boundary Measurement

Siyu Cen,Bangti Jin,Yikan Liu,Zhi Zhou

from arxiv, 28 pages

This work is concerned with numerically recovering multiple parameters simultaneously in the subdiffusion model from one single lateral measurement on a part of the boundary, while in an incompletely known medium. We prove that the boundary measurement corresponding to a fairly general boundary excitation uniquely determines the order of the fractional derivative and the polygonal support of the diffusion coefficient, without knowing either the initial condition or the source. The uniqueness analysis further inspires the development of a robust numerical algorithm for recovering the fractional order and diffusion coefficient. The proposed algorithm combines small-time asymptotic expansion, analytic continuation of the solution and the level set method. We present extensive numerical experiments to illustrate the feasibility of the simultaneous recovery. In addition, we discuss the uniqueness of recovering general diffusion and potential coefficients from one single partial boundary measurement, when the boundary excitation is more specialized.

估計/估計量 · 蒙特卡羅 · 樣本 · MCMC · 得分 ·

2023 年 7 月 5 日

Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach

Xunpeng Huang,Hanze Dong,Yifan Hao,Yian Ma,Tong Zhang

The efficacy of modern generative models is commonly contingent upon the precision of score estimation along the diffusion path, with a focus on diffusion models and their ability to generate high-quality data samples. This study delves into the potentialities of posterior sampling through reverse diffusion. An examination of the sampling literature reveals that score estimation can be transformed into a mean estimation problem via the decomposition of the transition kernel. By estimating the mean of the auxiliary distribution, the reverse diffusion process can give rise to a novel posterior sampling algorithm, which diverges from traditional gradient-based Markov Chain Monte Carlo (MCMC) methods. We provide the convergence analysis in total variation distance and demonstrate that the isoperimetric dependency of the proposed algorithm is comparatively lower than that observed in conventional MCMC techniques, which justifies the superior performance for high dimensional sampling with error tolerance. Our analytical framework offers fresh perspectives on the complexity of score estimation at various time points, as denoted by the properties of the auxiliary distribution.

矩 · 估計/估計量 · 泛函 · 可行 · 知識 (knowledge) ·

2023 年 7 月 5 日

Non-Gaussian Bayesian Filtering by Density Parametrization Using Power Moments

Guangyu Wu,Anders Lindquist

from arxiv, 15 pages, 7 figures

Non-Gaussian Bayesian filtering is a core problem in stochastic filtering. The difficulty of the problem lies in parameterizing the state estimates. However the existing methods are not able to treat it well. We propose to use power moments to obtain a parameterization. Unlike the existing parametric estimation methods, our proposed algorithm does not require prior knowledge about the state to be estimated, e.g. the number of modes and the feasible classes of function. Moreover, the proposed algorithm is not required to store massive parameters during filtering as the existing nonparametric Bayesian filters, e.g. the particle filter. The parameters of the proposed parametrization can also be determined by a convex optimization scheme with moments constraints, to which the solution is proved to exist and be unique. A necessary and sufficient condition for all the power moments of the density estimate to exist and be finite is provided. The errors of power moments are analyzed for the density estimate being either light-tailed or heavy-tailed. Error upper bounds of the density estimate for the one-step prediction are proposed. Simulation results on different types of density functions of the state are given, including the heavy-tailed densities, to validate the proposed algorithm.

Markov · SGD · 馬爾可夫鏈 · 不變 · 隨機梯度下降 ·

2023 年 7 月 4 日

Convergence and concentration properties of constant step-size SGD through Markov chains

Ibrahim Merad,Stéphane Ga?ffas

We consider the optimization of a smooth and strongly convex objective using constant step-size stochastic gradient descent (SGD) and study its properties through the prism of Markov chains. We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. We also establish this convergence in Wasserstein-2 distance in a more general setting compared to previous work. Thanks to the invariance property of the limit distribution, our analysis shows that the latter inherits sub-Gaussian or sub-exponential concentration properties when these hold true for the gradient. This allows the derivation of high-confidence bounds for the final estimate. Finally, under such conditions in the linear case, we obtain a dimension-free deviation bound for the Polyak-Ruppert average of a tail sequence. All our results are non-asymptotic and their consequences are discussed through a few applications.

近似 · 優化器 · 噪聲 · 統計量 · 方差 ·

2023 年 7 月 4 日

Accelerated stochastic approximation with state-dependent noise

Sasila Ilandarideva,Anatoli Juditsky,Guanghui Lan,Tianjiao Li

We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.