91婷婷国产精选国产色,综合综合综合综合综合网

from arxiv, A short version of this work appeared in ICML 22 ICML Workshop on Continuous Time Methods for Machine Learning under the title "The Gap Between Continuous and Discrete Gradient Descent"

When training neural networks, it has been widely observed that a large step size is essential in stochastic gradient descent (SGD) for obtaining superior models. However, the effect of large step sizes on the success of SGD is not well understood theoretically. Several previous works have attributed this success to the stochastic noise present in SGD. However, we show through a novel set of experiments that the stochastic noise is not sufficient to explain good non-convex training, and that instead the effect of a large learning rate itself is essential for obtaining best performance.We demonstrate the same effects also in the noise-less case, i.e. for full-batch GD. We formally prove that GD with large step size -- on certain non-convex function classes -- follows a different trajectory than GD with a small step size, which can lead to convergence to a global minimum instead of a local one. Our settings provide a framework for future analysis which allows comparing algorithms based on behaviors that can not be observed in the traditional settings.

相關內容

通用動力公司

關注 1

通用動力公司（General Dynamics）是一家美國的國防企業集團。2008年時通用動力是世界第五大國防工業承包商。由于近年來不斷的擴充和并購其他公司，通用動力現今的組成與面貌已與冷戰時期時大不相同。現今通用動力包含三大業務集團：海洋、作戰系統和資訊科技集團。

MCMC · 偏差 · 大偏差原理 · 馬爾科夫鏈 · 馬爾科夫 ·

2023 年 4 月 5 日

A large deviation principle for the empirical measures of Metropolis-Hastings chains

Federica Milinanni,Pierre Nyquist

from arxiv, 31 pages

To sample from a given target distribution, Markov chain Monte Carlo (MCMC) sampling relies on constructing an ergodic Markov chain with the target distribution as its invariant measure. For any MCMC method, an important question is how to evaluate its efficiency. One approach is to consider the associated empirical measure and how fast it converges to the stationary distribution of the underlying Markov process. Recently, this question has been considered from the perspective of large deviation theory, for different types of MCMC methods, including, e.g., non-reversible Metropolis-Hastings on a finite state space, non-reversible Langevin samplers, the zig-zag sampler, and parallell tempering. This approach, based on large deviations, has proven successful in analysing existing methods and designing new, efficient ones. However, for the Metropolis-Hastings algorithm on more general state spaces, the workhorse of MCMC sampling, the same techniques have not been available for analysing performance, as the underlying Markov chain dynamics violate the conditions used to prove existing large deviation results for empirical measures of a Markov chain. This also extends to methods built on the same idea as Metropolis-Hastings, such as the Metropolis-Adjusted Langevin Method or ABC-MCMC. In this paper, we take the first steps towards such a large-deviations based analysis of Metropolis-Hastings-like methods, by proving a large deviation principle for the the empirical measures of Metropolis-Hastings chains. In addition, we characterize the rate function and its properties in terms of the acceptance- and rejection-part of the Metropolis-Hastings dynamics.

近端梯度下降 · 去噪 · 梯度下降法 · 反問題 · 正則化 ·

2023 年 4 月 5 日

A relaxed proximal gradient descent algorithm for convergent plug-and-play with proximal denoiser

Samuel Hurault,Antonin Chambolle,Arthur Leclaire,Nicolas Papadakis

This paper presents a new convergent Plug-and-Play (PnP) algorithm. PnP methods are efficient iterative algorithms for solving image inverse problems formulated as the minimization of the sum of a data-fidelity term and a regularization term. PnP methods perform regularization by plugging a pre-trained denoiser in a proximal algorithm, such as Proximal Gradient Descent (PGD). To ensure convergence of PnP schemes, many works study specific parametrizations of deep denoisers. However, existing results require either unverifiable or suboptimal hypotheses on the denoiser, or assume restrictive conditions on the parameters of the inverse problem. Observing that these limitations can be due to the proximal algorithm in use, we study a relaxed version of the PGD algorithm for minimizing the sum of a convex function and a weakly convex one. When plugged with a relaxed proximal denoiser, we show that the proposed PnP-$\alpha$PGD algorithm converges for a wider range of regularization parameters, thus allowing more accurate image restoration.

步長 · 映射 · 機器翻譯 · Lipschitz連續 · Lyapunov ·

2023 年 4 月 3 日

On the Dynamics of First and Second Order GeCo and gBBKS Schemes

Thomas Izgin,Stefan Kopecz,Angela Martiradonna,Andreas Meister

from arxiv, 31 pages, 6 figures

In this paper we investigate the stability properties of the so-called gBBKS and GeCo methods, which belong to the class of nonstandard schemes and preserve the positivity as well as all linear invariants of the underlying system of ordinary differential equations for any step size. A stability investigation for these methods, which are outside the class of general linear methods, is challenging since the iterates are always generated by a nonlinear map even for linear problems. Recently, a stability theorem was derived presenting criteria for understanding such schemes. For the analysis, the schemes are applied to general linear equations and proven to be generated by $\mathcal C^1$-maps with locally Lipschitz continuous first derivatives. As a result, the above mentioned stability theorem can be applied to investigate the Lyapunov stability of non-hyperbolic fixed points of the numerical method by analyzing the spectrum of the corresponding Jacobian of the generating map. In addition, if a fixed point is proven to be stable, the theorem guarantees the local convergence of the iterates towards it. In the case of first and second order gBBKS schemes the stability domain coincides with that of the underlying Runge--Kutta method. Furthermore, while the first order GeCo scheme converts steady states to stable fixed points for all step sizes and all linear test problems of finite size, the second order GeCo scheme has a bounded stability region for the considered test problems. Finally, all theoretical predictions from the stability analysis are validated numerically.

稀疏 · 稀疏編碼 · 表示 · 監督 · 離散 ·

2023 年 4 月 3 日

Learning Sparsity of Representations with Discrete Latent Variables

Zhao Xu,Daniel Onoro Rubio,Giuseppe Serra,Mathias Niepert

Deep latent generative models have attracted increasing attention due to the capacity of combining the strengths of deep learning and probabilistic models in an elegant way. The data representations learned with the models are often continuous and dense. However in many applications, sparse representations are expected, such as learning sparse high dimensional embedding of data in an unsupervised setting, and learning multi-labels from thousands of candidate tags in a supervised setting. In some scenarios, there could be further restriction on degree of sparsity: the number of non-zero features of a representation cannot be larger than a pre-defined threshold $L_0$. In this paper we propose a sparse deep latent generative model SDLGM to explicitly model degree of sparsity and thus enable to learn the sparse structure of the data with the quantified sparsity constraint. The resulting sparsity of a representation is not fixed, but fits to the observation itself under the pre-defined restriction. In particular, we introduce to each observation $i$ an auxiliary random variable $L_i$, which models the sparsity of its representation. The sparse representations are then generated with a two-step sampling process via two Gumbel-Softmax distributions. For inference and learning, we develop an amortized variational method based on MC gradient estimator. The resulting sparse representations are differentiable with backpropagation. The experimental evaluation on multiple datasets for unsupervised and supervised learning problems shows the benefits of the proposed method.

度量空間 · 度量 · 經驗風險 · 概率 · 估計誤差 ·

2023 年 4 月 3 日

On the Concentration of the Minimizers of Empirical Risks

Paul Escande

Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning. Instead of deriving guarantees on the usual estimation error, the goal of this paper is to provide concentration inequalities on the distance between the sets of minimizers of the risks for a broad spectrum of estimation problems. In particular, the risks are defined on metric spaces through probability measures that are also supported on metric spaces. A particular attention will therefore be given to include unbounded spaces and non-convex cost functions that might also be unbounded. This work identifies a set of assumptions allowing to describe a regime that seem to govern the concentration in many estimation problems, where the empirical minimizers are stable. This stability can then be leveraged to prove parametric concentration rates in probability and in expectation. The assumptions are verified, and the bounds showcased, on a selection of estimation problems such as barycenters on metric space with positive or negative curvature, subspaces of covariance matrices, regression problems and entropic-Wasserstein barycenters.

隨機逼近 · 收斂性 · 噪聲 · 值函數 · 算法 ·

2023 年 4 月 3 日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Rajeeva L. Karandikar,M. Vidyasagar

from arxiv, 28 pages

The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a zero or a fixed point of a vector-valued funtion, when only noisy measurements of the function are available. In the literature to date, one makes a distinction between ``synchronous'' updating, whereby every component of the current guess is updated at each time, and ``asynchronous'' updating, whereby only one component is updated. In this paper, we study an intermediate situation that we call ``batch asynchronous stochastic approximation'' (BASA), in which, at each time instant, \textit{some but not all} components of the current estimated solution are updated. BASA allows the user to trade off memory requirements against time complexity. We develop a general methodology for proving that such algorithms converge to the fixed point of the map under study. These convergence proofs make use of weaker hypotheses than existing results. Specifically, existing convergence proofs require that the measurement noise is a zero-mean i.i.d\ sequence or a martingale difference sequence. In the present paper, we permit biased measurements, that is, measurement noises that have nonzero conditional mean. Also, all convergence results to date assume that the stochastic step sizes satisfy a probabilistic analog of the well-known Robbins-Monro conditions. We replace this assumption by a purely deterministic condition on the irreducibility of the underlying Markov processes. As specific applications to Reinforcement Learning, we analyze the temporal difference algorithm $TD(\lambda)$ for value iteration, and the $Q$-learning algorithm for finding the optimal action-value function. In both cases, we establish the convergence of these algorithms, under milder conditions than in the existing literature.

蒙特卡羅 · 蒙特卡羅方法 · 蒙特卡羅估計 · 誤差估計 · 統計量 ·

2023 年 4 月 2 日

Convergence analysis of the Monte Carlo method for random Navier--Stokes--Fourier system

Maria Lukacova -- Medvidova,Bangwei She,Yuhuan Yuan

from arxiv, 24 pages, 5 figures

In the present paper we consider the initial data, external force, viscosity coefficients, and heat conductivity coefficient as random data for the compressible Navier--Stokes--Fourier system. The Monte Carlo method, which is frequently used for the approximation of statistical moments, is combined with a suitable deterministic discretisation method in physical space and time. Under the assumption that numerical densities and temperatures are bounded in probability, we prove the convergence of random finite volume solutions to a statistical strong solution by applying genuine stochastic compactness arguments. Further, we show the convergence and error estimates for the Monte Carlo estimators of the expectation and deviation. We present several numerical results to illustrate the theoretical results.

攻擊 · 在線優化 · 在線 · 算法 · 最優性 ·

2023 年 4 月 1 日

Coordinated Defense Allocation in Reach-Avoid Scenarios with Efficient Online Optimization

Junwei Liu,Zikai Ouyang,Jiahui Yang,Hua Chen,Haibo Lu,Wei Zhang

Deriving strategies for multiple agents under adversarial scenarios poses a significant challenge in attaining both optimality and efficiency. In this paper, we propose an efficient defense strategy for cooperative defense against a group of attackers in a convex environment. The defenders aim to minimize the total number of attackers that successfully enter the target set without prior knowledge of the attacker's strategy. Our approach involves a two-scale method that decomposes the problem into coordination against a single attacker and assigning defenders to attackers. We first develop a coordination strategy for multiple defenders against a single attacker, implementing online convex programming. This results in the maximum defense-winning region of initial joint states from which the defender can successfully defend against a single attacker. We then propose an allocation algorithm that significantly reduces computational effort required to solve the induced integer linear programming problem. The allocation guarantees defense performance enhancement as the game progresses. We perform various simulations to verify the efficiency of our algorithm compared to the state-of-the-art approaches, including the one using the Gazabo platform with Robot Operating System.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

估計/估計量 · 估計誤差 · MoDELS · 學成 · 無偏 ·

2020 年 12 月 17 日

The Causal Learning of Retail Delinquency

Yiyan Huang,Cheuk Hang Leung,Xing Yan,Qi Wu,Nanbo Peng,Dongdong Wang,Zhixiang Huang

from arxiv, This paper was accepted and will be published in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.