亚洲AV午夜成人片精品网站听书_亚洲欧洲综合成人AV一区_中文字幕无线码中文字幕网站_风间中文字幕一区二区三区_欧美网站免费观看_美日韩美女自插在线观看视频_国产精品香蕉自产拍在线

Wasserstein gradient flows on probability measures have found a host of applications in various optimization problems. They typically arise as the continuum limit of exchangeable particle systems evolving by some mean-field interaction involving a gradient-type potential. However, in many problems, such as in multi-layer neural networks, the so-called particles are edge weights on large graphs whose nodes are exchangeable. Such large graphs are known to converge to continuum limits called graphons as their size grow to infinity. We show that the Euclidean gradient flow of a suitable function of the edge-weights converges to a novel continuum limit given by a curve on the space of graphons that can be appropriately described as a gradient flow or, more technically, a curve of maximal slope. Several natural functions on graphons, such as homomorphism functions and the scalar entropy, are covered by our set-up, and the examples have been worked out in detail.

相關內容

可交換的

關注 0

正則化項 · 離散化 · 近似 · UniFormer · 奇異的 ·

2022 年 2 月 14 日

Convergence of a regularized finite element discretization of the two-dimensional Monge-Ampère equation

Dietmar Gallistl,Ngoc Tien Tran

This paper proposes a regularization of the Monge-Amp\`ere equation in planar convex domains through uniformly elliptic Hamilton-Jacobi-Bellman equations. The regularized problem possesses a unique strong solution $u_\varepsilon$ and is accessible to the discretization with finite elements. This work establishes locally uniform convergence of $u_\varepsilon$ to the convex Alexandrov solution $u$ to the Monge-Amp\`ere equation as the regularization parameter $\varepsilon$ approaches $0$. A mixed finite element method for the approximation of $u_\varepsilon$ is proposed, and the regularized finite element scheme is shown to be locally uniformly convergent. Numerical experiments provide empirical evidence for the efficient approximation of singular solutions $u$.

Microsoft Surface · 簇 · Networking · 近似 · 各向同性 ·

2022 年 2 月 14 日

A structure-preserving finite element approximation of surface diffusion for curve networks and surface clusters

Weizhu Bao,Harald Garcke,Robert Nürnberg,Quan Zhao

from arxiv, 29 figures

We consider the evolution of curve networks in two dimensions (2d) and surface clusters in three dimensions (3d). The motion of the interfaces is described by surface diffusion, with boundary conditions at the triple junction points/lines, where three interfaces meet, and at the boundary points/lines, where an interface meets a fixed planar boundary. We propose a parametric finite element method based on a suitable variational formulation. The constructed method is semi-implicit and can be shown to satisfy the volume conservation of each enclosed bubble and the unconditional energy-stability, thus preserving the two fundamental geometric structures of the flow. Besides, the method has very good properties with respect to the distribution of mesh points, thus no mesh smoothing or regularization technique is required. A generalization of the introduced scheme to the case of anisotropic surface energies and non-neutral external boundaries is also considered. Numerical results are presented for the evolution of two-dimensional curve networks and three-dimensional surface clusters in the cases of both isotropic and anisotropic surface energies.

樣本復雜度 · 優化器 · 泛函 · 樣本 · Oracle ·

2022 年 2 月 14 日

Stochastic Multi-level Composition Optimization Algorithms with Level-Independent Convergence Rates

Krishnakumar Balasubramanian,Saeed Ghadimi,Anthony Nguyen

from arxiv, Fixed some typos

In this paper, we study smooth stochastic multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions. We assume access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle. For solving this class of problems, we propose two algorithms using moving-average stochastic estimates, and analyze their convergence to an $\epsilon$-stationary point of the problem. We show that the first algorithm, which is a generalization of \cite{GhaRuswan20} to the $T$ level case, can achieve a sample complexity of $\mathcal{O}(1/\epsilon^6)$ by using mini-batches of samples in each iteration. By modifying this algorithm using linearized stochastic estimates of the function values, we improve the sample complexity to $\mathcal{O}(1/\epsilon^4)$. {\color{black}This modification not only removes the requirement of having a mini-batch of samples in each iteration, but also makes the algorithm parameter-free and easy to implement}. To the best of our knowledge, this is the first time that such an online algorithm designed for the (un)constrained multi-level setting, obtains the same sample complexity of the smooth single-level setting, under standard assumptions (unbiasedness and boundedness of the second moments) on the stochastic first-order oracle.

優化器 · 策略改進 · 泛函 · 駐點 · 情景 ·

2022 年 2 月 13 日

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction

Jiawei Huang,Nan Jiang

from arxiv, 48 Pages; AISTATS 2022

In this paper, we study the convergence properties of off-policy policy improvement algorithms with state-action density ratio correction under function approximation setting, where the objective function is formulated as a max-max-min optimization problem. We characterize the bias of the learning objective and present two strategies with finite-time convergence guarantees. In our first strategy, we present algorithm P-SREDA with convergence rate $O(\epsilon^{-3})$, whose dependency on $\epsilon$ is optimal. In our second strategy, we propose a new off-policy actor-critic style algorithm named O-SPIM. We prove that O-SPIM converges to a stationary point with total complexity $O(\epsilon^{-4})$, which matches the convergence rate of some recent actor-critic algorithms in the on-policy setting.

離散化 · 噪聲 · Conformer · 有限差分 · 泛函 ·

2022 年 2 月 12 日

Thermodynamically consistent and positivity-preserving discretization of the thin-film equation with thermal noise

Benjamin Gess,Rishabh S. Gvalani,Florian Kunick,Felix Otto

from arxiv, 46 pages, 9 figures

In micro-fluidics not only does capillarity dominate but also thermal fluctuations become important. On the level of the lubrication approximation, this leads to a quasi-linear fourth-order parabolic equation for the film height $h$ driven by space-time white noise. The gradient flow structure of its deterministic counterpart, the thin-film equation, which encodes the balance between driving capillary and limiting viscous forces, provides the guidance for the thermodynamically consistent introduction of fluctuations. We follow this route on the level of a spatial discretization of the gradient flow structure. Starting from an energetically conformal finite-element (FE) discretization, we point out that the numerical mobility function introduced by Gr\"un and Rumpf can be interpreted as a discretization of the metric tensor in the sense of a mixed FE method with lumping. While this discretization was devised in order to preserve the so-called entropy estimate, we use this to show that the resulting high-dimensional stochastic differential equation (SDE) preserves pathwise and pointwise strict positivity, at least in case of the physically relevant mobility function arising from the no-slip boundary condition. As a consequence, this discretization gives rise to a consistent invariant measure, namely a discretization of the Brownian excursion (up to the volume constraint), and thus features an entropic repulsion. The price to pay over more naive discretizations is that when writing the SDE in It\^o's form, which is the basis for the Euler-Mayurama time discretization, a correction term appears. To conclude, we perform various numerical experiments to compare the behavior of our discretization to that of the more naive finite difference discretization of the equation.

MCMC · 馬爾可夫鏈 · 馬爾可夫鏈蒙特卡羅 · UniFormer · 估計/估計量 ·

2022 年 2 月 11 日

Long-Time Convergence and Propagation of Chaos for Nonlinear MCMC

James Vuckovic

from arxiv, 18+12 pages, 2 figures

In this paper, we study the long-time convergence and uniform strong propagation of chaos for a class of nonlinear Markov chains for Markov chain Monte Carlo (MCMC). Our technique is quite simple, making use of recent contraction estimates for linear Markov kernels and basic techniques from Markov theory and analysis. Moreover, the same proof strategy applies to both the long-time convergence and propagation of chaos. We also show, via some experiments, that these nonlinear MCMC techniques are viable for use in real-world high-dimensional inference such as Bayesian neural networks.

離散化 · 線性的 · 可約的 · CARS · 正定 ·

2022 年 2 月 11 日

Formal verification of iterative convergence of numerical algorithms

Mohit Tekriwal,Joshua Miller,Jean-Baptiste Jeannin

Physical systems are usually modeled by differential equations, but solving these differential equations analytically is often intractable. Instead, the differential equations can be solved numerically by discretization in a finite computational domain. The discretized equation is reduced to a large linear system, whose solution is typically found using an iterative solver. We start with an initial guess, x_0, and iterate the algorithm to obtain a sequence of solution vectors, x_m. The iterative algorithm is said to converge to solution $x$ if and only if x_m converges to $x$. Accuracy of the numerical solutions is important, especially in the design of safety critical systems such as airplanes, cars, or nuclear power plants. It is therefore important to formally guarantee that the iterative solvers converge to the "true" solution of the original differential equation. In this paper, we first formalize the necessary and sufficient conditions for iterative convergence in the Coq proof assistant. We then extend this result to two classical iterative methods: Gauss-Seidel iteration and Jacobi iteration. We formalize conditions for the convergence of the Gauss--Seidel classical iterative method, based on positive definiteness of the iterative matrix. We then formally state conditions for convergence of Jacobi iteration and instantiate it with an example to demonstrate convergence of iterative solutions to the direct solution of the linear system. We leverage recent developments of the Coq linear algebra and mathcomp library for our formalization.

塊 · 噪聲 · Performer · AIM · CASE ·

2022 年 2 月 10 日

Convergence and Semi-convergence of a class of constrained block iterative methods

Mahdi Mirzapour,Andrzej Cegielski,Tommy Elfving

In this paper, we analyze the convergence %semi-convergence properties of projected non-stationary block iterative methods (P-BIM) aiming to find a constrained solution to large linear, usually both noisy and ill-conditioned, systems of equations. We split the error of the $k$th iterate into noise error and iteration error, and consider each error separately. The iteration error is treated for a more general algorithm, also suited for solving split feasibility problems in Hilbert space. The results for P-BIM come out as a special case. The algorithmic step involves projecting onto closed convex sets. When these sets are polyhedral, and of finite dimension, it is shown that the algorithm converges linearly. We further derive an upper bound for the noise error of P-BIM. Based on this bound, we suggest a new strategy for choosing relaxation parameters, which assist in speeding up the reconstruction process and improving the quality of obtained images. The relaxation parameters may depend on the noise. The performance of the suggested strategy is shown by examples taken from the field of image reconstruction from projections.

SGD · 隨機梯度下降 · 噪聲 · 訓練誤差 · Neural Networks ·

2022 年 2 月 10 日

The effective noise of Stochastic Gradient Descent

Francesca Mignacco,Pierfrancesco Urbani

from arxiv, 7 pages + appendix, 5 figures

Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep learning technology. At each step of the training phase, a mini batch of samples is drawn from the training dataset and the weights of the neural network are adjusted according to the performance on this specific subset of examples. The mini-batch sampling procedure introduces a stochastic dynamics to the gradient descent, with a non-trivial state-dependent noise. We characterize the stochasticity of SGD and a recently-introduced variant, \emph{persistent} SGD, in a prototypical neural network model. In the under-parametrized regime, where the final training error is positive, the SGD dynamics reaches a stationary state and we define an effective temperature from the fluctuation-dissipation theorem, computed from dynamical mean-field theory. We use the effective temperature to quantify the magnitude of the SGD noise as a function of the problem parameters. In the over-parametrized regime, where the training error vanishes, we measure the noise magnitude of SGD by computing the average distance between two replicas of the system with the same initialization and two different realizations of SGD noise. We find that the two noise measures behave similarly as a function of the problem parameters. Moreover, we observe that noisier algorithms lead to wider decision boundaries of the corresponding constraint satisfaction problem.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.