亚洲国产最新AV片_99久久久无码国产精品69_欧美日韩国产在线视频一区二区_亚洲旡码A一区二区三区_午夜一区二区三区三级视频_亚洲区和欧洲区无码区自拍区_日韩亚洲欧美一区噜噜噜

In this paper we prove convergence rates for time discretisation schemes for semi-linear stochastic evolution equations with additive or multiplicative Gaussian noise, where the leading operator $A$ is the generator of a strongly continuous semigroup $S$ on a Hilbert space $X$, and the focus is on non-parabolic problems. The main results are optimal bounds for the uniform strong error $$\mathrm{E}_{k}^{\infty} := \Big(\mathbb{E} \sup_{j\in \{0, \ldots, N_k\}} \|U(t_j) - U^j\|^p\Big)^{1/p},$$ where $p \in [2,\infty)$, $U$ is the mild solution, $U^j$ is obtained from a time discretisation scheme, $k$ is the step size, and $N_k = T/k$. The usual schemes such as splitting/exponential Euler, implicit Euler, and Crank-Nicolson, etc.\ are included as special cases. Under conditions on the nonlinearity and the noise we show - $\mathrm{E}_{k}^{\infty}\lesssim k \log(T/k)$ (linear equation, additive noise, general $S$); - $\mathrm{E}_{k}^{\infty}\lesssim \sqrt{k} \log(T/k)$ (nonlinear equation, multiplicative noise, contractive $S$); - $\mathrm{E}_{k}^{\infty}\lesssim k \log(T/k)$ (nonlinear wave equation, multiplicative noise). The logarithmic factor can be removed if the splitting scheme is used with a (quasi)-contractive $S$. The obtained bounds coincide with the optimal bounds for SDEs. Most of the existing literature is concerned with bounds for the simpler pointwise strong error $$\mathrm{E}_k:=\bigg(\sup_{j\in \{0,\ldots,N_k\}}\mathbb{E} \|U(t_j) - U^{j}\|^p\bigg)^{1/p}.$$ Applications to Maxwell equations, Schr\"odinger equations, and wave equations are included. For these equations our results improve and reprove several existing results with a unified method.

相關內容

噪聲

關注 0

消息傳遞 · 圖神經網絡 · 神經網絡 · 卷積 · 收斂性 ·

2023 年 4 月 21 日

Convergence of Message Passing Graph Neural Networks with Generic Aggregation On Large Random Graphs

Matthieu Cordonnier,Nicolas Keriven,Nicolas Tremblay,Samuel Vaiter

We study the convergence of message passing graph neural networks on random graph models to their continuous counterpart as the number of nodes tends to infinity. Until now, this convergence was only known for architectures with aggregation functions in the form of degree-normalized means. We extend such results to a very large class of aggregation functions, that encompasses all classically used message passing graph neural networks, such as attention-based mesage passing or max convolutional message passing on top of (degree-normalized) convolutional message passing. Under mild assumptions, we give non asymptotic bounds with high probability to quantify this convergence. Our main result is based on the McDiarmid inequality. Interestingly, we treat the case where the aggregation is a coordinate-wise maximum separately, at it necessitates a very different proof technique and yields a qualitatively different convergence rate.

步長 · 離散 · 計算統計 · Lyapunov · 收斂速率 ·

2023 年 4 月 21 日

Uniform minorization condition and convergence bounds for discretizations of kinetic Langevin dynamics

Alain Durmus,Aurélien Enfroy,éric Moulines,Gabriel Stoltz

We study the convergence in total variation and $V$-norm of discretization schemes of the underdamped Langevin dynamics. Such algorithms are very popular and commonly used in molecular dynamics and computational statistics to approximatively sample from a target distribution of interest. We show first that, for a very large class of schemes, a minorization condition uniform in the stepsize holds. This class encompasses popular methods such as the Euler-Maruyama scheme and the schemes based on splitting strategies. Second, we provide mild conditions ensuring that the class of schemes that we consider satisfies a geometric Foster--Lyapunov drift condition, again uniform in the stepsize. This allows us to derive geometric convergence bounds, with a convergence rate scaling linearly with the stepsize. This kind of result is of prime interest to obtain estimates on norms of solutions to Poisson equations associated with a given numerical method.

Integration · 噪聲 · 類別 · 論文 · 數值分析 ·

2023 年 4 月 21 日

Positivity-preserving schemes for some nonlinear stochastic PDEs

Charles-Edouard Bréhier,David Cohen,Johan Ulander

We introduce a positivity-preserving numerical scheme for a class of nonlinear stochastic heat equations driven by a purely time-dependent Brownian motion. The construction is inspired by a recent preprint by the authors where one-dimensional equations driven by space-time white noise are considered. The objective of this paper is to illustrate the properties of the proposed integrators in a different framework, by numerical experiments and by giving convergence results.

MoDELS · Analysis · 有向 · 泛函 · Nuance ·

2023 年 4 月 21 日

A unified approach to radial, hyperbolic, and directional efficiency measurement in Data Envelopment Analysis

Margaréta Halická,Mária Trnovská,Ale? ?erny

from arxiv, 38 pages

The paper analyses properties of a large class of "path-based" Data Envelopment Analysis models through a unifying general scheme. The scheme includes the well-known oriented radial models, the hyperbolic distance function model, the directional distance function models, and even permits their generalisations. The modelling is not constrained to non-negative data and is flexible enough to accommodate variants of standard models over arbitrary data. Mathematical tools developed in the paper allow systematic analysis of the models from the point of view of ten desirable properties. It is shown that some of the properties are satisfied (resp., fail) for all models in the general scheme, while others have a more nuanced behaviour and must be assessed individually in each model. Our results can help researchers and practitioners navigate among the different models and apply the models to mixed data.

Integration · 模型評估 · 計算成本 · 樣例 · 層 ·

2023 年 4 月 21 日

Integral Equation Methods for the Morse-Ingard Equations

Xiaoyu Wei,Andreas Kl?ckner,Robert C. Kirby

We present two (a decoupled and a coupled) integral-equation-based methods for the Morse-Ingard equations subject to Neumann boundary conditions on the exterior domain. Both methods are based on second-kind integral equation (SKIE) formulations. The coupled method is well-conditioned and can achieve high accuracy. The decoupled method has lower computational cost and more flexibility in dealing with the boundary layer; however, it is prone to the ill-conditioning of the decoupling transform and cannot achieve as high accuracy as the coupled method. We show numerical examples using a Nystr\"om method based on quadrature-by-expansion (QBX) with fast-multipole acceleration. We demonstrate the accuracy and efficiency of the solvers in both two and three dimensions with complex geometry.

離散 · 高效計算 · 離散化 · 振蕩 · 積分器 ·

2023 年 4 月 19 日

Efficient computation of the sinc matrix function for the integration of second-order differential equations

Lidia Aceto,Fabio Durastante

This work deals with the numerical solution of systems of oscillatory second-order differential equations which often arise from the semi-discretization in space of partial differential equations. Since these differential equations exhibit pronounced or highly) oscillatory behavior, standard numerical methods are known to perform poorly. Our approach consists in directly discretizing the problem by means of Gautschi-type integrators based on $\operatorname{sinc}$ matrix functions. The novelty contained here is that of using a suitable rational approximation formula for the $\operatorname{sinc}$ matrix function to apply a rational Krylov-like approximation method with suitable choices of poles. In particular, we discuss the application of the whole strategy to a finite element discretization of the wave equation.

時間尺度 · 全局最優解 · 收斂性 · 步長 · 梯度 ·

2023 年 4 月 19 日

Leveraging the two timescale regime to demonstrate convergence of neural networks

Pierre Marion,Rapha?l Berthier

from arxiv, 33 pages, 7 figures

We study the training dynamics of shallow neural networks, in a two-timescale regime in which the stepsizes for the inner layer are much smaller than those for the outer layer. In this regime, we prove convergence of the gradient flow to a global optimum of the non-convex optimization problem in a simple univariate setting. The number of neurons need not be asymptotically large for our result to hold, distinguishing our result from popular recent approaches such as the neural tangent kernel or mean-field regimes. Experimental illustration is provided, showing that the stochastic gradient descent behaves according to our description of the gradient flow and thus converges to a global optimum in the two-timescale regime, but can fail outside of this regime.

Networking · 學成 · Principle · MoDELS · Networks ·

2021 年 6 月 18 日

The Principles of Deep Learning Theory

Daniel A. Roberts,Sho Yaida,Boris Hanin

from arxiv, 451 pages, to be published by Cambridge University Press

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

采樣法 · 方差 · 圖形處理器 · INFORMS · 泛化理論 ·

2020 年 6 月 24 日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Weilin Cong,Rana Forsati,Mahmut Kandemir,Mehrdad Mahdavi

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

樣本 · 類別 · 損失 · Performer · SimPLe ·

2019 年 1 月 16 日

Class-Balanced Loss Based on Effective Number of Samples

Yin Cui,Menglin Jia,Tsung-Yi Lin,Yang Song,Serge Belongie

from arxiv, Code is available at: //github.com/richardaecn/class-balanced-loss

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.