亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper we prove convergence rates for time discretisation schemes for semi-linear stochastic evolution equations with additive or multiplicative Gaussian noise, where the leading operator $A$ is the generator of a strongly continuous semigroup $S$ on a Hilbert space $X$, and the focus is on non-parabolic problems. The main results are optimal bounds for the uniform strong error $$\mathrm{E}_{k}^{\infty} := \Big(\mathbb{E} \sup_{j\in \{0, \ldots, N_k\}} \|U(t_j) - U^j\|^p\Big)^{1/p},$$ where $p \in [2,\infty)$, $U$ is the mild solution, $U^j$ is obtained from a time discretisation scheme, $k$ is the step size, and $N_k = T/k$. The usual schemes such as splitting/exponential Euler, implicit Euler, and Crank-Nicolson, etc.\ are included as special cases. Under conditions on the nonlinearity and the noise we show - $\mathrm{E}_{k}^{\infty}\lesssim k \log(T/k)$ (linear equation, additive noise, general $S$); - $\mathrm{E}_{k}^{\infty}\lesssim \sqrt{k} \log(T/k)$ (nonlinear equation, multiplicative noise, contractive $S$); - $\mathrm{E}_{k}^{\infty}\lesssim k \log(T/k)$ (nonlinear wave equation, multiplicative noise). The logarithmic factor can be removed if the splitting scheme is used with a (quasi)-contractive $S$. The obtained bounds coincide with the optimal bounds for SDEs. Most of the existing literature is concerned with bounds for the simpler pointwise strong error $$\mathrm{E}_k:=\bigg(\sup_{j\in \{0,\ldots,N_k\}}\mathbb{E} \|U(t_j) - U^{j}\|^p\bigg)^{1/p}.$$ Applications to Maxwell equations, Schr\"odinger equations, and wave equations are included. For these equations our results improve and reprove several existing results with a unified method.

相關內容

We study the convergence of message passing graph neural networks on random graph models to their continuous counterpart as the number of nodes tends to infinity. Until now, this convergence was only known for architectures with aggregation functions in the form of degree-normalized means. We extend such results to a very large class of aggregation functions, that encompasses all classically used message passing graph neural networks, such as attention-based mesage passing or max convolutional message passing on top of (degree-normalized) convolutional message passing. Under mild assumptions, we give non asymptotic bounds with high probability to quantify this convergence. Our main result is based on the McDiarmid inequality. Interestingly, we treat the case where the aggregation is a coordinate-wise maximum separately, at it necessitates a very different proof technique and yields a qualitatively different convergence rate.

We study the convergence in total variation and $V$-norm of discretization schemes of the underdamped Langevin dynamics. Such algorithms are very popular and commonly used in molecular dynamics and computational statistics to approximatively sample from a target distribution of interest. We show first that, for a very large class of schemes, a minorization condition uniform in the stepsize holds. This class encompasses popular methods such as the Euler-Maruyama scheme and the schemes based on splitting strategies. Second, we provide mild conditions ensuring that the class of schemes that we consider satisfies a geometric Foster--Lyapunov drift condition, again uniform in the stepsize. This allows us to derive geometric convergence bounds, with a convergence rate scaling linearly with the stepsize. This kind of result is of prime interest to obtain estimates on norms of solutions to Poisson equations associated with a given numerical method.

We introduce a positivity-preserving numerical scheme for a class of nonlinear stochastic heat equations driven by a purely time-dependent Brownian motion. The construction is inspired by a recent preprint by the authors where one-dimensional equations driven by space-time white noise are considered. The objective of this paper is to illustrate the properties of the proposed integrators in a different framework, by numerical experiments and by giving convergence results.

The paper analyses properties of a large class of "path-based" Data Envelopment Analysis models through a unifying general scheme. The scheme includes the well-known oriented radial models, the hyperbolic distance function model, the directional distance function models, and even permits their generalisations. The modelling is not constrained to non-negative data and is flexible enough to accommodate variants of standard models over arbitrary data. Mathematical tools developed in the paper allow systematic analysis of the models from the point of view of ten desirable properties. It is shown that some of the properties are satisfied (resp., fail) for all models in the general scheme, while others have a more nuanced behaviour and must be assessed individually in each model. Our results can help researchers and practitioners navigate among the different models and apply the models to mixed data.

We present two (a decoupled and a coupled) integral-equation-based methods for the Morse-Ingard equations subject to Neumann boundary conditions on the exterior domain. Both methods are based on second-kind integral equation (SKIE) formulations. The coupled method is well-conditioned and can achieve high accuracy. The decoupled method has lower computational cost and more flexibility in dealing with the boundary layer; however, it is prone to the ill-conditioning of the decoupling transform and cannot achieve as high accuracy as the coupled method. We show numerical examples using a Nystr\"om method based on quadrature-by-expansion (QBX) with fast-multipole acceleration. We demonstrate the accuracy and efficiency of the solvers in both two and three dimensions with complex geometry.

This work deals with the numerical solution of systems of oscillatory second-order differential equations which often arise from the semi-discretization in space of partial differential equations. Since these differential equations exhibit pronounced or highly) oscillatory behavior, standard numerical methods are known to perform poorly. Our approach consists in directly discretizing the problem by means of Gautschi-type integrators based on $\operatorname{sinc}$ matrix functions. The novelty contained here is that of using a suitable rational approximation formula for the $\operatorname{sinc}$ matrix function to apply a rational Krylov-like approximation method with suitable choices of poles. In particular, we discuss the application of the whole strategy to a finite element discretization of the wave equation.

We study the training dynamics of shallow neural networks, in a two-timescale regime in which the stepsizes for the inner layer are much smaller than those for the outer layer. In this regime, we prove convergence of the gradient flow to a global optimum of the non-convex optimization problem in a simple univariate setting. The number of neurons need not be asymptotically large for our result to hold, distinguishing our result from popular recent approaches such as the neural tangent kernel or mean-field regimes. Experimental illustration is provided, showing that the stochastic gradient descent behaves according to our description of the gradient flow and thus converges to a global optimum in the two-timescale regime, but can fail outside of this regime.

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

北京阿比特科技有限公司