亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails has links to the generalization error. While these studies have shed light on interesting aspects of the generalization behavior in modern settings, they relied on strong topological and statistical regularity assumptions, which are hard to verify in practice. Furthermore, it has been empirically illustrated that the relation between heavy tails and generalization might not always be monotonic in practice, contrary to the conclusions of existing theory. In this study, we establish novel links between the tail behavior and generalization properties of stochastic gradient descent (SGD), through the lens of algorithmic stability. We consider a quadratic optimization problem and use a heavy-tailed stochastic differential equation as a proxy for modeling the heavy-tailed behavior emerging in SGD. We then prove uniform stability bounds, which reveal the following outcomes: (i) Without making any exotic assumptions, we show that SGD will not be stable if the stability is measured with the squared-loss $x\mapsto x^2$, whereas it in turn becomes stable if the stability is instead measured with a surrogate loss $x\mapsto |x|^p$ with some $p<2$. (ii) Depending on the variance of the data, there exists a \emph{`threshold of heavy-tailedness'} such that the generalization error decreases as the tails become heavier, as long as the tails are lighter than this threshold. This suggests that the relation between heavy tails and generalization is not globally monotonic. (iii) We prove matching lower-bounds on uniform stability, implying that our bounds are tight in terms of the heaviness of the tails. We support our theory with synthetic and real neural network experiments.

相關內容

This article focuses on the multi-objective optimization of stochastic simulators with high output variance, where the input space is finite and the objective functions are expensive to evaluate. We rely on Bayesian optimization algorithms, which use probabilistic models to make predictions about the functions to be optimized. The proposed approach is an extension of the Pareto Active Learning (PAL) algorithm for the estimation of Pareto-optimal solutions that makes it suitable for the stochastic setting. We named it Pareto Active Learning for Stochastic Simulators (PALS). The performance of PALS is assessed through numerical experiments over a set of bi-dimensional, bi-objective test problems. PALS exhibits superior performance when compared to other scalarization-based and random-search approaches.

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.

In this paper a time-fractional Black-Scholes model (TFBSM) is considered to study the price change of the underlying fractal transmission system. We develop and analyze a numerical method to solve the TFBSM governing European options. The numerical method combines the exponential B-spline collocation to discretize in space and a finite difference method to discretize in time. The method is shown to be unconditionally stable using von-Neumann analysis. Also, the method is proved to be convergent of order two in space and $2-\mu$ is time, where $\mu$ is order of the fractional derivative. We implement the method on various numerical examples in order to illustrate the accuracy of the method, and validation of the theoretical findings. In addition, as an application, the method is used to price several different European options such as the European call option, European put option, and European double barrier knock-out call option.

Since proposed in [X. Zhang and C.-W. Shu, J. Comput. Phys., 229: 3091--3120, 2010], the Zhang--Shu framework has attracted extensive attention and motivated many bound-preserving (BP) high-order discontinuous Galerkin and finite volume schemes for various hyperbolic equations. A key ingredient in the framework is the decomposition of the cell averages of the numerical solution into a convex combination of the solution values at certain quadrature points, which helps to rewrite high-order schemes as convex combinations of formally first-order schemes. The classic convex decomposition originally proposed by Zhang and Shu has been widely used over the past decade. It was verified, only for the 1D quadratic and cubic polynomial spaces, that the classic decomposition is optimal in the sense of achieving the mildest BP CFL condition. Yet, it remained unclear whether the classic decomposition is optimal in multiple dimensions. In this paper, we find that the classic multidimensional decomposition based on the tensor product of Gauss--Lobatto and Gauss quadratures is generally not optimal, and we discover a novel alternative decomposition for the 2D and 3D polynomial spaces of total degree up to 2 and 3, respectively, on Cartesian meshes. Our new decomposition allows a larger BP time step-size than the classic one, and moreover, it is rigorously proved to be optimal to attain the mildest BP CFL condition, yet requires much fewer nodes. The discovery of such an optimal convex decomposition is highly nontrivial yet meaningful, as it may lead to an improvement of high-order BP schemes for a large class of hyperbolic or convection-dominated equations, at the cost of only a slight and local modification to the implementation code. Several numerical examples are provided to further validate the advantages of using our optimal decomposition over the classic one in terms of efficiency.

This paper is concerned with the design and analysis of least squares solvers for ill-posed PDEs that are conditionally stable. The norms and the regularization term used in the least squares functional are determined by the ingredients of the conditional stability assumption. We are then able to establish a general error bound that, in view of the conditional stability assumption, is qualitatively the best possible, without assuming consistent data. The price for these advantages is to handle dual norms which reduces to verifying suitable inf-sup stability. This, in turn, is done by constructing appropriate Fortin projectors for all sample scenarios. The theoretical findings are illustrated by numerical experiments.

Claiming causal inferences in network settings necessitates careful consideration of the often complex dependency between outcomes for actors. Of particular importance are treatment spillover or outcome interference effects. We consider causal inference when the actors are connected via an underlying network structure. Our key contribution is a model for causality when the underlying network is unobserved and the actor covariates evolve stochastically over time. We develop a joint model for the relational and covariate generating process that avoids restrictive separability assumptions and deterministic network assumptions that do not hold in the majority of social network settings of interest. Our framework utilizes the highly general class of Exponential-family Random Network models (ERNM) of which Markov Random Fields (MRF) and Exponential-family Random Graph models (ERGM) are special cases. We present potential outcome based inference within a Bayesian framework, and propose a simple modification to the exchange algorithm to allow for sampling from ERNM posteriors. We present results of a simulation study demonstrating the validity of the approach. Finally, we demonstrate the value of the framework in a case-study of smoking over time in the context of adolescent friendship networks.

In this work, we adapt the {\em micro-macro} methodology to stochastic differential equations for the purpose of numerically solving oscillatory evolution equations. The models we consider are addressed in a wide spectrum of regimes where oscillations may be slow or fast. We show that through an ad-hoc transformation (the micro-macro decomposition), it is possible to retain the usual orders of convergence of Euler-Maruyama method, that is to say, uniform weak order one and uniform strong order one half. We also show that the same orders of uniform accuracy can be achieved by a simple integral scheme. The advantage of the micro-macro scheme is that, in contrast to the integral scheme, it can be generalized to higher order methods.

This paper investigates the mathematical properties of a stochastic version of the balanced 2D thermal quasigeostrophic (TQG) model of potential vorticity dynamics. This stochastic TQG model is intended as a basis for parametrisation of the dynamical creation of unresolved degrees of freedom in computational simulations of upper ocean dynamics when horizontal buoyancy gradients and bathymetry affect the dynamics, particularly at the submesoscale (250m-10km). Specifically, we have chosen the SALT (Stochastic Advection by Lie Transport) algorithm introduced in [25] and applied in [11,12] as our modelling approach. The SALT approach preserves the Kelvin circulation theorem and an infinite family of integral conservation laws for TQG. The goal of the SALT algorithm is to quantify the uncertainty in the process of up-scaling, or coarse-graining of either observed or synthetic data at fine scales, for use in computational simulations at coarser scales. The present work provides a rigorous mathematical analysis of the solution properties of the thermal quasigeostrophic (TQG) equations with stochastic advection by Lie transport (SALT) [27,28].

Some of the most relevant future applications of multi-agent systems like autonomous driving or factories as a service display mixed-motive scenarios, where agents might have conflicting goals. In these settings agents are likely to learn undesirable outcomes in terms of cooperation under independent learning, such as overly greedy behavior. Motivated from real world societies, in this work we propose to utilize market forces to provide incentives for agents to become cooperative. As demonstrated in an iterated version of the Prisoner's Dilemma, the proposed market formulation can change the dynamics of the game to consistently learn cooperative policies. Further we evaluate our approach in spatially and temporally extended settings for varying numbers of agents. We empirically find that the presence of markets can improve both the overall result and agent individual returns via their trading activities.

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

北京阿比特科技有限公司