亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Stochastic differential equations of Langevin-diffusion form have received significant attention, thanks to their foundational role in both Bayesian sampling algorithms and optimization in machine learning. In the latter, they serve as a conceptual model of the stochastic gradient flow in training over-parametrized models. However, the literature typically assumes smoothness of the potential, whose gradient is the drift term. Nevertheless, there are many problems, for which the potential function is not continuously differentiable, and hence the drift is not Lipschitz continuous everywhere. This is exemplified by robust losses and Rectified Linear Units in regression problems. In this paper, we show some foundational results regarding the flow and asymptotic properties of Langevin-type Stochastic Differential Inclusions under assumptions appropriate to the machine-learning settings. In particular, we show strong existence of the solution, as well as asymptotic minimization of the canonical free-energy functional.

相關內容

讓 iOS 8 和 OS X Yosemite 無縫切換的一個新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source:

Mathematical models are essential for understanding and making predictions about systems arising in nature and engineering. Yet, mathematical models are a simplification of true phenomena, thus making predictions subject to uncertainty. Hence, the ability to quantify uncertainties is essential to any modelling framework, enabling the user to assess the importance of certain parameters on quantities of interest and have control over the quality of the model output by providing a rigorous understanding of uncertainty. Peridynamic models are a particular class of mathematical models that have proven to be remarkably accurate and robust for a large class of material failure problems. However, the high computational expense of peridynamic models remains a major limitation, hindering outer-loop applications that require a large number of simulations, for example, uncertainty quantification. This contribution provides a framework to make such computations feasible. By employing a Multilevel Monte Carlo (MLMC) framework, where the majority of simulations are performed using a coarse mesh, and performing relatively few simulations using a fine mesh, a significant reduction in computational cost can be realised, and statistics of structural failure can be estimated. The results show a speed-up factor of 16x over a standard Monte Carlo estimator, enabling the forward propagation of uncertain parameters in a computationally expensive peridynamic model. Furthermore, the multilevel method provides an estimate of both the discretisation error and sampling error, thus improving the confidence in numerical predictions. The performance of the approach is demonstrated through an examination of the statistical size effect in quasi-brittle materials.

Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality. The main idea is to design a classification problem for distinguishing training data from samples from an easy-to-sample noise distribution $q$, in a manner that avoids having to calculate a partition function. It is well-known that the choice of $q$ can severely impact the computational and statistical efficiency of NCE. In practice, a common choice for $q$ is a Gaussian which matches the mean and covariance of the data. In this paper, we show that such a choice can result in an exponentially bad (in the ambient dimension) conditioning of the Hessian of the loss, even for very simple data distributions. As a consequence, both the statistical and algorithmic complexity for such a choice of $q$ will be problematic in practice, suggesting that more complex noise distributions are essential to the success of NCE.

We develop fast and scalable methods for computing reduced-order nonlinear solutions (RONS). RONS was recently proposed as a framework for reduced-order modeling of time-dependent partial differential equations (PDEs), where the modes depend nonlinearly on a set of time-varying parameters. RONS uses a set of ordinary differential equations (ODEs) for the parameters to optimally evolve the shape of the modes to adapt to the PDE's solution. This method has already proven extremely effective in tackling challenging problems such as advection-dominated flows and high-dimensional PDEs. However, as the number of parameters grow, integrating the RONS equation and even its formation become computationally prohibitive. Here, we develop three separate methods to address these computational bottlenecks: symbolic RONS, collocation RONS and regularized RONS. We demonstrate the efficacy of these methods on two examples: Fokker-Planck equation in high dimensions and the Kuramoto-Sivashinsky equation. In both cases, we observe that the proposed methods lead to several orders of magnitude in speedup and accuracy. Our proposed methods extend the applicability of RONS beyond reduced-order modeling by making it possible to use RONS for accurate numerical solution of linear and nonlinear PDEs. Finally, as a special case of RONS, we discuss its application to problems where the PDE's solution is approximated by a neural network, with the time-dependent parameters being the weights and biases of the network. The RONS equations dictate the optimal evolution of the network's parameters without requiring any training.

A lower bound is an important tool for predicting the performance that an estimator can achieve under a particular statistical model. Bayesian bounds are a kind of such bounds which not only utilizes the observation statistics but also includes the prior model information. In reality, however, the true model generating the data is either unknown or simplified when deriving estimators, which motivates the works to derive estimation bounds under modeling mismatch situations. This paper provides a derivation of a Bayesian Cram\'{e}r-Rao bound under model misspecification, defining important concepts such as pseudotrue parameter that were not clearly identified in previous works. The general result is particularized in linear and Gaussian problems, where closed-forms are available and results are used to validate the results.

Tennenbaum's theorem states that the only countable model of Peano arithmetic (PA) with computable arithmetical operations is the standard model of natural numbers. In this paper, we use constructive type theory as a framework to revisit, analyze and generalize this result. The chosen framework allows for a synthetic approach to computability theory, exploiting that, externally, all functions definable in constructive type theory can be shown computable. We then build on this viewpoint and furthermore internalize it by assuming a version of Church's thesis, which expresses that any function on natural numbers is representable by a formula in PA. This assumption provides for a conveniently abstract setup to carry out rigorous computability arguments, even in the theorem's mechanization. Concretely, we constructivize several classical proofs and present one inherently constructive rendering of Tennenbaum's theorem, all following arguments from the literature. Concerning the classical proofs in particular, the constructive setting allows us to highlight differences in their assumptions and conclusions which are not visible classically. All versions are accompanied by a unified mechanization in the Coq proof assistant.

Federated learning is one of the most appealing alternatives to the standard centralized learning paradigm, allowing a heterogeneous set of devices to train a machine learning model without sharing their raw data. However, it requires a central server to coordinate the learning process, thus introducing potential scalability and security issues. In the literature, server-less federated learning approaches like gossip federated learning and blockchain-enabled federated learning have been proposed to mitigate these issues. In this work, we propose a complete overview of these three techniques proposing a comparison according to an integral set of performance indicators, including model accuracy, time complexity, communication overhead, convergence time, and energy consumption. An extensive simulation campaign permits to draw a quantitative analysis considering both feedforward and convolutional neural network models. Results show that gossip federated learning and standard federated solution are able to reach a similar level of accuracy, and their energy consumption is influenced by the machine learning model adopted, the software library, and the hardware used. Differently, blockchain-enabled federated learning represents a viable solution for implementing decentralized learning with a higher level of security, at the cost of an extra energy usage and data sharing. Finally, we identify open issues on the two decentralized federated learning implementations and provide insights on potential extensions and possible research directions in this new research field.

Solving analytically intractable partial differential equations (PDEs) that involve at least one variable defined on an unbounded domain arises in numerous physical applications. Accurately solving unbounded domain PDEs requires efficient numerical methods that can resolve the dependence of the PDE on the unbounded variable over at least several orders of magnitude. We propose a solution to such problems by combining two classes of numerical methods: (i) adaptive spectral methods and (ii) physics-informed neural networks (PINNs). The numerical approach that we develop takes advantage of the ability of physics-informed neural networks to easily implement high-order numerical schemes to efficiently solve PDEs and extrapolate numerical solutions at any point in space and time. We then show how recently introduced adaptive techniques for spectral methods can be integrated into PINN-based PDE solvers to obtain numerical solutions of unbounded domain problems that cannot be efficiently approximated by standard PINNs. Through a number of examples, we demonstrate the advantages of the proposed spectrally adapted PINNs in solving PDEs and estimating model parameters from noisy observations in unbounded domains.

Langevin dynamics are widely used in sampling high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there is strong interest in unadjusted Langevin algorithms (ULA), which directly discretize Langevin dynamics to estimate expectations over the target distribution. We study the use of transport maps that approximately normalize a target distribution as a way to precondition and accelerate the convergence of Langevin dynamics. We show that in continuous time, when a transport map is applied to Langevin dynamics, the result is a Riemannian manifold Langevin dynamics (RMLD) with metric defined by the transport map. This connection suggests more systematic ways of learning metrics, and also yields alternative discretizations of the RMLD described by the map, which we study. Moreover, we show that under certain conditions, when the transport map is used in conjunction with ULA, we can improve the geometric rate of convergence of the output process in the 2--Wasserstein distance. Illustrative numerical results complement our theoretical claims.

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.

北京阿比特科技有限公司