动漫AV观看网站不卡无码,亚洲清纯唯美色图

In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs. In this work we study SGD type optimization methods in the training of fully-connected feedforward DNNs with rectified linear unit (ReLU) activation. We first establish general regularity properties for the risk functions and their generalized gradient functions appearing in the training of such DNNs and, thereafter, we investigate the plain vanilla SGD optimization method in the training of such DNNs under the assumption that the target function under consideration is a constant function. Specifically, we prove under the assumption that the learning rates (the step sizes of the SGD optimization method) are sufficiently small but not $L^1$-summable and under the assumption that the target function is a constant function that the expectation of the riskof the considered SGD process converges in the training of such DNNs to zero as the number of SGD steps increases to infinity.

相關內容

泛函

關注 0

流形 · Integration · 曲率 · 估計/估計量 · 泛化理論 ·

2023 年 8 月 16 日

B-stability of numerical integrators on Riemannian manifolds

Martin Arnold,Elena Celledoni,Ergys ?okaj,Brynjulf Owren,Denise Tumiotto

We propose a generalization of nonlinear stability of numerical one-step integrators to Riemannian manifolds in the spirit of Butcher's notion of B-stability. Taking inspiration from Simpson-Porco and Bullo, we introduce non-expansive systems on such manifolds and define B-stability of integrators. In this first exposition, we provide concrete results for a geodesic version of the Implicit Euler (GIE) scheme. We prove that the GIE method is B-stable on Riemannian manifolds with non-positive sectional curvature. We show through numerical examples that the GIE method is expansive when applied to a certain non-expansive vector field on the 2-sphere, and that the GIE method does not necessarily possess a unique solution for large enough step sizes. Finally, we derive a new improved global error estimate for general Lie group integrators.

正交 · 近似 · 隨機場 · 單位向量 · 流形 ·

2023 年 8 月 16 日

Expected Euler characteristic method for the largest eigenvalue: (Skew-)orthogonal polynomial approach

Satoshi Kuriki

from arxiv, 30 pages, 3 figures, 3 tables

The expected Euler characteristic (EEC) method is an integral-geometric method used to approximate the tail probability of the maximum of a random field on a manifold. Noting that the largest eigenvalue of a real-symmetric or Hermitian matrix is the maximum of the quadratic form of a unit vector, we provide EEC approximation formulas for the tail probability of the largest eigenvalue of orthogonally invariant random matrices of a large class. For this purpose, we propose a version of a skew-orthogonal polynomial by adding a side condition such that it is uniquely defined, and describe the EEC formulas in terms of the (skew-)orthogonal polynomials. In addition, for the classical random matrices (Gaussian, Wishart, and multivariate beta matrices), we analyze the limiting behavior of the EEC approximation as the matrix size goes to infinity under the so-called edge-asymptotic normalization. It is shown that the limit of the EEC formula approximates well the Tracy-Widom distributions in the upper tail area, as does the EEC formula when the matrix size is finite.

離散化 · 潛在 · MoDELS · Performer · Continuity ·

2023 年 8 月 16 日

Variational latent discrete representation for time series modelling

Max Cohen,Maurice Charbit,Sylvain Le Corff

Discrete latent space models have recently achieved performance on par with their continuous counterparts in deep variational inference. While they still face various implementation challenges, these models offer the opportunity for a better interpretation of latent spaces, as well as a more direct representation of naturally discrete phenomena. Most recent approaches propose to train separately very high-dimensional prior models on the discrete latent data which is a challenging task on its own. In this paper, we introduce a latent data model where the discrete state is a Markov chain, which allows fast end-to-end training. The performance of our generative model is assessed on a building management dataset and on the publicly available Electricity Transformer Dataset.

估計/估計量 · MoDELS · 隨機初始化 · UniFormer · Analysis ·

2023 年 8 月 16 日

Error estimates of a bi-fidelity method for a multi-phase Navier-Stokes-Vlasov-Fokker-Planck system with random inputs

Yiwen Lin,Shi Jin

Uniform error estimates of a bi-fidelity method for a kinetic-fluid coupled model with random initial inputs in the fine particle regime are proved in this paper. Such a model is a system coupling the incompressible Navier-Stokes equations to the Vlasov-Fokker-Planck equations for a mixture of the flows with distinct particle sizes. The main analytic tool is the hypocoercivity analysis for the multi-phase Navier-Stokes-Vlasov-Fokker-Planck system with uncertainties, considering solutions in a perturbative setting near the global equilibrium. This allows us to obtain the error estimates in both kinetic and hydrodynamic regimes.

BART · MoDELS · 正則化項 · 有向 · 樣例 ·

2023 年 8 月 15 日

Bayesian additive regression trees for probabilistic programming

Miriana Quiroga,Pablo G Garay,Juan M. Alonso,Juan Martin Loyola,Osvaldo A Martin

from arxiv, 22 pages, 17 figures

Bayesian additive regression trees (BART) is a non-parametric method to approximate functions. It is a black-box method based on the sum of many trees where priors are used to regularize inference, mainly by restricting trees' learning capacity so that no individual tree is able to explain the data, but rather the sum of trees. We discuss BART in the context of probabilistic programming languages (PPL), i.e., we present BART as a primitive that can be used as a component of a probabilistic model rather than as a standalone model. Specifically, we introduce the Python library PyMC-BART, which works by extending PyMC, a library for probabilistic programming. We showcase a few examples of models that can be built using PyMC-BART, discuss recommendations for the selection of hyperparameters, and finally, we close with limitations of our implementation and future directions for improvement.

準則 · 近似 · 正則化項 · 線性的 · 流 ·

2023 年 8 月 15 日

Infinitary cut-elimination via finite approximations

Matteo Acclavio,Gianluca Curzi,Giulio Guerrieri

We investigate non-wellfounded proof systems based on parsimonious logic, a weaker variant of linear logic where the exponential modality ! is interpreted as a constructor for streams over finite data. Logical consistency is maintained at a global level by adapting a standard progressing criterion. We present an infinitary version of cut-elimination based on finite approximations, and we prove that, in presence of the progressing criterion, it returns well-defined non-wellfounded proofs at its limit. Furthermore, we show that cut-elimination preserves the progressive criterion and various regularity conditions internalizing degrees of proof-theoretical uniformity. Finally, we provide a denotational semantics for our systems based on the relational model.

離散化 · 線性的 · Networking · MoDELS · Performer ·

2023 年 8 月 15 日

Finite element discretization of a biological network formation system: a preliminary study

Clarissa Astuto,Daniele Boffi,Fabio Credali

from arxiv, 11 pages, 3 figures, 18 plots, 2 tables

A finite element discretization is developed for the Cai-Hu model, describing the formation of biological networks. The model consists of a non linear elliptic equation for the pressure $p$ and a non linear reaction-diffusion equation for the conductivity tensor $\mathbb{C}$. The problem requires high resolution due to the presence of multiple scales, the stiffness in all its components and the non linearities. We propose a low order finite element discretization in space coupled with a semi-implicit time advancing scheme. The code is {verified} with several numerical tests performed with various choices for the parameters involved in the system. In absence of the exact solution, we apply Richardson extrapolation technique to estimate the order of the method.

估計/估計量 · 推斷 · MoDELS · Networking · Neural Networks ·

2023 年 8 月 15 日

Neural Bayes estimators for censored inference with peaks-over-threshold models

Jordan Richards,Matthew Sainsbury-Dale,Andrew Zammit-Mangion,Rapha?l Huser

Making inference with spatial extremal dependence models can be computationally burdensome since they involve intractable and/or censored likelihoods. Building on recent advances in likelihood-free inference with neural Bayes estimators, that is, neural networks that approximate Bayes estimators, we develop highly efficient estimators for censored peaks-over-threshold models that encode censoring information in the neural network architecture. Our new method provides a paradigm shift that challenges traditional censored likelihood-based inference methods for spatial extremal dependence models. Our simulation studies highlight significant gains in both computational and statistical efficiency, relative to competing likelihood-based approaches, when applying our novel estimators to make inference with popular extremal dependence models, such as max-stable, $r$-Pareto, and random scale mixture process models. We also illustrate that it is possible to train a single neural Bayes estimator for a general censoring level, precluding the need to retrain the network when the censoring level is changed. We illustrate the efficacy of our estimators by making fast inference on hundreds-of-thousands of high-dimensional spatial extremal dependence models to assess extreme particulate matter 2.5 microns or less in diameter (PM2.5) concentration over the whole of Saudi Arabia.

2023 年 8 月 14 日

Dependent rounding with strong negative-correlation, and scheduling on unrelated machines to minimize completion time

David G. Harris

We describe a new dependent-rounding algorithmic framework for bipartite graphs. Given a fractional assignment $y$ of values to edges of graph $G = (U \cup V, E)$, the algorithms return an integral solution $Y$ such that each right-node $v \in V$ has at most one neighboring edge $f$ with $Y_f = 1$, and where the variables $Y_e$ also satisfy broad nonpositive-correlation properties. In particular, for any edges $e_1, e_2$ sharing a left-node $u \in U$, the variables $Y_{e_1}, Y_{e_2}$ have strong negative-correlation properties, i.e. the expectation of $Y_{e_1} Y_{e_2}$ is significantly below $y_{e_1} y_{e_2}$. This algorithm is a refinement of a dependent-rounding algorithm of Im \& Shadloo (2020) based on simulation of Poisson processes. Our algorithm allows greater flexibility, in particular, it allows ``irregular'' fractional assignments, and it gives more refined bounds on the negative correlation. Dependent rounding schemes with negative correlation properties have been used for approximation algorithms for job-scheduling on unrelated machines to minimize weighted completion times (Bansal, Srinivasan, & Svensson (2021), Im & Shadloo (2020), Im & Li (2023)). Using our new dependent-rounding algorithm, among other improvements, we obtain a $1.407$-approximation for this problem. This significantly improves over the prior $1.45$-approximation ratio of Im & Li (2023).

泛函 · 規范化的 · 樣例 · 可辨認的 · MoDELS ·

2023 年 8 月 14 日

Exploring the abyss in Kleene's computability theory

Sam Sanders

from arxiv, 24 pages; this paper is a significant extension ('journal version') of my CiE2023 proceedings paper arXiv:2302.07066

Kleene's computability theory based on the S1-S9 computation schemes constitutes a model for computing with objects of any finite type and extends Turing's 'machine model' which formalises computing with real numbers. A fundamental distinction in Kleene's framework is between normal and non-normal functionals where the former compute the associated Kleene quantifier $\exists^n$ and the latter do not. Historically, the focus was on normal functionals, but recently new non-normal functionals have been studied based on well-known theorems, the weakest among which seems to be the uncountability of the reals. These new non-normal functionals are fundamentally different from historical examples like Tait's fan functional: the latter is computable from $\exists^2$, while the former are computable in $\exists^3$ but not in weaker oracles. Of course, there is a great divide or abyss separating $\exists^2$ and $\exists^3$ and we identify slight variations of our new non-normal functionals that are again computable in $\exists^2$, i.e. fall on different sides of this abyss. Our examples are based on mainstream mathematical notions, like quasi-continuity, Baire classes, bounded variation, and semi-continuity from real analysis.