亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<form id='1qgDi'></form>

<bdo id='uZg4o'><sup id='ppEra'><div id='Q5snH'><bdo id='pxed2'></bdo></div></sup></bdo>

·

MoDELS · 平坦最小值 · Performer · 極小值 · 鞍點 ·

2023 年 1 月 31 日

An SDE for Modeling SAM: Theory and Insights

Enea Monzio Compagnoni,Luca Biggio,Antonio Orvieto,Frank Norbert Proske,Hans Kersting,Aurelien Lucchi

We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings. We demonstrate that these SDEs are rigorous approximations of the real discrete-time algorithms (in a weak sense, scaling linearly with the step size). Using these models, we then offer an explanation of why SAM prefers flat minima over sharp ones~--~by showing that it minimizes an implicitly regularized loss with a Hessian-dependent noise structure. Finally, we prove that perhaps unexpectedly SAM is attracted to saddle points under some realistic conditions. Our theoretical results are supported by detailed experiments.

相關內容

MoDELS

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 平坦最小值 · 極小點 · Continuity · 全局最小值 ·

2023 年 3 月 23 日

Symmetries, flat minima, and the conserved quantities of gradient flow

Bo Zhao,Iordan Ganev,Robin Walters,Rose Yu,Nima Dehmamy

from arxiv, To appear at ICLR 2023

Empirical studies of the loss landscape of deep networks have revealed that many local minima are connected through low-loss valleys. Yet, little is known about the theoretical origin of such valleys. We present a general framework for finding continuous symmetries in the parameter space, which carve out low-loss valleys. Our framework uses equivariances of the activation functions and can be applied to different layer architectures. To generalize this framework to nonlinear neural networks, we introduce a novel set of nonlinear, data-dependent symmetries. These symmetries can transform a trained model such that it performs similarly on new samples, which allows ensemble building that improves robustness under certain adversarial attacks. We then show that conserved quantities associated with linear symmetries can be used to define coordinates along low-loss valleys. The conserved quantities help reveal that using common initialization methods, gradient flow only explores a small part of the global minimum. By relating conserved quantities to convergence rate and sharpness of the minimum, we provide insights on how initialization impacts convergence and generalizability.

欠估計 · 分解的 · 近似 · 向量空間 · 方差 ·

2023 年 3 月 23 日

The Shrinkage-Delinkage Trade-off: An Analysis of Factorized Gaussian Approximations for Variational Inference

Charles C. Margossian,Lawrence K. Saul

When factorized approximations are used for variational inference (VI), they tend to underestimate the uncertainty -- as measured in various ways -- of the distributions they are meant to approximate. We consider two popular ways to measure the uncertainty deficit of VI: (i) the degree to which it underestimates the componentwise variance, and (ii) the degree to which it underestimates the entropy. To better understand these effects, and the relationship between them, we examine an informative setting where they can be explicitly (and elegantly) analyzed: the approximation of a Gaussian,~$p$, with a dense covariance matrix, by a Gaussian,~$q$, with a diagonal covariance matrix. We prove that $q$ always underestimates both the componentwise variance and the entropy of $p$, \textit{though not necessarily to the same degree}. Moreover we demonstrate that the entropy of $q$ is determined by the trade-off of two competing forces: it is decreased by the shrinkage of its componentwise variances (our first measure of uncertainty) but it is increased by the factorized approximation which delinks the nodes in the graphical model of $p$. We study various manifestations of this trade-off, notably one where, as the dimension of the problem grows, the per-component entropy gap between $p$ and $q$ becomes vanishingly small even though $q$ underestimates every componentwise variance by a constant multiplicative factor. We also use the shrinkage-delinkage trade-off to bound the entropy gap in terms of the problem dimension and the condition number of the correlation matrix of $p$. Finally we present empirical results on both Gaussian and non-Gaussian targets, the former to validate our analysis and the latter to explore its limitations.

狀態空間 · 類別 · 近似 · 集成 · MoDELS ·

2023 年 3 月 23 日

Sequential discretisation schemes for a class of stochastic differential equations and their application to Bayesian filtering

Deniz Akyildiz,Dan Crisan,Joaquin Miguez

We introduce a predictor-corrector discretisation scheme for the numerical integration of a class of stochastic differential equations and prove that it converges with weak order 1.0. The key feature of the new scheme is that it builds up sequentially (and recursively) in the dimension of the state space of the solution, hence making it suitable for approximations of high-dimensional state space models. We show, using the stochastic Lorenz 96 system as a test model, that the proposed method can operate with larger time steps than the standard Euler-Maruyama scheme and, therefore, generate valid approximations with a smaller computational cost. We also introduce the theoretical analysis of the error incurred by the new predictor-corrector scheme when used as a building block for discrete-time Bayesian filters for continuous-time systems. Finally, we assess the performance of several ensemble Kalman filters that incorporate the proposed sequential predictor-corrector Euler scheme and the standard Euler-Maruyama method. The numerical experiments show that the filters employing the new sequential scheme can operate with larger time steps, smaller Monte Carlo ensembles and noisier systems.

SGD · Learning · 隨機梯度下降 · Networking · Neural Networks ·

2023 年 3 月 23 日

The Probabilistic Stability of Stochastic Gradient Descent

Liu Ziyin,Botao Li,Tomer Galanti,Masahito Ueda

from arxiv, preprint

A fundamental open problem in deep learning theory is how to define and understand the stability of stochastic gradient descent (SGD) close to a fixed point. Conventional literature relies on the convergence of statistical moments, esp., the variance, of the parameters to quantify the stability. We revisit the definition of stability for SGD and use the \textit{convergence in probability} condition to define the \textit{probabilistic stability} of SGD. The proposed stability directly answers a fundamental question in deep learning theory: how SGD selects a meaningful solution for a neural network from an enormous number of solutions that may overfit badly. To achieve this, we show that only under the lens of probabilistic stability does SGD exhibit rich and practically relevant phases of learning, such as the phases of the complete loss of stability, incorrect learning, convergence to low-rank saddles, and correct learning. When applied to a neural network, these phase diagrams imply that SGD prefers low-rank saddles when the underlying gradient is noisy, thereby improving the learning performance. This result is in sharp contrast to the conventional wisdom that SGD prefers flatter minima to sharp ones, which we find insufficient to explain the experimental data. We also prove that the probabilistic stability of SGD can be quantified by the Lyapunov exponents of the SGD dynamics, which can easily be measured in practice. Our work potentially opens a new venue for addressing the fundamental question of how the learning algorithm affects the learning outcome in deep learning.

Analysis · 縮放 · Continuity · 類別 · 樣例 ·

2023 年 3 月 22 日

Artificial diffusion for convective and acoustic low Mach number flows I: Analysis of the modified equations, and application to Roe-type schemes

J. Hope-Collins,L. di Mare

from arxiv, 38 pages, 8 figures. Updated version including peer review. Published in Journal of Computational Physics

Three asymptotic limits exist for the Euler equations at low Mach number - purely convective, purely acoustic, and mixed convective-acoustic. Standard collocated density-based numerical schemes for compressible flow are known to fail at low Mach number due to the incorrect asymptotic scaling of the artificial diffusion. Previous studies of this class of schemes have shown a variety of behaviours across the different limits and proposed guidelines for the design of low-Mach schemes. However, these studies have primarily focused on specific discretisations and/or only the convective limit. In this paper, we review the low-Mach behaviour using the modified equations - the continuous Euler equations augmented with artificial diffusion terms - which are representative of a wide range of schemes in this class. By considering both convective and acoustic effects, we show that three diffusion scalings naturally arise. Single- and multiple-scale asymptotic analysis of these scalings shows that many of the important low-Mach features of this class of schemes can be reproduced in a straightforward manner in the continuous setting. As an example, we show that many existing low-Mach Roe-type finite-volume schemes match one of these three scalings. Our analysis corroborates previous analysis of these schemes, and we are able to refine previous guidelines on the design of low-Mach schemes by including both convective and acoustic effects. Discrete analysis and numerical examples demonstrate the behaviour of minimal Roe-type schemes with each of the three scalings for convective, acoustic, and mixed flows.

推斷 · Networking · 點估計 · 欠擬合 · Performer ·

2023 年 3 月 22 日

Bayesian stochastic blockmodeling

Tiago P. Peixoto

from arxiv, 44 pages, 16 figures. Minor typos fixed. Code is freely available as part of graph-tool at //graph-tool.skewed.de . See also the HOWTO at //graph-tool.skewed.de/static/doc/demos/inference/inference.html

This chapter provides a self-contained introduction to the use of Bayesian inference to extract large-scale modular structures from network data, based on the stochastic blockmodel (SBM), as well as its degree-corrected and overlapping generalizations. We focus on nonparametric formulations that allow their inference in a manner that prevents overfitting, and enables model selection. We discuss aspects of the choice of priors, in particular how to avoid underfitting via increased Bayesian hierarchies, and we contrast the task of sampling network partitions from the posterior distribution with finding the single point estimate that maximizes it, while describing efficient algorithms to perform either one. We also show how inferring the SBM can be used to predict missing and spurious links, and shed light on the fundamental limitations of the detectability of modular structures in networks.

估計/估計量 · 線性的 · 極大似然 · 似然 · MoDELS ·

2023 年 3 月 22 日

Exponential Consistency of M-estimators in Generalized Linear Mixed Models

Andrea M. Bratsberg,Magne Thoresen,Abhik Ghosh

from arxiv, Pre-print; under review

Generalized linear mixed models are powerful tools for analyzing clustered data, where the unknown parameters are classically (and most commonly) estimated by the maximum likelihood and restricted maximum likelihood procedures. However, since the likelihood based procedures are known to be highly sensitive to outliers, M-estimators have become popular as a means to obtain robust estimates under possible data contamination. In this paper, we prove that, for sufficiently smooth general loss functions defining the M-estimators in generalized linear mixed models, the tail probability of the deviation between the estimated and the true regression coefficients have an exponential bound. This implies an exponential rate of consistency of these M-estimators under appropriate assumptions, generalizing the existing exponential consistency results from univariate to multivariate responses. We have illustrated this theoretical result further for the special examples of the maximum likelihood estimator and the robust minimum density power divergence estimator, a popular example of model-based M-estimators, in the settings of linear and logistic mixed models, comparing it with the empirical rate of convergence through simulation studies.

分解的 · 估計/估計量 · 極大似然 · MoDELS · 似然 ·

2023 年 3 月 21 日

Quasi Maximum Likelihood Estimation of High-Dimensional Factor Models

Matteo Barigozzi

We review Quasi Maximum Likelihood estimation of factor models for high-dimensional panels of time series. We consider two cases: (1) estimation when no dynamic model for the factors is specified (Bai and Li, 2016); (2) estimation based on the Kalman smoother and the Expectation Maximization algorithm thus allowing to model explicitly the factor dynamics (Doz et al., 2012). Our interest is in approximate factor models, i.e., when we allow for the idiosyncratic components to be mildly cross-sectionally, as well as serially, correlated. Although such setting apparently makes estimation harder, we show, in fact, that factor models do not suffer of the curse of dimensionality problem, but instead they enjoy a blessing of dimensionality property. In particular, we show that if the cross-sectional dimension of the data, $N$, grows to infinity, then: (i) identification of the model is still possible, (ii) the mis-specification error due to the use of an exact factor model log-likelihood vanishes. Moreover, if we let also the sample size, $T$, grow to infinity, we can also consistently estimate all parameters of the model and make inference. The same is true for estimation of the latent factors which can be carried out by weighted least-squares, linear projection, or Kalman filtering/smoothing. We also compare the approaches presented with: Principal Component analysis and the classical, fixed $N$, exact Maximum Likelihood approach. We conclude with a discussion on efficiency of the considered estimators.

Processing（編程語言） · MoDELS · Networking · Neural Networks · 操作 ·

2023 年 3 月 20 日

Solving High-Dimensional Inverse Problems with Auxiliary Uncertainty via Operator Learning with Limited Data

Joseph Hart,Mamikon Gulian,Indu Manickam,Laura Swiler

from arxiv, 29 pages, 10 figures

In complex large-scale systems such as climate, important effects are caused by a combination of confounding processes that are not fully observable. The identification of sources from observations of system state is vital for attribution and prediction, which inform critical policy decisions. The difficulty of these types of inverse problems lies in the inability to isolate sources and the cost of simulating computational models. Surrogate models may enable the many-query algorithms required for source identification, but data challenges arise from high dimensionality of the state and source, limited ensembles of costly model simulations to train a surrogate model, and few and potentially noisy state observations for inversion due to measurement limitations. The influence of auxiliary processes adds an additional layer of uncertainty that further confounds source identification. We introduce a framework based on (1) calibrating deep neural network surrogates to the flow maps provided by an ensemble of simulations obtained by varying sources, and (2) using these surrogates in a Bayesian framework to identify sources from observations via optimization. Focusing on an atmospheric dispersion exemplar, we find that the expressive and computationally efficient nature of the deep neural network operator surrogates in appropriately reduced dimension allows for source identification with uncertainty quantification using limited data. Introducing a variable wind field as an auxiliary process, we find that a Bayesian approximation error approach is essential for reliable source inversion when uncertainty due to wind stresses the algorithm.

離散化 · 圖 · 圖形處理器 · Neural Networks · Networking ·

2019 年 3 月 28 日

Learning Discrete Structures for Graph Neural Networks

Luca Franceschi,Mathias Niepert,Massimiliano Pontil,Xiao He

from arxiv, 18 pages

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

平坦(tan)最小值

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tr id='ogXC7'><strong id='L5Owe'></strong><small id='qYKaz'></small><button id='FrVZo'></button><li id='Jb0av'><noscript id='808pW'><big id='Wru2a'></big><dt id='xCVxS'></dt></noscript></li></tr><ol id='TXQVr'><option id='7wQA3'><table id='w6CUB'><blockquote id='ngWg9'><tbody id='L7zJr'></tbody></blockquote></table></option></ol><u id='9DTac'></u><kbd id='dGxgF'><kbd id='VvoSz'></kbd></kbd>

<code id='3yfwp'><strong id='m2dXD'></strong></code>

<fieldset id='C1zNU'></fieldset>

<span id='70DMy'></span>

<ins id='VUuR5'></ins>

<acronym id='2VS2K'><em id='DOjOO'></em><td id='wQQYJ'><div id='Wrkar'></div></td></acronym><address id='FvbCa'><big id='r3LWp'><big id='VIaWB'></big><legend id='5ccoZ'></legend></big></address>

<i id='c9LiT'><div id='n7P2Q'><ins id='W8wYA'></ins></div></i>

<i id='H8Qm1'></i>