国产亚洲欧美日韩精品色狠二区-人人操人人干人人上

Stochastic gradient descent with momentum (SGDM) has been widely used in many machine learning and statistical applications. Despite the observed empirical benefits of SGDM over traditional SGD, the theoretical understanding of the role of momentum for different learning rates in the optimization process remains widely open. We analyze the finite-sample convergence rate of SGDM under the strongly convex settings and show that, with a large batch size, the mini-batch SGDM converges faster than the mini-batch SGD to a neighborhood of the optimal value. Additionally, our findings, supported by theoretical analysis and numerical experiments, indicate that SGDM permits broader choices of learning rates. Furthermore, we analyze the Polyak-averaging version of the SGDM estimator, establish its asymptotic normality, and justify its asymptotic equivalence to the averaged SGD. The asymptotic distribution of the averaged SGDM enables uncertainty quantification of the algorithm output and statistical inference of the model parameters.

相關內容

動量(liang)

關注 57

動(dong)量(liang)方法 (Polyak, 1964) 旨在(zai)加速學習，特別是處(chu)理高(gao)曲(qu)率、小但一(yi)致的梯(ti)度(du)，或是帶噪聲的梯(ti)度(du)。動(dong)量(liang)算法積累了(le)之(zhi)前梯(ti)度(du)指數級衰減的移(yi)(yi)動(dong)平均，并且繼(ji)續沿該方向移(yi)(yi)動(dong)。

評論員 · 統計量 · MoDELS · Performer · 向量空間 ·

2024 年 3 月 14 日

The statistical thermodynamics of generative diffusion models: Phase transitions, symmetry breaking and critical instability

Luca Ambrogioni

Generative diffusion models have achieved spectacular performance in many areas of generative modeling. While the fundamental ideas behind these models come from non-equilibrium physics, variational inference and stochastic calculus, in this paper we show that many aspects of these models can be understood using the tools of equilibrium statistical mechanics. Using this reformulation, we show that generative diffusion models undergo second-order phase transitions corresponding to symmetry breaking phenomena. We show that these phase-transitions are always in a mean-field universality class, as they are the result of a self-consistency condition in the generative dynamics. We argue that the critical instability that arises from the phase transitions lies at the heart of their generative capabilities, which are characterized by a set of mean field critical exponents. Furthermore, using the statistical physics of disordered systems, we show that memorization can be understood as a form of critical condensation corresponding to a disordered phase transition. Finally, we show that the dynamic equation of the generative process can be interpreted as a stochastic adiabatic transformation that minimizes the free energy while keeping the system in thermal equilibrium.

MoDELS · Performer · binary · 相互獨立的 · MCMC ·

2024 年 3 月 14 日

Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models

Filippo Ascolani,Gareth O. Roberts,Giacomo Zanella

We study general coordinate-wise MCMC schemes (such as Metropolis-within-Gibbs samplers), which are commonly used to fit Bayesian non-conjugate hierarchical models. We relate their convergence properties to the ones of the corresponding (potentially not implementable) Gibbs sampler through the notion of conditional conductance. This allows us to study the performances of popular Metropolis-within-Gibbs schemes for non-conjugate hierarchical models, in high-dimensional regimes where both number of datapoints and parameters increase. Given random data-generating assumptions, we establish dimension-free convergence results, which are in close accordance with numerical evidences. Applications to Bayesian models for binary regression with unknown hyperparameters and discretely observed diffusions are also discussed. Motivated by such statistical applications, auxiliary results of independent interest on approximate conductances and perturbation of Markov operators are provided.

置信度 · CSS · 推斷 · 統計量 · 樣本 ·

2024 年 3 月 14 日

Time-uniform central limit theory and asymptotic confidence sequences

Ian Waudby-Smith,David Arbour,Ritwik Sinha,Edward H. Kennedy,Aaditya Ramdas

from arxiv, 69 pages, 10 figures

Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under weak assumptions and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals, adding to the literature on confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time -- which provide valid inference at arbitrary stopping times and incur no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, enjoying finite-sample guarantees but not the aforementioned broad applicability of asymptotic confidence intervals. This work provides a definition for "asymptotic CSs" and a general recipe for deriving them. Asymptotic CSs forgo nonasymptotic validity for CLT-like versatility and (asymptotic) time-uniform guarantees. While the CLT approximates the distribution of a sample average by that of a Gaussian for a fixed sample size, we use strong invariance principles (stemming from the seminal 1960s work of Strassen) to uniformly approximate the entire sample average process by an implicit Gaussian process. As an illustration, we derive asymptotic CSs for the average treatment effect in observational studies (for which nonasymptotic bounds are essentially impossible to derive even in the fixed-time regime) as well as randomized experiments, enabling causal inference in sequential environments.

Analysis · 離散化 · CASES · 欠估計 · 講稿 ·

2024 年 3 月 13 日

A comparative analysis of transient finite-strain coupled diffusion-deformation theories for hydrogels

Jorge-Humberto Urrea-Quintero,Michele Marino,Thomas Wick,Udo Nackenhorst

This work presents a comparative review and classification between some well-known thermodynamically consistent models of hydrogel behavior in a large deformation setting, specifically focusing on solvent absorption/desorption and its impact on mechanical deformation and network swelling. The proposed discussion addresses formulation aspects, general mathematical classification of the governing equations, and numerical implementation issues based on the finite element method. The theories are presented in a unified framework demonstrating that, despite not being evident in some cases, all of them follow equivalent thermodynamic arguments. A detailed numerical analysis is carried out where Taylor-Hood elements are employed in the spatial discretization to satisfy the inf-sup condition and to prevent spurious numerical oscillations. The resulting discrete problems are solved using the FEniCS platform through consistent variational formulations, employing both monolithic and staggered approaches. We conduct benchmark tests on various hydrogel structures, demonstrating that major differences arise from the chosen volumetric response of the hydrogel. The significance of this choice is frequently underestimated in the state-of-the-art literature but has been shown to have substantial implications on the resulting hydrogel behavior.

MoDELS · Performer · 可辨認的 · 可約的 · 正交 ·

2024 年 3 月 13 日

Model order reduction for transient coupled diffusion-deformation of hydrogels

Gopal Agarwal,Jorge-Humberto Urrea-Quintero,Henning Wessels,Thomas Wick

This study introduces a reduced-order model (ROM) for analyzing the transient diffusion-deformation of hydrogels. The full-order model (FOM) describing hydrogel transient behavior consists of a coupled system of partial differential equations in which chemical potential and displacements are coupled. This system is formulated in a monolithic fashion and solved using the Finite Element Method (FEM). The ROM employs proper orthogonal decomposition as a model order reduction approach. We test the ROM performance through benchmark tests on hydrogel swelling behavior and a case study simulating co-axial printing. Finally, we embed the ROM into an optimization problem to identify the model material parameters of the coupled problem using full-field data. We verify that the ROM can predict hydrogels' diffusion-deformation evolution and material properties, significantly reducing computation time compared to the FOM. The results demonstrate the ROM's accuracy and computational efficiency. This work paths the way towards advanced practical applications of ROMs, e.g., in the context of feedback error control in hydrogel 3D printing.

估計/估計量 · MASS · 模型評估 · 泛函 · Analysis ·

2024 年 3 月 13 日

Non-linear collision-induced breakage equation: finite volume and semi-analytical methods

Sanjiv Kumar Bariwal,Saddam Hussain,Rajesh Kumar

The non-linear collision-induced breakage equation has significant applications in particulate processes. Two semi-analytical techniques, namely homotopy analysis method (HAM) and accelerated homotopy perturbation method (AHPM) are investigated along with the well-known finite volume method (FVM) to comprehend the dynamical behavior of the non-linear system, i.e., the concentration function, the total number and the total mass of the particles in the system. The theoretical convergence analyses of the series solutions of HAM and AHPM are discussed. In addition, the error estimations of the truncated solutions of both methods equip the maximum absolute error bound. To justify the applicability and accuracy of these methods, numerical simulations are compared with the findings of FVM and analytical solutions considering three physical problems.

MCMC · 馬爾可夫鏈蒙特卡羅 · Markov · 蒙特卡羅 · 馬爾可夫鏈 ·

2024 年 3 月 13 日

Efficient geometric Markov chain Monte Carlo for nonlinear Bayesian inversion enabled by derivative-informed neural operators

Lianghao Cao,Thomas O'Leary-Roseberry,Omar Ghattas

We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional nonlinear Bayesian inverse problems. While geometric MCMC employs high-quality proposals that adapt to posterior local geometry, it requires computing local gradient and Hessian information of the log-likelihood, incurring a high cost when the parameter-to-observable (PtO) map is defined through expensive model simulations. We consider a delayed-acceptance geometric MCMC method driven by a neural operator surrogate of the PtO map, where the proposal is designed to exploit fast surrogate approximations of the log-likelihood and, simultaneously, its gradient and Hessian. To achieve a substantial speedup, the surrogate needs to be accurate in predicting both the observable and its parametric derivative (the derivative of the observable with respect to the parameter). Training such a surrogate via conventional operator learning using input--output samples often demands a prohibitively large number of model simulations. In this work, we present an extension of derivative-informed operator learning [O'Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] using input--output--derivative training samples. Such a learning method leads to derivative-informed neural operator (DINO) surrogates that accurately predict the observable and its parametric derivative at a significantly lower training cost than the conventional method. Cost and error analysis for reduced basis DINO surrogates are provided. Numerical studies on PDE-constrained Bayesian inversion demonstrate that DINO-driven MCMC generates effective posterior samples 3--9 times faster than geometric MCMC and 60--97 times faster than prior geometry-based MCMC. Furthermore, the training cost of DINO surrogates breaks even after collecting merely 10--25 effective posterior samples compared to geometric MCMC.

正交 · 全 · 向量化 · 泛函 · 近似 ·

2024 年 3 月 12 日

Near-optimal convergence of the full orthogonalization method

Tyler Chen,Gérard Meurant

We establish a near-optimality guarantee for the full orthogonalization method (FOM), showing that the overall convergence of FOM is nearly as good as GMRES. In particular, we prove that at every iteration $k$, there exists an iteration $j\leq k$ for which the FOM residual norm at iteration $j$ is no more than $\sqrt{k+1}$ times larger than the GMRES residual norm at iteration $k$. This bound is sharp, and it has implications for algorithms for approximating the action of a matrix function on a vector.

對稱矩陣 · 秩 · 分解的 · MoDELS · 推斷 ·

2024 年 3 月 11 日

A multiscale cavity method for sublinear-rank symmetric matrix factorization

Jean Barbier,Justin Ko,Anas A. Rahman

We consider a statistical model for symmetric matrix factorization with additive Gaussian noise in the high-dimensional regime where the rank $M$ of the signal matrix to infer scales with its size $N$ as $M = o(N^{1/10})$. Allowing for a $N$-dependent rank offers new challenges and requires new methods. Working in the Bayesian-optimal setting, we show that whenever the signal has i.i.d. entries the limiting mutual information between signal and data is given by a variational formula involving a rank-one replica symmetric potential. In other words, from the information-theoretic perspective, the case of a (slowly) growing rank is the same as when $M = 1$ (namely, the standard spiked Wigner model). The proof is primarily based on a novel multiscale cavity method allowing for growing rank along with some information-theoretic identities on worst noise for the Gaussian vector channel. We believe that the cavity method developed here will play a role in the analysis of a broader class of inference and spin models where the degrees of freedom are large arrays instead of vectors.

XAI · 查準率/準確率 · 相似度 · 顯著圖 · 泛化理論 ·

2022 年 5 月 17 日

A psychological theory of explainability

Scott Cheng-Hsin Yang,Tomas Folke,Patrick Shafto

from arxiv, 14 pages, 2 figures, ICML (accepted, pre camera-ready version)

The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.