高清一区二区三区视频在线观看_中文字幕无码乱人伦漫画_国产一区二区三区日本韩国_6080无码吃奶在线观看视频_波多野结衣久久精品99E_黑人欧美精美视频一区_产在线国产视频91

This paper is concerned with the asymptotic distribution of the largest eigenvalues for some nonlinear random matrix ensemble stemming from the study of neural networks. More precisely we consider $M= \frac{1}{m} YY^\top$ with $Y=f(WX)$ where $W$ and $X$ are random rectangular matrices with i.i.d. centered entries. This models the data covariance matrix or the Conjugate Kernel of a single layered random Feed-Forward Neural Network. The function $f$ is applied entrywise and can be seen as the activation function of the neural network. We show that the largest eigenvalue has the same limit (in probability) as that of some well-known linear random matrix ensembles. In particular, we relate the asymptotic limit of the largest eigenvalue for the nonlinear model to that of an information-plus-noise random matrix, establishing a possible phase transition depending on the function $f$ and the distribution of $W$ and $X$. This may be of interest for applications to machine learning.

相關內容

Neural Networks

關注 1648

神(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)（Neural Networks）是世界上三個(ge)(ge)(ge)最古老的(de)(de)神(shen)(shen)(shen)(shen)經(jing)(jing)建模學(xue)(xue)會(hui)的(de)(de)檔(dang)案期刊:國際神(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)學(xue)(xue)會(hui)(INNS)、歐洲神(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)學(xue)(xue)會(hui)(ENNS)和(he)(he)(he)(he)(he)(he)日(ri)本(ben)神(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)學(xue)(xue)會(hui)(JNNS)。神(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)提供了一(yi)個(ge)(ge)(ge)論(lun)(lun)壇，以發(fa)(fa)展和(he)(he)(he)(he)(he)(he)培育一(yi)個(ge)(ge)(ge)國際社會(hui)的(de)(de)學(xue)(xue)者和(he)(he)(he)(he)(he)(he)實踐者感興趣的(de)(de)所有(you)方(fang)面的(de)(de)神(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)和(he)(he)(he)(he)(he)(he)相(xiang)關方(fang)法的(de)(de)計算(suan)智能。神(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)歡迎高質(zhi)量論(lun)(lun)文的(de)(de)提交，有(you)助于全面的(de)(de)神(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)研(yan)究(jiu)，從行為和(he)(he)(he)(he)(he)(he)大(da)腦建模，學(xue)(xue)習算(suan)法，通過數學(xue)(xue)和(he)(he)(he)(he)(he)(he)計算(suan)分(fen)析，系統(tong)(tong)的(de)(de)工(gong)程(cheng)和(he)(he)(he)(he)(he)(he)技術(shu)應用(yong)，大(da)量使用(yong)神(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)的(de)(de)概念和(he)(he)(he)(he)(he)(he)技術(shu)。這一(yi)獨(du)特而廣泛的(de)(de)范圍促(cu)進了生(sheng)物(wu)和(he)(he)(he)(he)(he)(he)技術(shu)研(yan)究(jiu)之間的(de)(de)思(si)想交流(liu)，并(bing)有(you)助于促(cu)進對生(sheng)物(wu)啟(qi)發(fa)(fa)的(de)(de)計算(suan)智能感興趣的(de)(de)跨學(xue)(xue)科社區的(de)(de)發(fa)(fa)展。因此，神(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)編委會(hui)代(dai)表的(de)(de)專家領域包括(kuo)心理學(xue)(xue)，神(shen)(shen)(shen)(shen)經(jing)(jing)生(sheng)物(wu)學(xue)(xue)，計算(suan)機科學(xue)(xue)，工(gong)程(cheng)，數學(xue)(xue)，物(wu)理。該雜志發(fa)(fa)表文章、信(xin)件和(he)(he)(he)(he)(he)(he)評論(lun)(lun)以及給編輯的(de)(de)信(xin)件、社論(lun)(lun)、時事、軟件調(diao)查和(he)(he)(he)(he)(he)(he)專利(li)信(xin)息(xi)。文章發(fa)(fa)表在五個(ge)(ge)(ge)部分(fen)之一(yi):認知科學(xue)(xue)，神(shen)(shen)(shen)(shen)經(jing)(jing)科學(xue)(xue)，學(xue)(xue)習系統(tong)(tong)，數學(xue)(xue)和(he)(he)(he)(he)(he)(he)計算(suan)分(fen)析、工(gong)程(cheng)和(he)(he)(he)(he)(he)(he)應用(yong)。官網(wang)地址(zhi)：

奇異值分解 · 優化器 · 奇異值 · 奇異的 · FAST ·

2022 年 4 月 18 日

Fast optimization of common basis for matrix set through Common Singular Value Decomposition

Jarek Duda

from arxiv, 4 pages, 3 figures

SVD (singular value decomposition) is one of the basic tools of machine learning, allowing to optimize basis for a given matrix. However, sometimes we have a set of matrices $\{A_k\}_k$ instead, and would like to optimize a single common basis for them: find orthogonal matrices $U$, $V$, such that $\{U^T A_k V\}$ set of matrices is somehow simpler. For example DCT-II is orthonormal basis of functions commonly used in image/video compression - as discussed here, this kind of basis can be quickly automatically optimized for a given dataset. While also discussed gradient descent optimization might be computationally costly, there is proposed CSVD (common SVD): fast general approach based on SVD. Specifically, we choose $U$ as built of eigenvectors of $\sum_i (w_k)^q (A_k A_k^T)^p$ and $V$ of $\sum_k (w_k)^q (A_k^T A_k)^p$, where $w_k$ are their weights, $p,q>0$ are some chosen powers e.g. 1/2, optionally with normalization e.g. $A \to A - rc^T$ where $r_i=\sum_j A_{ij}, c_j =\sum_i A_{ij}$.

Conformer · 頻率主義學派 · MoDELS · 優化器 · 覆蓋 ·

2022 年 4 月 18 日

Optimal Conformal Prediction for Small Areas

Elizabeth Bersson,Peter D. Hoff

from arxiv, 24 pages, 9 figures

Existing inferential methods for small area data involve a trade-off between maintaining area-level frequentist coverage rates and improving inferential precision via the incorporation of indirect information. In this article, we propose a method to obtain an area-level prediction region for a future observation which mitigates this trade-off. The proposed method takes a conformal prediction approach in which the conformity measure is the posterior predictive density of a working model that incorporates indirect information. The resulting prediction region has guaranteed frequentist coverage regardless of the working model, and, if the working model assumptions are accurate, the region has minimum expected volume compared to other regions with the same coverage rate. When constructed under a normal working model, we prove such a prediction region is an interval and construct an efficient algorithm to obtain the exact interval. We illustrate the performance of our method through simulation studies and an application to EPA radon survey data.

Continuity · 估計/估計量 · 哈密頓矩陣 · 模型評估 · 可約的 ·

2022 年 4 月 17 日

Self-learning Emulators and Eigenvector Continuation

Avik Sarkar,Dean Lee

from arxiv, 6 + 5 pages (main + supplemental), 5 + 7 figures (main + supplemental), additional discussion, references, and examples added

Emulators that can bypass computationally expensive scientific calculations with high accuracy and speed can enable new studies of fundamental science as well as more potential applications. In this work we discuss solving a system of constraint equations efficiently using a self-learning emulator. A self-learning emulator is an active learning protocol that can be used with any emulator that faithfully reproduces the exact solution at selected training points. The key ingredient is a fast estimate of the emulator error that becomes progressively more accurate as the emulator is improved, and the accuracy of the error estimate can be corrected using machine learning. We illustrate with three examples. The first uses cubic spline interpolation to find the solution of a transcendental equation with variable coefficients. The second example compares a spline emulator and a reduced basis method emulator to find solutions of a parameterized differential equation. The third example uses eigenvector continuation to find the eigenvectors and eigenvalues of a large Hamiltonian matrix that depends on several control parameters.

方差減小 · 可約的 · 方差 · 優化器 · Batch Size ·

2022 年 4 月 16 日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Yuri Kinoshita,Taiji Suzuki

The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications. Especially, its variance reduced versions have nowadays gained particular attention. In this paper, we study two variants of this kind, namely, the Stochastic Variance Reduced Gradient Langevin Dynamics and the Stochastic Recursive Gradient Langevin Dynamics. We prove their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality which are weaker conditions than those used in prior works for these algorithms. With the batch size and the inner loop length set to $\sqrt{n}$, the gradient complexity to achieve an $\epsilon$-precision is $\tilde{O}((n+dn^{1/2}\epsilon^{-1})\gamma^2 L^2\alpha^{-2})$, which is an improvement from any previous analyses. We also show some essential applications of our result to non-convex optimization.

離散化 · 極小點 · 路徑 · Performer · 計算成本 ·

2022 年 4 月 15 日

Convergence of the Discrete Minimum Energy Path

Xuanyu Liu,Huajie Chen,Christoph Ortner

from arxiv, arXiv admin note: text overlap with arXiv:2204.00984

The minimum energy path (MEP) describes the mechanism of reaction, and the energy barrier along the path can be used to calculate the reaction rate in thermal systems. The nudged elastic band (NEB) method is one of the most commonly used schemes to compute MEPs numerically. It approximates an MEP by a discrete set of configuration images, where the discretization size determines both computational cost and accuracy of the simulations. In this paper, we consider a discrete MEP to be a stationary state of the NEB method and prove an optimal convergence rate of the discrete MEP with respect to the number of images. Numerical simulations for the transitions of some several proto-typical model systems are performed to support the theory.

奇異的 · 線性的 · 模型評估 · SimPLe · CASE ·

2022 年 4 月 15 日

Singular quadratic eigenvalue problems: Linearization and weak condition numbers

Daniel Kressner,Ivana ?ain Glibi?

The numerical solution of singular eigenvalue problems is complicated by the fact that small perturbations of the coefficients may have an arbitrarily bad effect on eigenvalue accuracy. However, it has been known for a long time that such perturbations are exceptional and standard eigenvalue solvers, such as the QZ algorithm, tend to yield good accuracy despite the inevitable presence of roundoff error. Recently, Lotz and Noferini quantified this phenomenon by introducing the concept of $\delta$-weak eigenvalue condition numbers. In this work, we consider singular quadratic eigenvalue problems and two popular linearizations. Our results show that a correctly chosen linearization increases $\delta$-weak eigenvalue condition numbers only marginally, justifying the use of these linearizations in numerical solvers also in the singular case. We propose a very simple but often effective algorithm for computing well-conditioned eigenvalues of a singular quadratic eigenvalue problems by adding small random perturbations to the coefficients. We prove that the eigenvalue condition number is, with high probability, a reliable criterion for detecting and excluding spurious eigenvalues created from the singular part.

Neural Networks · Networking · 泛函 · 損失函數（機器學習） · PDE ·

2022 年 4 月 15 日

Solving the Dirichlet problem for the Monge-Ampère equation using neural networks

Kaj Nystr?m,Matias Vestberg

from arxiv, 22 pages, 7 figures, 3 tables

The Monge-Amp\`ere equation is a fully nonlinear partial differential equation (PDE) of fundamental importance in analysis, geometry and in the applied sciences. In this paper we solve the Dirichlet problem associated with the Monge-Amp\`ere equation using neural networks and we show that an ansatz using deep input convex neural networks can be used to find the unique convex solution. As part of our analysis we study the effect of singularities, discontinuities and noise in the source function, we consider nontrivial domains, and we investigate how the method performs in higher dimensions. We also compare this method to an alternative approach in which standard feed-forward networks are used together with a loss function which penalizes lack of convexity.

殘差網絡 · Networking · 正則化項 · 泛函 · 層 ·

2022 年 4 月 14 日

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

Rama Cont,Alain Rossier,RenYuan Xu

We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.

泛函 · 表示 · 規范化的 · 閉式 · 線性的 ·

2022 年 4 月 14 日

On the representation of non-holonomic univariate power series

Bertrand Teguia Tabuguia,Wolfram Koepf

from arxiv, 20 pages; 26 references. Update: revised version

Holonomic functions play an essential role in Computer Algebra since they allow the application of many symbolic algorithms. Among all algorithmic attempts to find formulas for power series, the holonomic property remains the most important requirement to be satisfied by the function under consideration. The targeted functions mainly summarize that of meromorphic functions. However, expressions like $\tan(z)$, $z/(\exp(z)-1)$, $\sec(z)$, etc., particularly, reciprocals, quotients and compositions of holonomic functions, are generally not holonomic. Therefore their power series are inaccessible by the holonomic framework. From the mathematical dictionaries, one can observe that most of the known closed-form formulas of non-holonomic power series involve another sequence whose evaluation depends on some finite summations. In the case of $\tan(z)$ and $\sec(z)$ the corresponding sequences are the Bernoulli and Euler numbers, respectively. Thus providing a symbolic approach that yields complete representations when linear summations for power series coefficients of non-holonomic functions appear, might be seen as a step forward towards the representation of non-holonomic power series. By adapting the method of ansatz with undetermined coefficients, we build an algorithm that computes least-order quadratic differential equations with polynomial coefficients for a large class of non-holonomic functions. A differential equation resulting from this procedure is converted into a recurrence equation by applying the Cauchy product formula and rewriting powers into polynomials and derivatives into shifts. Finally, using enough initial values we are able to give normal form representations to characterize several non-holonomic power series and prove non-trivial identities. We discuss this algorithm and its implementation for Maple 2022.

MoDELS · 學成 · Networking · 動力系統 · Neural Networks ·

2022 年 2 月 4 日

On Neural Differential Equations

Patrick Kidger

from arxiv, Doctoral thesis, Mathematical Institute, University of Oxford. 231 pages

The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.