成年人日屄视频免费观看_亚洲欧美中文日韩A_最黄网站在线观看_一区二区三区视频精品店在线视频_国产欧美日韩综合视频一区二区在线观看_日本十八禁黄无遮挡禁网站免费_欧美日韩一区二区精品视频在线

We consider the approximation rates of shallow neural networks with respect to the variation norm. Upper bounds on these rates have been established for sigmoidal and ReLU activation functions, but it has remained an important open problem whether these rates are sharp. In this article, we provide a solution to this problem by proving sharp lower bounds on the approximation rates for shallow neural networks, which are obtained by lower bounding the $L^2$-metric entropy of the convex hull of the neural network basis functions. In addition, our methods also give sharp lower bounds on the Kolmogorov $n$-widths of this convex hull, which show that the variation spaces corresponding to shallow neural networks cannot be efficiently approximated by linear methods. These lower bounds apply to both sigmoidal activation functions with bounded variation and to activation functions which are a power of the ReLU. Our results also quantify how much stronger the Barron spectral norm is than the variation norm and, combined with previous results, give the asymptotics of the $L^\infty$-metric entropy up to logarithmic factors in the case of the ReLU activation function.

相關內容

Neural Networks

關注 1648

神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)（Neural Networks）是世界上三個最古老的(de)神(shen)(shen)經(jing)(jing)(jing)(jing)建模學(xue)(xue)(xue)(xue)(xue)(xue)會(hui)(hui)的(de)檔(dang)案(an)期刊:國際神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)學(xue)(xue)(xue)(xue)(xue)(xue)會(hui)(hui)(INNS)、歐(ou)洲神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)學(xue)(xue)(xue)(xue)(xue)(xue)會(hui)(hui)(ENNS)和(he)(he)(he)(he)(he)(he)(he)(he)日本神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)學(xue)(xue)(xue)(xue)(xue)(xue)會(hui)(hui)(JNNS)。神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)提供了(le)一(yi)(yi)個論壇，以發(fa)展和(he)(he)(he)(he)(he)(he)(he)(he)培育一(yi)(yi)個國際社(she)會(hui)(hui)的(de)學(xue)(xue)(xue)(xue)(xue)(xue)者和(he)(he)(he)(he)(he)(he)(he)(he)實踐者感(gan)(gan)興(xing)(xing)趣(qu)的(de)所有(you)(you)方面(mian)的(de)神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)和(he)(he)(he)(he)(he)(he)(he)(he)相關方法的(de)計(ji)算(suan)智(zhi)能。神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)歡迎高質量(liang)論文(wen)的(de)提交(jiao)，有(you)(you)助于全面(mian)的(de)神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)研(yan)究(jiu)，從行為和(he)(he)(he)(he)(he)(he)(he)(he)大(da)腦建模，學(xue)(xue)(xue)(xue)(xue)(xue)習算(suan)法，通過數學(xue)(xue)(xue)(xue)(xue)(xue)和(he)(he)(he)(he)(he)(he)(he)(he)計(ji)算(suan)分析，系(xi)統的(de)工(gong)(gong)程和(he)(he)(he)(he)(he)(he)(he)(he)技(ji)(ji)術應用，大(da)量(liang)使(shi)用神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)的(de)概念和(he)(he)(he)(he)(he)(he)(he)(he)技(ji)(ji)術。這一(yi)(yi)獨特而廣泛的(de)范(fan)圍(wei)促進了(le)生(sheng)物(wu)和(he)(he)(he)(he)(he)(he)(he)(he)技(ji)(ji)術研(yan)究(jiu)之間的(de)思(si)想交(jiao)流，并有(you)(you)助于促進對生(sheng)物(wu)啟發(fa)的(de)計(ji)算(suan)智(zhi)能感(gan)(gan)興(xing)(xing)趣(qu)的(de)跨學(xue)(xue)(xue)(xue)(xue)(xue)科(ke)社(she)區的(de)發(fa)展。因(yin)此，神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)編委會(hui)(hui)代表的(de)專家領(ling)域包(bao)括心(xin)理學(xue)(xue)(xue)(xue)(xue)(xue)，神(shen)(shen)經(jing)(jing)(jing)(jing)生(sheng)物(wu)學(xue)(xue)(xue)(xue)(xue)(xue)，計(ji)算(suan)機科(ke)學(xue)(xue)(xue)(xue)(xue)(xue)，工(gong)(gong)程，數學(xue)(xue)(xue)(xue)(xue)(xue)，物(wu)理。該雜志發(fa)表文(wen)章、信件(jian)和(he)(he)(he)(he)(he)(he)(he)(he)評論以及給編輯的(de)信件(jian)、社(she)論、時事(shi)、軟件(jian)調(diao)查和(he)(he)(he)(he)(he)(he)(he)(he)專利信息。文(wen)章發(fa)表在(zai)五(wu)個部(bu)分之一(yi)(yi):認知科(ke)學(xue)(xue)(xue)(xue)(xue)(xue)，神(shen)(shen)經(jing)(jing)(jing)(jing)科(ke)學(xue)(xue)(xue)(xue)(xue)(xue)，學(xue)(xue)(xue)(xue)(xue)(xue)習系(xi)統，數學(xue)(xue)(xue)(xue)(xue)(xue)和(he)(he)(he)(he)(he)(he)(he)(he)計(ji)算(suan)分析、工(gong)(gong)程和(he)(he)(he)(he)(he)(he)(he)(he)應用。官網(wang)地(di)址：

統計量 · 優化器 · 可交換的 · 近似 · Pair ·

2021 年 11 月 1 日

Bounds for the chi-square approximation of Friedman's statistic by Stein's method

Robert E. Gaunt,Gesine Reinert

Friedman's chi-square test is a non-parametric statistical test for $r\geq2$ treatments across $n\ge1$ trials to assess the null hypothesis that there is no treatment effect. We use Stein's method with an exchangeable pair coupling to derive an explicit bound on the distance between the distribution of Friedman's statistic and its limiting chi-square distribution, measured using smooth test functions. Our bound is of the optimal order $n^{-1}$, and also has an optimal dependence on the parameter $r$, in that the bound tends to zero if and only if $r/n\rightarrow0$. From this bound, we deduce a Kolmogorov distance bound that decays to zero under the weaker condition $r^{1/2}/n\rightarrow0$.

秩 · 特化 · 近似 · Frobenius 范數 · 分解的 ·

2021 年 11 月 1 日

Improved Algorithms for Low Rank Approximation from Sparsity

David P. Woodruff,Taisuke Yasuda

from arxiv, To appear in SODA 2022

We overcome two major bottlenecks in the study of low rank approximation by assuming the low rank factors themselves are sparse. Specifically, (1) for low rank approximation with spectral norm error, we show how to improve the best known $\mathsf{nnz}(\mathbf A) k / \sqrt{\varepsilon}$ running time to $\mathsf{nnz}(\mathbf A)/\sqrt{\varepsilon}$ running time plus low order terms depending on the sparsity of the low rank factors, and (2) for streaming algorithms for Frobenius norm error, we show how to bypass the known $\Omega(nk/\varepsilon)$ memory lower bound and obtain an $s k (\log n)/ \mathrm{poly}(\varepsilon)$ memory bound, where $s$ is the number of non-zeros of each low rank factor. Although this algorithm is inefficient, as it must be under standard complexity theoretic assumptions, we also present polynomial time algorithms using $\mathrm{poly}(s,k,\log n,\varepsilon^{-1})$ memory that output rank $k$ approximations supported on a $O(sk/\varepsilon)\times O(sk/\varepsilon)$ submatrix. Both the prior $\mathsf{nnz}(\mathbf A) k / \sqrt{\varepsilon}$ running time and the $nk/\varepsilon$ memory for these problems were long-standing barriers; our results give a natural way of overcoming them assuming sparsity of the low rank factors.

矩陣論 · Batch Size · Neural Networks · 縮放 · 隨機梯度下降 ·

2021 年 10 月 31 日

Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training

Diego Granziol,Stefan Zohren,Stephen Roberts

We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory. We demonstrate that the magnitude of the extremal values of the batch Hessian are larger than those of the empirical Hessian. We also derive similar results for the Generalised Gauss-Newton matrix approximation of the Hessian. As a consequence of our theorems we derive an analytical expressions for the maximal learning rates as a function of batch size, informing practical training regimens for both stochastic gradient descent (linear scaling) and adaptive algorithms, such as Adam (square root scaling), for smooth, non-convex deep neural networks. Whilst the linear scaling for stochastic gradient descent has been derived under more restrictive conditions, which we generalise, the square root scaling rule for adaptive optimisers is, to our knowledge, completely novel. %For stochastic second-order methods and adaptive methods, we derive that the minimal damping coefficient is proportional to the ratio of the learning rate to batch size. We validate our claims on the VGG/WideResNet architectures on the CIFAR-$100$ and ImageNet datasets. Based on our investigations of the sub-sampled Hessian we develop a stochastic Lanczos quadrature based on the fly learning rate and momentum learner, which avoids the need for expensive multiple evaluations for these key hyper-parameters and shows good preliminary results on the Pre-Residual Architecure for CIFAR-$100$.

馬爾可夫鏈 · 不可約的 · 估計/估計量 · CASE · 推斷 ·

2021 年 10 月 29 日

On the $α$-lazy version of Markov chains in estimation and testing problems

Sela Fried,Geoffrey Wolfer

Given access to a single long trajectory generated by an unknown irreducible Markov chain $M$, we simulate an $\alpha$-lazy version of $M$ which is ergodic. This enables us to generalize recent results on estimation and identity testing that were stated for ergodic Markov chains in a way that allows fully empirical inference. In particular, our approach shows that the pseudo spectral gap introduced by Paulin [2015] and defined for ergodic Markov chains may be given a meaning already in the case of irreducible but possibly periodic Markov chains.

整流網絡 · 近似 · 平滑 · 優化器 · Networking ·

2021 年 10 月 29 日

Approximation of Smoothness Classes by Deep Rectifier Networks

Mazen Ali,Anthony Nouy

from arxiv, To appear in SIAM Journal on Numerical Analysis

We consider approximation rates of sparsely connected deep rectified linear unit (ReLU) and rectified power unit (RePU) neural networks for functions in Besov spaces $B^\alpha_{q}(L^p)$ in arbitrary dimension $d$, on general domains. We show that \alert{deep rectifier} networks with a fixed activation function attain optimal or near to optimal approximation rates for functions in the Besov space $B^\alpha_{\tau}(L^\tau)$ on the critical embedding line $1/\tau=\alpha/d+1/p$ for \emph{arbitrary} smoothness order $\alpha>0$. Using interpolation theory, this implies that the entire range of smoothness classes at or above the critical line is (near to) optimally approximated by deep ReLU/RePU networks.

UniFormer · 近似 · 泛函 · 平滑 · 估計/估計量 ·

2021 年 10 月 28 日

Approximation of functions with small mixed smoothness in the uniform norm

Vladimir Temlyakov,Tino Ullrich

In this paper we present results on asymptotic characteristics of multivariate function classes in the uniform norm. Our main interest is the approximation of functions with mixed smoothness parameter not larger than $1/2$. Our focus will be on the behavior of the best $m$-term trigonometric approximation as well as the decay of Kolmogorov and entropy numbers in the uniform norm. It turns out that these quantities share a few fundamental abstract properties like their behavior under real interpolation, such that they can be treated simultaneously. We start with proving estimates on finite rank convolution operators with range in a step hyperbolic cross. These results imply bounds for the corresponding function space embeddings by a well-known decomposition technique. The decay of Kolmogorov numbers have direct implications for the problem of sampling recovery in $L_2$ in situations where recent results in the literature are not applicable since the corresponding approximation numbers are not square summable.

Neural Networks · 近似 · Networking · 泛函 · ReLU ·

2021 年 10 月 28 日

Sobolev-type embeddings for neural network approximation spaces

Philipp Grohs,Felix Voigtlaender

We consider neural network approximation spaces that classify functions according to the rate at which they can be approximated (with error measured in $L^p$) by ReLU neural networks with an increasing number of coefficients, subject to bounds on the magnitude of the coefficients and the number of hidden layers. We prove embedding theorems between these spaces for different values of $p$. Furthermore, we derive sharp embeddings of these approximation spaces into H\"older spaces. We find that, analogous to the case of classical function spaces (such as Sobolev spaces, or Besov spaces) it is possible to trade "smoothness" (i.e., approximation rate) for increased integrability. Combined with our earlier results in [arXiv:2104.02746], our embedding theorems imply a somewhat surprising fact related to "learning" functions from a given neural network space based on point samples: if accuracy is measured with respect to the uniform norm, then an optimal "learning" algorithm for reconstructing functions that are well approximable by ReLU neural networks is simply given by piecewise constant interpolation on a tensor product grid.

估計/估計量 · 隨機變量 · 置信度 · 泛化理論 · CASES ·

2021 年 10 月 28 日

Estimating means of bounded random variables by betting

Ian Waudby-Smith,Aaditya Ramdas

from arxiv, 68 pages, 18 figures; Python implementation: //github.com/wannabesmith/confseq

This paper derives confidence intervals (CI) and time-uniform confidence sequences (CS) for the classical problem of estimating an unknown mean from bounded observations. We present a general approach for deriving concentration bounds, that can be seen as a generalization (and improvement) of the celebrated Chernoff method. At its heart, it is based on deriving a new class of composite nonnegative martingales, with strong connections to testing by betting and the method of mixtures. We show how to extend these ideas to sampling without replacement, another heavily studied problem. In all cases, our bounds are adaptive to the unknown variance, and empirically vastly outperform existing approaches based on Hoeffding or empirical Bernstein inequalities and their recent supermartingale generalizations. In short, we establish a new state-of-the-art for four fundamental problems: CSs and CIs for bounded means, when sampling with and without replacement.

Networking · Neural Networks · 優化器 · contrastive · CASE ·

2018 年 8 月 3 日

A Dual Approach to Scalable Verification of Deep Networks

Krishnamurthy, Dvijotham,Robert Stanforth,Sven Gowal,Timothy Mann,Pushmeet Kohli

This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i.e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.

優化器 · 強化學習 · 學成 · state-of-the-art · SimPLe ·

2018 年 7 月 25 日

Variational Bayesian Reinforcement Learning with Regret Bounds

Brendan O'Donoghue

We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.