亚洲精品无码国产爽快A片百度-国产一区二区三区日本韩国

We study the fundamental limits to the expressive power of neural networks. Given two sets $F$, $G$ of real-valued functions, we first prove a general lower bound on how well functions in $F$ can be approximated in $L^p(\mu)$ norm by functions in $G$, for any $p \geq 1$ and any probability measure $\mu$. The lower bound depends on the packing number of $F$, the range of $F$, and the fat-shattering dimension of $G$. We then instantiate this bound to the case where $G$ corresponds to a piecewise-polynomial feed-forward neural network, and describe in details the application to two sets $F$: H{\"o}lder balls and multivariate monotonic functions. Beside matching (known or new) upper bounds up to log factors, our lower bounds shed some light on the similarities or differences between approximation in $L^p$ norm or in sup norm, solving an open question by DeVore et al. (2021). Our proof strategy differs from the sup norm case and uses a key probability result of Mendelson (2002).

相關內容

Neural Networks

關注 1648

神經網絡（Neural Networks）是世界上三個最古老的神經建模學會的檔案期刊:國際神經網絡學會(INNS)、歐洲神經網絡學會(ENNS)和日本神經網絡學會(JNNS)。神經網絡提供了一個論壇，以發展和培育一個國際社會的學者和實踐者感興趣的所有方面的神經網絡和相關方法的計算智能。神經網絡歡迎高質量論文的提交，有助于全面的神經網絡研究，從行為和大腦建模，學習算法，通過數學和計算分析，系統的工程和技術應用，大量使用神經網絡的概念和技術。這一獨特而廣泛的范圍促進了生物和技術研究之間的思想交流，并有助于促進對生物啟發的計算智能感興趣的跨學科社區的發展。因此，神經網絡編委會代表的專家領域包括心理學，神經生物學，計算機科學，工程，數學，物理。該雜志發表文章、信件和評論以及給編輯的信件、社論、時事、軟件調查和專利信息。文章發表在五個部分之一:認知科學，神經科學，學習系統，數學和計算分析、工程和應用。官網地址：

通用近似器 · 近似 · 通用近似定理 · Learning · Continuity ·

2022 年 7 月 25 日

Universal Approximation Theorems for Differentiable Geometric Deep Learning

Anastasis Kratsios,Leonie Papon

from arxiv, Keywords: Geometric Deep Learning, Symmetric Positive-Definite Matrices, Hyperbolic Neural Networks, Deep Kalman Filter, Shape Space, Riemannian Manifolds, Curse of Dimensionality. Additional Information: 33 Pages + 30 Pages Appendix + Bibliography, 2 Tables, 7 Figures;

This paper addresses the growing need to process non-Euclidean data, by introducing a geometric deep learning (GDL) framework for building universal feedforward-type models compatible with differentiable manifold geometries. We show that our GDL models can approximate any continuous target function uniformly on compact sets of a controlled maximum diameter. We obtain curvature-dependent lower-bounds on this maximum diameter and upper-bounds on the depth of our approximating GDL models. Conversely, we find that there is always a continuous function between any two non-degenerate compact manifolds that any "locally-defined" GDL model cannot uniformly approximate. Our last main result identifies data-dependent conditions guaranteeing that the GDL model implementing our approximation breaks "the curse of dimensionality." We find that any "real-world" (i.e. finite) dataset always satisfies our condition and, conversely, any dataset satisfies our requirement if the target function is smooth. As applications, we confirm the universal approximation capabilities of the following GDL models: Ganea et al. (2018)'s hyperbolic feedforward networks, the architecture implementing Krishnan et al. (2015)'s deep Kalman-Filter, and deep softmax classifiers. We build universal extensions/variants of: the SPD-matrix regressor of Meyer et al. (2011), and Fletcher (2003)'s Procrustean regressor. In the Euclidean setting, our results imply a quantitative version of Kidger and Lyons (2020)'s approximation theorem and a data-dependent version of Yarotsky and Zhevnerchuk (2019)'s uncursed approximation rates.

優化器 · Neural Networks · 情景 · Minimax · Networking ·

2022 年 7 月 25 日

Optimal Convergence Rates of Deep Neural Networks in a Classification Setting

Joseph T. Meyer

We establish optimal convergence rates up to a log-factor for a class of deep neural networks in a classification setting under a restraint sometimes referred to as the Tsybakov noise condition. We construct classifiers in a general setting where the boundary of the bayes-rule can be approximated well by neural networks. Corresponding rates of convergence are proven with respect to the misclassification error. It is then shown that these rates are optimal in the minimax sense if the boundary satisfies a smoothness condition. Non-optimal convergence rates already exist for this setting. Our main contribution lies in improving existing rates and showing optimality, which was an open problem. Furthermore, we show almost optimal rates under some additional restraints which circumvent the curse of dimensionality. For our analysis we require a condition which gives new insight on the restraint used. In a sense it acts as a requirement for the "correct noise exponent" for a class of functions.

THz · Analysis · 優化器 · Microsoft Surface · 均方誤差 ·

2022 年 7 月 25 日

Beamforming Analysis and Design for Wideband THz Reconfigurable Intelligent Surface Communications

Wencai Yan,Wanming Hao,Chongwen Huang,Gangcan Sun,Osamu Muta,Haris Gacanin

Reconfigurable intelligent surface (RIS)-aided terahertz (THz) communications have been regarded as a promising candidate for future 6G networks because of its ultra-wide bandwidth and ultra-low power consumption. However, there exists the beam split problem, especially when the base station (BS) or RIS owns the large-scale antennas, which may lead to serious array gain loss. Therefore, in this paper, we investigate the beam split and beamforming design problems in the THz RIS communications. Specifically, we first analyze the beam split effect caused by different RIS sizes, shapes and deployments. On this basis, we apply the fully connected time delayer phase shifter hybrid beamforming architecture at the BS and deploy distributed RISs to cooperatively mitigate the beam split effect. We aim to maximize the achievable sum rate by jointly optimizing the hybrid analog/digital beamforming, time delays at the BS and reflection coefficients at the RISs. To solve the formulated problem, we first design the analog beamforming and time delays based on different RISs physical directions, and then it is transformed into an optimization problem by jointly optimizing the digital beamforming and reflection coefficients. Next, we propose an alternatively iterative optimization algorithm to deal with it. Specifically, for given the reflection coefficients, we propose an iterative algorithm based on the minimum mean square error technique to obtain the digital beamforming. After, we apply LDR and MCQT methods to transform the original problem to a QCQP, which can be solved by ADMM technique to obtain the reflection coefficients. Finally, the digital beamforming and reflection coefficients are obtained via repeating the above processes until convergence. Simulation results verify that the proposed scheme can effectively alleviate the beam split effect and improve the system capacity.

估計/估計量 · 推斷 · Networking · 欠估計 · INTERACT ·

2022 年 7 月 25 日

Causal Inference with Noncompliance and Unknown Interference

Tadao Hoshino,Takahide Yanagi

We consider a causal inference model in which individuals interact in a social network and they may not comply with the assigned treatments. Estimating causal parameters is challenging in the presence of network interference of unknown form, as each individual may be influenced by both close individuals and distant ones in complex ways. Noncompliance with treatment assignment further complicates this problem, and prior methods dealing with network spillovers but disregarding the noncompliance issue may underestimate the effect of the treatment receipt on the outcome. To estimate meaningful causal parameters, we introduce a new concept of exposure mapping, which summarizes potentially complicated spillover effects into a fixed dimensional statistic of instrumental variables. We investigate identification conditions for the intention-to-treat effect and the average causal effect for compliers, while explicitly considering the possibility of misspecification of exposure mapping. Based on our identification results, we develop nonparametric estimation procedures via inverse probability weighting. Their asymptotic properties, including consistency and asymptotic normality, are investigated using an approximate neighborhood interference framework, which is convenient for dealing with unknown forms of spillovers between individuals. For an empirical illustration, we apply our method to experimental data on the anti-conflict intervention school program.

近似 · Neural Networks · Networking · 泛函 · 平滑 ·

2022 年 7 月 22 日

Simultaneous Neural Network Approximation for Smooth Functions

Sean Hon,Haizhao Yang

We establish in this work approximation results of deep neural networks for smooth functions measured in Sobolev norms, motivated by recent development of numerical solvers for partial differential equations using deep neural networks. {Our approximation results are nonasymptotic in the sense that the error bounds are explicitly characterized in terms of both the width and depth of the networks simultaneously with all involved constants explicitly determined.} Namely, for $f\in C^s([0,1]^d)$, we show that deep ReLU networks of width $\mathcal{O}(N\log{N})$ and of depth $\mathcal{O}(L\log{L})$ can achieve a nonasymptotic approximation rate of $\mathcal{O}(N^{-2(s-1)/d}L^{-2(s-1)/d})$ with respect to the $\mathcal{W}^{1,p}([0,1]^d)$ norm for $p\in[1,\infty)$. If either the ReLU function or its square is applied as activation functions to construct deep neural networks of width $\mathcal{O}(N\log{N})$ and of depth $\mathcal{O}(L\log{L})$ to approximate $f\in C^s([0,1]^d)$, the approximation rate is $\mathcal{O}(N^{-2(s-n)/d}L^{-2(s-n)/d})$ with respect to the $\mathcal{W}^{n,p}([0,1]^d)$ norm for $p\in[1,\infty)$.

Tensor · 正則化項 · Neural Networks · Networking · 卷積 ·

2022 年 7 月 22 日

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Noam Razin,Asaf Maman,Nadav Cohen

from arxiv, Accepted to ICML 2022

In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit an implicit tendency towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regularization in hierarchical tensor factorization, a model equivalent to certain deep convolutional neural networks. Through a dynamical systems lens, we overcome challenges associated with hierarchy, and establish implicit regularization towards low hierarchical tensor rank. This translates to an implicit regularization towards locality for the associated convolutional networks. Inspired by our theory, we design explicit regularization discouraging locality, and demonstrate its ability to improve the performance of modern convolutional networks on non-local tasks, in defiance of conventional wisdom by which architectural changes are needed. Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.

Boosting（一種模型訓練加速方式） · 正交 · 設計矩陣 · 貪心逐層預訓練 · Projection ·

2022 年 7 月 21 日

High-Dimensional $L_2$Boosting: Rate of Convergence

Ye Luo,Martin Spindler,Jannis Kück

from arxiv, 19 pages, 4 tables; AMS 2000 subject classifications: Primary 62J05, 62J07, 41A25; secondary 49M15, 68Q32

Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of $L_2$Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by $L_2$Boosting. Another variant is \textquotedblleft Orthogonal Boosting\textquotedblright\ where after each step an orthogonal projection is conducted. We show that both post-$L_2$Boosting and the orthogonal boosting achieve the same rate of convergence as LASSO in a sparse, high-dimensional setting. We show that the rate of convergence of the classical $L_2$Boosting depends on the design matrix described by a sparse eigenvalue constant. To show the latter results, we derive new approximation results for the pure greedy algorithm, based on analyzing the revisiting behavior of $L_2$Boosting. We also introduce feasible rules for early stopping, which can be easily implemented and used in applied work. Our results also allow a direct comparison between LASSO and boosting which has been missing from the literature. Finally, we present simulation studies and applications to illustrate the relevance of our theoretical results and to provide insights into the practical aspects of boosting. In these simulation studies, post-$L_2$Boosting clearly outperforms LASSO.

UniFormer · Processing（編程語言） · 超參數 · Performer · 情景 ·

2022 年 7 月 20 日

Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications

Alexandre Capone,Armin Lederer,Sandra Hirche

Gaussian processes have become a promising tool for various safety-critical settings, since the posterior variance can be used to directly estimate the model error and quantify risk. However, state-of-the-art techniques for safety-critical settings hinge on the assumption that the kernel hyperparameters are known, which does not apply in general. To mitigate this, we introduce robust Gaussian process uniform error bounds in settings with unknown hyperparameters. Our approach computes a confidence region in the space of hyperparameters, which enables us to obtain a probabilistic upper bound for the model error of a Gaussian process with arbitrary hyperparameters. We do not require to know any bounds for the hyperparameters a priori, which is an assumption commonly found in related work. Instead, we are able to derive bounds from data in an intuitive fashion. We additionally employ the proposed technique to derive performance guarantees for a class of learning-based control problems. Experiments show that the bound performs significantly better than vanilla and fully Bayesian Gaussian processes.

圖形處理器 · 層 · 圖 · Neural Networks · Networking ·

2021 年 6 月 14 日

Training Graph Neural Networks with 1000 Layers

Guohao Li,Matthias Müller,Bernard Ghanem,Vladlen Koltun

from arxiv, Accepted at ICML'2021. Code available at //www.deepgcns.org/arch/gnn1000. Work done during Guohao Li's internship at Intel Intelligent Systems Lab

Deep graph neural networks (GNNs) have achieved excellent results on various tasks on increasingly large graph datasets with millions of nodes and edges. However, memory complexity has become a major obstacle when training deep GNNs for practical applications due to the immense number of nodes, edges, and intermediate activations. To improve the scalability of GNNs, prior works propose smart graph sampling or partitioning strategies to train GNNs with a smaller set of nodes or sub-graphs. In this work, we study reversible connections, group convolutions, weight tying, and equilibrium models to advance the memory and parameter efficiency of GNNs. We find that reversible connections in combination with deep network architectures enable the training of overparameterized GNNs that significantly outperform existing methods on multiple datasets. Our models RevGNN-Deep (1001 layers with 80 channels each) and RevGNN-Wide (448 layers with 224 channels each) were both trained on a single commodity GPU and achieve an ROC-AUC of $87.74 \pm 0.13$ and $88.14 \pm 0.15$ on the ogbn-proteins dataset. To the best of our knowledge, RevGNN-Deep is the deepest GNN in the literature by one order of magnitude. Please visit our project website //www.deepgcns.org/arch/gnn1000 for more information.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.