欧美综合一本热第九页_丁香五月天激情婷婷五月天_日韩免费一区二区三区视频_操操操逼AV网站_老鸭窝日本一区二区三区_亚洲国产欧美一区二区好看电影_国产免费无码一区二区视频无码

In this article we identify a general class of high-dimensional continuous functions that can be approximated by deep neural networks (DNNs) with the rectified linear unit (ReLU) activation without the curse of dimensionality. In other words, the number of DNN parameters grows at most polynomially in the input dimension and the approximation error. The functions in our class can be expressed as a potentially unbounded number of compositions of special functions which include products, maxima, and certain parallelized Lipschitz continuous functions.

相關內容

深度(du)神經網(wang)絡

關注 34

深度(du)(du)神經(jing)網(wang)絡(luo)(luo)(luo)（DNN）是深度(du)(du)學習的一種(zhong)(zhong)框架(jia)，它是一種(zhong)(zhong)具(ju)備至(zhi)少一個隱層(ceng)的神經(jing)網(wang)絡(luo)(luo)(luo)。與淺層(ceng)神經(jing)網(wang)絡(luo)(luo)(luo)類似，深度(du)(du)神經(jing)網(wang)絡(luo)(luo)(luo)也能(neng)夠為(wei)復雜(za)非線性系統提供(gong)建模，但多出(chu)的層(ceng)次為(wei)模型提供(gong)了更高(gao)(gao)的抽象(xiang)層(ceng)次，因(yin)而提高(gao)(gao)了模型的能(neng)力。

Lipschitz · Lipschitz連續 · Continuity · Networking · 泛函 ·

2023 年 5 月 30 日

Some Fundamental Aspects about Lipschitz Continuity of Neural Network Functions

Grigory Khromov,Sidak Pal Singh

Lipschitz continuity is a simple yet crucial functional property of any predictive model for it lies at the core of the model's robustness, generalisation, as well as adversarial vulnerability. Our aim is to thoroughly investigate and characterise the Lipschitz behaviour of the functions realised by neural networks. Thus, we carry out an empirical investigation in a range of different settings (namely, architectures, losses, optimisers, label noise, and more) by exhausting the limits of the simplest and the most general lower and upper bounds. Although motivated primarily by computational hardness results, this choice nevertheless turns out to be rather resourceful and sheds light on several fundamental and intriguing traits of the Lipschitz continuity of neural network functions, which we also supplement with suitable theoretical arguments. As a highlight of this investigation, we identify a striking double descent trend in both upper and lower bounds to the Lipschitz constant with increasing network width -- which tightly aligns with the typical double descent trend in the test loss. Lastly, we touch upon the seeming (counter-intuitive) decline of the Lipschitz constant in the presence of label noise.

Learning · Networking · Neural Networks · 卷積 · Machine Learning ·

2023 年 5 月 30 日

Quantum Convolutional Neural Networks for Multi-Channel Supervised Learning

Anthony M. Smaldone,Gregory W. Kyro,Victor S. Batista

As the rapidly evolving field of machine learning continues to produce incredibly useful tools and models, the potential for quantum computing to provide speed up for machine learning algorithms is becoming increasingly desirable. In particular, quantum circuits in place of classical convolutional filters for image detection-based tasks are being investigated for the ability to exploit quantum advantage. However, these attempts, referred to as quantum convolutional neural networks (QCNNs), lack the ability to efficiently process data with multiple channels and therefore are limited to relatively simple inputs. In this work, we present a variety of hardware-adaptable quantum circuit ansatzes for use as convolutional kernels, and demonstrate that the quantum neural networks we report outperform existing QCNNs on classification tasks involving multi-channel data. We envision that the ability of these implementations to effectively learn inter-channel information will allow quantum machine learning methods to operate with more complex data. This work is available as open source at //github.com/anthonysmaldone/QCNN-Multi-Channel-Supervised-Learning.

情景 · 代價 · Analysis · CASE · 知識 (knowledge) ·

2023 年 5 月 30 日

Tight Data Access Bounds for Private Top-$k$ Selection

Hao Wu,Olga Ohrimenko,Anthony Wirth

We study the top-$k$ selection problem under the differential privacy model: $m$ items are rated according to votes of a set of clients. We consider a setting in which algorithms can retrieve data via a sequence of accesses, each either a random access or a sorted access; the goal is to minimize the total number of data accesses. Our algorithm requires only $O(\sqrt{mk})$ expected accesses: to our knowledge, this is the first sublinear data-access upper bound for this problem. Our analysis also shows that the well-known exponential mechanism requires only $O(\sqrt{m})$ expected accesses. Accompanying this, we develop the first lower bounds for the problem, in three settings: only random accesses; only sorted accesses; a sequence of accesses of either kind. We show that, to avoid $\Omega(m)$ access cost, supporting *both* kinds of access is necessary, and that in this case our algorithm's access cost is optimal.

INTERACT · Lipschitz · 近似 · Analysis · 可約的 ·

2023 年 5 月 29 日

Convergence analysis of an explicit method and its random batch approximation for the McKean-Vlasov equations with non-globally Lipschitz conditions

Qian Guo,Jie He,Lei Li

In this paper, we present a numerical approach to solve the McKean-Vlasov equations, which are distribution-dependent stochastic differential equations, under some non-globally Lipschitz conditions for both the drift and diffusion coefficients. We establish a propagation of chaos result, based on which the McKean-Vlasov equation is approximated by an interacting particle system. A truncated Euler scheme is then proposed for the interacting particle system allowing for a Khasminskii-type condition on the coefficients. To reduce the computational cost, the random batch approximation proposed in [Jin et al., J. Comput. Phys., 400(1), 2020] is extended to the interacting particle system where the interaction could take place in the diffusion term. An almost half order of convergence is proved in $L^p$ sense. Numerical tests are performed to verify the theoretical results.

穩健性 · Learning · 線性的 · 情景 · 損失函數（機器學習） ·

2023 年 5 月 29 日

Robust Methods for High-Dimensional Linear Learning

Ibrahim Merad,Stéphane Ga?ffas

from arxiv, accepted version

We propose statistically robust and computationally efficient linear learning methods in the high-dimensional batch setting, where the number of features $d$ may exceed the sample size $n$. We employ, in a generic learning setting, two algorithms depending on whether the considered loss function is gradient-Lipschitz or not. Then, we instantiate our framework on several applications including vanilla sparse, group-sparse and low-rank matrix recovery. This leads, for each application, to efficient and robust learning algorithms, that reach near-optimal estimation rates under heavy-tailed distributions and the presence of outliers. For vanilla $s$-sparsity, we are able to reach the $s\log (d)/n$ rate under heavy-tails and $\eta$-corruption, at a computational cost comparable to that of non-robust analogs. We provide an efficient implementation of our algorithms in an open-source $\mathtt{Python}$ library called $\mathtt{linlearn}$, by means of which we carry out numerical experiments which confirm our theoretical findings together with a comparison to other recent approaches proposed in the literature.

寬度 · 通用近似器 · UniFormer · 極小點 · Networking ·

2023 年 5 月 29 日

Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation

Li'ang Li,Yifei Duan,Guanghua Ji,Yongqiang Cai

from arxiv, ICML2023 camera ready

The study of universal approximation properties (UAP) for neural networks (NN) has a long history. When the network width is unlimited, only a single hidden layer is sufficient for UAP. In contrast, when the depth is unlimited, the width for UAP needs to be not less than the critical width $w^*_{\min}=\max(d_x,d_y)$, where $d_x$ and $d_y$ are the dimensions of the input and output, respectively. Recently, \cite{cai2022achieve} shows that a leaky-ReLU NN with this critical width can achieve UAP for $L^p$ functions on a compact domain $K$, \emph{i.e.,} the UAP for $L^p(K,\mathbb{R}^{d_y})$. This paper examines a uniform UAP for the function class $C(K,\mathbb{R}^{d_y})$ and gives the exact minimum width of the leaky-ReLU NN as $w_{\min}=\max(d_x+1,d_y)+1_{d_y=d_x+1}$, which involves the effects of the output dimensions. To obtain this result, we propose a novel lift-flow-discretization approach that shows that the uniform UAP has a deep connection with topological theory.

近似 · 泛函 · Networking · Neural Networks · 估計/估計量 ·

2023 年 5 月 28 日

Efficient Parametric Approximations of Neural Network Function Space Distance

Nikita Dhawan,Sicong Huang,Juhan Bae,Roger Grosse

from arxiv, 18 pages, 5 figures, ICML 2023

It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.

情景 · 全局極小值 · 極小值 · 訓練誤差 · 泛函 ·

2023 年 5 月 26 日

Highly over-parameterized classifiers generalize since bad solutions are rare

Julius Martinetz,Thomas Martinetz

We study over-parameterized classifiers where Empirical Risk Minimization (ERM) for learning leads to zero training error. In these over-parameterized settings there are many global minima with zero training error, some of which generalize better than others. We show that under certain conditions the fraction of "bad" global minima with a true error larger than {\epsilon} decays to zero exponentially fast with the number of training data n. The bound depends on the distribution of the true error over the set of classifier functions used for the given classification problem, and does not necessarily depend on the size or complexity (e.g. the number of parameters) of the classifier function set. This might explain the unexpectedly good generalization even of highly over-parameterized Neural Networks. We validate our mathematical framework with experiments on a synthetic data set and a subset of MNIST, and also test our hypothesis with VGG19 and ResNet18 on a subset of Caltech101.

Learning · Networking · 核化 · 層 · Neural Networks ·

2023 年 5 月 25 日

The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

Blake Bordelon,Cengiz Pehlevan

from arxiv, ICLR 2023 Camera Ready

It is unclear how changing the learning rule of a deep neural network alters its learning dynamics and representations. To gain insight into the relationship between learned features, function approximation, and the learning rule, we analyze infinite-width deep networks trained with gradient descent (GD) and biologically-plausible alternatives including feedback alignment (FA), direct feedback alignment (DFA), and error modulated Hebbian learning (Hebb), as well as gated linear networks (GLN). We show that, for each of these learning rules, the evolution of the output function at infinite width is governed by a time varying effective neural tangent kernel (eNTK). In the lazy training limit, this eNTK is static and does not evolve, while in the rich mean-field regime this kernel's evolution can be determined self-consistently with dynamical mean field theory (DMFT). This DMFT enables comparisons of the feature and prediction dynamics induced by each of these learning rules. In the lazy limit, we find that DFA and Hebb can only learn using the last layer features, while full FA can utilize earlier layers with a scale determined by the initial correlation between feedforward and feedback weight matrices. In the rich regime, DFA and FA utilize a temporally evolving and depth-dependent NTK. Counterintuitively, we find that FA networks trained in the rich regime exhibit more feature learning if initialized with smaller correlation between the forward and backward pass weights. GLNs admit a very simple formula for their lazy limit kernel and preserve conditional Gaussianity of their preactivations under gating functions. Error modulated Hebb rules show very small task-relevant alignment of their kernels and perform most task relevant learning in the last layer.

Performer · 對數幾率Sigmoid · Neural Networks · Networking · 激活函數 ·

2021 年 9 月 29 日

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

Shiv Ram Dubey,Satish Kumar Singh,Bidyut Baran Chaudhuri

from arxiv, Submitted to Springer

Neural networks have shown tremendous growth in recent years to solve numerous problems. Various types of neural networks have been introduced to deal with different types of problems. However, the main goal of any neural network is to transform the non-linearly separable input data into more linearly separable abstract features using a hierarchy of layers. These layers are combinations of linear and nonlinear functions. The most popular and common non-linearity layers are activation functions (AFs), such as Logistic Sigmoid, Tanh, ReLU, ELU, Swish and Mish. In this paper, a comprehensive overview and survey is presented for AFs in neural networks for deep learning. Different classes of AFs such as Logistic Sigmoid and Tanh based, ReLU based, ELU based, and Learning based are covered. Several characteristics of AFs such as output range, monotonicity, and smoothness are also pointed out. A performance comparison is also performed among 18 state-of-the-art AFs with different networks on different types of data. The insights of AFs are presented to benefit the researchers for doing further research and practitioners to select among different choices. The code used for experimental comparison is released at: \url{//github.com/shivram1987/ActivationFunctions}.