日本一区二区三区不卡网站_无码一级毛片免费_日韩午夜男女爽爽爽电影_亚洲夜夜欢A一区二区三区_欧美午夜精品一区二区三区_欧美V日韩国产在线视频_国产伦精品一区二区视频

Recent works demonstrated the existence of a double-descent phenomenon for the generalization error of neural networks, where highly overparameterized models escape overfitting and achieve good test performance, at odds with the standard bias-variance trade-off described by statistical learning theory. In the present work, we explore a link between this phenomenon and the increase of complexity and sensitivity of the function represented by neural networks. In particular, we study the Boolean mean dimension (BMD), a metric developed in the context of Boolean function analysis. Focusing on a simple teacher-student setting for the random feature model, we derive a theoretical analysis based on the replica method that yields an interpretable expression for the BMD, in the high dimensional regime where the number of data points, the number of features, and the input size grow to infinity. We find that, as the degree of overparameterization of the network is increased, the BMD reaches an evident peak at the interpolation threshold, in correspondence with the generalization error peak, and then slowly approaches a low asymptotic value. The same phenomenology is then traced in numerical experiments with different model classes and training setups. Moreover, we find empirically that adversarially initialized models tend to show higher BMD values, and that models that are more robust to adversarial attacks exhibit a lower BMD.

相關內容

Neural Networks

關注 1648

神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)（Neural Networks）是世界上三個最古老(lao)的(de)(de)(de)神(shen)(shen)經(jing)(jing)(jing)(jing)建(jian)(jian)模學(xue)會(hui)(hui)(hui)的(de)(de)(de)檔案期刊(kan):國際神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)學(xue)會(hui)(hui)(hui)(INNS)、歐(ou)洲神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)學(xue)會(hui)(hui)(hui)(ENNS)和(he)(he)(he)日本神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)學(xue)會(hui)(hui)(hui)(JNNS)。神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)提供了(le)一個論壇，以發展和(he)(he)(he)培育一個國際社(she)(she)會(hui)(hui)(hui)的(de)(de)(de)學(xue)者和(he)(he)(he)實踐(jian)者感(gan)興趣的(de)(de)(de)所有方面(mian)的(de)(de)(de)神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)和(he)(he)(he)相關方法的(de)(de)(de)計(ji)算(suan)(suan)(suan)智能。神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)歡迎高質量論文的(de)(de)(de)提交(jiao)，有助于全面(mian)的(de)(de)(de)神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)研究，從(cong)行(xing)為(wei)和(he)(he)(he)大(da)腦(nao)建(jian)(jian)模，學(xue)習(xi)算(suan)(suan)(suan)法，通(tong)過(guo)數學(xue)和(he)(he)(he)計(ji)算(suan)(suan)(suan)分(fen)(fen)析，系統(tong)(tong)的(de)(de)(de)工程和(he)(he)(he)技術(shu)(shu)應(ying)用，大(da)量使用神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)的(de)(de)(de)概念(nian)和(he)(he)(he)技術(shu)(shu)。這一獨特而廣泛的(de)(de)(de)范圍(wei)促進(jin)了(le)生物(wu)和(he)(he)(he)技術(shu)(shu)研究之(zhi)間的(de)(de)(de)思想交(jiao)流，并(bing)有助于促進(jin)對生物(wu)啟(qi)發的(de)(de)(de)計(ji)算(suan)(suan)(suan)智能感(gan)興趣的(de)(de)(de)跨學(xue)科社(she)(she)區的(de)(de)(de)發展。因此，神(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)編(bian)委會(hui)(hui)(hui)代(dai)表(biao)的(de)(de)(de)專家領域(yu)包括(kuo)心理學(xue)，神(shen)(shen)經(jing)(jing)(jing)(jing)生物(wu)學(xue)，計(ji)算(suan)(suan)(suan)機科學(xue)，工程，數學(xue)，物(wu)理。該雜(za)志發表(biao)文章、信件(jian)(jian)和(he)(he)(he)評論以及給(gei)編(bian)輯(ji)的(de)(de)(de)信件(jian)(jian)、社(she)(she)論、時事、軟件(jian)(jian)調(diao)查和(he)(he)(he)專利信息(xi)。文章發表(biao)在五個部分(fen)(fen)之(zhi)一:認知(zhi)科學(xue)，神(shen)(shen)經(jing)(jing)(jing)(jing)科學(xue)，學(xue)習(xi)系統(tong)(tong)，數學(xue)和(he)(he)(he)計(ji)算(suan)(suan)(suan)分(fen)(fen)析、工程和(he)(he)(he)應(ying)用。官網(wang)地址：

大學 · 合一 · CASES · 變換 · 講稿 ·

2024 年 3 月 5 日

Sharing proofs with predicative theories through universe polymorphic elaboration

Thiago Felicissimo,Frédéric Blanqui

from arxiv, Journal version of //doi.org/10.4230/LIPIcs.CSL.2023.19 to be submitted to LMCS, also supersedes arXiv:2211.05700

As the development of formal proofs is a time-consuming task, it is important to devise ways of sharing the already written proofs to prevent wasting time redoing them. One of the challenges in this domain is to translate proofs written in proof assistants based on impredicative logics to proof assistants based on predicative logics, whenever impredicativity is not used in an essential way. In this paper we present a transformation for sharing proofs with a core predicative system supporting prenex universe polymorphism (like in Agda). It consists in trying to elaborate each term into a predicative universe polymorphic term as general as possible. The use of universe polymorphism is justified by the fact that mapping each universe to a fixed one in the target theory is not sufficient in most cases. During the elaboration, we need to solve unification problems in the equational theory of universe levels. In order to do this, we give a complete characterization of when a single equation admits a most general unifier. This characterization is then employed in a partial algorithm which uses a constraint-postponement strategy for trying to solve unification problems. The proposed translation is of course partial, but in practice allows one to translate many proofs that do not use impredicativity in an essential way. Indeed, it was implemented in the tool Predicativize and then used to translate semi-automatically many non-trivial developments from Matita's library to Agda, including proofs of Bertrand's Postulate and Fermat's Little Theorem, which (as far as we know) were not available in Agda yet.

Performer · Networking · Neural Networks · 剪枝 · 推斷 ·

2024 年 3 月 5 日

Neural network relief: a pruning algorithm based on neural activity

Aleksandr Dekhovich,David M. J. Tax,Marcel H. F. Sluiter,Miguel A. Bessa

Current deep neural networks (DNNs) are overparameterized and use most of their neuronal connections during inference for each task. The human brain, however, developed specialized regions for different tasks and performs inference with a small fraction of its neuronal connections. We propose an iterative pruning strategy introducing a simple importance-score metric that deactivates unimportant connections, tackling overparameterization in DNNs and modulating the firing patterns. The aim is to find the smallest number of connections that is still capable of solving a given task with comparable accuracy, i.e. a simpler subnetwork. We achieve comparable performance for LeNet architectures on MNIST, and significantly higher parameter compression than state-of-the-art algorithms for VGG and ResNet architectures on CIFAR-10/100 and Tiny-ImageNet. Our approach also performs well for the two different optimizers considered -- Adam and SGD. The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations, although it performs reasonably when compared to the state of the art.

RNN · 近似 · 泛函 · 循環神經網絡 · Networking ·

2024 年 3 月 4 日

Universality of reservoir systems with recurrent neural networks

Hiroki Yasumoto,Toshiyuki Tanaka

Approximation capability of reservoir systems whose reservoir is a recurrent neural network (RNN) is discussed. In our problem setting, a reservoir system approximates a set of functions just by adjusting its linear readout while the reservoir is fixed. We will show what we call uniform strong universality of a family of RNN reservoir systems for a certain class of functions to be approximated. This means that, for any positive number, we can construct a sufficiently large RNN reservoir system whose approximation error for each function in the class of functions to be approximated is bounded from above by the positive number. Such RNN reservoir systems are constructed via parallel concatenation of RNN reservoirs.

SGD · 線性的 · Networking · 廣義線性模型 · 線性模型 ·

2024 年 3 月 1 日

Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD

Luca Arnaboldi,Florent Krzakala,Bruno Loureiro,Ludovic Stephan

This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario $n=O(d \log d)$ samples are typically needed. However, we provide precise results concerning the pre-factors in high-dimensional contexts and for varying widths. Notably, our findings suggest that overparameterization can only enhance convergence by a constant factor within this problem class. These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of stochasticity may be minimal in this scenario.

Networking · 殘差網絡 · 正則化項 · 離散化 · MoDELS ·

2024 年 3 月 1 日

Implicit regularization of deep residual networks towards neural ODEs

Pierre Marion,Yu-Han Wu,Michael E. Sander,Gérard Biau

from arxiv, ICLR 2024 (spotlight). 40 pages, 3 figures

Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their success, the link between the discrete and continuous models still lacks a solid mathematical foundation. In this article, we take a step in this direction by establishing an implicit regularization of deep residual networks towards neural ODEs, for nonlinear networks trained with gradient flow. We prove that if the network is initialized as a discretization of a neural ODE, then such a discretization holds throughout training. Our results are valid for a finite training time, and also as the training time tends to infinity provided that the network satisfies a Polyak-Lojasiewicz condition. Importantly, this condition holds for a family of residual networks where the residuals are two-layer perceptrons with an overparameterization in width that is only linear, and implies the convergence of gradient flow to a global minimum. Numerical experiments illustrate our results.

Neural Networks · 統計量 · Networking · 泛化誤差 · Learning ·

2024 年 2 月 29 日

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Benjamin Aubin,Antoine Maillard,Jean Barbier,Florent Krzakala,Nicolas Macris,Lenka Zdeborová

from arxiv, 18 pages + supplementary material, 3 figures. (v2: update to match the published version ; v3: clarification of the caption of Fig. 3)

Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters. We find that there are regimes in which a low generalization error is information-theoretically achievable while the AMP algorithm fails to deliver it, strongly suggesting that no efficient algorithm exists for those cases, and unveiling a large computational gap.

貪心 · 模態 · MoDELS · 學成 · 泛化理論 ·

2022 年 2 月 10 日

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Nan Wu,Stanis?aw Jastrz?bski,Kyunghyun Cho,Krzysztof J. Geras

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.

特化 · 可約的 · Neural Networks · 剪枝 · Networking ·

2021 年 1 月 31 日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Torsten Hoefler,Dan Alistarh,Tal Ben-Nun,Nikoli Dryden,Alexandra Peste

from arxiv, 90 pages, 26 figures

The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.

Neural Networks · 優化器 · Networks · 局部極小 · Networking ·

2019 年 12 月 19 日

Optimization for deep learning: theory and algorithms

Ruoyu Sun

from arxiv, 38 pages of main body; 5 pages of appendix; 12 pages of references

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

圖形處理器 · 圖 · INTERACT · Performer · Neural Networks ·

2019 年 11 月 6 日

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

Ruochi Zhang,Yuesong Zou,Jian Ma

Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.