99视频在线播放喷射_亚洲成A人片在线观看网站黄_国际黄色在线视频_人人干人人色網站_在线播放一区二区_欧美一级高清在线播放_一级视频性高清观看

It is frequently observed that overparameterized neural networks generalize well. Regarding such phenomena, existing theoretical work mainly devotes to linear settings or fully-connected neural networks. This paper studies the learning ability of an important family of deep neural networks, deep convolutional neural networks (DCNNs), under both underparameterized and overparameterized settings. We establish the first learning rates of underparameterized DCNNs without parameter or function variable structure restrictions presented in the literature. We also show that by adding well-defined layers to a non-interpolating DCNN, we can obtain some interpolating DCNNs that maintain the good learning rates of the non-interpolating DCNN. This result is achieved by a novel network deepening scheme designed for DCNNs. Our work provides theoretical verification of how overfitted DCNNs generalize well.

相關內容

Neural Networks

關注 1648

神(shen)(shen)經(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)（Neural Networks）是(shi)世(shi)界上三個(ge)(ge)最古老的(de)(de)(de)神(shen)(shen)經(jing)(jing)建模學(xue)(xue)會(hui)的(de)(de)(de)檔案(an)期刊:國(guo)(guo)際(ji)神(shen)(shen)經(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)學(xue)(xue)會(hui)(INNS)、歐洲(zhou)神(shen)(shen)經(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)學(xue)(xue)會(hui)(ENNS)和(he)(he)(he)日(ri)本(ben)神(shen)(shen)經(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)學(xue)(xue)會(hui)(JNNS)。神(shen)(shen)經(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)提(ti)供了一(yi)個(ge)(ge)論壇，以發展(zhan)和(he)(he)(he)培(pei)育一(yi)個(ge)(ge)國(guo)(guo)際(ji)社(she)會(hui)的(de)(de)(de)學(xue)(xue)者和(he)(he)(he)實踐(jian)者感興趣(qu)的(de)(de)(de)所有方面的(de)(de)(de)神(shen)(shen)經(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)和(he)(he)(he)相關(guan)方法(fa)的(de)(de)(de)計算智(zhi)能(neng)(neng)。神(shen)(shen)經(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)歡迎高質量論文(wen)的(de)(de)(de)提(ti)交，有助(zhu)于全面的(de)(de)(de)神(shen)(shen)經(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)研究(jiu)，從(cong)行為和(he)(he)(he)大腦建模，學(xue)(xue)習算法(fa)，通過數(shu)學(xue)(xue)和(he)(he)(he)計算分(fen)析，系(xi)(xi)統(tong)的(de)(de)(de)工(gong)(gong)程和(he)(he)(he)技術應(ying)用，大量使用神(shen)(shen)經(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)的(de)(de)(de)概念(nian)和(he)(he)(he)技術。這一(yi)獨特而廣泛的(de)(de)(de)范圍(wei)促進(jin)了生(sheng)(sheng)物(wu)和(he)(he)(he)技術研究(jiu)之間(jian)的(de)(de)(de)思(si)想交流，并有助(zhu)于促進(jin)對生(sheng)(sheng)物(wu)啟發的(de)(de)(de)計算智(zhi)能(neng)(neng)感興趣(qu)的(de)(de)(de)跨學(xue)(xue)科社(she)區(qu)的(de)(de)(de)發展(zhan)。因此(ci)，神(shen)(shen)經(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)編委(wei)會(hui)代(dai)表(biao)的(de)(de)(de)專(zhuan)家領域包(bao)括心理(li)學(xue)(xue)，神(shen)(shen)經(jing)(jing)生(sheng)(sheng)物(wu)學(xue)(xue)，計算機科學(xue)(xue)，工(gong)(gong)程，數(shu)學(xue)(xue)，物(wu)理(li)。該(gai)雜志發表(biao)文(wen)章(zhang)、信件和(he)(he)(he)評論以及給編輯(ji)的(de)(de)(de)信件、社(she)論、時事(shi)、軟件調查和(he)(he)(he)專(zhuan)利信息。文(wen)章(zhang)發表(biao)在(zai)五個(ge)(ge)部分(fen)之一(yi):認知科學(xue)(xue)，神(shen)(shen)經(jing)(jing)科學(xue)(xue)，學(xue)(xue)習系(xi)(xi)統(tong)，數(shu)學(xue)(xue)和(he)(he)(he)計算分(fen)析、工(gong)(gong)程和(he)(he)(he)應(ying)用。官網(wang)(wang)(wang)地址：

泛函 · 優化器 · 正則化項 · 激活函數 · Networking ·

2023 年 10 月 5 日

Banach Space Optimality of Neural Architectures With Multivariate Nonlinearities

Rahul Parhi,Michael Unser

We investigate the variational optimality (specifically, the Banach space optimality) of a large class of neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator and the $k$-plane transform. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received considerable interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.

結構化學習 · 圖 · 稀疏 · 圖形處理器 · Neural Networks ·

2021 年 12 月 13 日

Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Yinhua Piao,Sangseon Lee,Dohoon Lee,Sun Kim

from arxiv, Accepted by AAAI 2022

Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structures can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document.

Neural Networks · Networking · 可約的 · 估計/估計量 · 可辨認的 ·

2021 年 7 月 7 日

A Survey of Uncertainty in Deep Neural Networks

Jakob Gawlikowski,Cedrique Rovile Njieutcheu Tassi,Mohsin Ali,Jongseok Lee,Matthias Humt,Jianxiang Feng,Anna Kruspe,Rudolph Triebel,Peter Jung,Ribana Roscher,Muhammad Shahzad,Wen Yang,Richard Bamler,Xiao Xiang Zhu

Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identified and a variety of approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. A comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and not reducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks, ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for the calibration of neural networks and give an overview of existing baselines and implementations. Different examples from the wide spectrum of challenges in different fields give an idea of the needs and challenges regarding uncertainties in practical applications. Additionally, the practical limitations of current methods for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given.

Networking · 學成 · Principle · MoDELS · Networks ·

2021 年 6 月 18 日

The Principles of Deep Learning Theory

Daniel A. Roberts,Sho Yaida,Boris Hanin

from arxiv, 451 pages, to be published by Cambridge University Press

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

圖形處理器 · 圖 · 可辨認的 · Neural Networks · Networking ·

2021 年 5 月 31 日

On Explainability of Graph Neural Networks via Subgraph Explorations

Hao Yuan,Haiyang Yu,Jie Wang,Kang Li,Shuiwang Ji

from arxiv, Accepted by ICML 2021

We consider the problem of explaining the predictions of graph neural networks (GNNs), which otherwise are considered as black boxes. Existing methods invariably focus on explaining the importance of graph nodes or edges but ignore the substructures of graphs, which are more intuitive and human-intelligible. In this work, we propose a novel method, known as SubgraphX, to explain GNNs by identifying important subgraphs. Given a trained GNN model and an input graph, our SubgraphX explains its predictions by efficiently exploring different subgraphs with Monte Carlo tree search. To make the tree search more effective, we propose to use Shapley values as a measure of subgraph importance, which can also capture the interactions among different subgraphs. To expedite computations, we propose efficient approximation schemes to compute Shapley values for graph data. Our work represents the first attempt to explain GNNs via identifying subgraphs explicitly and directly. Experimental results show that our SubgraphX achieves significantly improved explanations, while keeping computations at a reasonable level.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

結構化學習 · 圖 · 學成 · 匯聚 · Neural Networks ·

2019 年 11 月 14 日

Hierarchical Graph Pooling with Structure Learning

Zhen Zhang,Jiajun Bu,Martin Ester,Jianfeng Zhang,Chengwei Yao,Zhi Yu,Can Wang

from arxiv, Accepted to AAAI-2020; Code is available at //github.com/cszhangzhen/HGP-SL

Graph Neural Networks (GNNs), which generalize deep neural networks to graph-structured data, have drawn considerable attention and achieved state-of-the-art performance in numerous graph related tasks. However, existing GNN models mainly focus on designing graph convolution operations. The graph pooling (or downsampling) operations, that play an important role in learning hierarchical representations, are usually overlooked. In this paper, we propose a novel graph pooling operator, called Hierarchical Graph Pooling with Structure Learning (HGP-SL), which can be integrated into various graph neural network architectures. HGP-SL incorporates graph pooling and structure learning into a unified module to generate hierarchical representations of graphs. More specifically, the graph pooling operation adaptively selects a subset of nodes to form an induced subgraph for the subsequent layers. To preserve the integrity of graph's topological information, we further introduce a structure learning mechanism to learn a refined graph structure for the pooled graph at each layer. By combining HGP-SL operator with graph neural networks, we perform graph level representation learning with focus on graph classification task. Experimental results on six widely used benchmarks demonstrate the effectiveness of our proposed model.

注意力機制 · 注意力模型 · MoDELS · Taxonomy · Neural Networks ·

2019 年 4 月 5 日

An Attentive Survey of Attention Models

Sneha Chaudhari,Gungor Polatkan,Rohan Ramanath,Varun Mithal

from arxiv, submitted to IJCAI 2019 Survey Track; 6 pages, 4 figures, 2 tables

Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we propose a taxonomy which groups existing techniques into coherent categories. We review the different neural architectures in which attention has been incorporated, and also show how attention improves interpretability of neural models. Finally, we discuss some applications in which modeling attention has a significant impact. We hope this survey will provide a succinct introduction to attention models and guide practitioners while developing approaches for their applications.

離散化 · 圖 · 圖形處理器 · Neural Networks · Networking ·

2019 年 3 月 28 日

Learning Discrete Structures for Graph Neural Networks

Luca Franceschi,Mathias Niepert,Massimiliano Pontil,Xiao He

from arxiv, 18 pages

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

圖 · 學成 · state-of-the-art · GNN · 表示學習 ·

2018 年 6 月 26 日

Hierarchical Graph Representation Learning with Differentiable Pooling

Rex Ying,Jiaxuan You,Christopher Morris,Xiang Ren,William L. Hamilton,Jure Leskovec

Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs---a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DiffPool, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DiffPool learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DiffPool yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.