一级片免费电影看黄片免费,国产精品午夜无码AV天美

Interval analysis (or interval bound propagation, IBP) is a popular technique for verifying and training provably robust deep neural networks, a fundamental challenge in the area of reliable machine learning. However, despite substantial efforts, progress on addressing this key challenge has stagnated, calling into question whether interval arithmetic is a viable path forward. In this paper we present two fundamental results on the limitations of interval arithmetic for analyzing neural networks. Our main impossibility theorem states that for any neural network classifying just three points, there is a valid specification over these points that interval analysis can not prove. Further, in the restricted case of one-hidden-layer neural networks we show a stronger impossibility result: given any radius $\alpha < 1$, there is a set of $O(\alpha^{-1})$ points with robust radius $\alpha$, separated by distance $2$, that no one-hidden-layer network can be proven to classify robustly via interval analysis.

相關內容

Neural Networks

關注 1648

神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)(luo)(luo)（Neural Networks）是世界上三個最(zui)古(gu)老的(de)(de)神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)建(jian)模學(xue)(xue)(xue)(xue)會(hui)的(de)(de)檔案期刊(kan):國(guo)際(ji)神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)(luo)(luo)學(xue)(xue)(xue)(xue)會(hui)(INNS)、歐洲神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)(luo)(luo)學(xue)(xue)(xue)(xue)會(hui)(ENNS)和(he)日本神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)(luo)(luo)學(xue)(xue)(xue)(xue)會(hui)(JNNS)。神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)(luo)(luo)提供了一(yi)(yi)個論壇，以(yi)發(fa)(fa)展和(he)培育一(yi)(yi)個國(guo)際(ji)社會(hui)的(de)(de)學(xue)(xue)(xue)(xue)者(zhe)和(he)實(shi)踐者(zhe)感興(xing)趣的(de)(de)所有(you)方面的(de)(de)神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)(luo)(luo)和(he)相關(guan)方法的(de)(de)計算(suan)智能。神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)(luo)(luo)歡迎高質量論文(wen)(wen)的(de)(de)提交，有(you)助于全(quan)面的(de)(de)神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)(luo)(luo)研究，從行為和(he)大(da)腦建(jian)模，學(xue)(xue)(xue)(xue)習算(suan)法，通過數(shu)學(xue)(xue)(xue)(xue)和(he)計算(suan)分(fen)析，系統的(de)(de)工(gong)程和(he)技(ji)術(shu)應用(yong)，大(da)量使用(yong)神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)(luo)(luo)的(de)(de)概念和(he)技(ji)術(shu)。這一(yi)(yi)獨特而廣泛的(de)(de)范圍促進了生(sheng)物和(he)技(ji)術(shu)研究之間的(de)(de)思想交流，并有(you)助于促進對生(sheng)物啟發(fa)(fa)的(de)(de)計算(suan)智能感興(xing)趣的(de)(de)跨學(xue)(xue)(xue)(xue)科社區的(de)(de)發(fa)(fa)展。因此，神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(luo)(luo)(luo)編委會(hui)代表(biao)的(de)(de)專家領域包括心理學(xue)(xue)(xue)(xue)，神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)生(sheng)物學(xue)(xue)(xue)(xue)，計算(suan)機科學(xue)(xue)(xue)(xue)，工(gong)程，數(shu)學(xue)(xue)(xue)(xue)，物理。該雜(za)志(zhi)發(fa)(fa)表(biao)文(wen)(wen)章、信件(jian)(jian)和(he)評論以(yi)及(ji)給(gei)編輯的(de)(de)信件(jian)(jian)、社論、時事、軟件(jian)(jian)調(diao)查和(he)專利信息。文(wen)(wen)章發(fa)(fa)表(biao)在五個部分(fen)之一(yi)(yi):認知科學(xue)(xue)(xue)(xue)，神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)科學(xue)(xue)(xue)(xue)，學(xue)(xue)(xue)(xue)習系統，數(shu)學(xue)(xue)(xue)(xue)和(he)計算(suan)分(fen)析、工(gong)程和(he)應用(yong)。官網(wang)地址：

泛化理論 · Better · Networking · 表征學習 · 對率損失 ·

2022 年 2 月 15 日

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

Spencer Frei,Niladri S. Chatterji,Peter L. Bartlett

from arxiv, 41 pages

In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate, refuting the conjecture of Malach and Shalev-Shwartz that 'deeper is better only when shallow is good'. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.

Neural Networks · Storage · Networking · 查準率/準確率 · 模型評估 ·

2022 年 2 月 11 日

NITI: Training Integer Neural Networks Using Integer-only Arithmetic

Maolin Wang,Seyedramin Rasoulinezhad,Philip H. W. Leong,Hayden K. H. So

While integer arithmetic has been widely adopted for improved performance in deep quantized neural network inference, training remains a task primarily executed using floating point arithmetic. This is because both high dynamic range and numerical accuracy are central to the success of most modern training algorithms. However, due to its potential for computational, storage and energy advantages in hardware accelerators, neural network training methods that can be implemented with low precision integer-only arithmetic remains an active research challenge. In this paper, we present NITI, an efficient deep neural network training framework that stores all parameters and intermediate values as integers, and computes exclusively with integer arithmetic. A pseudo stochastic rounding scheme that eliminates the need for external random number generation is proposed to facilitate conversion from wider intermediate results to low precision storage. Furthermore, a cross-entropy loss backpropagation scheme computed with integer-only arithmetic is proposed. A proof-of-concept open-source software implementation of NITI that utilizes native 8-bit integer operations in modern GPUs to achieve end-to-end training is presented. When compared with an equivalent training setup implemented with floating point storage and arithmetic, NITI achieves negligible accuracy degradation on the MNIST and CIFAR10 datasets using 8-bit integer storage and computation. On ImageNet, 16-bit integers are needed for weight accumulation with an 8-bit datapath. This achieves training results comparable to all-floating-point implementations.

中位數 · 圖 · 泛化理論 · 值域 · 分解 ·

2022 年 2 月 11 日

Interval Query Problem on Cube-free Median Graphs

Soh Kumabe

from arxiv, ISAAC'21, 21 pages

In this paper, we introduce the \emph{interval query problem} on cube-free median graphs. Let $G$ be a cube-free median graph and $\mathcal{S}$ be a commutative semigroup. For each vertex $v$ in $G$, we are given an element $p(v)$ in $\mathcal{S}$. For each query, we are given two vertices $u,v$ in $G$ and asked to calculate the sum of $p(z)$ over all vertices $z$ belonging to a $u-v$ shortest path. This is a common generalization of range query problems on trees and grids. In this paper, we provide an algorithm to answer each interval query in $O(\log^2 n)$ time. The required data structure is constructed in $O(n\log^3 n)$ time and $O(n\log^2 n)$ space. To obtain our algorithm, we introduce a new technique, named the \emph{stairs decomposition}, to decompose an interval of cube-free median graphs into simpler substructures.

泛化理論 · UniFormer · 未標記 · TOOLS · 可辨認的 ·

2021 年 10 月 17 日

Explaining generalization in deep learning: progress and fundamental limits

Vaishnavh Nagarajan

from arxiv, arXiv admin note: text overlap with arXiv:1902.04742

This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

跳躍連接 · Neural Networks · 優化器 · 線性的 · 圖 ·

2021 年 5 月 10 日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Keyulu Xu,Mozhi Zhang,Stefanie Jegelka,Kenji Kawaguchi

Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.

未標記 · Networking · MoDELS · 樣本復雜度 · 無監督 ·

2021 年 2 月 8 日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Colin Wei,Kendrick Shen,Yining Chen,Tengyu Ma

Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic ``expansion'' assumption, which states that a low-probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.

優化器 · 圖 · 圖形處理器 · Neural Networks · 核化 ·

2021 年 1 月 28 日

Interpreting and Unifying Graph Neural Networks with An Optimization Framework

Meiqi Zhu,Xiao Wang,Chuan Shi,Houye Ji,Peng Cui

from arxiv, WWW2021, 12 pages

Graph Neural Networks (GNNs) have received considerable attention on graph-structured data learning for a wide variety of tasks. The well-designed propagation mechanism which has been demonstrated effective is the most fundamental part of GNNs. Although most of GNNs basically follow a message passing manner, litter effort has been made to discover and analyze their essential relations. In this paper, we establish a surprising connection between different propagation mechanisms with a unified optimization problem, showing that despite the proliferation of various GNNs, in fact, their proposed propagation mechanisms are the optimal solution optimizing a feature fitting function over a wide class of graph kernels with a graph regularization term. Our proposed unified optimization framework, summarizing the commonalities between several of the most representative GNNs, not only provides a macroscopic view on surveying the relations between different GNNs, but also further opens up new opportunities for flexibly designing new GNNs. With the proposed framework, we discover that existing works usually utilize naive graph convolutional kernels for feature fitting function, and we further develop two novel objective functions considering adjustable graph kernels showing low-pass or high-pass filtering capabilities respectively. Moreover, we provide the convergence proofs and expressive power comparisons for the proposed models. Extensive experiments on benchmark datasets clearly show that the proposed GNNs not only outperform the state-of-the-art methods but also have good ability to alleviate over-smoothing, and further verify the feasibility for designing GNNs with our unified optimization framework.

Neural Networks · 優化器 · Networks · 局部極小 · Networking ·

2019 年 12 月 19 日

Optimization for deep learning: theory and algorithms

Ruoyu Sun

from arxiv, 38 pages of main body; 5 pages of appendix; 12 pages of references

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

注意力機制 · 指針網絡 · 離散化 · 輸出 · 學成 ·

2017 年 1 月 2 日

Pointer Networks

Oriol Vinyals,Meire Fortunato,Navdeep Jaitly

We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence and Neural Turing Machines, because the number of target classes in each step of the output depends on the length of the input, which is variable. Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class. Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output. We call this architecture a Pointer Net (Ptr-Net). We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems -- finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem -- using training examples alone. Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems.