成人午夜性影院视频,欧美PLAY视频海量性欧美

This paper explores the implicit bias of overparameterized neural networks of depth greater than two layers. Our framework considers a family of networks of varying depths that all have the same capacity but different implicitly defined representation costs. The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function; it reflects the function space bias associated with the architecture. Our results show that adding linear layers to a ReLU network yields a representation cost that favors functions that can be approximated by a low-rank linear operator composed with a function with low representation cost using a two-layer network. Specifically, using a neural network to fit training data with minimum representation cost yields an interpolating function that is nearly constant in directions orthogonal to a low-dimensional subspace. This means that the learned network will approximately be a single- or multiple-index model. Our experiments show that when this active subspace structure exists in the data, adding linear layers can improve generalization and result in a network that is well-aligned with the true active subspace.

相關內容

Networking

關注 22

Networking：IFIP International Conferences on Networking。 Explanation：國際網絡會議。 Publisher：IFIP。 SIT：

線性的 · Neural Networks · Networking · 線性回歸 · SimPLe ·

2023 年 7 月 14 日

Using Linear Regression for Iteratively Training Neural Networks

Harshad Khadilkar

from arxiv, 10 pages

We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the description and experiments to (i) simple feedforward neural networks, (ii) scalar (single output) regression problems, and (iii) invertible activation functions. However, the approach is intended to be extensible to larger, more complex architectures. The key idea is the observation that the input to every neuron in a neural network is a linear combination of the activations of neurons in the previous layer, as well as the parameters (weights and biases) of the layer. If we are able to compute the ideal total input values to every neuron by working backwards from the output, we can formulate the learning problem as a linear least squares problem which iterates between updating the parameters and the activation values. We present an explicit algorithm that implements this idea, and we show that (at least for small problems) the approach is more stable and faster than gradient-based methods.

2023 年 7 月 14 日

Linear Classification of Neural Manifolds with Correlated Variability

Albert J. Wakhloo,Tamara J. Sussman,SueYeon Chung

from arxiv, 6 pages and 5 figures in main text. 13 pages and 1 figure in supplementary material

Understanding how the statistical and geometric properties of neural activity relate to performance is a key problem in theoretical neuroscience and deep learning. Here, we calculate how correlations between object representations affect the capacity, a measure of linear separability. We show that for spherical object manifolds, introducing correlations between centroids effectively pushes the spheres closer together, while introducing correlations between the axes effectively shrinks their radii, revealing a duality between correlations and geometry with respect to the problem of classification. We then apply our results to accurately estimate the capacity of deep network data.

未標記 · 主動學習 · INFORMS · 線性模型 · Performer ·

2023 年 7 月 13 日

Stream-based active learning with linear models

Davide Cacciarelli,Murat Kulahci,John S?lve Tyssedal

from arxiv, Published in Knowledge-Based Systems (2022)

The proliferation of automated data collection schemes and the advances in sensorics are increasing the amount of data we are able to monitor in real-time. However, given the high annotation costs and the time required by quality inspections, data is often available in an unlabeled form. This is fostering the use of active learning for the development of soft sensors and predictive models. In production, instead of performing random inspections to obtain product information, labels are collected by evaluating the information content of the unlabeled data. Several query strategy frameworks for regression have been proposed in the literature but most of the focus has been dedicated to the static pool-based scenario. In this work, we propose a new strategy for the stream-based scenario, where instances are sequentially offered to the learner, which must instantaneously decide whether to perform the quality check to obtain the label or discard the instance. The approach is inspired by the optimal experimental design theory and the iterative aspect of the decision-making process is tackled by setting a threshold on the informativeness of the unlabeled data points. The proposed approach is evaluated using numerical simulations and the Tennessee Eastman Process simulator. The results confirm that selecting the examples suggested by the proposed algorithm allows for a faster reduction in the prediction error.

Networking · 循環神經網絡 · Learning · Neural Networks · 再參數化/重參數化 ·

2023 年 7 月 13 日

Learning fixed points of recurrent neural networks by reparameterizing the network model

Vicky Zhu,Robert Rosenbaum

In computational neuroscience, fixed points of recurrent neural network models are commonly used to model neural responses to static or slowly changing stimuli. These applications raise the question of how to train the weights in a recurrent neural network to minimize a loss function evaluated on fixed points. A natural approach is to use gradient descent on the Euclidean space of synaptic weights. We show that this approach can lead to poor learning performance due, in part, to singularities that arise in the loss surface. We use a re-parameterization of the recurrent network model to derive two alternative learning rules that produces more robust learning dynamics. We show that these learning rules can be interpreted as steepest descent and gradient descent, respectively, under a non-Euclidean metric on the space of recurrent weights. Our results question the common, implicit assumption that learning in the brain should necessarily follow the negative Euclidean gradient of synaptic weights.

Networking · Neural Networks · 神經元 · SGD · ReLU ·

2023 年 7 月 13 日

Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification

Lianke Qin,Zhao Song,Yuanyuan Yang

Deep learning has been widely used in many fields, but the model training process usually consumes massive computational resources and time. Therefore, designing an efficient neural network training method with a provable convergence guarantee is a fundamental and important research question. In this paper, we present a static half-space report data structure that consists of a fully connected two-layer neural network for shifted ReLU activation to enable activated neuron identification in sublinear time via geometric search. We also prove that our algorithm can converge in $O(M^2/\epsilon^2)$ time with network size quadratic in the coefficient norm upper bound $M$ and error term $\epsilon$.

泛化理論 · 穩健性 · Performer · 卷積 · 訓練數據 ·

2023 年 7 月 12 日

Single Domain Generalization via Normalised Cross-correlation Based Convolutions

WeiQin Chuah,Ruwan Tennakoon,Reza Hoseinnezhad,David Suter,Alireza Bab-Hadiashar

from arxiv, 10 pages, 6 figures

Deep learning techniques often perform poorly in the presence of domain shift, where the test data follows a different distribution than the training data. The most practically desirable approach to address this issue is Single Domain Generalization (S-DG), which aims to train robust models using data from a single source. Prior work on S-DG has primarily focused on using data augmentation techniques to generate diverse training data. In this paper, we explore an alternative approach by investigating the robustness of linear operators, such as convolution and dense layers commonly used in deep learning. We propose a novel operator called XCNorm that computes the normalized cross-correlation between weights and an input feature patch. This approach is invariant to both affine shifts and changes in energy within a local feature patch and eliminates the need for commonly used non-linear activation functions. We show that deep neural networks composed of this operator are robust to common semantic distribution shifts. Furthermore, our empirical results on single-domain generalization benchmarks demonstrate that our proposed technique performs comparably to the state-of-the-art methods.

MoDELS · 估計/估計量 · 線性的 · Projection · 向量化 ·

2023 年 7 月 11 日

Single Index Fréchet Regression

Satarupa Bhattacharjee,Hans-Georg Müller

from arxiv, 64 pages, 17 figures, 11 tables

Single index models provide an effective dimension reduction tool in regression, especially for high dimensional data, by projecting a general multivariate predictor onto a direction vector. We propose a novel single-index model for regression models where metric space-valued random object responses are coupled with multivariate Euclidean predictors. The responses in this regression model include complex, non-Euclidean data that lie in abstract metric spaces, including covariance matrices, graph Laplacians of networks, and univariate probability distribution functions. While Fr\'echet regression has proved useful for modeling the conditional mean of such random objects given multivariate Euclidean vectors, it does not provide for regression parameters such as slopes or intercepts, since the metric space-valued responses are not amenable to linear operations. As a consequence, distributional results for Fr\'echet regression have been elusive. We show here that for the case of multivariate Euclidean predictors, the parameters that define a single index and projection vector can be used to substitute for the inherent absence of parameters in Fr\'echet regression. Specifically, we derive the asymptotic distribution of suitable estimates of these parameters, which then can be utilized to test linear hypotheses for the parameters, subject to an identifiability condition. We demonstrate the finite sample performance of estimation and inference for the proposed single index Fr\'echet regression model through simulation studies. The method is illustrated for resting-state functional Magnetic Resonance Imaging (fMRI) data from the ADNI study.

Networking · Neural Networks · Learning · 輸入輸出對 · 監督學習 ·

2023 年 7 月 11 日

Fundamental limits of overparametrized shallow neural networks for supervised learning

Francesco Camilli,Daria Tieplova,Jean Barbier

from arxiv, 30 pages, 1 figure

We carry out an information-theoretical analysis of a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error, to the same quantities but for a simpler (generalized) linear model for which explicit expressions are rigorously known. Our bounds, which are expressed in terms of the number of training samples, input dimension and number of hidden units, thus yield fundamental performance limits for any neural network (and actually any learning procedure) trained from limited data generated according to our two-layer teacher neural network model. The proof relies on rigorous tools from spin glasses and is guided by ``Gaussian equivalence principles'' lying at the core of numerous recent analyses of neural networks. With respect to the existing literature, which is either non-rigorous or restricted to the case of the learning of the readout weights only, our results are information-theoretic (i.e. are not specific to any learning algorithm) and, importantly, cover a setting where all the network parameters are trained.

貪心 · 模態 · MoDELS · 學成 · 泛化理論 ·

2022 年 2 月 10 日

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Nan Wu,Stanis?aw Jastrz?bski,Kyunghyun Cho,Krzysztof J. Geras

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.

圖卷積神經網絡/圖卷積網絡 · 圖卷積網絡 · 圖 · 圖卷積 · state-of-the-art ·

2019 年 8 月 8 日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Wei-Lin Chiang,Xuanqing Liu,Si Si,Yang Li,Samy Bengio,Cho-Jui Hsieh

from arxiv, In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19)

Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at //github.com/google-research/google-research/tree/master/cluster_gcn.