草莓视频在线观看免费完整,无码人妻一区二区三区在线不卡,国产午夜福利精品理论片在线播放

Studying the function spaces defined by neural networks helps to understand the corresponding learning models and their inductive bias. While in some limits neural networks correspond to function spaces that are reproducing kernel Hilbert spaces, these regimes do not capture the properties of the networks used in practice. In contrast, in this paper we show that deep neural networks define suitable reproducing kernel Banach spaces. These spaces are equipped with norms that enforce a form of sparsity, enabling them to adapt to potential latent structures within the input data and their representations. In particular, leveraging the theory of reproducing kernel Banach spaces, combined with variational results, we derive representer theorems that justify the finite architectures commonly employed in applications. Our study extends analogous results for shallow networks and can be seen as a step towards considering more practically plausible neural architectures.

相關內容

Networking

關注 0

Networking：IFIP International Conferences on Networking。 Explanation：國際網絡會議。 Publisher：IFIP。 SIT：

線性的 · 線性模型 · MoDELS · 廣義線性模型 · 估計/估計量 ·

2024 年 4 月 24 日

Empirical Bayes inference in sparse high-dimensional generalized linear models

Yiqi Tang,Ryan Martin

High-dimensional linear models have been widely studied, but the developments in high-dimensional generalized linear models, or GLMs, have been slower. In this paper, we propose an empirical or data-driven prior leading to an empirical Bayes posterior distribution which can be used for estimation of and inference on the coefficient vector in a high-dimensional GLM, as well as for variable selection. We prove that our proposed posterior concentrates around the true/sparse coefficient vector at the optimal rate, provide conditions under which the posterior can achieve variable selection consistency, and prove a Bernstein--von Mises theorem that implies asymptotically valid uncertainty quantification. Computation of the proposed empirical Bayes posterior is simple and efficient, and is shown to perform well in simulations compared to existing Bayesian and non-Bayesian methods in terms of estimation and variable selection.

Neural Networks · Networking · Learning · 反向傳播 · Machine Learning ·

2024 年 4 月 23 日

Training all-mechanical neural networks for task learning through in situ backpropagation

Shuaifeng Li,Xiaoming Mao

from arxiv, 25 pages, 5 figures

Recent advances unveiled physical neural networks as promising machine learning platforms, offering faster and more energy-efficient information processing. Compared with extensively-studied optical neural networks, the development of mechanical neural networks (MNNs) remains nascent and faces significant challenges, including heavy computational demands and learning with approximate gradients. Here, we introduce the mechanical analogue of in situ backpropagation to enable highly efficient training of MNNs. We demonstrate that the exact gradient can be obtained locally in MNNs, enabling learning through their immediate vicinity. With the gradient information, we showcase the successful training of MNNs for behavior learning and machine learning tasks, achieving high accuracy in regression and classification. Furthermore, we present the retrainability of MNNs involving task-switching and damage, demonstrating the resilience. Our findings, which integrate the theory for training MNNs and experimental and numerical validations, pave the way for mechanical machine learning hardware and autonomous self-learning material systems.

Microsoft Surface · Networking · Neural Networks · INFORMS · 相互獨立的 ·

2024 年 4 月 23 日

Surface profile recovery from electromagnetic field with physics--informed neural networks

Yuxuan Chen,Ce Wang,Yuan Hui,Mark Spivack

Physics--informed neural networks (PINN) have shown their potential in solving both direct and inverse problems of partial differential equations. In this paper, we introduce a PINN-based deep learning approach to reconstruct one-dimensional rough surfaces from field data illuminated by an electromagnetic incident wave. In the proposed algorithm, the rough surface is approximated by a neural network, with which the spatial derivatives of surface function can be obtained via automatic differentiation and then the scattered field can be calculated via the method of moments. The neural network is trained by minimizing the loss between the calculated and the observed field data. Furthermore, the proposed method is an unsupervised approach, independent of any surface data, rather only the field data is used. Both TE field (Dirichlet boundary condition) and TM field (Neumann boundary condition) are considered. Two types of field data are used here: full scattered field data and phaseless total field data. The performance of the method is verified by testing with Gaussian-correlated random rough surfaces. Numerical results demonstrate that the PINN-based method can recover rough surfaces with great accuracy and is robust with respect to a wide range of problem regimes.

Neural Networks · Networking · Networks · Learning · INFORMS ·

2024 年 4 月 23 日

Elucidating the theoretical underpinnings of surrogate gradient learning in spiking neural networks

Julia Gygax,Friedemann Zenke

from arxiv, 25 pages, 7 figures + 3 supplementary figures

Training spiking neural networks to approximate complex functions is essential for studying information processing in the brain and neuromorphic computing. Yet, the binary nature of spikes constitutes a challenge for direct gradient-based training. To sidestep this problem, surrogate gradients have proven empirically successful, but their theoretical foundation remains elusive. Here, we investigate the relation of surrogate gradients to two theoretically well-founded approaches. On the one hand, we consider smoothed probabilistic models, which, due to lack of support for automatic differentiation, are impractical for training deep spiking neural networks, yet provide gradients equivalent to surrogate gradients in single neurons. On the other hand, we examine stochastic automatic differentiation, which is compatible with discrete randomness but has never been applied to spiking neural network training. We find that the latter provides the missing theoretical basis for surrogate gradients in stochastic spiking neural networks. We further show that surrogate gradients in deterministic networks correspond to a particular asymptotic case and numerically confirm the effectiveness of surrogate gradients in stochastic multi-layer spiking neural networks. Finally, we illustrate that surrogate gradients are not conservative fields and, thus, not gradients of a surrogate loss. Our work provides the missing theoretical foundation for surrogate gradients and an analytically well-founded solution for end-to-end training of stochastic spiking neural networks.

Storage · 激活函數 · 泛函 · Networking · Neural Networks ·

2024 年 4 月 20 日

Solution space and storage capacity of fully connected two-layer neural networks with generic activation functions

Sota Nishiyama,Masayuki Ohzeki

from arxiv, 10+11 pages, 5 figures, 1 table

The storage capacity of a binary classification model is the maximum number of random input-output pairs per parameter that the model can learn. It is one of the indicators of the expressive power of machine learning models and is important for comparing the performance of various models. In this study, we analyze the structure of the solution space and the storage capacity of fully connected two-layer neural networks with general activation functions using the replica method from statistical physics. Our results demonstrate that the storage capacity per parameter remains finite even with infinite width and that the weights of the network exhibit negative correlations, leading to a 'division of labor'. In addition, we find that increasing the dataset size triggers a phase transition at a certain transition point where the permutation symmetry of weights is broken, resulting in the solution space splitting into disjoint regions. We identify the dependence of this transition point and the storage capacity on the choice of activation function. These findings contribute to understanding the influence of activation functions and the number of parameters on the structure of the solution space, potentially offering insights for selecting appropriate architectures based on specific objectives.

Networking · 分解的 · TEAM · 論文 · MoDELS ·

2024 年 4 月 19 日

Analysis of effects to scientific impact indicators based on the coevolution of coauthorship and citation networks

Haobai Xue

While computer modeling and simulation are crucial for understanding scientometrics, their practical use in literature remains somewhat limited. In this study, we establish a joint coauthorship and citation network using preferential attachment. As papers get published, we update the coauthorship network based on each paper's author list, representing the collaborative team behind it. This team is formed considering the number of collaborations each author has, and we introduce new authors at a fixed probability, expanding the coauthorship network. Simultaneously, as each paper cites a specific number of references, we add an equivalent number of citations to the citation network upon publication. The likelihood of a paper being cited depends on its existing citations, fitness value, and age. Then we calculate the journal impact factor and h-index, using them as examples of scientific impact indicators. After thorough validation, we conduct case studies to analyze the impact of different parameters on the journal impact factor and h-index. The findings reveal that increasing the reference number N or reducing the paper's lifetime {\theta} significantly boosts the journal impact factor and average h-index. On the other hand, enlarging the team size m without introducing new authors or decreasing the probability of newcomers p notably increases the average h-index. In conclusion, it is evident that various parameters influence scientific impact indicators, and their interpretation can be manipulated by authors. Thus, exploring the impact of these parameters and continually refining scientific impact indicators are essential. The modeling and simulation method serves as a powerful tool in this ongoing process, and the model can be easily extended to include other scientific impact indicators and scenarios.

優化器 · 塑造 · 跡 · Performer · 目標函數 ·

2024 年 4 月 18 日

Tracing Pareto-optimal points for multi-objective shape optimization applied to electric machines

Alessio Cesarano,Peter Gangl

from arxiv, 9 pages, 3 figures. SCEE2024 conference proceeding paper

In the context of the optimization of rotating electric machines, many different objective functions are of interest and considering this during the optimization is of crucial importance. While evolutionary algorithms can provide a Pareto front straightforwardly and are widely used in this context, derivative-based optimization algorithms can be computationally more efficient. In this case, a Pareto front can be obtained by performing several optimization runs with different weights. In this work, we focus on a free-form shape optimization approach allowing for arbitrary motor geometries. In particular, we propose a way to efficiently obtain Pareto-optimal points by moving along to the Pareto front exploiting a homotopy method based on second order shape derivatives.

Networking · Neural Networks · 控制器 · 線性的 · MoDELS ·

2024 年 4 月 18 日

Mapping back and forth between model predictive control and neural networks

Ross Drummond,Pablo R Baldivieso-Monasterios,Giorgio Valmorbida

from arxiv, 13 pages

Model predictive control (MPC) for linear systems with quadratic costs and linear constraints is shown to admit an exact representation as an implicit neural network. A method to "unravel" the implicit neural network of MPC into an explicit one is also introduced. As well as building links between model-based and data-driven control, these results emphasize the capability of implicit neural networks for representing solutions of optimisation problems, as such problems are themselves implicitly defined functions.

Networking · Neural Networks · Learning · MoDELS · 經驗風險 ·

2024 年 4 月 17 日

Learning time-scales in two-layers neural networks

Rapha?l Berthier,Andrea Montanari,Kangjie Zhou

from arxiv, 64 pages, 15 figures

Gradient-based learning in multi-layer neural networks displays a number of striking features. In particular, the decrease rate of empirical risk is non-monotone even after averaging over large batches. Long plateaus in which one observes barely any progress alternate with intervals of rapid decrease. These successive phases of learning often take place on very different time scales. Finally, models learnt in an early phase are typically `simpler' or `easier to learn' although in a way that is difficult to formalize. Although theoretical explanations of these phenomena have been put forward, each of them captures at best certain specific regimes. In this paper, we study the gradient flow dynamics of a wide two-layer neural network in high-dimension, when data are distributed according to a single-index model (i.e., the target function depends on a one-dimensional projection of the covariates). Based on a mixture of new rigorous results, non-rigorous mathematical derivations, and numerical simulations, we propose a scenario for the learning dynamics in this setting. In particular, the proposed evolution exhibits separation of timescales and intermittency. These behaviors arise naturally because the population gradient flow can be recast as a singularly perturbed dynamical system.

貪心 · 模態 · MoDELS · 學成 · 泛化理論 ·

2022 年 2 月 10 日

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Nan Wu,Stanis?aw Jastrz?bski,Kyunghyun Cho,Krzysztof J. Geras

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.