好男人在线观看免费2019-精品国产91久久久久久久下载

Unraveling the emergence of collective learning in systems of coupled artificial neural networks is an endeavor with broader implications for physics, machine learning, neuroscience and society. Here we introduce a minimal model that condenses several recent decentralized algorithms by considering a competition between two terms: the local learning dynamics in the parameters of each neural network unit, and a diffusive coupling among units that tends to homogenize the parameters of the ensemble. We derive the coarse-grained behavior of our model via an effective theory for linear networks that we show is analogous to a deformed Ginzburg-Landau model with quenched disorder. This framework predicts (depth-dependent) disorder-order-disorder phase transitions in the parameters' solutions that reveal the onset of a collective learning phase, along with a depth-induced delay of the critical point and a robust shape of the microscopic learning path. We validate our theory in realistic ensembles of coupled nonlinear networks trained in the MNIST dataset under privacy constraints. Interestingly, experiments confirm that individual networks -- trained only with private data -- can fully generalize to unseen data classes when the collective learning phase emerges. Our work elucidates the physics of collective learning and contributes to the mechanistic interpretability of deep learning in decentralized settings.

相關內容

Learning

關注 12

可理解性 · MoDELS · Networking · Principle · Neural Networks ·

2023 年 12 月 6 日

A simple probabilistic neural network for machine understanding

Rongrong Xie,Matteo Marsili

from arxiv, 34 pages, 9 figures. Accepted in JSTAT

We discuss probabilistic neural networks with a fixed internal representation as models for machine understanding. Here understanding is intended as mapping data to an already existing representation which encodes an {\em a priori} organisation of the feature space. We derive the internal representation by requiring that it satisfies the principles of maximal relevance and of maximal ignorance about how different features are combined. We show that, when hidden units are binary variables, these two principles identify a unique model -- the Hierarchical Feature Model (HFM) -- which is fully solvable and provides a natural interpretation in terms of features. We argue that learning machines with this architecture enjoy a number of interesting properties, like the continuity of the representation with respect to changes in parameters and data, the possibility to control the level of compression and the ability to support functions that go beyond generalisation. We explore the behaviour of the model with extensive numerical experiments and argue that models where the internal representation is fixed reproduce a learning modality which is qualitatively different from that of traditional models such as Restricted Boltzmann Machines.

簇 · 標準正態分布 · 圖 · INFORMS · 規范化的 ·

2023 年 12 月 5 日

Central limit theorem for the average closure coefficient

Mingao Yuan

Many real-world networks exhibit the phenomenon of edge clustering, which is typically measured by the average clustering coefficient. Recently, an alternative measure, the average closure coefficient, is proposed to quantify local clustering. It is shown that the average closure coefficient possesses a number of useful properties and can capture complementary information missed by the classical average clustering coefficient. In this paper, we study the asymptotic distribution of the average closure coefficient of a heterogeneous Erd\"{o}s-R\'{e}nyi random graph. We prove that the standardized average closure coefficient converges in distribution to the standard normal distribution. In the Erd\"{o}s-R\'{e}nyi random graph, the variance of the average closure coefficient exhibits the same phase transition phenomenon as the average clustering coefficient.

MoDELS · INFORMS · 判別器 · 情景 · GROUP ·

2023 年 12 月 2 日

When accurate prediction models yield harmful self-fulfilling prophecies

Wouter A. C. van Amsterdam,Nan van Geloven,Jesse Krijthe,Rajesh Ranganth,Giovanni Ciná

from arxiv, ML4H 2023 Findings Track

Prediction models are popular in medical research and practice. By predicting an outcome of interest for specific patients, these models may help inform difficult treatment decisions, and are often hailed as the poster children for personalized, data-driven healthcare. We show however, that using prediction models for decision making can lead to harmful decisions, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients but the worse outcome of these patients does not invalidate the predictive power of the model. Our main result is a formal characterization of a set of such prediction models. Next we show that models that are well calibrated before} and after deployment are useless for decision making as they made no change in the data distribution. These results point to the need to revise standard practices for validation, deployment and evaluation of prediction models that are used in medical decisions.

Neural Networks · Networking · crosstalk · 神經元 · SLIM ·

2023 年 12 月 2 日

Slimmed optical neural networks with multiplexed neuron sets and a corresponding backpropagation training algorithm

Yi-Feng Liu,Rui-Yao Ren,Dai-Bao Hou,Hai-Zhong Weng,Bo-Wen Wang,Ke-Jie Huang,Xing Lin,Feng Liu,Chen-Hui Li,Chao-Yuan Jin

Due to their intrinsic capabilities on parallel signal processing, optical neural networks (ONNs) have attracted extensive interests recently as a potential alternative to electronic artificial neural networks (ANNs) with reduced power consumption and low latency. Preliminary confirmation of the parallelism in optical computing has been widely done by applying the technology of wavelength division multiplexing (WDM) in the linear transformation part of neural networks. However, inter-channel crosstalk has obstructed WDM technologies to be deployed in nonlinear activation in ONNs. Here, we propose a universal WDM structure called multiplexed neuron sets (MNS) which apply WDM technologies to optical neurons and enable ONNs to be further compressed. A corresponding back-propagation (BP) training algorithm is proposed to alleviate or even cancel the influence of inter-channel crosstalk on MNS-based WDM-ONNs. For simplicity, semiconductor optical amplifiers (SOAs) are employed as an example of MNS to construct a WDM-ONN trained with the new algorithm. The result shows that the combination of MNS and the corresponding BP training algorithm significantly downsize the system and improve the energy efficiency to tens of times while giving similar performance to traditional ONNs.

Networking · Soft Computing · MoDELS · Learning · SOFT ·

2023 年 12 月 1 日

Soft computing for the posterior of a new matrix t graphical network

J. Pillay,A. Bekker,J. T. Ferreira,M. Arashi

from arxiv, 25 pages, 9 figures

Modelling noisy data in a network context remains an unavoidable obstacle; fortunately, random matrix theory may comprehensively describe network environments effectively. Thus it necessitates the probabilistic characterisation of these networks (and accompanying noisy data) using matrix variate models. Denoising network data using a Bayes approach is not common in surveyed literature. This paper adopts the Bayesian viewpoint and introduces a new matrix variate t-model in a prior sense by relying on the matrix variate gamma distribution for the noise process, following the Gaussian graphical network for the cases when the normality assumption is violated. From a statistical learning viewpoint, such a theoretical consideration indubitably benefits the real-world comprehension of structures causing noisy data with network-based attributes as part of machine learning in data science. A full structural learning procedure is provided for calculating and approximating the resulting posterior of interest to assess the considered model's network centrality measures. Experiments with synthetic and real-world stock price data are performed not only to validate the proposed algorithm's capabilities but also to show that this model has wider flexibility than originally implied in Billio et al. (2021).

Networking · MoDELS · Neural Networks · 卷積 · 卷積神經網絡 ·

2023 年 12 月 1 日

A causal convolutional neural network for multi-subject motion modeling and generation

Shuaiying Hou,Congyi Wang,Wenlin Zhuang,Yu Chen,Yangang Wang,Hujun Bao,Jinxiang Chai,Weiwei Xu

from arxiv, This preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections. The Version of Record of this article is published in Computational Visual Media, and is available online at //doi.org/10.1007/s41095-022-0307-3

Inspired by the success of WaveNet in multi-subject speech synthesis, we propose a novel neural network based on causal convolutions for multi-subject motion modeling and generation. The network can capture the intrinsic characteristics of the motion of different subjects, such as the influence of skeleton scale variation on motion style. Moreover, after fine-tuning the network using a small motion dataset for a novel skeleton that is not included in the training dataset, it is able to synthesize high-quality motions with a personalized style for the novel skeleton. The experimental results demonstrate that our network can model the intrinsic characteristics of motions well and can be applied to various motion modeling and synthesis tasks.

XAI · 查準率/準確率 · 相似度 · 顯著圖 · 泛化理論 ·

2022 年 5 月 17 日

A psychological theory of explainability

Scott Cheng-Hsin Yang,Tomas Folke,Patrick Shafto

from arxiv, 14 pages, 2 figures, ICML (accepted, pre camera-ready version)

The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.

貪心 · 模態 · MoDELS · 學成 · 泛化理論 ·

2022 年 2 月 10 日

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Nan Wu,Stanis?aw Jastrz?bski,Kyunghyun Cho,Krzysztof J. Geras

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.

學成 · 深度學習 · Continuity · 貝葉斯推斷 · Networking ·

2020 年 12 月 20 日

Recent advances in deep learning theory

Fengxiang He,Dacheng Tao

Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.

Neural Networks · 優化器 · Networks · 局部極小 · Networking ·

2019 年 12 月 19 日

Optimization for deep learning: theory and algorithms

Ruoyu Sun

from arxiv, 38 pages of main body; 5 pages of appendix; 12 pages of references

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.