爱琴海论坛视频播放三免费,91精品综合久久久久久五月天

The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic behavior of ReLU, which leads feature neurons to specialize for different regions of input space. By contrast, feature neurons in Tanh networks tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.

相關內容

Networking

關注 22

Networking：IFIP International Conferences on Networking。 Explanation：國際網絡會議。 Publisher：IFIP。 SIT：

估計/估計量 · Performer · 方差 · 樣本 · 對稱矩陣 ·

2024 年 3 月 6 日

On randomized estimators of the Hafnian of a nonnegative matrix

Alexey Uvarov,Dmitry Vinichenko

from arxiv, 11 pages, 7 figures

Gaussian Boson Samplers aim to demonstrate quantum advantage by performing a sampling task believed to be classically hard. The probabilities of individual outcomes in the sampling experiment are determined by the Hafnian of an appropriately constructed symmetric matrix. For nonnegative matrices, there is a family of randomized estimators of the Hafnian based on generating a particular random matrix and calculating its determinant. While these estimators are unbiased (the mean of the determinant is equal to the Hafnian of interest), their variance may be so high as to prevent an efficient estimation. Here we investigate the performance of two such estimators, which we call the Barvinok and Godsil-Gutman estimators. We find that in general both estimators perform well for adjacency matrices of random graphs, demonstrating a slow growth of variance with the size of the problem. Nonetheless, there are simple examples where both estimators show high variance, requiring an exponential number of samples. In addition, we calculate the asymptotic behavior of the variance for the complete graph. Finally, we simulate the Gaussian Boson Sampling using the Godsil-Gutman estimator and show that this technique can successfully reproduce low-order correlation functions.

圖 · 圖形處理器 · MoDELS · Networking · Neural Networks ·

2024 年 3 月 6 日

Graph neural network outputs are almost surely asymptotically constant

Sam Adam-Day,Michael Benedikt,?smail ?lkan Ceylan,Ben Finkelshtein

from arxiv, 10 body pages, 23 appendix pages, 2 figures

Graph neural networks (GNNs) are the predominant architectures for a variety of learning tasks on graphs. We present a new angle on the expressive power of GNNs by studying how the predictions of a GNN probabilistic classifier evolve as we apply it on larger graphs drawn from some random graph model. We show that the output converges to a constant function, which upper-bounds what these classifiers can express uniformly. This convergence phenomenon applies to a very wide class of GNNs, including state of the art models, with aggregates including mean and the attention-based mechanism of graph transformers. Our results apply to a broad class of random graph models, including the (sparse) Erd\H{o}s-R\'enyi model and the stochastic block model. We empirically validate these findings, observing that the convergence phenomenon already manifests itself on graphs of relatively modest size.

Cognition · MoDELS · 線性的 · Sigmoid（一種激活函數） · 分段 ·

2024 年 3 月 6 日

A comparison of mixed-models for the analysis of non-linear longitudinal data: application to late-life cognitive trajectories

Maude Wagner,Donald R. Hedeker,Tianhao Wang,Graciela Muniz-Terrera,Ana W. Capuano

from arxiv, 34 pages, 7 Figures, 1 Table

Several mixed-effects models for longitudinal data have been proposed to accommodate the non-linearity of late-life cognitive trajectories and assess the putative influence of covariates on it. No prior research provides a side-by-side examination of these models to offer guidance on their proper application and interpretation. In this work, we examined five statistical approaches previously used to answer research questions related to non-linear changes in cognitive aging: the linear mixed model (LMM) with a quadratic term, LMM with splines, the functional mixed model, the piecewise linear mixed model, and the sigmoidal mixed model. We first theoretically describe the models. Next, using data from two prospective cohorts with annual cognitive testing, we compared the interpretation of the models by investigating associations of education on cognitive change before death. Lastly, we performed a simulation study to empirically evaluate the models and provide practical recommendations. Except for the LMM-quadratic, the fit of all models was generally adequate to capture non-linearity of cognitive change and models were relatively robust. Although spline-based models have no interpretable nonlinearity parameters, their convergence was easier to achieve, and they allow graphical interpretation. In contrast, piecewise and sigmoidal models, with interpretable non-linear parameters, may require more data to achieve convergence.

Networking · Neural Networks · MoDELS · Continuity · 動力系統 ·

2024 年 3 月 6 日

Neural Koopman prior for data assimilation

Anthony Frion,Lucas Drumetz,Mauro Dalla Mura,Guillaume Tochon,Abdeldjalil A?ssa El Bey

With the increasing availability of large scale datasets, computational power and tools like automatic differentiation and expressive neural network architectures, sequential data are now often treated in a data-driven way, with a dynamical model trained from the observation data. While neural networks are often seen as uninterpretable black-box architectures, they can still benefit from physical priors on the data and from mathematical knowledge. In this paper, we use a neural network architecture which leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, enabling a number of appealing features. We introduce methods that enable to train such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques, with applications to e.g. time series interpolation and forecasting.

MoDELS · 語言模型化 · 多樣性 · 設計 · Extensibility ·

2024 年 3 月 6 日

Diffusion on language model embeddings for protein sequence generation

Viacheslav Meshchaninov,Pavel Strashnov,Andrey Shevtsov,Fedor Nikolaev,Nikita Ivanisenko,Olga Kardymon,Dmitry Vetrov

Protein design requires a deep understanding of the inherent complexities of the protein universe. While many efforts lean towards conditional generation or focus on specific families of proteins, the foundational task of unconditional generation remains underexplored and undervalued. Here, we explore this pivotal domain, introducing DiMA, a model that leverages continuous diffusion on embeddings derived from the protein language model, ESM-2, to generate amino acid sequences. DiMA surpasses leading solutions, including autoregressive transformer-based and discrete diffusion models, and we quantitatively illustrate the impact of the design choices that lead to its superior performance. We extensively evaluate the quality, diversity, distribution similarity, and biological relevance of the generated sequences using multiple metrics across various modalities. Our approach consistently produces novel, diverse protein sequences that accurately reflect the inherent structural and functional diversity of the protein space. This work advances the field of protein design and sets the stage for conditional models by providing a robust framework for scalable and high-quality protein sequence generation.

Performer · 容差 · 類別 · Less · 有向 ·

2024 年 3 月 6 日

Adaptive coordination promotes collective cooperation in repeated social dilemmas

Feipeng Zhang,Te Wu,Long Wang

Direct reciprocity based on the repeated prisoner's dilemma has been intensively studied. Most theoretical investigations have concentrated on memory-$1$ strategies, a class of elementary strategies just reacting to the previous-round outcomes. Though the properties of "All-or-None" strategies ($AoN_K$) have been discovered, simulations just confirmed the good performance of $AoN_K$ of very short memory lengths. It remains unclear how $AoN_K$ strategies would fare when players have access to longer rounds of history information. We construct a theoretical model to investigate the performance of the class of $AoN_K$ strategies of varying memory length $K$. We rigorously derive the payoffs and show that $AoN_K$ strategies of intermediate memory length $K$ are most prevalent, while strategies of larger memory lengths are less competent. Larger memory lengths make it hard for $AoN_K$ strategies to coordinate, and thus inhibiting their mutual reciprocity. We then propose the adaptive coordination strategy combining tolerance and $AoN_K$' coordination rule. This strategy behaves like $AoN_K$ strategy when coordination is not sufficient, and tolerates opponents' occasional deviations by still cooperating when coordination is sufficient. We found that the adaptive coordination strategy wins over other classic memory-$1$ strategies in various typical competition environments, and stabilizes the population at high levels of cooperation, suggesting the effectiveness of high level adaptability in resolving social dilemmas. Our work may offer a theoretical framework for exploring complex strategies using history information, which are different from traditional memory-$n$ strategies.

大學 · 合一 · CASES · 變換 · 講稿 ·

2024 年 3 月 5 日

Sharing proofs with predicative theories through universe polymorphic elaboration

Thiago Felicissimo,Frédéric Blanqui

from arxiv, Journal version of //doi.org/10.4230/LIPIcs.CSL.2023.19 to be submitted to LMCS, also supersedes arXiv:2211.05700

As the development of formal proofs is a time-consuming task, it is important to devise ways of sharing the already written proofs to prevent wasting time redoing them. One of the challenges in this domain is to translate proofs written in proof assistants based on impredicative logics to proof assistants based on predicative logics, whenever impredicativity is not used in an essential way. In this paper we present a transformation for sharing proofs with a core predicative system supporting prenex universe polymorphism (like in Agda). It consists in trying to elaborate each term into a predicative universe polymorphic term as general as possible. The use of universe polymorphism is justified by the fact that mapping each universe to a fixed one in the target theory is not sufficient in most cases. During the elaboration, we need to solve unification problems in the equational theory of universe levels. In order to do this, we give a complete characterization of when a single equation admits a most general unifier. This characterization is then employed in a partial algorithm which uses a constraint-postponement strategy for trying to solve unification problems. The proposed translation is of course partial, but in practice allows one to translate many proofs that do not use impredicativity in an essential way. Indeed, it was implemented in the tool Predicativize and then used to translate semi-automatically many non-trivial developments from Matita's library to Agda, including proofs of Bertrand's Postulate and Fermat's Little Theorem, which (as far as we know) were not available in Agda yet.

可約的 · 優化器 · 計算成本 · 代價 · 約束 ·

2024 年 3 月 5 日

Reducing computational effort in topology optimization considering the deformation in additive manufacturing

Takao Miki

from arxiv, 26 pages, 15 figures

Integrating topology optimization and additive manufacturing (AM) technology can facilitate innovative product development. However, laser powder bed fusion, which is the predominant method in metal AM, can lead to issues such as residual stress and deformation. Recently, topology optimization methods considering these stresses and deformations have been proposed; however, they suffer from challenges caused by an increased computational cost. In this study, we propose a method for reducing computational cost in topology optimization considering the deformation in AM. An inherent strain method-based analytical model is presented for simulating the residual stress and deformation in the AM process. Subsequently, a constraint condition to suppress the deformation is formulated, and a method to reduce the computational cost of the adjoint analysis in deriving sensitivity is proposed. The minimum mean compliance problem considering AM deformation and self-support constraints can then be incorporated into the level set-based topology optimization framework. Finally, numerical examples are presented for validating the effectiveness of the proposed topology optimization method.

Neural Networks · Networking · MoDELS · Extensibility · 約束優化 ·

2024 年 3 月 4 日

A prediction rigidity formalism for low-cost uncertainties in trained neural networks

Filippo Bigi,Sanggyu Chong,Michele Ceriotti,Federico Grasselli

Regression methods are fundamental for scientific and technological applications. However, fitted models can be highly unreliable outside of their training domain, and hence the quantification of their uncertainty is crucial in many of their applications. Based on the solution of a constrained optimization problem, we propose "prediction rigidities" as a method to obtain uncertainties of arbitrary pre-trained regressors. We establish a strong connection between our framework and Bayesian inference, and we develop a last-layer approximation that allows the new method to be applied to neural networks. This extension affords cheap uncertainties without any modification to the neural network itself or its training procedure. We show the effectiveness of our method on a wide range of regression tasks, ranging from simple toy models to applications in chemistry and meteorology.

廣義線性模型 · 線性的 · 線性模型 · MoDELS · 統計理論 ·

2024 年 3 月 4 日

Permutation-based multiple testing when fitting many generalized linear models

Riccardo De Santis,Jelle J. Goeman,Samuel Davenport,Jesse Hemerik,Livio Finos

The multiple testing problem appears when fitting multivariate generalized linear models for high dimensional data. We show that the sign-flip test can be combined with permutation-based procedures for assessing the multiple testing problem