亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Recent architectural developments have enabled recurrent neural networks (RNNs) to reach and even surpass the performance of Transformers on certain sequence modeling tasks. These modern RNNs feature a prominent design pattern: linear recurrent layers interconnected by feedforward paths with multiplicative gating. Here, we show how RNNs equipped with these two design elements can exactly implement (linear) self-attention, the main building block of Transformers. By reverse-engineering a set of trained RNNs, we find that gradient descent in practice discovers our construction. In particular, we examine RNNs trained to solve simple in-context learning tasks on which Transformers are known to excel and find that gradient descent instills in our RNNs the same attention-based in-context learning algorithm used by Transformers. Our findings highlight the importance of multiplicative interactions in neural networks and suggest that certain RNNs might be unexpectedly implementing attention under the hood.

相關內容

神(shen)經(jing)(jing)網(wang)(wang)絡(luo)(Neural Networks)是世界上三個(ge)最(zui)古老(lao)的(de)(de)(de)(de)(de)神(shen)經(jing)(jing)建模學(xue)(xue)(xue)會(hui)(hui)的(de)(de)(de)(de)(de)檔(dang)案期刊:國際神(shen)經(jing)(jing)網(wang)(wang)絡(luo)學(xue)(xue)(xue)會(hui)(hui)(INNS)、歐洲(zhou)神(shen)經(jing)(jing)網(wang)(wang)絡(luo)學(xue)(xue)(xue)會(hui)(hui)(ENNS)和(he)(he)日本神(shen)經(jing)(jing)網(wang)(wang)絡(luo)學(xue)(xue)(xue)會(hui)(hui)(JNNS)。神(shen)經(jing)(jing)網(wang)(wang)絡(luo)提供(gong)了一(yi)個(ge)論(lun)壇,以(yi)發展和(he)(he)培育一(yi)個(ge)國際社(she)會(hui)(hui)的(de)(de)(de)(de)(de)學(xue)(xue)(xue)者和(he)(he)實踐(jian)者感興趣的(de)(de)(de)(de)(de)所有方面(mian)(mian)的(de)(de)(de)(de)(de)神(shen)經(jing)(jing)網(wang)(wang)絡(luo)和(he)(he)相關(guan)方法的(de)(de)(de)(de)(de)計(ji)(ji)算智(zhi)能(neng)。神(shen)經(jing)(jing)網(wang)(wang)絡(luo)歡迎高質量(liang)論(lun)文的(de)(de)(de)(de)(de)提交,有助(zhu)(zhu)于全面(mian)(mian)的(de)(de)(de)(de)(de)神(shen)經(jing)(jing)網(wang)(wang)絡(luo)研(yan)究(jiu),從行為和(he)(he)大(da)腦建模,學(xue)(xue)(xue)習算法,通過數學(xue)(xue)(xue)和(he)(he)計(ji)(ji)算分析,系(xi)(xi)統的(de)(de)(de)(de)(de)工程和(he)(he)技術應用,大(da)量(liang)使用神(shen)經(jing)(jing)網(wang)(wang)絡(luo)的(de)(de)(de)(de)(de)概念(nian)和(he)(he)技術。這一(yi)獨特而廣泛的(de)(de)(de)(de)(de)范圍促(cu)進了生物和(he)(he)技術研(yan)究(jiu)之間的(de)(de)(de)(de)(de)思(si)想(xiang)交流,并有助(zhu)(zhu)于促(cu)進對生物啟發的(de)(de)(de)(de)(de)計(ji)(ji)算智(zhi)能(neng)感興趣的(de)(de)(de)(de)(de)跨學(xue)(xue)(xue)科社(she)區(qu)的(de)(de)(de)(de)(de)發展。因此,神(shen)經(jing)(jing)網(wang)(wang)絡(luo)編委(wei)會(hui)(hui)代表(biao)的(de)(de)(de)(de)(de)專家領(ling)域包括(kuo)心理(li)學(xue)(xue)(xue),神(shen)經(jing)(jing)生物學(xue)(xue)(xue),計(ji)(ji)算機科學(xue)(xue)(xue),工程,數學(xue)(xue)(xue),物理(li)。該雜(za)志(zhi)發表(biao)文章、信件(jian)和(he)(he)評論(lun)以(yi)及給編輯的(de)(de)(de)(de)(de)信件(jian)、社(she)論(lun)、時事、軟件(jian)調(diao)查(cha)和(he)(he)專利信息。文章發表(biao)在五(wu)個(ge)部(bu)分之一(yi):認知(zhi)科學(xue)(xue)(xue),神(shen)經(jing)(jing)科學(xue)(xue)(xue),學(xue)(xue)(xue)習系(xi)(xi)統,數學(xue)(xue)(xue)和(he)(he)計(ji)(ji)算分析、工程和(he)(he)應用。 官網(wang)(wang)地址:

We use fixed point theory to analyze nonnegative neural networks, which we define as neural networks that map nonnegative vectors to nonnegative vectors. We first show that nonnegative neural networks with nonnegative weights and biases can be recognized as monotonic and (weakly) scalable mappings within the framework of nonlinear Perron-Frobenius theory. This fact enables us to provide conditions for the existence of fixed points of nonnegative neural networks having inputs and outputs of the same dimension, and these conditions are weaker than those recently obtained using arguments in convex analysis. Furthermore, we prove that the shape of the fixed point set of nonnegative neural networks with nonnegative weights and biases is an interval, which under mild conditions degenerates to a point. These results are then used to obtain the existence of fixed points of more general nonnegative neural networks. From a practical perspective, our results contribute to the understanding of the behavior of autoencoders, and we also offer valuable mathematical machinery for future developments in deep equilibrium models.

A fundamental aspect of statistics is the integration of data from different sources. Classically, Fisher and others were focused on how to integrate homogeneous (or only mildly heterogeneous) sets of data. More recently, as data are becoming more accessible, the question of if data sets from different sources should be integrated is becoming more relevant. The current literature treats this as a question with only two answers: integrate or don't. Here we take a different approach, motivated by information-sharing principles coming from the shrinkage estimation literature. In particular, we deviate from the do/don't perspective and propose a dial parameter that controls the extent to which two data sources are integrated. How far this dial parameter should be turned is shown to depend, for example, on the informativeness of the different data sources as measured by Fisher information. In the context of generalized linear models, this more nuanced data integration framework leads to relatively simple parameter estimates and valid tests/confidence intervals. Moreover, we demonstrate both theoretically and empirically that setting the dial parameter according to our recommendation leads to more efficient estimation compared to other binary data integration schemes.

Most currently used tensor regression models for high-dimensional data are based on Tucker decomposition, which has good properties but loses its efficiency in compressing tensors very quickly as the order of tensors increases, say greater than four or five. However, for the simplest tensor autoregression in handling time series data, its coefficient tensor already has the order of six. This paper revises a newly proposed tensor train (TT) decomposition and then applies it to tensor regression such that a nice statistical interpretation can be obtained. The new tensor regression can well match the data with hierarchical structures, and it even can lead to a better interpretation for the data with factorial structures, which are supposed to be better fitted by models with Tucker decomposition. More importantly, the new tensor regression can be easily applied to the case with higher order tensors since TT decomposition can compress the coefficient tensors much more efficiently. The methodology is also extended to tensor autoregression for time series data, and nonasymptotic properties are derived for the ordinary least squares estimations of both tensor regression and autoregression. A new algorithm is introduced to search for estimators, and its theoretical justification is also discussed. Theoretical and computational properties of the proposed methodology are verified by simulation studies, and the advantages over existing methods are illustrated by two real examples.

Within network data analysis, bipartite networks represent a particular type of network where relationships occur between two disjoint sets of nodes, formally called sending and receiving nodes. In this context, sending nodes may be organized into layers on the basis of some defined characteristics, resulting in a special case of multilayer bipartite network, where each layer includes a specific set of sending nodes. To perform a clustering of sending nodes in multi-layer bipartite network, we extend the Mixture of Latent Trait Analyzers (MLTA), also taking into account the influence of concomitant variables on clustering formation and the multi-layer structure of the data. To this aim, a multilevel approach offers a useful methodological tool to properly account for the hierarchical structure of the data and for the unobserved sources of heterogeneity at multiple levels. A simulation study is conducted to test the performance of the proposal in terms of parameters' and clustering recovery. Furthermore, the model is applied to the European Social Survey data (ESS) to i) perform a clustering of individuals (sending nodes) based on their digital skills (receiving nodes); ii) understand how socio-economic and demographic characteristics influence the individual digitalization level; iii) account for the multilevel structure of the data; iv) obtain a clustering of countries in terms of the base-line attitude to digital technologies of their residents.

We present a physics-informed neural network (PINN) approach for the discovery of slow invariant manifolds (SIMs), for the most general class of fast/slow dynamical systems of ODEs. In contrast to other machine learning (ML) approaches that construct reduced order black box surrogate models using simple regression, and/or require a priori knowledge of the fast and slow variables, our approach, simultaneously decomposes the vector field into fast and slow components and provides a functional of the underlying SIM in a closed form. The decomposition is achieved by finding a transformation of the state variables to the fast and slow ones, which enables the derivation of an explicit, in terms of fast variables, SIM functional. The latter is obtained by solving a PDE corresponding to the invariance equation within the Geometric Singular Perturbation Theory (GSPT) using a single-layer feedforward neural network with symbolic differentiation. The performance of the proposed physics-informed ML framework is assessed via three benchmark problems: the Michaelis-Menten, the target mediated drug disposition (TMDD) reaction model and a fully competitive substrate-inhibitor(fCSI) mechanism. We also provide a comparison with other GPST methods, namely the quasi steady state approximation (QSSA), the partial equilibrium approximation (PEA) and CSP with one and two iterations. We show that the proposed PINN scheme provides SIM approximations, of equivalent or even higher accuracy, than those provided by QSSA, PEA and CSP, especially close to the boundaries of the underlying SIMs.

Recent advances in neuroimaging have enabled studies in functional connectivity (FC) of human brain, alongside investigation of the neuronal basis of cognition. One important FC study is the representation of vision in human brain. The release of publicly available dataset BOLD5000 has made it possible to study the brain dynamics during visual tasks in greater detail. In this paper, a comprehensive analysis of fMRI time series (TS) has been performed to explore different types of visual brain networks (VBN). The novelty of this work lies in (1) constructing VBN with consistently significant direct connectivity using both marginal and partial correlation, which is further analyzed using graph theoretic measures, (2) classification of VBNs as formed by image complexity-specific TS, using graphical features. In image complexity-specific VBN classification, XGBoost yields average accuracy in the range of 86.5% to 91.5% for positively correlated VBN, which is 2% greater than that using negative correlation. This result not only reflects the distinguishing graphical characteristics of each image complexity-specific VBN, but also highlights the importance of studying both positively correlated and negatively correlated VBN to understand the how differently brain functions while viewing different complexities of real-world images.

We investigate the stationary (late-time) training regime of single- and two-layer linear underparameterized neural networks within the continuum limit of stochastic gradient descent (SGD) for synthetic Gaussian data. In the case of a single-layer network in the weakly underparameterized regime, the spectrum of the noise covariance matrix deviates notably from the Hessian, which can be attributed to the broken detailed balance of SGD dynamics. The weight fluctuations are in this case generally anisotropic, but are subject to an isotropic loss. For a two-layer network, we obtain the stochastic dynamics of the weights in each layer and analyze the associated stationary covariances. We identify the inter-layer coupling as a new source of anisotropy for the weight fluctuations. In contrast to the single-layer case, the weight fluctuations experience an anisotropic loss, the flatness of which is inversely related to the fluctuation variance. We thereby provide an analytical derivation of the recently observed inverse variance-flatness relation in a model of a deep linear neural network.

Uncertainty is a key feature of any machine learning model and is particularly important in neural networks, which tend to be overconfident. This overconfidence is worrying under distribution shifts, where the model performance silently degrades as the data distribution diverges from the training data distribution. Uncertainty estimation offers a solution to overconfident models, communicating when the output should (not) be trusted. Although methods for uncertainty estimation have been developed, they have not been explicitly linked to the field of explainable artificial intelligence (XAI). Furthermore, literature in operations research ignores the actionability component of uncertainty estimation and does not consider distribution shifts. This work proposes a general uncertainty framework, with contributions being threefold: (i) uncertainty estimation in ML models is positioned as an XAI technique, giving local and model-specific explanations; (ii) classification with rejection is used to reduce misclassifications by bringing a human expert in the loop for uncertain observations; (iii) the framework is applied to a case study on neural networks in educational data mining subject to distribution shifts. Uncertainty as XAI improves the model's trustworthiness in downstream decision-making tasks, giving rise to more actionable and robust machine learning systems in operations research.

In many experimental contexts, whether and how network interactions impact the outcome of interest for both treated and untreated individuals are key concerns. Networks data is often assumed to perfectly represent these possible interactions. This paper considers the problem of estimating treatment effects when measured connections are, instead, a noisy representation of the true spillover pathways. We show that existing methods, using the potential outcomes framework, yield biased estimators in the presence of this mismeasurement. We develop a new method, using a class of mixture models, that can account for missing connections and discuss its estimation via the Expectation-Maximization algorithm. We check our method's performance by simulating experiments on real network data from 43 villages in India. Finally, we use data from a previously published study to show that estimates using our method are more robust to the choice of network measure.

Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.

北京阿比特科技有限公司