亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions.

相關內容

We propose a simple dynamic model for estimating the relative contagiousness of two virus variants. Maximum likelihood estimation and inference is conveniently invariant to variation in the total number of cases over the sample period and can be expressed as a logistic regression. We apply the model to Danish SARS-CoV-2 variant data. We estimate the reproduction numbers of Alpha and Delta to be larger than that of the ancestral variant by a factor of 1.51 [CI 95%: 1.50, 1.53] and 3.28 [CI 95%: 3.01, 3.58], respectively. In a predominately vaccinated population, we estimate Omicron to be 3.15 [CI 95%: 2.83, 3.50] times more infectious than Delta. Forecasting the proportion of an emerging virus variant is straight forward and we proceed to show how the effective reproduction number for a new variant can be estimated without contemporary sequencing results. This is useful for assessing the state of the pandemic in real time as we illustrate empirically with the inferred effective reproduction number for the Alpha variant.

Missing values in covariates due to censoring by signal interference or lack of sensitivity in the measuring devices are common in industrial problems. We propose a full Bayesian solution to the prediction problem with an efficient Markov Chain Monte Carlo (MCMC) algorithm that updates all the censored covariate values jointly in a random scan Gibbs sampler. We show that the joint updating of missing covariate values can be at least two orders of magnitude more efficient than univariate updating. This increased efficiency is shown to be crucial for quickly learning the missing covariate values and their uncertainty in a real-time decision making context, in particular when there is substantial correlation in the posterior for the missing values. The approach is evaluated on simulated data and on data from the telecom sector. Our results show that the proposed Bayesian imputation gives substantially more accurate predictions than na\"ive imputation, and that the use of auxiliary variables in the imputation gives additional predictive power.

Proteins can exhibit dynamic structural flexibility as they carry out their functions, especially in binding regions that interact with other molecules. For the key SARS-CoV-2 spike protein that facilitates COVID-19 infection, studies have previously identified several such highly flexible regions with therapeutic importance. However, protein structures available from the Protein Data Bank are presented as static snapshots that may not adequately depict this flexibility, and furthermore these cannot keep pace with new mutations and variants. In this paper we present a sequential Monte Carlo method for broadly sampling the 3-D conformational space of protein structure, according to the Boltzmann distribution of a given energy function. Our approach is distinct from previous sampling methods that focus on finding the lowest-energy conformation for predicting a single stable structure. We exemplify our method on the SARS-CoV-2 omicron variant as an application of timely interest. Our results identify sequence positions 495-508 as a key region where omicron mutations have the most impact on the space of possible conformations, which coincides with the findings of other preliminary studies on the binding properties of the omicron variant.

生(sheng)成(cheng)流網絡(luo)(GFlowNets)在(zai)(zai)主動(dong)學習(xi)環境(jing)(jing)中采樣多(duo)樣化的(de)(de)(de)(de)(de)(de)候(hou)選集(ji)合(he)的(de)(de)(de)(de)(de)(de)一(yi)(yi)種(zhong)方法(fa),其訓練(lian)目標是(shi)(shi)使(shi)它(ta)們(men)(men)(men)按照給(gei)定的(de)(de)(de)(de)(de)(de)獎(jiang)勵(li)函數(shu)的(de)(de)(de)(de)(de)(de)比(bi)例近似抽樣。在(zai)(zai)本文中,我們(men)(men)(men)展(zhan)示了GFlowNets的(de)(de)(de)(de)(de)(de)一(yi)(yi)些額外的(de)(de)(de)(de)(de)(de)理論性質。它(ta)們(men)(men)(men)可以(yi)用來估計聯合(he)概率分布(bu)和相應的(de)(de)(de)(de)(de)(de)邊際分布(bu),其中一(yi)(yi)些變量(liang)是(shi)(shi)未指定的(de)(de)(de)(de)(de)(de),特別有(you)趣的(de)(de)(de)(de)(de)(de)是(shi)(shi),可以(yi)表(biao)示在(zai)(zai)集(ji)合(he)和圖(tu)等復合(he)對象上的(de)(de)(de)(de)(de)(de)分布(bu)。GFlowNets將通常由計算(suan)昂貴的(de)(de)(de)(de)(de)(de)MCMC方法(fa)所做(zuo)的(de)(de)(de)(de)(de)(de)工作攤銷在(zai)(zai)一(yi)(yi)個單一(yi)(yi)但經過訓練(lian)的(de)(de)(de)(de)(de)(de)生(sheng)成(cheng)過程(cheng)中。它(ta)們(men)(men)(men)也可以(yi)用來估計配分函數(shu)和自由能,給(gei)定子(zi)(zi)集(ji)(子(zi)(zi)圖(tu))的(de)(de)(de)(de)(de)(de)超(chao)(chao)集(ji)(超(chao)(chao)圖(tu))的(de)(de)(de)(de)(de)(de)條件概率,以(yi)及給(gei)定集(ji)(圖(tu))的(de)(de)(de)(de)(de)(de)所有(you)超(chao)(chao)集(ji)(超(chao)(chao)圖(tu))的(de)(de)(de)(de)(de)(de)邊際分布(bu)。我們(men)(men)(men)引入變量(liang),使(shi)熵和互信息的(de)(de)(de)(de)(de)(de)估計,從帕累托邊界取樣,連接到獎(jiang)勵(li)最大化策略,并擴展(zhan)到隨(sui)機環境(jing)(jing),連續(xu)行動(dong)和模塊能量(liang)函數(shu)。

//www.zhuanzhi.ai/paper/e6adb2c4a1bfad46e20524d5b2d10ef7

付費5元查看完整內容

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. In absence of reliable domain expertise, one might resort to inferring the latent graph structure, which is often difficult due to the vast search space of possible graphs. Here we introduce Pointer Graph Networks (PGNs) which augment sets or graphs with additional inferred edges for improved model generalisation ability. PGNs allow each node to dynamically point to another node, followed by message passing over these pointers. The sparsity of this adaptable graph structure makes learning tractable while still being sufficiently expressive to simulate complex algorithms. Critically, the pointing mechanism is directly supervised to model long-term sequences of operations on classical data structures, incorporating useful structural inductive biases from theoretical computer science. Qualitatively, we demonstrate that PGNs can learn parallelisable variants of pointer-based data structures, namely disjoint set unions and link/cut trees. PGNs generalise out-of-distribution to 5x larger test inputs on dynamic graph connectivity tasks, outperforming unrestricted GNNs and Deep Sets.

Learning low-dimensional representations for entities and relations in knowledge graphs using contrastive estimation represents a scalable and effective method for inferring connectivity patterns. A crucial aspect of contrastive learning approaches is the choice of corruption distribution that generates hard negative samples, which force the embedding model to learn discriminative representations and find critical characteristics of observed data. While earlier methods either employ too simple corruption distributions, i.e. uniform, yielding easy uninformative negatives or sophisticated adversarial distributions with challenging optimization schemes, they do not explicitly incorporate known graph structure resulting in suboptimal negatives. In this paper, we propose Structure Aware Negative Sampling (SANS), an inexpensive negative sampling strategy that utilizes the rich graph structure by selecting negative samples from a node's k-hop neighborhood. Empirically, we demonstrate that SANS finds high-quality negatives that are highly competitive with SOTA methods, and requires no additional parameters nor difficult adversarial optimization.

Social network analysis provides meaningful information about behavior of network members that can be used for diverse applications such as classification, link prediction. However, network analysis is computationally expensive because of feature learning for different applications. In recent years, many researches have focused on feature learning methods in social networks. Network embedding represents the network in a lower dimensional representation space with the same properties which presents a compressed representation of the network. In this paper, we introduce a novel algorithm named "CARE" for network embedding that can be used for different types of networks including weighted, directed and complex. Current methods try to preserve local neighborhood information of nodes, whereas the proposed method utilizes local neighborhood and community information of network nodes to cover both local and global structure of social networks. CARE builds customized paths, which are consisted of local and global structure of network nodes, as a basis for network embedding and uses the Skip-gram model to learn representation vector of nodes. Subsequently, stochastic gradient descent is applied to optimize our objective function and learn the final representation of nodes. Our method can be scalable when new nodes are appended to network without information loss. Parallelize generation of customized random walks is also used for speeding up CARE. We evaluate the performance of CARE on multi label classification and link prediction tasks. Experimental results on various networks indicate that the proposed method outperforms others in both Micro and Macro-f1 measures for different size of training data.

Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.

We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence and Neural Turing Machines, because the number of target classes in each step of the output depends on the length of the input, which is variable. Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class. Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output. We call this architecture a Pointer Net (Ptr-Net). We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems -- finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem -- using training examples alone. Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems.

北京阿比特科技有限公司