国产一区二区高清无码_日韩精品大片一区二区三区四区_经典三级一区二区三区视频_国产又爽又黄又硬视频免费观看_欧美精品国产一区二区_日韩伦理中文字幕_欧美三级片视频在线播放

Key to effective generic, or "black-box", variational inference is the selection of an approximation to the target density that balances accuracy and calibration speed. Copula models are promising options, but calibration of the approximation can be slow for some choices. Smith et al. (2020) suggest using "implicit copula" models that are formed by element-wise transformation of the target parameters. We show here why these are a tractable and scalable choice, and propose adjustments to increase their accuracy. We also show how a sub-class of elliptical copulas have a generative representation that allows easy application of the re-parameterization trick and efficient first order optimization methods. We demonstrate the estimation methodology using two statistical models as examples. The first is a mixed effects logistic regression, and the second is a regularized correlation matrix. For the latter, standard Markov chain Monte Carlo estimation methods can be slow or difficult to implement, yet our proposed variational approach provides an effective and scalable estimator. We illustrate by estimating a regularized Gaussian copula model for income inequality in U.S. states between 1917 and 2018.

相關內容

估計/估計量

關注 3

MCMC · 推斷 · Processing（編程語言） · 向量空間 · 視覺識別系統 ·

2022 年 1 月 19 日

Stein Variational Gaussian Processes

Thomas Pinder,Christopher Nemeth,David Leslie

from arxiv, 26 pages, 5 figures

We show how to use Stein variational gradient descent (SVGD) to carry out inference in Gaussian process (GP) models with non-Gaussian likelihoods and large data volumes. Markov chain Monte Carlo (MCMC) is extremely computationally intensive for these situations, but the parametric assumptions required for efficient variational inference (VI) result in incorrect inference when they encounter the multi-modal posterior distributions that are common for such models. SVGD provides a non-parametric alternative to variational inference which is substantially faster than MCMC. We prove that for GP models with Lipschitz gradients the SVGD algorithm monotonically decreases the Kullback-Leibler divergence from the sampling distribution to the true posterior. Our method is demonstrated on benchmark problems in both regression and classification, a multimodal posterior, and an air quality example with 550,134 spatiotemporal observations, showing substantial performance improvements over MCMC and VI.

推斷 · 馬爾可夫鏈蒙特卡羅 · 馬爾可夫鏈 · Performer · Better ·

2022 年 1 月 16 日

Constrained inference through posterior projections

Deborshee Sen,Sayan Patra,David Dunson

Bayesian approaches are appealing for constrained inference problems by allowing a probabilistic characterization of uncertainty, while providing a computational machinery for incorporating complex constraints in hierarchical models. However, the usual Bayesian strategy of placing a prior on the constrained space and conducting posterior computation with Markov chain Monte Carlo algorithms is often intractable. An alternative is to conduct inference for a less constrained posterior and project samples to the constrained space through a minimal distance mapping. We formalize and provide a unifying framework for such posterior projections. For theoretical tractability, we initially focus on constrained parameter spaces corresponding to closed and convex subsets of the original space. We then consider non-convex Stiefel manifolds. We provide a general formulation of projected posteriors in a Bayesian decision-theoretic framework. We show that asymptotic properties of the unconstrained posterior are transferred to the projected posterior, leading to asymptotically correct credible intervals. We demonstrate numerically that projected posteriors can have better performance that competitor approaches in real data examples.

Networking · 級聯 · 推斷 · 情景 · 極大 ·

2021 年 6 月 7 日

Network Inference and Influence Maximization from Samples

Wei Chen,Xiaoming Sun,Jialin Zhang,Zhijie Zhang

from arxiv, Accepted by ICML 2021

Influence maximization is the task of selecting a small number of seed nodes in a social network to maximize the spread of the influence from these seeds, and it has been widely investigated in the past two decades. In the canonical setting, the whole social network as well as its diffusion parameters is given as input. In this paper, we consider the more realistic sampling setting where the network is unknown and we only have a set of passively observed cascades that record the set of activated nodes at each diffusion step. We study the task of influence maximization from these cascade samples (IMS), and present constant approximation algorithms for this task under mild conditions on the seed set distribution. To achieve the optimization goal, we also provide a novel solution to the network inference problem, that is, learning diffusion parameters and the network structure from the cascade data. Comparing with prior solutions, our network inference algorithm requires weaker assumptions and does not rely on maximum-likelihood estimation and convex programming. Our IMS algorithms enhance the learning-and-then-optimization approach by allowing a constant approximation ratio even when the diffusion parameters are hard to learn, and we do not need any assumption related to the network structure or diffusion parameters.

貝葉斯推斷 · 正則化項 · 推斷 · 正則化 · 學成 ·

2020 年 12 月 3 日

Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

Kien Do,Truyen Tran,Svetha Venkatesh

from arxiv, Accepted to AAAI 2021

We propose two generic methods for improving semi-supervised learning (SSL). The first integrates weight perturbation (WP) into existing "consistency regularization" (CR) based methods. We implement WP by leveraging variational Bayesian inference (VBI). The second method proposes a novel consistency loss called "maximum uncertainty regularization" (MUR). While most consistency losses act on perturbations in the vicinity of each data point, MUR actively searches for "virtual" points situated beyond this region that cause the most uncertain class predictions. This allows MUR to impose smoothness on a wider area in the input-output manifold. Our experiments show clear improvements in classification errors of various CR based methods when they are combined with VBI or MUR or both.

環 · MAML · 優化器 · 學成 · 小樣本學習 ·

2019 年 9 月 10 日

Meta-Learning with Implicit Gradients

Aravind Rajeswaran,Chelsea Finn,Sham Kakade,Sergey Levine

from arxiv, NeurIPS 2019. First two authors contributed equally

A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.

似然 · 估計/估計量 · 最大似然估計 · 極大似然 · MoDELS ·

2018 年 9 月 24 日

Implicit Maximum Likelihood Estimation

Ke Li,Jitendra Malik

from arxiv, 21 pages, 4 figures. In the interest of promoting discussion, we make the reviews available at //people.eecs.berkeley.edu/~ke.li/papers/imle_reviews.pdf

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.

概率圖模型 · 圖 · 推斷 · GM · 信念傳播 ·

2018 年 5 月 25 日

Inference in Probabilistic Graphical Models by Graph Neural Networks

KiJung Yoon,Renjie Liao,Yuwen Xiong,Lisa Zhang,Ethan Fetaya,Raquel Urtasun,Richard Zemel,Xaq Pitkow

A fundamental computation for statistical inference and accurate decision-making is to compute the marginal probabilities or most probable states of task-relevant variables. Probabilistic graphical models can efficiently represent the structure of such complex data, but performing these inferences is generally difficult. Message-passing algorithms, such as belief propagation, are a natural way to disseminate evidence amongst correlated variables while exploiting the graph structure, but these algorithms can struggle when the conditional dependency graphs contain loops. Here we use Graph Neural Networks (GNNs) to learn a message-passing algorithm that solves these inference tasks. We first show that the architecture of GNNs is well-matched to inference tasks. We then demonstrate the efficacy of this inference approach by training GNNs on a collection of graphical models and showing that they substantially outperform belief propagation on loopy graphs. Our message-passing algorithms generalize out of the training set to larger graphs and graphs with different structure.

entity · 推斷 · Performer · 向量空間 · Pair ·

2018 年 3 月 17 日

Variational Knowledge Graph Reasoning

Wenhu Chen,Wenhan Xiong,Xifeng Yan,William Wang

from arxiv, Accepted to NAACL 2018

Inferring missing links in knowledge graphs (KG) has attracted a lot of attention from the research community. In this paper, we tackle a practical query answering task involving predicting the relation of a given entity pair. We frame this prediction problem as an inference problem in a probabilistic graphical model and aim at resolving it from a variational inference perspective. In order to model the relation between the query entity pair, we assume that there exist underlying latent variables (assemble of all paths connecting these two nodes) in the KG, which carries the equivalent semantics of their relation. However, due to the intractability of connections in large KGs, we propose to use variation inference to maximize the evidence lower bound. More specifically, our framework (\textsc{Diva}) is composed of three modules, i.e. a posterior approximator, a prior (path finder), and a likelihood (path reasoner). By using variational inference, we are able to incorporate them closely into a unified architecture and jointly optimize them to perform KG reasoning. With active interactions among these sub-modules, \textsc{Diva} is better at handling noise and cope with more complex reasoning scenarios. In order to evaluate our method, we conduct the experiment of the link prediction task on NELL-995 and FB15K datasets and achieve state-of-the-art performances on both datasets.

推斷 · 近似 · 近似推斷 · 自編碼器 · 變分自編碼 ·

2018 年 1 月 10 日

Inference Suboptimality in Variational Autoencoders

Chris Cremer,Xuechen Li,David Duvenaud

Amortized inference has led to efficient approximate inference for large datasets. The quality of posterior inference is largely determined by two factors: a) the ability of the variational distribution to model the true posterior and b) the capacity of the recognition network to generalize inference over all datapoints. We analyze approximate inference in variational autoencoders in terms of these factors. We find that suboptimal inference is often due to amortizing inference rather than the limited complexity of the approximating distribution. We show that this is due partly to the generator learning to accommodate the choice of approximation. Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation.