黄片一级在线视频播放,亚洲综合在线观看一区二区三区,中文字幕在线观看黄色

Stein discrepancies have emerged as a powerful tool for retrospective improvement of Markov chain Monte Carlo output. However, the question of how to design Markov chains that are well-suited to such post-processing has yet to be addressed. This paper studies Stein importance sampling, in which weights are assigned to the states visited by a $\Pi$-invariant Markov chain to obtain a consistent approximation of $P$, the intended target. Surprisingly, the optimal choice of $\Pi$ is not identical to the target $P$; we therefore propose an explicit construction for $\Pi$ based on a novel variational argument. Explicit conditions for convergence of Stein $\Pi$-Importance Sampling are established. For $\approx 70\%$ of tasks in the PosteriorDB benchmark, a significant improvement over the analogous post-processing of $P$-invariant Markov chains is reported.

相關內容

Markov

關注 1

散度 · CASES · 樣本 · Learning · 可辨認的 ·

2023 年 7 月 3 日

Improved sampling via learned diffusions

Lorenz Richter,Julius Berner,Guan-Horng Liu

from arxiv, Accepted at ICML 2023 Workshop on New Frontiers in Learning, Control, and Dynamical Systems

Recently, a series of papers proposed deep learning-based approaches to sample from unnormalized target densities using controlled diffusion processes. In this work, we identify these approaches as special cases of the Schr\"odinger bridge problem, seeking the most likely stochastic evolution between a given prior distribution and the specified target. We further generalize this framework by introducing a variational formulation based on divergences between path space measures of time-reversed diffusion processes. This abstract perspective leads to practical losses that can be optimized by gradient-based algorithms and includes previous objectives as special cases. At the same time, it allows us to consider divergences other than the reverse Kullback-Leibler divergence that is known to suffer from mode collapse. In particular, we propose the so-called log-variance loss, which exhibits favorable numerical properties and leads to significantly improved performance across all considered approaches.

離散化 · 近似 · 噪聲 · Processing（編程語言） · Subspace ·

2023 年 7 月 3 日

Stochastic Transport with Lévy Noise -- Fully Discrete Numerical Approximation

Andrea Barth,Andreas Stein

Semilinear hyperbolic stochastic partial differential equations (SPDEs) find widespread applications in the natural and engineering sciences. However, the traditional Gaussian setting may prove too restrictive, as phenomena in mathematical finance, porous media, and pollution models often exhibit noise of a different nature. To capture temporal discontinuities and accommodate heavy-tailed distributions, Hilbert space-valued L\'evy processes or L\'evy fields are employed as driving noise terms. The numerical discretization of such SPDEs presents several challenges. The low regularity of the solution in space and time leads to slow convergence rates and instability in space/time discretization schemes. Furthermore, the L\'evy process can take values in an infinite-dimensional Hilbert space, necessitating projections onto finite-dimensional subspaces at each discrete time point. Additionally, unbiased sampling from the resulting L\'evy field may not be feasible. In this study, we introduce a novel fully discrete approximation scheme that tackles these difficulties. Our main contribution is a discontinuous Galerkin scheme for spatial approximation, derived naturally from the weak formulation of the SPDE. We establish optimal convergence properties for this approach and combine it with a suitable time stepping scheme to prevent numerical oscillations. Furthermore, we approximate the driving noise process using truncated Karhunen-Lo\`eve expansions. This approximation yields a sum of scaled and uncorrelated one-dimensional L\'evy processes, which can be simulated with controlled bias using Fourier inversion techniques.

Weight · 回合 · Learning · Continuity · CASES ·

2023 年 7 月 3 日

Meta Adaptation using Importance Weighted Demonstrations

Kiran Lekkala,Sami Abu-El-Haija,Laurent Itti

Imitation learning has gained immense popularity because of its high sample-efficiency. However, in real-world scenarios, where the trajectory distribution of most of the tasks dynamically shifts, model fitting on continuously aggregated data alone would be futile. In some cases, the distribution shifts, so much, that it is difficult for an agent to infer the new task. We propose a novel algorithm to generalize on any related task by leveraging prior knowledge on a set of specific tasks, which involves assigning importance weights to each past demonstration. We show experiments where the robot is trained from a diversity of environmental tasks and is also able to adapt to an unseen environment, using few-shot learning. We also developed a prototype robot system to test our approach on the task of visual navigation, and experimental results obtained were able to confirm these suppositions.

似然 · MoDELS · 相關系數 · 置信度 · 可辨認的 ·

2023 年 7 月 3 日

Profile likelihoods for parameters in Gaussian geostatistical models

Ruoyong Xu,Patrick Brown

from arxiv, 25 pages, more than 20 figures

Profile likelihoods are rarely used in geostatistical models due to the computational burden imposed by repeated decompositions of large variance matrices. Accounting for uncertainty in covariance parameters can be highly consequential in geostatistical models as some covariance parameters are poorly identified, the problem is severe enough that the differentiability parameter of the Matern correlation function is typically treated as fixed. The problem is compounded with anisotropic spatial models as there are two additional parameters to consider. In this paper, we make the following contributions: 1, A methodology is created for profile likelihoods for Gaussian spatial models with Mat\'ern family of correlation functions, including anisotropic models. This methodology adopts a novel reparametrization for generation of representative points, and uses GPUs for parallel profile likelihoods computation in software implementation. 2, We show the profile likelihood of the Mat\'ern shape parameter is often quite flat but still identifiable, it can usually rule out very small values. 3, Simulation studies and applications on real data examples show that profile-based confidence intervals of covariance parameters and regression parameters have superior coverage to the traditional standard Wald type confidence intervals.

樣本 · 平滑 · 吉布斯采樣/吉布斯抽樣 · 拒絕采樣 · CASES ·

2023 年 6 月 30 日

A Proximal Algorithm for Sampling

Jiaming Liang,Yongxin Chen

from arxiv, 25 pages

We study sampling problems associated with potentials that lack smoothness. The potentials can be either convex or non-convex. Departing from the standard smooth setting, the potentials are only assumed to be weakly smooth or non-smooth, or the summation of multiple such functions. We develop a sampling algorithm that resembles proximal algorithms in optimization for this challenging sampling task. Our algorithm is based on a special case of Gibbs sampling known as the alternating sampling framework (ASF). The key contribution of this work is a practical realization of the ASF based on rejection sampling for both non-convex and convex potentials that are not necessarily smooth. In almost all the cases of sampling considered in this work, our proximal sampling algorithm achieves better complexity than all existing methods.

樣本 · 優化器 · off-policy · Bandits · 向量空間 ·

2023 年 6 月 30 日

Thompson sampling for improved exploration in GFlowNets

Jarrid Rector-Brooks,Kanika Madan,Moksh Jain,Maksym Korablyov,Cheng-Hao Liu,Sarath Chandar,Nikolay Malkin,Yoshua Bengio

from arxiv, Structured Probabilistic Inference and Generative Modeling (SPIGM) workshop @ ICML 2023

Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.

權重規范化 · Weight · 規范化的 · 穩健性 · 有偏 ·

2023 年 6 月 30 日

Robust Implicit Regularization via Weight Normalization

Hung-Hsu Chou,Holger Rauhut,Rachel Ward

Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear networks, explaining to some extent why overparameterized neural network models trained by gradient descent tend to have good generalization performance in practice. However, existing theory for square-loss objectives often requires very small initialization of the trainable weights, which is at odds with the larger scale at which weights are initialized in practice for faster convergence and better generalization performance. In this paper, we aim to close this gap by incorporating and analyzing gradient descent with weight normalization, where the weight vector is reparamterized in terms of polar coordinates, and gradient descent is applied to the polar coordinates. By analyzing key invariants of the gradient flow and using Lojasiewicz's Theorem, we show that weight normalization also has an implicit bias towards sparse solutions in the diagonal linear model, but that in contrast to plain gradient descent, weight normalization enables a robust bias that persists even if the weights are initialized at practically large scale. Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization in overparameterized diagonal linear network models.

近似 · 估計/估計量 · Analysis · 散度 · 樣本 ·

2023 年 6 月 29 日

An Approximation Theory Framework for Measure-Transport Sampling Algorithms

Ricardo Baptista,Bamdad Hosseini,Nikola B. Kovachki,Youssef M. Marzouk,Amir Sagiv

This article presents a general approximation-theoretic framework to analyze measure transport algorithms for probabilistic modeling. A primary motivating application for such algorithms is sampling -- a central task in statistical inference and generative modeling. We provide a priori error estimates in the continuum limit, i.e., when the measures (or their densities) are given, but when the transport map is discretized or approximated using a finite-dimensional function space. Our analysis relies on the regularity theory of transport maps and on classical approximation theory for high-dimensional functions. A third element of our analysis, which is of independent interest, is the development of new stability estimates that relate the distance between two maps to the distance~(or divergence) between the pushforward measures they define. We present a series of applications of our framework, where quantitative convergence rates are obtained for practical problems using Wasserstein metrics, maximum mean discrepancy, and Kullback--Leibler divergence. Specialized rates for approximations of the popular triangular Kn{\"o}the-Rosenblatt maps are obtained, followed by numerical experiments that demonstrate and extend our theory.

離散化 · Performer · 表示學習 · 學成 · 向量化 ·

2021 年 6 月 10 日

Cross-Modal Discrete Representation Learning

Alexander H. Liu,SouYoung Jin,Cheng-I Jeff Lai,Andrew Rouditchenko,Aude Oliva,James Glass

from arxiv, Preprint

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector. In this work we present a self-supervised learning framework that is able to learn a representation that captures finer levels of granularity across different modalities such as concepts or events represented by visual objects or spoken words. Our framework relies on a discretized embedding space created via vector quantization that is shared across different modalities. Beyond the shared embedding space, we propose a Cross-Modal Code Matching objective that forces the representations from different views (modalities) to have a similar distribution over the discrete embedding space such that cross-modal objects/actions localization can be performed without direct supervision. In our experiments we show that the proposed discretized multi-modal fine-grained representation (e.g., pixel/word/frame) can complement high-level summary representations (e.g., video/sentence/waveform) for improved performance on cross-modal retrieval tasks. We also observe that the discretized representation uses individual clusters to represent the same semantic concept across modalities.

估計/估計量 · 圖 · 圖形處理器 · 結點 · Neural Networks ·

2019 年 5 月 21 日

Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks

Namyong Park,Andrey Kan,Xin Luna Dong,Tong Zhao,Christos Faloutsos

from arxiv, KDD 2019 Research Track

How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. While a number of approaches have been developed to address this problem for general graphs, they do not fully utilize information available in KGs, or lack flexibility needed to model complex relationship between entities and their importance. To address these limitations, we explore supervised machine learning algorithms. In particular, building upon recent advancement of graph neural networks (GNNs), we develop GENI, a GNN-based method designed to deal with distinctive challenges involved with predicting node importance in KGs. Our method performs an aggregation of importance scores instead of aggregating node embeddings via predicate-aware attention mechanism and flexible centrality adjustment. In our evaluation of GENI and existing methods on predicting node importance in real-world KGs with different characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.