A级日本乱理伦片免费入口,99久久久无码国产精品69,日韩AV无码免费无禁无码,亚洲国产高清中文字幕高高清在线

Given a target distribution $\mu \propto e^{-\mathcal{H}}$ to sample from with Hamiltonian $\mathcal{H}$, in this paper we propose and analyze new Metropolis-Hastings sampling algorithms that target an alternative distribution $\mu^f_{1,\alpha,c} \propto e^{-\mathcal{H}^{f}_{1,\alpha,c}}$, where $\mathcal{H}^{f}_{1,\alpha,c}$ is a landscape-modified Hamiltonian which we introduce explicitly. The advantage of the Metropolis dynamics which targets $\pi^f_{1,\alpha,c}$ is that it enjoys reduced critical height described by the threshold parameter $c$, function $f$, and a penalty parameter $\alpha \geq 0$ that controls the state-dependent effect. First, we investigate the case of fixed $\alpha$ and propose a self-normalized estimator that corrects for the bias of sampling and prove asymptotic convergence results and Chernoff-type bound of the proposed estimator. Next, we consider the case of annealing the penalty parameter $\alpha$. We prove strong ergodicity and bounds on the total variation mixing time of the resulting non-homogeneous chain subject to appropriate assumptions on the decay of $\alpha$. We illustrate the proposed algorithms by comparing their mixing times with the original Metropolis dynamics on statistical physics models including the ferromagnetic Ising model on the hypercube or the complete graph and the $q$-state Potts model on the two-dimensional torus. In these cases, the mixing times of the classical Glauber dynamics are at least exponential in the system size as the critical height grows at least linearly with the size, while the proposed annealing algorithm, with appropriate choice of $f$, $c$, and annealing schedule on $\alpha$, mixes rapidly with at most polynomial dependence on the size. The crux of the proof harnesses on the important observation that the reduced critical height can be bounded independently of the size that gives rise to rapid mixing.

相關內容

可約的

關注 2

易處理的 · 類別 · Extensibility · MoDELS · 統計量 ·

2022 年 1 月 31 日

A tractable class of Multivariate Phase-type distributions for loss modeling

Martin Bladt

Phase-type (PH) distributions are a popular tool for the analysis of univariate risks in numerous actuarial applications. Their multivariate counterparts (MPH$^\ast$), however, have not seen such a proliferation, due to lack of explicit formulas and complicated estimation procedures. A simple construction of multivariate phase-type distributions -- mPH -- is proposed for the parametric description of multivariate risks, leading to models of considerable probabilistic flexibility and statistical tractability. The main idea is to start different Markov processes at the same state, and allow them to evolve independently thereafter, leading to dependent absorption times. By dimension augmentation arguments, this construction can be cast into the umbrella of MPH$^\ast$ class, but enjoys explicit formulas which the general specification lacks, including common measures of dependence. Moreover, it is shown that the class is still rich enough to be dense on the set of multivariate risks supported on the positive orthant, and it is the smallest known sub-class to have this property. In particular, the latter result provides a new short proof of the denseness of the MPH$^\ast$ class. In practice this means that the mPH class allows for modeling of bivariate risks with any given correlation or copula. We derive an EM algorithm for its statistical estimation, and illustrate it on bivariate insurance data. Extensions to more general settings are outlined.

多峰值 · 增廣拉格朗日法 · 單峰值 · 貝葉斯推斷 · INTERACT ·

2022 年 1 月 30 日

Multimodal Maximum Entropy Dynamic Games

Oswin So,Kyle Stachowicz,Evangelos A. Theodorou

from arxiv, Under review for RSS 2022. Supplementary Video: //youtu.be/7molN_Q38dk

Environments with multi-agent interactions often result a rich set of modalities of behavior between agents due to the inherent suboptimality of decision making processes when agents settle for satisfactory decisions. However, existing algorithms for solving these dynamic games are strictly unimodal and fail to capture the intricate multimodal behaviors of the agents. In this paper, we propose MMELQGames (Multimodal Maximum-Entropy Linear Quadratic Games), a novel constrained multimodal maximum entropy formulation of the Differential Dynamic Programming algorithm for solving generalized Nash equilibria. By formulating the problem as a certain dynamic game with incomplete and asymmetric information where agents are uncertain about the cost and dynamics of the game itself, the proposed method is able to reason about multiple local generalized Nash equilibria, enforce constraints with the Augmented Lagrangian framework and also perform Bayesian inference on the latent mode from past observations. We assess the efficacy of the proposed algorithm on two illustrative examples: multi-agent collision avoidance and autonomous racing. In particular, we show that only MMELQGames is able to effectively block a rear vehicle when given a speed disadvantage and the rear vehicle can overtake from multiple positions.

Integration · 離散化 · 可辨認的 · Continuity · Performer ·

2022 年 1 月 28 日

Efficient optimization-based quadrature for variational discretization of nonlocal problems

Marco Pasetto,Zhaoxiang Shen,Marta D'Elia,Xiaochuan Tian,Nathaniel Trask,David Kamensky

from arxiv, 59 pages, 21 figures

Casting nonlocal problems in variational form and discretizing them with the finite element (FE) method facilitates the use of nonlocal vector calculus to prove well-posedeness, convergence, and stability of such schemes. Employing an FE method also facilitates meshing of complicated domain geometries and coupling with FE methods for local problems. However, nonlocal weak problems involve the computation of a double-integral, which is computationally expensive and presents several challenges. In particular, the inner integral of the variational form associated with the stiffness matrix is defined over the intersections of FE mesh elements with a ball of radius $\delta$, where $\delta$ is the range of nonlocal interaction. Identifying and parameterizing these intersections is a nontrivial computational geometry problem. In this work, we propose a quadrature technique where the inner integration is performed using quadrature points distributed over the full ball, without regard for how it intersects elements, and weights are computed based on the generalized moving least squares method. Thus, as opposed to all previously employed methods, our technique does not require element-by-element integration and fully circumvents the computation of element-ball intersections. This paper considers one- and two-dimensional implementations of piecewise linear continuous FE approximations, focusing on the case where the element size h and the nonlocal radius $\delta$ are proportional, as is typical of practical computations. When boundary conditions are treated carefully and the outer integral of the variational form is computed accurately, the proposed method is asymptotically compatible in the limit of $h \sim \delta \to 0$, featuring at least first-order convergence in L^2 for all dimensions, using both uniform and nonuniform grids.

經驗分布 · 相互獨立的 · 變換 · 統計理論 ·

2022 年 1 月 28 日

The limiting spectral distribution of large dimensional general information-plus-noise type matrices

Huanchao Zhou,Zhidong Bai,Jiang Hu

Let $ X_{n} $ be $ n\times N $ random complex matrices, $R_{n}$ and $T_{n}$ be non-random complex matrices with dimensions $n\times N$ and $n\times n$, respectively. We assume that the entries of $ X_{n} $ are independent and identically distributed, $ T_{n} $ are nonnegative definite Hermitian matrices and $T_{n}R_{n}R_{n}^{*}= R_{n}R_{n}^{*}T_{n} $. The general information-plus-noise type matrices are defined by $C_{n}=\frac{1}{N}T_{n}^{\frac{1}{2}} \left( R_{n} +X_{n}\right) \left(R_{n}+X_{n}\right)^{*}T_{n}^{\frac{1}{2}} $. In this paper, we establish the limiting spectral distribution of the large dimensional general information-plus-noise type matrices $C_{n}$. Specifically, we show that as $n$ and $N$ tend to infinity proportionally, the empirical distribution of the eigenvalues of $C_{n}$ converges weakly to a non-random probability distribution, which is characterized in terms of a system of equations of its Stieltjes transform.

潛變量/隱變量 · MoDELS · 線性的 · 因子分解 · 分解的 ·

2022 年 1 月 27 日

Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays

?ukasz Kidziński,Francis K. C. Hui,David I. Warton,Trevor Hastie

Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood and then using a Newton method and Fisher scoring to learn the model parameters. Computationally, our method is noticeably faster and more stable, enabling GLLVM fits to much larger matrices than previously possible. We apply our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit and find that most of the variability can be explained with a handful of factors. We publish an easy-to-use implementation of our proposed fitting algorithm.

可辨認的 · MoDELS · 線性的 · 泛函 · 強連通圖 ·

2022 年 1 月 27 日

Input-output equations and identifiability of linear ODE models

Alexey Ovchinnikov,Gleb Pogudin,Peter Thompson

Structural identifiability is a property of a differential model with parameters that allows for the parameters to be determined from the model equations in the absence of noise. The method of input-output equations is one method for verifying structural identifiability. This method stands out in its importance because the additional insights it provides can be used to analyze and improve models. However, its complete theoretical grounds and applicability are still to be established. A subtlety and key for this method to work correctly is knowing whether the coefficients of these equations are identifiable. In this paper, to address this, we prove identifiability of the coefficients of input-output equations for types of differential models that often appear in practice, such as linear models with one output and linear compartment models in which, from each compartment, one can reach either a leak or an input. This shows that checking identifiability via input-output equations for these models is legitimate and, as we prove, that the field of identifiable functions is generated by the coefficients of the input-output equations. Finally, we exploit a connection between input-output equations and the transfer function matrix to show that, for a linear compartment model with an input and strongly connected graph, the field of all identifiable functions is generated by the coefficients of the transfer function matrix even if the initial conditions are generic.

泛化理論 · UniFormer · 未標記 · TOOLS · 可辨認的 ·

2021 年 10 月 17 日

Explaining generalization in deep learning: progress and fundamental limits

Vaishnavh Nagarajan

from arxiv, arXiv admin note: text overlap with arXiv:1902.04742

This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.

邊緣化 · 對率損失 · FAST · 線性分類 · Performer ·

2021 年 7 月 1 日

Fast Margin Maximization via Dual Acceleration

Ziwei Ji,Nathan Srebro,Matus Telgarsky

from arxiv, ICML 2021

We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$. This contrasts with a rate of $\mathcal{O}(1/\log(t))$ for standard gradient descent, and $\mathcal{O}(1/t)$ for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables.

離散化 · MoDELS · 樣本 · 似然 · 近似 ·

2021 年 6 月 6 日

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

Will Grathwohl,Kevin Swersky,Milad Hashemi,David Duvenaud,Chris J. Maddison

from arxiv, Energy-Based Models, Deep generative models, MCMC sampling

We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.