国产精品亚洲综合久久_精品一区一区二区国产在线观看_国产偷亚洲高清在线观看_国产午夜理论片不卡平9_中文字幕乱偷免费视_婷婷久久久亚洲一区二区三区_久久精品视频网站

We give lower bounds on the performance of two of the most popular sampling methods in practice, the Metropolis-adjusted Langevin algorithm (MALA) and multi-step Hamiltonian Monte Carlo (HMC) with a leapfrog integrator, when applied to well-conditioned distributions. Our main result is a nearly-tight lower bound of $\widetilde{\Omega}(\kappa d)$ on the mixing time of MALA from an exponentially warm start, matching a line of algorithmic results up to logarithmic factors and answering an open question of Chewi et. al. We also show that a polynomial dependence on dimension is necessary for the relaxation time of HMC under any number of leapfrog steps, and bound the gains achievable by changing the step count. Our HMC analysis draws upon a novel connection between leapfrog integration and Chebyshev polynomials, which may be of independent interest.

相關內容

采樣法(fa)

關注 0

binary · ONCE · 情景 · 樣例 · 有向 ·

2021 年 12 月 27 日

Sharp nonparametric bounds for decomposition effects with two binary mediators

Erin E Gabriel,Michael C Sachs,Arvid Sj?lander

In randomized trials, once the total effect of the intervention has been estimated, it is often of interest to explore mechanistic effects through mediators along the causal pathway between the randomized treatment and the outcome. In the setting with two sequential mediators, there are a variety of decompositions of the total risk difference into mediation effects. We derive sharp and valid bounds for a number of mediation effects in the setting of two sequential mediators both with unmeasured confounding with the outcome. We provide five such bounds in the main text corresponding to two different decompositions of the total effect, as well as the controlled direct effect, with an additional thirty novel bounds provided in the supplementary materials corresponding to the terms of twenty-four four-way decompositions. We also show that, although it may seem that one can produce sharp bounds by adding or subtracting the limits of the sharp bounds for terms in a decomposition, this almost always produces valid, but not sharp bounds that can even be completely noninformative. We investigate the properties of the bounds by simulating random probability distributions under our causal model and illustrate how they are interpreted in a real data example.

估計/估計量 · 極大值 · 泛函 · AIM · 規范化的 ·

2021 年 12 月 27 日

Non-parametric estimator of a multivariate madogram for missing-data and extreme value framework

Alexis Boulin,Elena Di Bernardino,Thomas Lalo?,Gwladys Toulemonde

from arxiv, 29 pages, 4 figures

The modeling of dependence between maxima is an important subject in several applications in risk analysis. To this aim, the extreme value copula function, characterised via the madogram, can be used as a margin-free description of the dependence structure. From a practical point of view, the family of extreme value distributions is very rich and arise naturally as the limiting distribution of properly normalised component-wise maxima. In this paper, we investigate the nonparametric estimation of the madogram where data are completely missing at random. We provide the functional central limit theorem for the considered multivariate madrogram correctly normalized, towards a tight Gaussian process for which the covariance function depends on the probabilities of missing. Explicit formula for the asymptotic variance is also given. Our results are illustrated in a finite sample setting with a simulation study.

Better · contrastive · 納什均衡 · Processing（編程語言） · 情景 ·

2021 年 12 月 27 日

Coalitional Permutation Manipulations in the Gale-Shapley Algorithm

Weiran Shen,Yuan Deng,Pingzhong Tang

In this paper, we consider permutation manipulations by any subset of women in the men-proposing version of the Gale-Shapley algorithm. This paper is motivated by the college admissions process in China. Our results also answer an open problem on what can be achieved by permutation manipulations. We present an efficient algorithm to find a strategy profile such that the induced matching is stable and Pareto-optimal (in the set of all achievable stable matchings) while the strategy profile itself is inconspicuous. Surprisingly, we show that such a strategy profile actually forms a Nash equilibrium of the manipulation game. In the end, we show that it is NP-complete to find a manipulation that is strictly better for all members of the coalition. This result demonstrates a sharp contrast between weakly better off outcomes and strictly better-off outcomes.

直徑 · Pair · 可約的 · Sphering · 規范化的 ·

2021 年 12 月 24 日

Asymptotic Bounds on the Combinatorial Diameter of Random Polytopes

Gilles Bonnet,Daniel Dadush,Uri Grupel,Sophie Huiberts,Galyna Livshyts

from arxiv, 29 pages, 2 figures

The combinatorial diameter $\operatorname{diam}(P)$ of a polytope $P$ is the maximum shortest path distance between any pair of vertices. In this paper, we provide upper and lower bounds on the combinatorial diameter of a random "spherical" polytope, which is tight to within one factor of dimension when the number of inequalities is large compared to the dimension. More precisely, for an $n$-dimensional polytope $P$ defined by the intersection of $m$ i.i.d.\ half-spaces whose normals are chosen uniformly from the sphere, we show that $\operatorname{diam}(P)$ is $\Omega(n m^{\frac{1}{n-1}})$ and $O(n^2 m^{\frac{1}{n-1}} + n^5 4^n)$ with high probability when $m \geq 2^{\Omega(n)}$. For the upper bound, we first prove that the number of vertices in any fixed two dimensional projection sharply concentrates around its expectation when $m$ is large, where we rely on the $\Theta(n^2 m^{\frac{1}{n-1}})$ bound on the expectation due to Borgwardt [Math. Oper. Res., 1999]. To obtain the diameter upper bound, we stitch these ``shadows paths'' together over a suitable net using worst-case diameter bounds to connect vertices to the nearest shadow. For the lower bound, we first reduce to lower bounding the diameter of the dual polytope $P^\circ$, corresponding to a random convex hull, by showing the relation $\operatorname{diam}(P) \geq (n-1)(\operatorname{diam}(P^\circ)-2)$. We then prove that the shortest path between any ``nearly'' antipodal pair vertices of $P^\circ$ has length $\Omega(m^{\frac{1}{n-1}})$.

近似 · 等式約束 · 單純形 · 泛函 · 約束 ·

2021 年 12 月 23 日

Bounds-constrained polynomial approximation using the Bernstein basis

Larry Allen,Robert C. Kirby

from arxiv, 20 pages, 3 figures

A fundamental problem in numerical analysis and approximation theory is approximating smooth functions by polynomials. A much harder version under recent consideration is to enforce bounds constraints on the approximating polynomial. In this paper, we consider the problem of approximating functions by polynomials whose Bernstein coefficients with respect to a given degree satisfy such bounds, which implies such bounds on the approximant. We frame the problem as an inequality-constrained optimization problem and give an algorithm for finding the Bernstein coefficients of the exact solution. Additionally, our method can be modified slightly to include equality constraints such as mass preservation. It also extends naturally to multivariate polynomials over a simplex.

離散化 · MoDELS · 樣本 · 似然 · 近似 ·

2021 年 6 月 6 日

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

Will Grathwohl,Kevin Swersky,Milad Hashemi,David Duvenaud,Chris J. Maddison

from arxiv, Energy-Based Models, Deep generative models, MCMC sampling

We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.

泛化理論 · Extensibility · state-of-the-art · 測試數據 · 學成 ·

2021 年 4 月 16 日

Deep Stable Learning for Out-Of-Distribution Generalization

Xingxuan Zhang,Peng Cui,Renzhe Xu,Linjun Zhou,Yue He,Zheyan Shen

Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution, but can significantly fail otherwise. Therefore, eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models. Conventional methods assume either the known heterogeneity of training data (e.g. domain labels) or the approximately equal capacities of different domains. In this paper, we consider a more challenging case where neither of the above assumptions holds. We propose to address this problem by removing the dependencies between features via learning weights for training samples, which helps deep models get rid of spurious correlations and, in turn, concentrate more on the true connection between discriminative features and labels. Extensive experiments clearly demonstrate the effectiveness of our method on multiple distribution generalization benchmarks compared with state-of-the-art counterparts. Through extensive experiments on distribution generalization benchmarks including PACS, VLCS, MNIST-M, and NICO, we show the effectiveness of our method compared with state-of-the-art counterparts.

contrastive · 監督 · 優化器 · 線性的 · 學成 ·

2021 年 2 月 17 日

Dissecting Supervised Constrastive Learning

Florian Graf,Christoph D. Hofer,Marc Niethammer,Roland Kwitt

Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks. However, recent works show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective. In this work, we address the question whether there are fundamental differences in the sought-for representation geometry in the output space of the encoder at minimal loss. Specifically, we prove, under mild assumptions, that both losses attain their minimum once the representations of each class collapse to the vertices of a regular simplex, inscribed in a hypersphere. We provide empirical evidence that this configuration is attained in practice and that reaching a close-to-optimal state typically indicates good generalization performance. Yet, the two losses show remarkably different optimization behavior. The number of iterations required to perfectly fit to data scales superlinearly with the amount of randomly flipped labels for the supervised contrastive loss. This is in contrast to the approximately linear scaling previously reported for networks trained with cross-entropy.

優化器 · Lipschitz連續 · 正則化項 · Continuity · Lipschitz ·

2018 年 6 月 1 日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Kevin Scaman,Francis Bach,Sébastien Bubeck,Yin Tat Lee,Laurent Massoulié

from arxiv, 17 pages

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.