午夜剧场成年免费视,国产女做A性色精品视频免费,国产在线欧美日韩久精品,亚洲婷婷综合中文字幕第一页,精品久久久久久久久电影网

We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution. We establish its optimal minimax mixing time under a warm start. Our main contribution is two-fold. First, for a $d$-dimensional log-concave density with condition number $\kappa$, we show that MALA with a warm start mixes in $\tilde O(\kappa \sqrt{d})$ iterations up to logarithmic factors. This improves upon the previous work on the dependency of either the condition number $\kappa$ or the dimension $d$. Our proof relies on comparing the leapfrog integrator with the continuous Hamiltonian dynamics, where we establish a new concentration bound for the acceptance rate. Second, we prove a spectral gap based mixing time lower bound for reversible MCMC algorithms on general state spaces. We apply this lower bound result to construct a hard distribution for which MALA requires at least $\tilde \Omega (\kappa \sqrt{d})$ steps to mix. The lower bound for MALA matches our upper bound in terms of condition number and dimension. Finally, numerical experiments are included to validate our theoretical results.

相關內容

混合時間

關注 0

估計/估計量 · 極小點 · 貪心 · 分解的 · 圖 ·

2021 年 11 月 17 日

Time-Optimal Sublinear Algorithms for Matching and Vertex Cover

Soheil Behnezhad

from arxiv, This is the full version of a FOCS 2021 paper under the same title

We study the problem of estimating the size of maximum matching and minimum vertex cover in sublinear time. Denoting the number of vertices by $n$ and the average degree in the graph by $\bar{d}$, we obtain the following results for both problems: * A multiplicative $(2+\epsilon)$-approximation that takes $\tilde{O}(n/\epsilon^2)$ time using adjacency list queries. * A multiplicative-additive $(2, \epsilon n)$-approximation in $\tilde{O}((\bar{d} + 1)/\epsilon^2)$ time using adjacency list queries. * A multiplicative-additive $(2, \epsilon n)$-approximation in $\tilde{O}(n/\epsilon^{3})$ time using adjacency matrix queries. All three results are provably time-optimal up to polylogarithmic factors culminating a long line of work on these problems. Our main contribution and the key ingredient leading to the bounds above is a new and near-tight analysis of the average query complexity of the randomized greedy maximal matching algorithm which improves upon a seminal result of Yoshida, Yamamoto, and Ito [STOC'09].

混合時間 · 相互獨立的 · 混合 · MoDELS · 馬爾可夫鏈 ·

2021 年 11 月 16 日

Rapid mixing of Glauber dynamics via spectral independence for all degrees

Xiaoyu Chen,Weiming Feng,Yitong Yin,Xinyuan Zhang

We prove an optimal $\Omega(n^{-1})$ lower bound on the spectral gap of Glauber dynamics for anti-ferromagnetic two-spin systems with $n$ vertices in the tree uniqueness regime. This spectral gap holds for all, including unbounded, maximum degree $\Delta$. Consequently, we have the following mixing time bounds for the models satisfying the uniqueness condition with a slack $\delta\in(0,1)$: $\bullet$ $C(\delta) n^2\log n$ mixing time for the hardcore model with fugacity $\lambda\le (1-\delta)\lambda_c(\Delta)= (1-\delta)\frac{(\Delta - 1)^{\Delta - 1}}{(\Delta - 2)^\Delta}$; $\bullet$ $C(\delta) n^2$ mixing time for the Ising model with edge activity $\beta\in\left[\frac{\Delta-2+\delta}{\Delta-\delta},\frac{\Delta-\delta}{\Delta-2+\delta}\right]$; where the maximum degree $\Delta$ may depend on the number of vertices $n$, and $C(\delta)$ depends only on $\delta$. Our proof is built upon the recently developed connections between the Glauber dynamics for spin systems and the high-dimensional expander walks. In particular, we prove a stronger notion of spectral independence, called the complete spectral independence, and use a novel Markov chain called the field dynamics to connect this stronger spectral independence to the rapid mixing of Glauber dynamics for all degrees.

Color · 樣本 · 圖 · 混合 · 算法與數據結構 ·

2021 年 11 月 16 日

A Matrix Trickle-Down Theorem on Simplicial Complexes and Applications to Sampling Colorings

Dorna Abdolazimi,Kuikui Liu,Shayan Oveis Gharan

We show that the natural Glauber dynamics mixes rapidly and generates a random proper edge-coloring of a graph with maximum degree $\Delta$ whenever the number of colors is at least $q\geq (\frac{10}{3} + \epsilon)\Delta$, where $\epsilon>0$ is arbitrary and the maximum degree satisfies $\Delta \geq C$ for a constant $C = C(\epsilon)$ depending only on $\epsilon$. For edge-colorings, this improves upon prior work \cite{Vig99, CDMPP19} which show rapid mixing when $q\geq (\frac{11}{3}-\epsilon_0 ) \Delta$, where $\epsilon_0 \approx 10^{-5}$ is a small fixed constant. At the heart of our proof, we establish a matrix trickle-down theorem, generalizing Oppenheim's influential result, as a new technique to prove that a high dimensional simplical complex is a local spectral expander.

Extensibility · 情景 · Performer · 駐點 · 非凸 ·

2021 年 11 月 16 日

On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems

Tianyi Lin,Chi Jin,Michael I. Jordan

from arxiv, Accepted by ICML 2020; Improve the writing and fix an error in Proposition 4.11 and 4.12 by modifying Definition 4.10

We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set. One of the most popular algorithms for solving this problem is the celebrated gradient descent ascent (GDA) algorithm, which has been widely used in machine learning, control theory and economics. Despite the extensive convergence results for the convex-concave setting, GDA with equal stepsize can converge to limit cycles or even diverge in a general setting. In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. To the best our knowledge, this is the first nonasymptotic analysis for two-time-scale GDA in this setting, shedding light on its superior practical performance in training generative adversarial networks (GANs) and other real applications.

估計/估計量 · 統計量 · 線性的 · 相互獨立的 · 平滑 ·

2021 年 11 月 15 日

Properties of linear spectral statistics of frequency-smoothed estimated spectral coherence matrix of high-dimensional Gaussian time series

Philippe Loubaton,Alexis Rosuel

from arxiv, arXiv admin note: substantial text overlap with arXiv:2007.08806

The asymptotic behaviour of Linear Spectral Statistics (LSS) of the smoothed periodogram estimator of the spectral coherency matrix of a complex Gaussian high-dimensional time series $(\y_n)_{n \in \mathbb{Z}}$ with independent components is studied under the asymptotic regime where the sample size $N$ converges towards $+\infty$ while the dimension $M$ of $\y$ and the smoothing span of the estimator grow to infinity at the same rate in such a way that $\frac{M}{N} \rightarrow 0$. It is established that, at each frequency, the estimated spectral coherency matrix is close from the sample covariance matrix of an independent identically $\mathcal{N}_{\mathbb{C}}(0,\I_M)$ distributed sequence, and that its empirical eigenvalue distribution converges towards the Marcenko-Pastur distribution. This allows to conclude that each LSS has a deterministic behaviour that can be evaluated explicitly. Using concentration inequalities, it is shown that the order of magnitude of the supremum over the frequencies of the deviation of each LSS from its deterministic approximation is of the order of $\frac{1}{M} + \frac{\sqrt{M}}{N}+ (\frac{M}{N})^{3}$ where $N$ is the sample size. Numerical simulations supports our results.

解碼 · contrastive · 隨機變量 · 優化器 · 代價 ·

2021 年 11 月 14 日

Signaling Games for Log-Concave Distributions: Number of Bins and Properties of Equilibria

Ertan Kaz?kl?,Serkan Sar?ta?,Sinan Gezici,Tamás Linder,Serdar Yüksel

from arxiv, 27 pages and 1 figure. arXiv admin note: text overlap with arXiv:1901.06738

We investigate the equilibrium behavior for the decentralized cheap talk problem for real random variables and quadratic cost criteria in which an encoder and a decoder have misaligned objective functions. In prior work, it has been shown that the number of bins in any equilibrium has to be countable, generalizing a classical result due to Crawford and Sobel who considered sources with density supported on $[0,1]$. In this paper, we first refine this result in the context of log-concave sources. For sources with two-sided unbounded support, we prove that, for any finite number of bins, there exists a unique equilibrium. In contrast, for sources with semi-unbounded support, there may be a finite upper bound on the number of bins in equilibrium depending on certain conditions stated explicitly. Moreover, we prove that for log-concave sources, the expected costs of the encoder and the decoder in equilibrium decrease as the number of bins increases. Furthermore, for strictly log-concave sources with two-sided unbounded support, we prove convergence to the unique equilibrium under best response dynamics which starts with a given number of bins, making a connection with the classical theory of optimal quantization and convergence results of Lloyd's method. In addition, we consider more general sources which satisfy certain assumptions on the tail(s) of the distribution and we show that there exist equilibria with infinitely many bins for sources with two-sided unbounded support. Further explicit characterizations are provided for sources with exponential, Gaussian, and compactly-supported probability distributions.

估計/估計量 · MCMC · 配分函數 · CC · 混合時間 ·

2021 年 11 月 14 日

Fast Doubly-Adaptive MCMC to Estimate the Gibbs Partition Function with Weak Mixing Time Bounds

Shahrzad Haddadan,Yue Zhuang,Cyrus Cousins,Eli Upfal

from arxiv, A short version of this paper will appear inthe 35th Conference on NeuralInformation Processing Systems, NeurIPS 2021

We present a novel method for reducing the computational complexity of rigorously estimating the partition functions (normalizing constants) of Gibbs (Boltzmann) distributions, which arise ubiquitously in probabilistic graphical models. A major obstacle to practical applications of Gibbs distributions is the need to estimate their partition functions. The state of the art in addressing this problem is multi-stage algorithms, which consist of a cooling schedule, and a mean estimator in each step of the schedule. While the cooling schedule in these algorithms is adaptive, the mean estimation computations use MCMC as a black-box to draw approximate samples. We develop a doubly adaptive approach, combining the adaptive cooling schedule with an adaptive MCMC mean estimator, whose number of Markov chain steps adapts dynamically to the underlying chain. Through rigorous theoretical analysis, we prove that our method outperforms the state of the art algorithms in several factors: (1) The computational complexity of our method is smaller; (2) Our method is less sensitive to loose bounds on mixing times, an inherent component in these algorithms; and (3) The improvement obtained by our method is particularly significant in the most challenging regime of high-precision estimation. We demonstrate the advantage of our method in experiments run on classic factor graphs, such as voting models and Ising models.

統計量 · 估計/估計量 · 優化器 · 稀疏 · Extensibility ·

2021 年 11 月 12 日

Distributed Sparse Regression via Penalization

Yao Ji,Gesualdo Scutari,Ying Sun,Harsha Honnappa

from arxiv, 63 pages, journal publication

We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been extensively studied in the optimization literature, their statistical and computational guarantees in the high dimensional setting remain unclear. This work provides an answer to this open problem. Our contribution is two-fold. First, we establish statistical consistency of the estimator: under a suitable choice of the penalty parameter, the optimal solution of the penalized problem achieves near optimal minimax rate $\mathcal{O}(s \log d/N)$ in $\ell_2$-loss, where $s$ is the sparsity value, $d$ is the ambient dimension, and $N$ is the total sample size in the network -- this matches centralized sample rates. Second, we show that the proximal-gradient algorithm applied to the penalized problem, which naturally leads to distributed implementations, converges linearly up to a tolerance of the order of the centralized statistical error -- the rate scales as $\mathcal{O}(d)$, revealing an unavoidable speed-accuracy dilemma.Numerical results demonstrate the tightness of the derived sample rate and convergence rate scalings.

優化器 · Lipschitz連續 · 正則化項 · Continuity · Lipschitz ·

2018 年 6 月 1 日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Kevin Scaman,Francis Bach,Sébastien Bubeck,Yin Tat Lee,Laurent Massoulié

from arxiv, 17 pages

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.