在线亚洲91SE亚洲综合在线_A级日本乱理伦片免费入口_日本视频免费观看的网站_成人免费A级黄毛片_AV在线免费看一区二区_日韩精品一区二区三在线观看_国产区成人综合色在线

In this work, we study the $k$-means cost function. Given a dataset $X \subseteq \mathbb{R}^d$ and an integer $k$, the goal of the Euclidean $k$-means problem is to find a set of $k$ centers $C \subseteq \mathbb{R}^d$ such that $\Phi(C, X) \equiv \sum_{x \in X} \min_{c \in C} ||x - c||^2$ is minimized. Let $\Delta(X,k) \equiv \min_{C \subseteq \mathbb{R}^d} \Phi(C, X)$ denote the cost of the optimal $k$-means solution. For any dataset $X$, $\Delta(X,k)$ decreases as $k$ increases. In this work, we try to understand this behaviour more precisely. For any dataset $X \subseteq \mathbb{R}^d$, integer $k \geq 1$, and a precision parameter $\varepsilon > 0$, let $L(X, k, \varepsilon)$ denote the smallest integer such that $\Delta(X, L(X, k, \varepsilon)) \leq \varepsilon \cdot \Delta(X,k)$. We show upper and lower bounds on this quantity. Our techniques generalize for the metric $k$-median problem in arbitrary metric spaces and we give bounds in terms of the doubling dimension of the metric. Finally, we observe that for any dataset $X$, we can compute a set $S$ of size $O \left(L(X, k, \varepsilon/c) \right)$ using $D^2$-sampling such that $\Phi(S,X) \leq \varepsilon \cdot \Delta(X,k)$ for some fixed constant $c$. We also discuss some applications of our bounds.

相關內容

代價函數

關注 104

在數學(xue)(xue)優化(hua)，統計(ji)學(xue)(xue)，計(ji)量經濟學(xue)(xue)，決策理論，機器學(xue)(xue)習和計(ji)算神經科學(xue)(xue)中，代(dai)價函數，又叫損失(shi)函數或(huo)成(cheng)本函數，它是將(jiang)一(yi)個或(huo)多(duo)個變量的事件閾(yu)值映(ying)射到直觀地表示與該事件。一(yi)個優化(hua)問題試圖(tu)最小化(hua)損失(shi)函數。目(mu)標函數是損失(shi)函數或(huo)其負(fu)值，在這(zhe)種情況下它將(jiang)被最大化(hua)。

SGD · 單峰值 · 隨機梯度下降 · 泛化理論 · MoDELS ·

2021 年 10 月 29 日

On the Double Descent of Random Features Models Trained with SGD

Fanghui Liu,Johan A. K. Suykens,Volkan Cevher

from arxiv, 34 pages, 2 figures. This version gives a refined estimation on B2 (one term in Bias)

We study generalization properties of random features (RF) regression in high dimensions optimized by stochastic gradient descent (SGD). In this regime, we derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting, and observe the double descent phenomenon both theoretically and empirically. Our analysis shows how to cope with multiple randomness sources of initialization, label noise, and data sampling (as well as stochastic gradients) with no closed-form solution, and also goes beyond the commonly-used Gaussian/spherical data assumption. Our theoretical results demonstrate that, with SGD training, RF regression still generalizes well for interpolation learning, and is able to characterize the double descent behavior by the unimodality of variance and monotonic decrease of bias. Besides, we also prove that the constant step-size SGD setting incurs no loss in convergence rate when compared to the exact minimal-norm interpolator, as a theoretical justification of using SGD in practice.

binary · 優化器 · 示例 · CC · SimPLe ·

2021 年 10 月 29 日

On the complexity of binary polynomial optimization over acyclic hypergraphs

Alberto Del Pia,Silvia Di Gregorio

In this work we advance the understanding of the fundamental limits of computation for Binary Polynomial Optimization (BPO), which is the problem of maximizing a given polynomial function over all binary points. In our main result we provide a novel class of BPO that can be solved efficiently both from a theoretical and computational perspective. In fact, we give a strongly polynomial-time algorithm for instances whose corresponding hypergraph is beta-acyclic. We note that the beta-acyclicity assumption is natural in several applications including relational database schemes and the lifted multicut problem on trees. Due to the novelty of our proving technique, we obtain an algorithm which is interesting also from a practical viewpoint. This is because our algorithm is very simple to implement and the running time is a polynomial of very low degree in the number of nodes and edges of the hypergraph. Our result completely settles the computational complexity of BPO over acyclic hypergraphs, since the problem is NP-hard on alpha-acyclic instances. Our algorithm can also be applied to any general BPO problem that contains beta-cycles. For these problems, the algorithm returns a smaller instance together with a rule to extend any optimal solution of the smaller instance to an optimal solution of the original instance.

Weight · 查準率/準確率 · CC · 代價 · 標量 ·

2021 年 10 月 29 日

Computing Lewis Weights to High Precision

Maryam Fazel,Yin Tat Lee,Swati Padmanabhan,Aaron Sidford

from arxiv, 24 pages

We present an algorithm for computing approximate $\ell_p$ Lewis weights to high precision. Given a full-rank $\mathbf{A} \in \mathbb{R}^{m \times n}$ with $m \geq n$ and a scalar $p>2$, our algorithm computes $\epsilon$-approximate $\ell_p$ Lewis weights of $\mathbf{A}$ in $\widetilde{O}_p(\log(1/\epsilon))$ iterations; the cost of each iteration is linear in the input size plus the cost of computing the leverage scores of $\mathbf{D}\mathbf{A}$ for diagonal $\mathbf{D} \in \mathbb{R}^{m \times m}$. Prior to our work, such a computational complexity was known only for $p \in (0, 4)$ [CohenPeng2015], and combined with this result, our work yields the first polylogarithmic-depth polynomial-work algorithm for the problem of computing $\ell_p$ Lewis weights to high precision for all constant $p > 0$. An important consequence of this result is also the first polylogarithmic-depth polynomial-work algorithm for computing a nearly optimal self-concordant barrier for a polytope.

不可約的 · 向量化 · Sphering · 單位向量 · 子空間 ·

2021 年 10 月 29 日

A Remark on Random Vectors and Irreducible Representations

Alexander Kushkuley

It was observed in [1] that the expectation of a squared scalar product of two random independent unit vectors that are uniformly distributed on the unit sphere in $R^n $ is equal to $1/n$. It is shown in this paper, that this is a characteristic property of random vectors defined on invariant probability subspaces of unit spheres in irreducible real representations of compact Lie groups.

自由能 · 配分函數 · 近似 · CC · 泛函 ·

2021 年 10 月 29 日

On the complexity of quantum partition functions

Sergey Bravyi,Anirban Chowdhury,David Gosset,Pawel Wocjan

The partition function and free energy of a quantum many-body system determine its physical properties in thermal equilibrium. Here we study the computational complexity of approximating these quantities for $n$-qubit local Hamiltonians. First, we report a classical algorithm with $\mathrm{poly}(n)$ runtime which approximates the free energy of a given $2$-local Hamiltonian provided that it satisfies a certain denseness condition. Our algorithm combines the variational characterization of the free energy and convex relaxation methods. It contributes to a body of work on efficient approximation algorithms for dense instances of optimization problems which are hard in the general case, and can be viewed as simultaneously extending existing algorithms for (a) the ground energy of dense $2$-local Hamiltonians, and (b) the free energy of dense classical Ising models. Secondly, we establish polynomial-time equivalence between the problem of approximating the free energy of local Hamiltonians and three other natural quantum approximate counting problems, including the problem of approximating the number of witness states accepted by a QMA verifier. These results suggest that simulation of quantum many-body systems in thermal equilibrium may precisely capture the complexity of a broad family of computational problems that has yet to be defined or characterized in terms of known complexity classes. Finally, we summarize state-of-the-art classical and quantum algorithms for approximating the free energy and show how to improve their runtime and memory footprint.

有向 · INFORMS · CASES · CASE · 模型評估 ·

2021 年 10 月 28 日

On computing root polynomials and minimal bases of matrix pencils

Vanni Noferini,Paul Van Dooren

We revisit the notion of root polynomials, thoroughly studied in [F. Dopico and V. Noferini, Root polynomials and their role in the theory of matrix polynomials, Linear Algebra Appl. 584:37--78, 2020] for general polynomial matrices, and show how they can efficiently be computed in the case of matrix pencils. The staircase algorithm implicitly computes so-called zero directions, as defined in [P. Van Dooren, Computation of zero directions of transfer functions, Proceedings IEEE 32nd CDC, 3132--3137, 1993]. However, zero directions generally do not provide the correct information on partial multiplicities and minimal indices. These indices are instead provided by two special cases of zero directions, namely, root polynomials and vectors of a minimal basis of the pencil. We show how to extract, starting from the block triangular pencil that the staircase algorithm computes, both a minimal basis and a maximal set of root polynomials in an efficient manner. Moreover, we argue that the accuracy of the computation of the root polynomials can be improved by making use of iterative refinement.

秩 · ILP · 單純形 · 可行 · 極大 ·

2021 年 10 月 28 日

Faster algorithm for counting of the integer points number in $Δ$-modular polyhedra

D. V. Gribanov,D. S. Malyshev

Let a polytope $\mathcal{P}$ be defined by one of the following ways: (i) $\mathcal{P} = \{x \in \mathbb{R}^n \colon A x \leq b\}$, where $A \in \mathbb{Z}^{(n+m) \times n}$, $b \in \mathbb{Z}^{(n+m)}$, and $rank(A) = n$, (ii) $\mathcal{P} = \{x \in \mathbb{R}_+^n \colon A x = b\}$, where $A \in \mathbb{Z}^{m \times n}$, $b \in \mathbb{Z}^{m}$, and $rank(A) = m$, and let all the rank minors of $A$ be bounded by $\Delta$ in the absolute values. We show that $|\mathcal{P} \cap \mathbb{Z}^n|$ can be computed with an algorithm, having the arithmetic complexity bound $$ O\bigl( \nu(d,m,\Delta) \cdot d^3 \cdot \Delta^4 \cdot \log(\Delta) \bigr), $$ where $d = \dim(\mathcal{P})$ and $\nu(d,m,\Delta)$ is the maximal possible number of vertices in a $d$-dimensional polytope $P$, defined by one of the systems above. Using the obtained result, we have the following arithmetical complexity bounds to compute $|P \cap \mathbb{Z}^n|$: 1) The bound $O(\frac{d}{m}+1)^m \cdot d^3 \cdot \Delta^4 \cdot \log(\Delta)$ that is polynomial on $d$ and $\Delta$, for any fixed $m$; 2) The bound $O\bigl(\frac{m}{d}+1\bigr)^{\frac{d}{2}} \cdot d^4 \cdot \Delta^4 \cdot \log(\Delta)$ that is polynomial on $m$ and $\Delta$, for any fixed $d$; 3) The bound $O(d)^{4 + \frac{d}{2}} \cdot \Delta^{4+d} \cdot \log(\Delta)$ that is polynomial on $\Delta$, for any fixed $d$. Given bounds can be used to obtain faster algorithms for the ILP feasibility problem, and for the problem to count integer points in a simplex or in an unbounded Subset-Sum polytope. Unbounded and parametric versions of the above problem are also considered.

優化器 · 穩健性 · CC · 近似 · Better ·

2021 年 10 月 28 日

On Robust Optimal Transport: Computational Complexity and Barycenter Computation

Khang Le,Huy Nguyen,Quang Nguyen,Tung Pham,Hung Bui,Nhat Ho

from arxiv, Advances in NeurIPS, 2021; 52 pages, 10 figures; Khang Le and Huy Nguyen contributed equally to this week

We consider robust variants of the standard optimal transport, named robust optimal transport, where marginal constraints are relaxed via Kullback-Leibler divergence. We show that Sinkhorn-based algorithms can approximate the optimal cost of robust optimal transport in $\widetilde{\mathcal{O}}(\frac{n^2}{\varepsilon})$ time, in which $n$ is the number of supports of the probability distributions and $\varepsilon$ is the desired error. Furthermore, we investigate a fixed-support robust barycenter problem between $m$ discrete probability distributions with at most $n$ number of supports and develop an approximating algorithm based on iterative Bregman projections (IBP). For the specific case $m = 2$, we show that this algorithm can approximate the optimal barycenter value in $\widetilde{\mathcal{O}}(\frac{mn^2}{\varepsilon})$ time, thus being better than the previous complexity $\widetilde{\mathcal{O}}(\frac{mn^2}{\varepsilon^2})$ of the IBP algorithm for approximating the Wasserstein barycenter.

優化器 · 正則化項 · 類別 · 分離的 · 約束 ·

2021 年 10 月 27 日

Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers

Jeffrey Negrea,Blair Bilodeau,Nicolò Campolongo,Francesco Orabona,Daniel M. Roy

from arxiv, 30 pages, 2 figures. Jeffrey Negrea and Blair Bilodeau are equal-contribution authors

Quantile (and, more generally, KL) regret bounds, such as those achieved by NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data. More recently, the semi-adversarial paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of adversarial online learning by considering data that may be neither fully adversarial nor stochastic (i.i.d.). We achieve the minimax optimal regret in both paradigms using FTRL with separate, novel, root-logarithmic regularizers, both of which can be interpreted as yielding variants of NormalHedge. We extend existing KL regret upper bounds, which hold uniformly over target distributions, to possibly uncountable expert classes with arbitrary priors; provide the first full-information lower bounds for quantile regret on finite expert classes (which are tight); and provide an adaptively minimax optimal algorithm for the semi-adversarial paradigm that adapts to the true, unknown constraint faster, leading to uniformly improved regret bounds over existing methods.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.