人人操人人莫人人草,久久婷婷人人喊人人泡人人爽

In this paper, we focus on the analysis of the regularized Wasserstein barycenter problem. We provide uniqueness and a characterization of the barycenter for two important classes of probability measures: (i) Gaussian distributions and (ii) $q$-Gaussian distributions; each regularized by a particular entropy functional. We propose an algorithm based on gradient projection method in the space of matrices in order to compute these regularized barycenters. We also consider a general class of $\varphi$-exponential measures, for which only the non-regularized barycenter is studied. Finally, we numerically show the influence of parameters and stability of the algorithm under small perturbation of data.

相關內容

正則化項

關注 0

隱變量 · 估計/估計量 · 可辨認的 · MoDELS · GM ·

2021 年 11 月 29 日

Semiparametric Inference For Causal Effects In Graphical Models With Hidden Variables

Rohit Bhattacharya,Razieh Nabi,Ilya Shpitser

from arxiv, 75 pages

Identification theory for causal effects in causal models associated with hidden variable directed acyclic graphs (DAGs) is well studied. However, the corresponding algorithms are underused due to the complexity of estimating the identifying functionals they output. In this work, we bridge the gap between identification and estimation of population-level causal effects involving a single treatment and a single outcome. We derive influence function based estimators that exhibit double robustness for the identified effects in a large class of hidden variable DAGs where the treatment satisfies a simple graphical criterion; this class includes models yielding the adjustment and front-door functionals as special cases. We also provide necessary and sufficient conditions under which the statistical model of a hidden variable DAG is nonparametrically saturated and implies no equality constraints on the observed data distribution. Further, we derive an important class of hidden variable DAGs that imply observed data distributions observationally equivalent (up to equality constraints) to fully observed DAGs. In these classes of DAGs, we derive estimators that achieve the semiparametric efficiency bounds for the target of interest where the treatment satisfies our graphical criterion. Finally, we provide a sound and complete identification algorithm that directly yields a weight based estimation strategy for any identifiable effect in hidden variable causal models.

示例 · Weight · 相互獨立的 · 圖 · 優化器 ·

2021 年 11 月 29 日

Bilu-Linial stability, certified algorithms and the Independent Set problem

Haris Angelidakis,Pranjal Awasthi,Avrim Blum,Vaggos Chatziafratis,Chen Dan

from arxiv, Funding and affiliation corrections. Full version of work that appeared in ESA 2019

We study the Maximum Independent Set (MIS) problem under the notion of stability introduced by Bilu and Linial (2010): a weighted instance of MIS is $\gamma$-stable if it has a unique optimal solution that remains the unique optimum under multiplicative perturbations of the weights by a factor of at most $\gamma\geq 1$. The goal then is to efficiently recover the unique optimal solution. In this work, we solve stable instances of MIS on several graphs classes: we solve $\widetilde{O}(\Delta/\sqrt{\log \Delta})$-stable instances on graphs of maximum degree $\Delta$, $(k - 1)$-stable instances on $k$-colorable graphs and $(1 + \varepsilon)$-stable instances on planar graphs. For general graphs, we present a strong lower bound showing that there are no efficient algorithms for $O(n^{\frac{1}{2} - \varepsilon})$-stable instances of MIS, assuming the planted clique conjecture. We also give an algorithm for $(\varepsilon n)$-stable instances. As a by-product of our techniques, we give algorithms and lower bounds for stable instances of Node Multiway Cut. Furthermore, we prove a general result showing that the integrality gap of convex relaxations of several maximization problems reduces dramatically on stable instances. Moreover, we initiate the study of certified algorithms, a notion recently introduced by Makarychev and Makarychev (2018), which is a class of $\gamma$-approximation algorithms that satisfy one crucial property: the solution returned is optimal for a perturbation of the original instance. We obtain $\Delta$-certified algorithms for MIS on graphs of maximum degree $\Delta$, and $(1+\varepsilon)$-certified algorithms on planar graphs. Finally, we analyze the algorithm of Berman and Furer (1994) and prove that it is a $\left(\frac{\Delta + 1}{3} + \varepsilon\right)$-certified algorithm for MIS on graphs of maximum degree $\Delta$ where all weights are equal to 1.

混合分布 · binary · SimPLe · 平穩的 · 優化器 ·

2021 年 11 月 29 日

Hypothesis Testing of Mixture Distributions using Compressed Data

Minh Thanh Vu

from arxiv, 33 pages

In this paper we revisit the binary hypothesis testing problem with one-sided compression. Specifically we assume that the distribution in the null hypothesis is a mixture distribution of iid components. The distribution under the alternative hypothesis is a mixture of products of either iid distributions or finite order Markov distributions with stationary transition kernels. The problem is studied under the Neyman-Pearson framework in which our main interest is the maximum error exponent of the second type of error. We derive the optimal achievable error exponent and under a further sufficient condition establish the maximum $\epsilon$-achievable error exponent. It is shown that to obtain the latter, the study of the exponentially strong converse is needed. Using a simple code transfer argument we also establish new results for the Wyner-Ahlswede-K{\"o}rner problem in which the source distribution is a mixture of iid components.

估計/估計量 · 經驗熵 · 可交換的 · 子采樣 · GROUP ·

2021 年 11 月 28 日

Limit theorems for invariant distributions

Morgane Austern,Peter Orbanz

A distributional symmetry is invariance of a distribution under a group of transformations. Exchangeability and stationarity are examples. We explain that a result of ergodic theory provides a law of large numbers: If the group satisfies suitable conditions, expectations can be estimated by averaging over subsets of transformations, and these estimators are strongly consistent. We show that, if a mixing condition holds, the averages also satisfy a central limit theorem, a Berry-Esseen bound, and concentration. These are extended further to apply to triangular arrays, to randomly subsampled averages, and to a generalization of U-statistics. As applications, we obtain new results on exchangeability, random fields, network models, and a class of marked point processes. We also establish asymptotic normality of the empirical entropy for a large class of processes. Some known results are recovered as special cases, and can hence be interpreted as an outcome of symmetry. The proofs adapt Stein's method.

簇 · 優化器 · 特征變換 · 損失函數（機器學習） · Performer ·

2021 年 11 月 27 日

Efficient Clustering for Stretched Mixtures: Landscape and Optimality

Kaizheng Wang,Yuling Yan,Mateo Díaz

from arxiv, 36 pages

This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and k-means require individual components of the mixture to be somewhat spherical, and perform poorly when they are stretched. To overcome this issue, we propose a non-convex program seeking for an affine transform to turn the data into a one-dimensional point cloud concentrating around $-1$ and $1$, after which clustering becomes easy. Our theoretical contributions are two-fold: (1) we show that the non-convex loss function exhibits desirable geometric properties when the sample size exceeds some constant multiple of the dimension, and (2) we leverage this to prove that an efficient first-order algorithm achieves near-optimal statistical precision without good initialization. We also propose a general methodology for clustering with flexible choices of feature transforms and loss objectives.

Neural Networks · Networking · 邊緣化 · AIM · Weight ·

2021 年 11 月 27 日

Scalable Computations of Wasserstein Barycenter via Input Convex Neural Networks

Jiaojiao Fan,Amirhossein Taghvaei,Yongxin Chen

from arxiv, 21 pages

Wasserstein Barycenter is a principled approach to represent the weighted mean of a given set of probability distributions, utilizing the geometry induced by optimal transport. In this work, we present a novel scalable algorithm to approximate the Wasserstein Barycenters aiming at high-dimensional applications in machine learning. Our proposed algorithm is based on the Kantorovich dual formulation of the Wasserstein-2 distance as well as a recent neural network architecture, input convex neural network, that is known to parametrize convex functions. The distinguishing features of our method are: i) it only requires samples from the marginal distributions; ii) unlike the existing approaches, it represents the Barycenter with a generative model and can thus generate infinite samples from the barycenter without querying the marginal distributions; iii) it works similar to Generative Adversarial Model in one marginal case. We demonstrate the efficacy of our algorithm by comparing it with the state-of-art methods in multiple experiments.

MoDELS · Extensibility · 易處理的 · INFORMS · 多峰值 ·

2021 年 11 月 26 日

Phase-type distributions for claim severity regression modeling

Martin Bladt

This paper addresses the task of modeling severity losses using segmentation when the data distribution does not fall into the usual regression frameworks. This situation is not uncommon in lines of business such as third-party liability insurance, where heavy-tails and multimodality often hamper a direct statistical analysis. We propose to use regression models based on phase-type distributions, regressing on their underlying inhomogeneous Markov intensity and using an extension of the EM algorithm. These models are interpretable and tractable in terms of multi-state processes and generalize the proportional hazards specification when the dimension of the state space is larger than one. We show that the combination of matrix parameters, inhomogeneity transforms, and covariate information provides flexible regression models that effectively capture the entire distribution of loss severities.

微分熵 · 估計/估計量 · Continuity · INFORMS · 概率密度函數 ·

2021 年 11 月 24 日

On the Estimation of Information Measures of Continuous Distributions

Georg Pichler,Pablo Piantanida,Günther Koliander

from arxiv, 20 pages

The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.

圖 · 學成 · 劃分 · 優化器 · state-of-the-art ·

2019 年 10 月 9 日

Scalable Gromov-Wasserstein Learning for Graph Partitioning and Matching

Hongteng Xu,Dixin Luo,Lawrence Carin

from arxiv, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

We propose a scalable Gromov-Wasserstein learning (S-GWL) method and establish a novel and theoretically-supported paradigm for large-scale graph analysis. The proposed method is based on the fact that Gromov-Wasserstein discrepancy is a pseudometric on graphs. Given two graphs, the optimal transport associated with their Gromov-Wasserstein discrepancy provides the correspondence between their nodes and achieves graph matching. When one of the graphs has isolated but self-connected nodes ($i.e.$, a disconnected graph), the optimal transport indicates the clustering structure of the other graph and achieves graph partitioning. Using this concept, we extend our method to multi-graph partitioning and matching by learning a Gromov-Wasserstein barycenter graph for multiple observed graphs; the barycenter graph plays the role of the disconnected graph, and since it is learned, so is the clustering. Our method combines a recursive $K$-partition mechanism with a regularized proximal gradient algorithm, whose time complexity is $\mathcal{O}(K(E+V)\log_K V)$ for graphs with $V$ nodes and $E$ edges. To our knowledge, our method is the first attempt to make Gromov-Wasserstein discrepancy applicable to large-scale graph analysis and unify graph partitioning and matching into the same framework. It outperforms state-of-the-art graph partitioning and matching methods, achieving a trade-off between accuracy and efficiency.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.