91婷婷国产精选国产色_日韩一区国产二区不卡_日韩精品中文字幕一区二区中文_欧美久久精品午夜青青大伊人_久久久久久久精品伊人亚洲_少妇人妻系列无码专区按摩_波多野亚州一区二区三区四区

Persistence diagrams (PD)s play a central role in topological data analysis. This analysis requires computing distances among such diagrams such as the 1-Wasserstein distance. Accurate computation of these PD distances for large data sets that render large diagrams may not scale appropriately with the existing methods. The main source of difficulty ensues from the size of the bipartite graph on which a matching needs to be computed for determining these PD distances. We address this problem by making several algorithmic and computational observations in order to obtain an approximation. First, taking advantage of the proximity of PD points, we condense them thereby decreasing the number of nodes in the graph for computation. The increase in point multiplicities is addressed by reducing the matching problem to a min-cost flow problem on a transshipment network. Second, we use Well Separated Pair Decomposition to sparsify the graph to a size that is linear in the number of points. Both node and arc sparsifications contribute to the approximation factor where we leverage a lower bound given by the Relaxed Word Mover's distance. Third, we eliminate bottlenecks during the sparsification procedure by introducing parallelism. Fourth, we develop an open source software called PDoptFlow based on our algorithm, exploiting parallelism by GPU and multicore. We perform extensive experiments and show that the actual empirical error is very low. We also show that we can achieve high performance at low guaranteed relative errors, improving upon the state of the arts.

相關內容

稀疏化

關注 0

線性的 · 漢明距離 · 極小點 · 分解的 · 噪聲 ·

2021 年 12 月 29 日

Edge-Isoperimetric Inequalities and Ball-Noise Stability: Linear Programming and Probabilistic Approaches

Lei Yu

from arxiv, 31 pages, no figures

Let $Q_{n}^{r}$ be the graph with vertex set $\{-1,1\}^{n}$ in which two vertices are joined if their Hamming distance is at most $r$. The edge-isoperimetric problem for $Q_{n}^{r}$ is that: For every $(n,r,M)$ such that $1\le r\le n$ and $1\le M\le2^{n}$, determine the minimum edge-boundary size of a subset of vertices of $Q_{n}^{r}$ with a given size $M$. In this paper, we apply two different approaches to prove bounds for this problem. The first approach is a linear programming approach and the second is a probabilistic approach. Our bound derived by the first approach generalizes the tight bound for $M=2^{n-1}$ derived by Kahn, Kalai, and Linial in 1989. Moreover, our bound is also tight for $M=2^{n-2}$ and $r\le\frac{n}{2}-1$. Our bounds derived by the second approach are expressed in terms of the \emph{noise stability}, and they are shown to be asymptotically tight as $n\to\infty$ when $r=2\lfloor\frac{\beta n}{2}\rfloor+1$ and $M=\lfloor\alpha2^{n}\rfloor$ for fixed $\alpha,\beta\in(0,1)$, and is tight up to a factor $2$ when $r=2\lfloor\frac{\beta n}{2}\rfloor$ and $M=\lfloor\alpha2^{n}\rfloor$. In fact, the edge-isoperimetric problem is equivalent to a ball-noise stability problem which is a variant of the traditional (i.i.d.-) noise stability problem. Our results can be interpreted as bounds for the ball-noise stability problem.

Sphering · 相互獨立的 · 線性的 · 數值分析 ·

2021 年 12 月 28 日

The minimal spherical dispersion

Joscha Prochno,Daniel Rudolf

from arxiv, 12 pages

In this paper we prove upper and lower bounds on the minimal spherical dispersion. In particular, we see that the inverse $N(\varepsilon,d)$ of the minimal spherical dispersion is, for fixed $\varepsilon>0$, up to logarithmic terms linear in the dimension $d$. We also derive upper and lower bounds on the expected dispersion for points chosen independently and uniformly at random from the Euclidean unit sphere.

Fisher信息矩陣 · 精度矩陣 · 分解的 · INFORMS · 查準率/準確率 ·

2021 年 12 月 28 日

Explicit natural gradient updates for Cholesky factor in Gaussian variational approximation

Linda S. L. Tan

from arxiv, 19 pages

Stochastic gradient methods have enabled variational inference for high-dimensional models and large data. However, the steepest ascent direction in the parameter space of a statistical model is given not by the commonly used Euclidean gradient, but the natural gradient which premultiplies the Euclidean gradient by the inverted Fisher information matrix. Use of natural gradients can improve convergence significantly, but inverting the Fisher information matrix is daunting in high-dimensions. In Gaussian variational approximation, natural gradient updates of the natural parameters (expressed in terms of the mean and precision matrix) of the Gaussian distribution can be derived analytically, but do not ensure the precision matrix remains positive definite. To tackle this issue, we consider Cholesky decomposition of the covariance or precision matrix and derive explicit natural gradient updates of the Cholesky factor by finding the inverse of the Fisher information matrix analytically. Natural gradient updates of the Cholesky factor as compared to natural parameters, depend only on the first instead of the second derivative of the log posterior density and reduces computational cost. Sparsity constraints incorporating posterior independence structure can be imposed by fixing relevant entries in the Cholesky factor to zero.

核化 · 核矩陣 · 近似 · anchor · 數據集 ·

2021 年 12 月 28 日

Fast deterministic approximation of symmetric indefinite kernel matrices with high dimensional datasets

Difeng Cai,James Nagy,Yuanzhe Xi

from arxiv, 27 pages, 16 figures

Kernel methods are used frequently in various applications of machine learning. For large-scale high dimensional applications, the success of kernel methods hinges on the ability to operate certain large dense kernel matrix K. An enormous amount of literature has been devoted to the study of symmetric positive semi-definite (SPSD) kernels, where Nystrom methods compute a low-rank approximation to the kernel matrix via choosing landmark points. In this paper, we study the Nystrom method for approximating both symmetric indefinite kernel matrices as well SPSD ones. We first develop a theoretical framework for general symmetric kernel matrices, which provides a theoretical guidance for the selection of landmark points. We then leverage discrepancy theory to propose the anchor net method for computing accurate Nystrom approximations with optimal complexity. The anchor net method operates entirely on the dataset without requiring the access to $K$ or its matrix-vector product. Results on various types of kernels (both indefinite and SPSD ones) and machine learning datasets demonstrate that the new method achieves better accuracy and stability with lower computational cost compared to the state-of-the-art Nystrom methods.

奇異值分解 · 穩健性 · 圖片分類 · Integration · 模型評估 ·

2021 年 4 月 8 日

Robust Differentiable SVD

Wei Wang,Zheng Dang,Yinlin Hu,Pascal Fua,Mathieu Salzmann

from arxiv, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) PREPRINT 2021

Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. However, the derivatives of the eigenvectors tend to be numerically unstable, whether using the SVD to compute them analytically or using the Power Iteration (PI) method to approximate them. This instability arises in the presence of eigenvalues that are close to each other. This makes integrating eigendecomposition into deep networks difficult and often results in poor convergence, particularly when dealing with large matrices. While this can be mitigated by partitioning the data into small arbitrary groups, doing so has no theoretical basis and makes it impossible to exploit the full power of eigendecomposition. In previous work, we mitigated this using SVD during the forward pass and PI to compute the gradients during the backward pass. However, the iterative deflation procedure required to compute multiple eigenvectors using PI tends to accumulate errors and yield inaccurate gradients. Here, we show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying in practice on an iterative process and thus yields more accurate gradients. We demonstrate the benefits of this increased accuracy for image classification and style transfer.

采樣法 · 方差 · 圖形處理器 · INFORMS · 泛化理論 ·

2020 年 6 月 24 日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Weilin Cong,Rana Forsati,Mahmut Kandemir,Mehrdad Mahdavi

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

判別器 · GANs · 模式崩潰 · IPM · 多樣性 ·

2018 年 6 月 27 日

Approximability of Discriminators Implies Diversity in GANs

Yu Bai,Tengyu Ma,Andrej Risteski

While Generative Adversarial Networks (GANs) have empirically produced impressive results on learning complex real-world distributions, recent work has shown that they suffer from lack of diversity or mode collapse. The theoretical work of Arora et al.~\cite{AroraGeLiMaZh17} suggests a dilemma about GANs' statistical properties: powerful discriminators cause overfitting, whereas weak discriminators cannot detect mode collapse. In contrast, we show in this paper that GANs can in principle learn distributions in Wasserstein distance (or KL-divergence in many cases) with polynomial sample complexity, if the discriminator class has strong distinguishing power against the particular generator class (instead of against all possible generators). For various generator classes such as mixture of Gaussians, exponential families, and invertible neural networks generators, we design corresponding discriminators (which are often neural nets of specific architectures) such that the Integral Probability Metric (IPM) induced by the discriminators can provably approximate the Wasserstein distance and/or KL-divergence. This implies that if the training is successful, then the learned distribution is close to the true distribution in Wasserstein distance or KL divergence, and thus cannot drop modes. Our preliminary experiments show that on synthetic datasets the test IPM is well correlated with KL divergence, indicating that the lack of diversity may be caused by the sub-optimality in optimization instead of statistical inefficiency.

優化器 · Lipschitz連續 · 正則化項 · Continuity · Lipschitz ·

2018 年 6 月 1 日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Kevin Scaman,Francis Bach,Sébastien Bubeck,Yin Tat Lee,Laurent Massoulié

from arxiv, 17 pages

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

最大平均偏差 · 優化器 · Performer · CASES · tuning ·

2018 年 1 月 30 日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Ben Usman,Kate Saenko,Brian Kulis

from arxiv, ICLR 2018 Conference Invite to Workshop

Methods that align distributions by minimizing an adversarial distance between them have recently achieved impressive results. However, these approaches are difficult to optimize with gradient descent and they often do not converge well without careful hyperparameter tuning and proper initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment and explore its connections to Maximum Mean Discrepancy. Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem on digits. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.