We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass by Gozlanet al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging the input distributions in a geometric way (as the Wasserstein barycenter based on classic optimal transport does) weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. The notion of weak barycenter and our approaches to compute it are illustrated on synthetic examples, validated on 2D real-world data and compared to standard Wasserstein barycenters.
In this paper, we present a low-diameter decomposition algorithm in the LOCAL model of distributed computing that succeeds with probability $1 - 1/poly(n)$. Specifically, we show how to compute an $\left(\epsilon, O\left(\frac{\log n}{\epsilon}\right)\right)$ low-diameter decomposition in $O\left(\frac{\log^3(1/\epsilon)\log n}{\epsilon}\right)$ round Further developing our techniques, we show new distributed algorithms for approximating general packing and covering integer linear programs in the LOCAL model. For packing problems, our algorithm finds an $(1-\epsilon)$-approximate solution in $O\left(\frac{\log^3 (1/\epsilon) \log n}{\epsilon}\right)$ rounds with probability $1 - 1/poly(n)$. For covering problems, our algorithm finds an $(1+\epsilon)$-approximate solution in $O\left(\frac{\left(\log \log n + \log (1/\epsilon)\right)^3 \log n}{\epsilon}\right)$ rounds with probability $1 - 1/poly(n)$. These results improve upon the previous $O\left(\frac{\log^3 n}{\epsilon}\right)$-round algorithm by Ghaffari, Kuhn, and Maus [STOC 2017] which is based on network decompositions. Our algorithms are near-optimal for many fundamental combinatorial graph optimization problems in the LOCAL model, such as minimum vertex cover and minimum dominating set, as their $(1\pm \epsilon)$-approximate solutions require $\Omega\left(\frac{\log n}{\epsilon}\right)$ rounds to compute.
We consider the problem to transport resources/mass while abiding by constraints on the flow through constrictions along their path between specified terminal distributions. Constrictions, conceptualized as toll stations at specified points, limit the flow rate across. We quantify flow-rate constraints via a bound on a sought probability density of the times that mass-elements cross toll stations and cast the transportation scheduling in a Kantorovich-type of formalism. Recent work by our team focused on the existence of Monge maps for similarly constrained transport minimizing average kinetic energy. The present formulation in this paper, besides being substantially more general, is cast as a (generalized) multi-marginal transport problem - a problem of considerable interest in modern-day machine learning literature and motivated extensive computational analyses. An enabling feature of our formalism is the representation of an average quadratic cost on the speed of transport as a convex constraint that involves crossing times.
In this paper, we find a sample complexity bound for learning a simplex from noisy samples. Assume a dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a $\ell_2$ distance of at most $\varepsilon$ from the true simplex (for any $\varepsilon>0$). Also, we theoretically show that in order to achieve this bound, it is sufficient to have $n\ge\left(K^2/\varepsilon^2\right)e^{\Omega\left(K/\mathrm{SNR}^2\right)}$ samples, where $\mathrm{SNR}$ stands for the signal-to-noise ratio. This result solves an important open problem and shows as long as $\mathrm{SNR}\ge\Omega\left(K^{1/2}\right)$, the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in \citep{ashtiani2018nearly}, mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.
The Kolmogorov $N$-width describes the best possible error one can achieve by elements of an $N$-dimensional linear space. Its decay has extensively been studied in Approximation Theory and for the solution of Partial Differential Equations (PDEs). Particular interest has occurred within Model Order Reduction (MOR) of parameterized PDEs e.g.\ by the Reduced Basis Method (RBM). While it is known that the $N$-width decays exponentially fast (and thus admits efficient MOR) for certain problems, there are examples of the linear transport and the wave equation, where the decay rate deteriorates to $N^{-1/2}$. On the other hand, it is widely accepted that a smooth parameter dependence admits a fast decay of the $N$-width. However, a detailed analysis of the influence of properties of the data (such as regularity or slope) on the rate of the $N$-width seems to lack. In this paper, we use techniques from Fourier Analysis to derive exact representations of the $N$-width in terms of initial and boundary conditions of the linear transport equation modeled by some function $g$ for half-wave symmetric data. For arbitrary functions $g$, we derive bounds and prove that these bounds are sharp. In particular, we prove that the $N$-width decays as $c_r N^{-(r+1/2)}$ for functions in the Sobolev space, $g\in H^r$. Our theoretical investigations are complemented by numerical experiments which confirm the sharpness of our bounds and give additional quantitative insight.
In this paper, we propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs). From a perspective of physical simulation, we redefine the problem of approximating the gradient flow utilizing optimal transport (i.e. Wasserstein) metric. In EBMs, the learning process of stepwise sampling and estimating data distribution performs the functional gradient of minimizing the global relative entropy between the current and target real distribution, which can be treated as dynamic particles moving from disorder to target manifold. Previous learning schemes mainly minimize the entropy concerning the consecutive time KL divergence in each learning step. However, they are prone to being stuck in the local KL divergence by projecting non-smooth information within smooth manifold, which is against the optimal transport principle. To solve this problem, we derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation. Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities. We also derive this near-proximal scheme and provide its numerical computation equations. Our extensive experiments demonstrate the practical superiority and potentials of our proposed scheme on fitting complex distributions and generating high-quality, high-dimensional data with neural EBMs.
We present a registration method for model reduction of parametric partial differential equations with dominating advection effects and moving features. Registration refers to the use of a parameter-dependent mapping to make the set of solutions to these equations more amicable for approximation using classical reduced basis methods. The proposed approach utilizes concepts from optimal transport theory, as we utilize Monge embeddings to construct these mappings in a purely data-driven way. The method relies on one interpretable hyper-parameter. We discuss how our approach relates to existing works that combine model order reduction and optimal transport theory. Numerical results are provided to demonstrate the effect of the registration. This includes a model problem where the solution is itself a probability density and one where it is not.
In optimal covariance cleaning theory, minimizing the Frobenius norm between the true population covariance matrix and a rotational invariant estimator is a key step. This estimator can be obtained asymptotically for large covariance matrices, without knowledge of the true covariance matrix. In this study, we demonstrate that this minimization problem is equivalent to minimizing the loss of information between the true population covariance and the rotational invariant estimator for normal multivariate variables. However, for Student's t distributions, the minimal Frobenius norm does not necessarily minimize the information loss in finite-sized matrices. Nevertheless, such deviations vanish in the asymptotic regime of large matrices, which might extend the applicability of random matrix theory results to Student's t distributions. These distributions are characterized by heavy tails and are frequently encountered in real-world applications such as finance, turbulence, or nuclear physics. Therefore, our work establishes a connection between statistical random matrix theory and estimation theory in physics, which is predominantly based on information theory.
Data valuation is a powerful framework for providing statistical insights into which data are beneficial or detrimental to model training. Many Shapley-based data valuation methods have shown promising results in various downstream tasks, however, they are well known to be computationally challenging as it requires training a large number of models. As a result, it has been recognized as infeasible to apply to large datasets. To address this issue, we propose Data-OOB, a new data valuation method for a bagging model that utilizes the out-of-bag estimate. The proposed method is computationally efficient and can scale to millions of data by reusing trained weak learners. Specifically, Data-OOB takes less than 2.25 hours on a single CPU processor when there are $10^6$ samples to evaluate and the input dimension is 100. Furthermore, Data-OOB has solid theoretical interpretations in that it identifies the same important data point as the infinitesimal jackknife influence function when two different points are compared. We conduct comprehensive experiments using 12 classification datasets, each with thousands of sample sizes. We demonstrate that the proposed method significantly outperforms existing state-of-the-art data valuation methods in identifying mislabeled data and finding a set of helpful (or harmful) data points, highlighting the potential for applying data values in real-world applications.
In 2022, over half of the web traffic was accessed through mobile devices. By reducing the energy consumption of mobile web apps, we can not only extend the battery life of our devices, but also make a significant contribution to energy conservation efforts. For example, if we could save only 5% of the energy used by web apps, we estimate that it would be enough to shut down one of the nuclear reactors in Fukushima. This paper presents a comprehensive overview of energy-saving experiments and related approaches for mobile web apps, relevant for researchers and practitioners. To achieve this objective, we conducted a systematic literature review and identified 44 primary studies for inclusion. Through the mapping and analysis of scientific papers, this work contributes: (1) an overview of the energy-draining aspects of mobile web apps, (2) a comprehensive description of the methodology used for the energy-saving experiments, and (3) a categorization and synthesis of various energy-saving approaches.
Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets. They need to compute node representations recursively from their neighbors. Current GCN training algorithms suffer from either high computational costs that grow exponentially with the number of layers, or high memory usage for loading the entire graph and node embeddings. In this paper, we propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training, hence greatly reducing time and memory complexities. We present theoretical analysis for L-GCN under the graph isomorphism framework, that L-GCN leads to as powerful GCNs as the more costly conventional training algorithm does, under mild conditions. We further propose L^2-GCN, which learns a controller for each layer that can automatically adjust the training epochs per layer in L-GCN. Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size, while maintaining comparable prediction performance. With the learned controller, L^2-GCN can further cut the training time in half. Our codes are available at //github.com/Shen-Lab/L2-GCN.