The Gromov--Hausdorff distance measures the difference in shape between compact metric spaces. While even approximating the distance up to any practical factor poses an NP-hard problem, its relaxations have proven useful for the problems in geometric data analysis, including on point clouds, manifolds, and graphs. We investigate the modified Gromov--Hausdorff distance, a relaxation of the standard distance that retains many of its theoretical properties, which includes their topological equivalence on a rich set of families of metric spaces. We show that the two distances are Lipschitz-equivalent on any family of metric spaces of uniformly bounded size, but that the equivalence does not hold in general, not even when the distances are restricted to ultrametric spaces. We additionally prove that the standard and the modified Gromov--Hausdorff distances are either equal or within a factor of 2 from each other when taken to a regular simplex, which connects the relaxation to some well-known problems in discrete geometry.
Wasserstein dictionary learning is an unsupervised approach to learning a collection of probability distributions that generate observed distributions as Wasserstein barycentric combinations. Existing methods for Wasserstein dictionary learning optimize an objective that seeks a dictionary with sufficient representation capacity via barycentric interpolation to approximate the observed training data, but without imposing additional structural properties on the coefficients associated to the dictionary. This leads to dictionaries that densely represent the observed data, which makes interpretation of the coefficients challenging and may also lead to poor empirical performance when using the learned coefficients in downstream tasks. In contrast and motivated by sparse dictionary learning in Euclidean spaces, we propose a geometrically sparse regularizer for Wasserstein space that promotes representations of a data point using only nearby dictionary elements. We show this approach leads to sparse representations in Wasserstein space and addresses the problem of non-uniqueness of barycentric representation. Moreover, when data is generated as Wasserstein barycenters of fixed distributions, this regularizer facilitates the recovery of the generating distributions in cases that are ill-posed for unregularized Wasserstein dictionary learning. Through experimentation on synthetic and real data, we show that our geometrically regularized approach yields sparser and more interpretable dictionaries in Wasserstein space, which perform better in downstream applications.
In this paper, we investigate the Gaussian graphical model inference problem in a novel setting that we call erose measurements, referring to irregularly measured or observed data. For graphs, this results in different node pairs having vastly different sample sizes which frequently arises in data integration, genomics, neuroscience, and sensor networks. Existing works characterize the graph selection performance using the minimum pairwise sample size, which provides little insights for erosely measured data, and no existing inference method is applicable. We aim to fill in this gap by proposing the first inference method that characterizes the different uncertainty levels over the graph caused by the erose measurements, named GI-JOE (Graph Inference when Joint Observations are Erose). Specifically, we develop an edge-wise inference method and an affiliated FDR control procedure, where the variance of each edge depends on the sample sizes associated with corresponding neighbors. We prove statistical validity under erose measurements, thanks to careful localized edge-wise analysis and disentangling the dependencies across the graph. Finally, through simulation studies and a real neuroscience data example, we demonstrate the advantages of our inference methods for graph selection from erosely measured data.
To investigate the structure of individual differences in performance on behavioral tasks, Haaf and Rouder (2017) developed a class of hierarchical Bayesian mixed models with varying levels of constraint on the individual effects. The models are then compared via Bayes factors, telling us which model best predicts the observed data. One common criticism of their method is that the observed data are assumed to be drawn from a normal distribution. However, for most cognitive tasks, the primary measure of performance is a response time, the distribution of which is well known to not be normal. In this paper, I investigate the assumption of normality for two datasets in numerical cognition. Specifically, I show that using a shifted lognormal model for the response times does not change the overall pattern of inference. Further, since the model-estimated effects are now on a logarithmic scale, the interpretation of the modeling becomes more difficult, particularly because the estimated effect is now multiplicative rather than additive. As a result, I recommend that even though response times are not normally distributed in general, the simplification afforded by the Haaf and Rouder (2017) approach provides a pragmatic approach to modeling individual differences in behavioral tasks.
Recently, privacy issues in web services that rely on users' personal data have raised great attention. Unlike existing privacy-preserving technologies such as federated learning and differential privacy, we explore another way to mitigate users' privacy concerns, giving them control over their own data. For this goal, we propose a privacy aware recommendation framework that gives users delicate control over their personal data, including implicit behaviors, e.g., clicks and watches. In this new framework, users can proactively control which data to disclose based on the trade-off between anticipated privacy risks and potential utilities. Then we study users' privacy decision making under different data disclosure mechanisms and recommendation models, and how their data disclosure decisions affect the recommender system's performance. To avoid the high cost of real-world experiments, we apply simulations to study the effects of our proposed framework. Specifically, we propose a reinforcement learning algorithm to simulate users' decisions (with various sensitivities) under three proposed platform mechanisms on two datasets with three representative recommendation models. The simulation results show that the platform mechanisms with finer split granularity and more unrestrained disclosure strategy can bring better results for both end users and platforms than the "all or nothing" binary mechanism adopted by most real-world applications. It also shows that our proposed framework can effectively protect users' privacy since they can obtain comparable or even better results with much less disclosed data.
We develop an algorithmic framework for solving convex optimization problems using no-regret game dynamics. By converting the problem of minimizing a convex function into an auxiliary problem of solving a min-max game in a sequential fashion, we can consider a range of strategies for each of the two-players who must select their actions one after the other. A common choice for these strategies are so-called no-regret learning algorithms, and we describe a number of such and prove bounds on their regret. We then show that many classical first-order methods for convex optimization -- including average-iterate gradient descent, the Frank-Wolfe algorithm, the Heavy Ball algorithm, and Nesterov's acceleration methods -- can be interpreted as special cases of our framework as long as each player makes the correct choice of no-regret strategy. Proving convergence rates in this framework becomes very straightforward, as they follow from plugging in the appropriate known regret bounds. Our framework also gives rise to a number of new first-order methods for special cases of convex optimization that were not previously known.
Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -- tilted empirical risk minimization (TERM) -- which uses exponential tilting to flexibly tune the impact of individual losses. The resulting framework has several useful properties: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to the tail probability of losses. Our work makes rigorous connections between TERM and related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO). We develop batch and stochastic first-order optimization methods for solving TERM, provide convergence guarantees for the solvers, and show that the framework can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.
Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of known size). A metrical reconstruction is also needed for any application that measures distances and dimensions of the subject (e.g., to virtually fit a glasses frame). State-of-the-art methods for face reconstruction from a single image are trained on large 2D image datasets in a self-supervised fashion. However, due to the nature of a perspective projection they are not able to reconstruct the actual face dimensions, and even predicting the average human face outperforms some of these methods in a metrical sense. To learn the actual shape of a face, we argue for a supervised training scheme. Since there exists no large-scale 3D dataset for this task, we annotated and unified small- and medium-scale databases. The resulting unified dataset is still a medium-scale dataset with more than 2k identities and training purely on it would lead to overfitting. To this end, we take advantage of a face recognition network pretrained on a large-scale 2D image dataset, which provides distinct features for different faces and is robust to expression, illumination, and camera changes. Using these features, we train our face shape estimator in a supervised fashion, inheriting the robustness and generalization of the face recognition network. Our method, which we call MICA (MetrIC fAce), outperforms the state-of-the-art reconstruction methods by a large margin, both on current non-metric benchmarks as well as on our metric benchmarks (15% and 24% lower average error on NoW, respectively).
As interest in graph data has grown in recent years, the computation of various geometric tools has become essential. In some area such as mesh processing, they often rely on the computation of geodesics and shortest paths in discretized manifolds. A recent example of such a tool is the computation of Wasserstein barycenters (WB), a very general notion of barycenters derived from the theory of Optimal Transport, and their entropic-regularized variant. In this paper, we examine how WBs on discretized meshes relate to the geometry of the underlying manifold. We first provide a generic stability result with respect to the input cost matrices. We then apply this result to random geometric graphs on manifolds, whose shortest paths converge to geodesics, hence proving the consistency of WBs computed on discretized shapes.
We consider the following problem: for a given graph $G$ and two integers $k$ and $d$, can we apply a fixed graph operation at most $k$ times in order to reduce a given graph parameter $\pi$ by at least $d$? We show that this problem is NP-hard when the parameter is the independence number and the graph operation is vertex deletion or edge contraction, even for fixed $d=1$ and when restricted to chordal graphs. We give a polynomial time algorithm for bipartite graphs when the operation is edge contraction, the parameter is the independence number and $d$ is fixed. Further, we complete the complexity dichotomy on $H$-free graphs when the parameter is the clique number and the operation is edge contraction by showing that this problem is NP-hard in $(C_3+P_1)$-free graphs even for fixed $d=1$. When the operation is edge deletion and the parameter is the chromatic number, we determine the computational complexity of the associated problem on cographs and complete multipartite graphs. Our results answer several open questions stated in [Diner et al., Theoretical Computer Science, 746, p. 49-72 (2012)].
Generalized sliced Wasserstein distance is a variant of sliced Wasserstein distance that exploits the power of non-linear projection through a given defining function to better capture the complex structures of the probability distributions. Similar to sliced Wasserstein distance, generalized sliced Wasserstein is defined as an expectation over random projections which can be approximated by the Monte Carlo method. However, the complexity of that approximation can be expensive in high-dimensional settings. To that end, we propose to form deterministic and fast approximations of the generalized sliced Wasserstein distance by using the concentration of random projections when the defining functions are polynomial function, circular function, and neural network type function. Our approximations hinge upon an important result that one-dimensional projections of a high-dimensional random vector are approximately Gaussian.