In seminal work, Lov\'asz, Spencer, and Vesztergombi [European J. Combin., 1986] proved a lower bound for the hereditary discrepancy of a matrix $A \in \mathbb{R}^{m \times n}$ in terms of the maximum $|\det(B)|^{1/k}$ over all $k \times k$ submatrices $B$ of $A$. We show algorithmically that this determinant lower bound can be off by at most a factor of $O(\sqrt{\log (m) \cdot \log (n)})$, improving over the previous bound of $O(\log(mn) \cdot \sqrt{\log (n)})$ given by Matou\v{s}ek [Proc. of the AMS, 2013]. Our result immediately implies $\mathrm{herdisc}(\mathcal{F}_1 \cup \mathcal{F}_2) \leq O(\sqrt{\log (m) \cdot \log (n)}) \cdot \max(\mathrm{herdisc}(\mathcal{F}_1), \mathrm{herdisc}(\mathcal{F}_2))$, for any two set systems $\mathcal{F}_1, \mathcal{F}_2$ over $[n]$ satisfying $|\mathcal{F}_1 \cup \mathcal{F}_2| = m$. Our bounds are tight up to constants when $m = O(\mathrm{poly}(n))$ due to a construction of P\'alv\"olgyi [Discrete Comput. Geom., 2010] or the counterexample to Beck's three permutation conjecture by Newman, Neiman and Nikolov [FOCS, 2012].
Georeferenced compositional data are prominent in many scientific fields and in spatial statistics. This work addresses the problem of proposing models and methods to analyze and predict, through kriging, this type of data. To this purpose, a novel class of transformations, named the Isometric $\alpha$-transformation ($\alpha$-IT), is proposed, which encompasses the traditional Isometric Log-Ratio (ILR) transformation. It is shown that the ILR is the limit case of the $\alpha$-IT as $\alpha$ tends to 0 and that $\alpha=1$ corresponds to a linear transformation of the data. Unlike the ILR, the proposed transformation accepts 0s in the compositions when $\alpha>0$. Maximum likelihood estimation of the parameter $\alpha$ is established. Prediction using kriging on $\alpha$-IT transformed data is validated on synthetic spatial compositional data, using prediction scores computed either in the geometry induced by the $\alpha$-IT, or in the simplex. Application to land cover data shows that the relative superiority of the various approaches w.r.t. a prediction objective depends on whether the compositions contained any zero component. When all components are positive, the limit cases (ILR or linear transformations) are optimal for none of the considered metrics. An intermediate geometry, corresponding to the $\alpha$-IT with maximum likelihood estimate, better describes the dataset in a geostatistical setting. When the amount of compositions with 0s is not negligible, some side-effects of the transformation gets amplified as $\alpha$ decreases, entailing poor kriging performances both within the $\alpha$-IT geometry and for metrics in the simplex.
We study a matrix recovery problem with unknown correspondence: given the observation matrix $M_o=[A,\tilde P B]$, where $\tilde P$ is an unknown permutation matrix, we aim to recover the underlying matrix $M=[A,B]$. Such problem commonly arises in many applications where heterogeneous data are utilized and the correspondence among them are unknown, e.g., due to privacy concerns. We show that it is possible to recover $M$ via solving a nuclear norm minimization problem under a proper low-rank condition on $M$, with provable non-asymptotic error bound for the recovery of $M$. We propose an algorithm, $\text{M}^3\text{O}$ (Matrix recovery via Min-Max Optimization) which recasts this combinatorial problem as a continuous minimax optimization problem and solves it by proximal gradient with a Max-Oracle. $\text{M}^3\text{O}$ can also be applied to a more general scenario where we have missing entries in $M_o$ and multiple groups of data with distinct unknown correspondence. Experiments on simulated data, the MovieLens 100K dataset and Yale B database show that $\text{M}^3\text{O}$ achieves state-of-the-art performance over several baselines and can recover the ground-truth correspondence with high accuracy.
Nonlinear metrics, such as the F1-score, Matthews correlation coefficient, and Fowlkes-Mallows index, are often used to evaluate the performance of machine learning models, in particular, when facing imbalanced datasets that contain more samples of one class than the other. Recent optimal decision tree algorithms have shown remarkable progress in producing trees that are optimal with respect to linear criteria, such as accuracy, but unfortunately nonlinear metrics remain a challenge. To address this gap, we propose a novel algorithm based on bi-objective optimisation, which treats misclassifications of each binary class as a separate objective. We show that, for a large class of metrics, the optimal tree lies on the Pareto frontier. Consequently, we obtain the optimal tree by using our method to generate the set of all nondominated trees. To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics. Our approach leads to a trade-off when compared to optimising linear metrics: the resulting trees may be more desirable according to the given nonlinear metric at the expense of higher runtimes. Nevertheless, the experiments illustrate that runtimes are reasonable for majority of the tested datasets.
Many modern applications seek to understand the relationship between an outcome variable $Y$ and a covariate $X$ in the presence of a (possibly high-dimensional) confounding variable $Z$. Although much attention has been paid to testing whether $Y$ depends on $X$ given $Z$, in this paper we seek to go beyond testing by inferring the strength of that dependence. We first define our estimand, the minimum mean squared error (mMSE) gap, which quantifies the conditional relationship between $Y$ and $X$ in a way that is deterministic, model-free, interpretable, and sensitive to nonlinearities and interactions. We then propose a new inferential approach called floodgate that can leverage any working regression function chosen by the user (allowing, e.g., it to be fitted by a state-of-the-art machine learning algorithm or be derived from qualitative domain knowledge) to construct asymptotic confidence bounds, and we apply it to the mMSE gap. In addition to proving floodgate's asymptotic validity, we rigorously quantify its accuracy (distance from confidence bound to estimand) and robustness. We then show we can apply the same floodgate principle to a different measure of variable importance when $Y$ is binary. Finally, we demonstrate floodgate's performance in a series of simulations and apply it to data from the UK Biobank to infer the strengths of dependence of platelet count on various groups of genetic mutations.
Bayesian Likelihood-Free Inference (LFI) approaches allow to obtain posterior distributions for stochastic models with intractable likelihood, by relying on model simulations. In Approximate Bayesian Computation (ABC), a popular LFI method, summary statistics are used to reduce data dimensionality. ABC algorithms adaptively tailor simulations to the observation in order to sample from an approximate posterior, whose form depends on the chosen statistics. In this work, we introduce a new way to learn ABC statistics: we first generate parameter-simulation pairs from the model independently on the observation; then, we use Score Matching to train a neural conditional exponential family to approximate the likelihood. The exponential family is the largest class of distributions with fixed-size sufficient statistics; thus, we use them in ABC, which is intuitively appealing and has state-of-the-art performance. In parallel, we insert our likelihood approximation in an MCMC for doubly intractable distributions to draw posterior samples. We can repeat that for any number of observations with no additional model simulations, with performance comparable to related approaches. We validate our methods on toy models with known likelihood and a large-dimensional time-series model.
Given a fixed finite metric space $(V,\mu)$, the {\em minimum $0$-extension problem}, denoted as ${\tt 0\mbox{-}Ext}[\mu]$, is equivalent to the following optimization problem: minimize function of the form $\min\limits_{x\in V^n} \sum_i f_i(x_i) + \sum_{ij}c_{ij}\mu(x_i,x_j)$ where $c_{ij},c_{vi}$ are given nonnegative costs and $f_i:V\rightarrow \mathbb R$ are functions given by $f_i(x_i)=\sum_{v\in V}c_{vi}\mu(x_i,v)$. The computational complexity of ${\tt 0\mbox{-}Ext}[\mu]$ has been recently established by Karzanov and by Hirai: if metric $\mu$ is {\em orientable modular} then ${\tt 0\mbox{-}Ext}[\mu]$ can be solved in polynomial time, otherwise ${\tt 0\mbox{-}Ext}[\mu]$ is NP-hard. To prove the tractability part, Hirai developed a theory of discrete convex functions on orientable modular graphs generalizing several known classes of functions in discrete convex analysis, such as $L^\natural$-convex functions. We consider a more general version of the problem in which unary functions $f_i(x_i)$ can additionally have terms of the form $c_{uv;i}\mu(x_i,\{u,v\})$ for $\{u,v\}\in F$, where set $F\subseteq\binom{V}{2}$ is fixed. We extend the complexity classification above by providing an explicit condition on $(\mu,F)$ for the problem to be tractable. In order to prove the tractability part, we generalize Hirai's theory and define a larger class of discrete convex functions. It covers, in particular, another well-known class of functions, namely submodular functions on an integer lattice. Finally, we improve the complexity of Hirai's algorithm for solving ${\tt 0\mbox{-}Ext}[\mu]$ on orientable modular graphs.
We initiate the study of property testing problems concerning relations between permutations. In such problems, the input is a tuple $(\sigma_1,\dotsc,\sigma_d)$ of permutations on $\{1,\dotsc,n\}$, and one wishes to determine whether this tuple satisfies a certain system of relations $E$, or is far from every tuple that satisfies $E$. If this computational problem can be solved by querying only a small number of entries of the given permutations, we say that $E$ is testable. For example, when $d=2$ and $E$ consists of the single relation $\mathsf{XY=YX}$, this corresponds to testing whether $\sigma_1\sigma_2=\sigma_2\sigma_1$, where $\sigma_1\sigma_2$ and $\sigma_2\sigma_1$ denote composition of permutations. We define a collection of graphs, naturally associated with the system $E$, that encodes all the information relevant to the testability of $E$. We then prove two theorems that provide criteria for testability and non-testability in terms of expansion properties of these graphs. By virtue of a deep connection with group theory, both theorems are applicable to wide classes of systems of relations. In addition, we formulate the well-studied group-theoretic notion of stability in permutations as a special case of the testability notion above, interpret all previous works on stability as testability results, survey previous results on stability from a computational perspective, and describe many directions for future research on stability and testability.
Optimal Transport (OT) metrics allow for defining discrepancies between two probability measures. Wasserstein distance is for longer the celebrated OT-distance frequently-used in the literature, which seeks probability distributions to be supported on the $\textit{same}$ metric space. Because of its high computational complexity, several approximate Wasserstein distances have been proposed based on entropy regularization or on slicing, and one-dimensional Wassserstein computation. In this paper, we propose a novel extension of Wasserstein distance to compare two incomparable distributions, that hinges on the idea of $\textit{distributional slicing}$, embeddings, and on computing the closed-form Wassertein distance between the sliced distributions. We provide a theoretical analysis of this new divergence, called $\textit{heterogeneous Wasserstein discrepancy (HWD)}$, and we show that it preserves several interesting properties including rotation-invariance. We show that the embeddings involved in HWD can be efficiently learned. Finally, we provide a large set of experiments illustrating the behavior of HWD as a divergence in the context of generative modeling and in query framework.
The (unweighted) tree edit distance problem for $n$ node trees asks to compute a measure of dissimilarity between two rooted trees with node labels. The current best algorithm from more than a decade ago runs in $O(n ^ 3)$ time [Demaine, Mozes, Rossman, and Weimann, ICALP 2007]. The same paper also showed that $O(n ^ 3)$ is the best possible running time for any algorithm using the so-called decomposition strategy, which underlies almost all the known algorithms for this problem. These algorithms would also work for the weighted tree edit distance problem, which cannot be solved in truly sub-cubic time under the APSP conjecture [Bringmann, Gawrychowski, Mozes, and Weimann, SODA 2018]. In this paper, we break the cubic barrier by showing an $O(n ^ {2.9546})$ time algorithm for the unweighted tree edit distance problem. We consider an equivalent maximization problem and use a dynamic programming scheme involving matrices with many special properties. By using a decomposition scheme as well as several combinatorial techniques, we reduce tree edit distance to the max-plus product of bounded-difference matrices, which can be solved in truly sub-cubic time [Bringmann, Grandoni, Saha, and Vassilevska Williams, FOCS 2016].
Influence maximization is the task of selecting a small number of seed nodes in a social network to maximize the spread of the influence from these seeds, and it has been widely investigated in the past two decades. In the canonical setting, the whole social network as well as its diffusion parameters is given as input. In this paper, we consider the more realistic sampling setting where the network is unknown and we only have a set of passively observed cascades that record the set of activated nodes at each diffusion step. We study the task of influence maximization from these cascade samples (IMS), and present constant approximation algorithms for this task under mild conditions on the seed set distribution. To achieve the optimization goal, we also provide a novel solution to the network inference problem, that is, learning diffusion parameters and the network structure from the cascade data. Comparing with prior solutions, our network inference algorithm requires weaker assumptions and does not rely on maximum-likelihood estimation and convex programming. Our IMS algorithms enhance the learning-and-then-optimization approach by allowing a constant approximation ratio even when the diffusion parameters are hard to learn, and we do not need any assumption related to the network structure or diffusion parameters.