Given a metric space $(X,d_X)$, a $(\beta,s,\Delta)$-sparse cover is a collection of clusters $\mathcal{C}\subseteq P(X)$ with diameter at most $\Delta$, such that for every point $x\in X$, the ball $B_X(x,\frac\Delta\beta)$ is fully contained in some cluster $C\in \mathcal{C}$, and $x$ belongs to at most $s$ clusters in $\mathcal{C}$. Our main contribution is to show that the shortest path metric of every $K_r$-minor free graphs admits $(O(r),O(r^2),\Delta)$-sparse cover, and for every $\epsilon>0$, $(4+\epsilon,O(\frac1\epsilon)^r,\Delta)$-sparse cover (for arbitrary $\Delta>0$). We then use this sparse cover to show that every $K_r$-minor free graph embeds into $\ell_\infty^{\tilde{O}(\frac1\epsilon)^{r+1}\cdot\log n}$ with distortion $3+\eps$ (resp. into $\ell_\infty^{\tilde{O}(r^2)\cdot\log n}$ with distortion $O(r)$). Further, we provide applications of these sparse covers into padded decompositions, sparse partitions, universal TSP / Steiner tree, oblivious buy at bulk, name independent routing, and path reporting distance oracles.
Let $\mathcal{P}$ be a simple polygon with $m$ vertices and let $P$ be a set of $n$ points inside $\mathcal{P}$. We prove that there exists, for any $\varepsilon>0$, a set $\mathcal{C} \subset P$ of size $O(1/\varepsilon^2)$ such that the following holds: for any query point $q$ inside the polygon $\mathcal{P}$, the geodesic distance from $q$ to its furthest neighbor in $\mathcal{C}$ is at least $1-\varepsilon$ times the geodesic distance to its further neighbor in $P$. Thus the set $\mathcal{C}$ can be used for answering $\varepsilon$-approximate furthest-neighbor queries with a data structure whose storage requirement is independent of the size of $P$. The coreset can be constructed in $O\left(\frac{1}{\varepsilon} \left( n\log(1/\varepsilon) + (n+m)\log(n+m)\right) \right)$ time.
We consider the Distinct Shortest Walks problem. Given two vertices $s$ and $t$ of a graph database $\mathcal{D}$ and a regular path query, enumerate all walks of minimal length from $s$ to $t$ that carry a label that conforms to the query. Usual theoretical solutions turn out to be inefficient when applied to graph models that are closer to real-life systems, in particular because edges may carry multiple labels. Indeed, known algorithms may repeat the same answer exponentially many times. We propose an efficient algorithm for multi-labelled graph databases. The preprocessing runs in $O{|\mathcal{D}|\times|\mathcal{A}|}$ and the delay between two consecutive outputs is in $O(\lambda\times|\mathcal{A}|)$, where $\mathcal{A}$ is a nondeterministic automaton representing the query and $\lambda$ is the minimal length. The algorithm can handle $\varepsilon$-transitions in $\mathcal{A}$ or queries given as regular expressions at no additional cost.
We study the problem of maintaining a lightweight bounded-degree $(1+\varepsilon)$-spanner of a dynamic point set in a $d$-dimensional Euclidean space, where $\varepsilon>0$ and $d$ are arbitrary constants. In our fully-dynamic setting, points are allowed to be inserted as well as deleted, and our objective is to maintain a $(1+\varepsilon)$-spanner that has constant bounds on its maximum degree and its lightness (the ratio of its weight to that of the minimum spanning tree), while minimizing the recourse, which is the number of edges added or removed by each point insertion or deletion. We present a fully-dynamic algorithm that handles point insertion with amortized constant recourse and point deletion with amortized $O(\log\Delta)$ recourse, where $\Delta$ is the aspect ratio of the point set.
In 1991, Brenier proved a theorem that generalizes the $QR$ decomposition for square matrices -- factored as PSD $\times$ unitary -- to any vector field $F:\mathbb{R}^d\rightarrow \mathbb{R}^d$. The theorem, known as the polar factorization theorem, states that any field $F$ can be recovered as the composition of the gradient of a convex function $u$ with a measure-preserving map $M$, namely $F=\nabla u \circ M$. We propose a practical implementation of this far-reaching theoretical result, and explore possible uses within machine learning. The theorem is closely related to optimal transport (OT) theory, and we borrow from recent advances in the field of neural optimal transport to parameterize the potential $u$ as an input convex neural network. The map $M$ can be either evaluated pointwise using $u^*$, the convex conjugate of $u$, through the identity $M=\nabla u^* \circ F$, or learned as an auxiliary network. Because $M$ is, in general, not injective, we consider the additional task of estimating the ill-posed inverse map that can approximate the pre-image measure $M^{-1}$ using a stochastic generator. We illustrate possible applications of \citeauthor{Brenier1991PolarFA}'s polar factorization to non-convex optimization problems, as well as sampling of densities that are not log-concave.
We consider approximating solutions to parameterized linear systems of the form $A(\mu_1,\mu_2) x(\mu_1,\mu_2) = b$, where $(\mu_1, \mu_2) \in \mathbb{R}^2$. Here the matrix $A(\mu_1,\mu_2) \in \mathbb{R}^{n \times n}$ is nonsingular, large, and sparse and depends nonlinearly on the parameters $\mu_1$ and $\mu_2$. Specifically, the system arises from a discretization of a partial differential equation and $x(\mu_1,\mu_2) \in \mathbb{R}^n$, $b \in \mathbb{R}^n$. This work combines companion linearization with the Krylov subspace method preconditioned bi-conjugate gradient (BiCG) and a decomposition of a tensor matrix of precomputed solutions, called snapshots. As a result, a reduced order model of $x(\mu_1,\mu_2)$ is constructed, and this model can be evaluated in a cheap way for many values of the parameters. Tensor decompositions performed on a set of snapshots can fail to reach a certain level of accuracy, and it is not known a priori if a decomposition will be successful. Moreover, the selection of snapshots can affect both the quality of the produced model and the computation time required for its construction. This new method offers a way to generate a new set of solutions on the same parameter space at little additional cost. An interpolation of the model is used to produce approximations on the entire parameter space, and this method can be used to solve a parameter estimation problem. Numerical examples of a parameterized Helmholtz equation show the competitiveness of our approach. The simulations are reproducible, and the software is available online.
The $(k, z)$-Clustering problem in Euclidean space $\mathbb{R}^d$ has been extensively studied. Given the scale of data involved, compression methods for the Euclidean $(k, z)$-Clustering problem, such as data compression and dimension reduction, have received significant attention in the literature. However, the space complexity of the clustering problem, specifically, the number of bits required to compress the cost function within a multiplicative error $\varepsilon$, remains unclear in existing literature. This paper initiates the study of space complexity for Euclidean $(k, z)$-Clustering and offers both upper and lower bounds. Our space bounds are nearly tight when $k$ is constant, indicating that storing a coreset, a well-known data compression approach, serves as the optimal compression scheme. Furthermore, our lower bound result for $(k, z)$-Clustering establishes a tight space bound of $\Theta( n d )$ for terminal embedding, where $n$ represents the dataset size. Our technical approach leverages new geometric insights for principal angles and discrepancy methods, which may hold independent interest.
Let $D$ be a digraph. Its acyclic number $\vec{\alpha}(D)$ is the maximum order of an acyclic induced subdigraph and its dichromatic number $\vec{\chi}(D)$ is the least integer $k$ such that $V(D)$ can be partitioned into $k$ subsets inducing acyclic subdigraphs. We study ${\vec a}(n)$ and $\vec t(n)$ which are the minimum of $\vec\alpha(D)$ and the maximum of $\vec{\chi}(D)$, respectively, over all oriented triangle-free graphs of order $n$. For every $\epsilon>0$ and $n$ large enough, we show $(1/\sqrt{2} - \epsilon) \sqrt{n\log n} \leq \vec{a}(n) \leq \frac{107}{8} \sqrt n \log n$ and $\frac{8}{107} \sqrt n/\log n \leq \vec{t}(n) \leq (\sqrt 2 + \epsilon) \sqrt{n/\log n}$. We also construct an oriented triangle-free graph on 25 vertices with dichromatic number~3, and show that every oriented triangle-free graph of order at most 17 has dichromatic number at most 2.
Let $M$ be an $n\times n$ matrix of homogeneous linear forms over a field $\Bbbk$. If the ideal $\mathcal{I}_{n-2}(M)$ generated by minors of size $n-1$ is Cohen-Macaulay, then the Gulliksen-Neg{\aa}rd complex is a free resolution of $\mathcal{I}_{n-2}(M)$. It has recently been shown that by taking into account the syzygy modules for $\mathcal{I}_{n-2}(M)$ which can be obtained from this complex, one can derive a refined signature-based Gr\"obner basis algorithm DetGB which avoids reductions to zero when computing a grevlex Gr\"obner basis for $\mathcal{I}_{n-2}(M)$. In this paper, we establish sharp complexity bounds on DetGB. To accomplish this, we prove several results on the sizes of reduced grevlex Gr\"obner bases of reverse lexicographic ideals, thanks to which we obtain two main complexity results which rely on conjectures similar to that of Fr\"oberg. The first one states that, in the zero-dimensional case, the size of the reduced grevlex Gr\"obner basis of $\mathcal{I}_{n-2}(M)$ is bounded from below by $n^{6}$ asymptotically. The second, also in the zero-dimensional case, states that the complexity of DetGB is bounded from above by $n^{2\omega+3}$ asymptotically, where $2\le\omega\le 3$ is any complexity exponent for matrix multiplication over $\Bbbk$.
We study the PSPACE-complete $k$-Canadian Traveller Problem, where a weighted graph $G=(V,E,\omega)$ with a source $s\in V$ and a target $t\in V$ are given. This problem also has a hidden input $E_* \subsetneq E$ of cardinality at most $k$ representing blocked edges. The objective is to travel from $s$ to $t$ with the minimum distance. At the beginning of the walk, the blockages $E_*$ are unknown: the traveller discovers that an edge is blocked when visiting one of its endpoints. Online algorithms, also called strategies, have been proposed for this problem and assessed with the competitive ratio, i.e. the ratio between the distance actually traversed by the traveller divided by the distance we would have traversed knowing the blockages in advance. Even though the optimal competitive ratio is $2k+1$ even on unit-weighted planar graphs of treewidth 2, we design a polynomial-time strategy achieving competitive ratio $9$ on unit-weighted outerplanar graphs. This value $9$ also stands as a lower bound for this family of graphs as we prove that, for any $\varepsilon > 0$, no strategy can achieve a competitive ratio $9-\varepsilon$. Finally, we show that it is not possible to achieve a constant competitive ratio (independent of $G$ and $k$) on weighted outerplanar graphs.
For positive integers $d$ and $p$ such that $d \ge p$, we obtain complete asymptotic expansions, for large $d$, of the normalizing constants for the matrix Bingham and matrix Langevin distributions on Stiefel manifolds. The accuracy of each truncated expansion is strictly increasing in $d$; also, for sufficiently large $d$, the accuracy is strictly increasing in $m$, the number of terms in the truncated expansion. We apply these results to obtain the rate of convergence of these asymptotic expansions if both $d, p \to \infty$. Using values of $d$ and $p$ arising in various data sets, we illustrate the rate of convergence of the truncated approximations as $d$ or $m$ increases. These results extend our recent work on asymptotic expansions for the normalizing constants of the high-dimensional Bingham distributions.