We say that a Hamilton cycle $C=(x_1,\ldots,x_n)$ in a graph $G$ is $k$-symmetric, if the mapping $x_i\mapsto x_{i+n/k}$ for all $i=1,\ldots,n$, where indices are considered modulo $n$, is an automorphism of $G$. In other words, if we lay out the vertices $x_1,\ldots,x_n$ equidistantly on a circle and draw the edges of $G$ as straight lines, then the drawing of $G$ has $k$-fold rotational symmetry, i.e., all information about the graph is compressed into a $360^\circ/k$ wedge of the drawing. The maximum $k$ for which there exists a $k$-symmetric Hamilton cycle in $G$ is referred to as the Hamilton compression of $G$. We investigate the Hamilton compression of four different families of vertex-transitive graphs, namely hypercubes, Johnson graphs, permutahedra and Cayley graphs of abelian groups. In several cases we determine their Hamilton compression exactly, and in other cases we provide close lower and upper bounds. The constructed cycles have a much higher compression than several classical Gray codes known from the literature. Our constructions also yield Gray codes for bitstrings, combinations and permutations that have few tracks and/or that are balanced.
We show an area law with logarithmic correction for the maximally mixed state $\Omega$ in the (degenerate) ground space of a 1D gapped local Hamiltonian $H$, which is independent of the underlying ground space degeneracy. Formally, for $\varepsilon>0$ and a bi-partition $L\cup L^c$ of the 1D lattice, we show that $$\mathrm{I}^{\varepsilon}_{\max}(L:L^c)_{\Omega} \leq O(\log(|L|)+\log(1/\varepsilon)),$$ where $|L|$ represents the number of qudits in $L$ and $\mathrm{I}^{\epsilon}_{\max}(L:L^c)_{\Omega}$ represents the $\varepsilon$- 'smoothed maximum mutual information' with respect to the $L:L^c$ partition in $\Omega$. As a corollary, we get an area law for the mutual information of the form $\mathrm{I}(L:R)_\Omega \leq O(\log |L|)$. In addition, we show that $\Omega$ can be approximated up to an $\varepsilon$ in trace norm with a state of Schmidt rank of at most $\mathrm{poly}(|L|/\varepsilon)$.
The Weisfeiler-Leman (WL) dimension of a graph parameter $f$ is the minimum $k$ such that, if $G_1$ and $G_2$ are indistinguishable by the $k$-dimensional WL-algorithm then $f(G_1)=f(G_2)$. The WL-dimension of $f$ is $\infty$ if no such $k$ exists. We study the WL-dimension of graph parameters characterised by the number of answers from a fixed conjunctive query to the graph. Given a conjunctive query $\varphi$, we quantify the WL-dimension of the function that maps every graph $G$ to the number of answers of $\varphi$ in $G$. The works of Dvor\'ak (J. Graph Theory 2010), Dell, Grohe, and Rattan (ICALP 2018), and Neuen (ArXiv 2023) have answered this question for full conjunctive queries, which are conjunctive queries without existentially quantified variables. For such queries $\varphi$, the WL-dimension is equal to the treewidth of the Gaifman graph of $\varphi$. In this work, we give a characterisation that applies to all conjunctive qureies. Given any conjunctive query $\varphi$, we prove that its WL-dimension is equal to the semantic extension width $\mathsf{sew}(\varphi)$, a novel width measure that can be thought of as a combination of the treewidth of $\varphi$ and its quantified star size, an invariant introduced by Durand and Mengel (ICDT 2013) describing how the existentially quantified variables of $\varphi$ are connected with the free variables. Using the recently established equivalence between the WL-algorithm and higher-order Graph Neural Networks (GNNs) due to Morris et al. (AAAI 2019), we obtain as a consequence that the function counting answers to a conjunctive query $\varphi$ cannot be computed by GNNs of order smaller than $\mathsf{sew}(\varphi)$.
A set $S$ of isometric paths of a graph $G$ is "$v$-rooted", where $v$ is a vertex of $G$, if $v$ is one of the end-vertices of all the isometric paths in $S$. The isometric path complexity of a graph $G$, denoted by $ipco(G)$, is the minimum integer $k$ such that there exists a vertex $v\in V(G)$ satisfying the following property: the vertices of any isometric path $P$ of $G$ can be covered by $k$ many $v$-rooted isometric paths. First, we provide an $O(n^2 m)$-time algorithm to compute the isometric path complexity of a graph with $n$ vertices and $m$ edges. Then we show that the isometric path complexity remains bounded for graphs in three seemingly unrelated graph classes, namely, hyperbolic graphs, (theta, prism, pyramid)-free graphs, and outerstring graphs. Hyperbolic graphs are extensively studied in Metric Graph Theory. The class of (theta, prism, pyramid)-free graphs are extensively studied in Structural Graph Theory, e.g. in the context of the Strong Perfect Graph Theorem. The class of outerstring graphs is studied in Geometric Graph Theory and Computational Geometry. Our results also show that the distance functions of these (structurally) different graph classes are more similar than previously thought. There is a direct algorithmic consequence of having small isometric path complexity. Specifically, we show that if the isometric path complexity of a graph $G$ is bounded by a constant, then there exists a polynomial-time constant-factor approximation algorithm for ISOMETRIC PATH COVER, whose objective is to cover all vertices of a graph with a minimum number of isometric paths. This applies to all the above graph classes.
We construct and analyze finite element approximations of the Einstein tensor in dimension $N \ge 3$. We focus on the setting where a smooth Riemannian metric tensor $g$ on a polyhedral domain $\Omega \subset \mathbb{R}^N$ has been approximated by a piecewise polynomial metric $g_h$ on a simplicial triangulation $\mathcal{T}$ of $\Omega$ having maximum element diameter $h$. We assume that $g_h$ possesses single-valued tangential-tangential components on every codimension-1 simplex in $\mathcal{T}$. Such a metric is not classically differentiable in general, but it turns out that one can still attribute meaning to its Einstein curvature in a distributional sense. We study the convergence of the distributional Einstein curvature of $g_h$ to the Einstein curvature of $g$ under refinement of the triangulation. We show that in the $H^{-2}(\Omega)$-norm, this convergence takes place at a rate of $O(h^{r+1})$ when $g_h$ is an optimal-order interpolant of $g$ that is piecewise polynomial of degree $r \ge 1$. We provide numerical evidence to support this claim.
A collection of graphs is \textit{nearly disjoint} if every pair of them intersects in at most one vertex. We prove that if $G_1, \dots, G_m$ are nearly disjoint graphs of maximum degree at most $D$, then the following holds. For every fixed $C$, if each vertex $v \in \bigcup_{i=1}^m V(G_i)$ is contained in at most $C$ of the graphs $G_1, \dots, G_m$, then the (list) chromatic number of $\bigcup_{i=1}^m G_i$ is at most $D + o(D)$. This result confirms a special case of a conjecture of Vu and generalizes Kahn's bound on the list chromatic index of linear uniform hypergraphs of bounded maximum degree. In fact, this result holds for the correspondence (or DP) chromatic number and thus implies a recent result of Molloy, and we derive this result from a more general list coloring result in the setting of `color degrees' that also implies a result of Reed and Sudakov.
Given a vector dataset $\mathcal{X}$ and a query vector $\vec{x}_q$, graph-based Approximate Nearest Neighbor Search (ANNS) aims to build a graph index $G$ and approximately return vectors with minimum distances to $\vec{x}_q$ by searching over $G$. The main drawback of graph-based ANNS is that a graph index would be too large to fit into the memory especially for a large-scale $\mathcal{X}$. To solve this, a Product Quantization (PQ)-based hybrid method called DiskANN is proposed to store a low-dimensional PQ index in memory and retain a graph index in SSD, thus reducing memory overhead while ensuring a high search accuracy. However, it suffers from two I/O issues that significantly affect the overall efficiency: (1) long routing path from an entry vertex to the query's neighborhood that results in large number of I/O requests and (2) redundant I/O requests during the routing process. We propose an optimized DiskANN++ to overcome above issues. Specifically, for the first issue, we present a query-sensitive entry vertex selection strategy to replace DiskANN's static graph-central entry vertex by a dynamically determined entry vertex that is close to the query. For the second I/O issue, we present an isomorphic mapping on DiskANN's graph index to optimize the SSD layout and propose an asynchronously optimized Pagesearch based on the optimized SSD layout as an alternative to DiskANN's beamsearch. Comprehensive experimental studies on eight real-world datasets demonstrate our DiskANN++'s superiority on efficiency. We achieve a notable 1.5 X to 2.2 X improvement on QPS compared to DiskANN, given the same accuracy constraint.
This paper studies the prediction of a target $\mathbf{z}$ from a pair of random variables $(\mathbf{x},\mathbf{y})$, where the ground-truth predictor is additive $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$. We study the performance of empirical risk minimization (ERM) over functions $f+g$, $f \in F$ and $g \in G$, fit on a given training distribution, but evaluated on a test distribution which exhibits covariate shift. We show that, when the class $F$ is "simpler" than $G$ (measured, e.g., in terms of its metric entropy), our predictor is more resilient to heterogeneous covariate shifts} in which the shift in $\mathbf{x}$ is much greater than that in $\mathbf{y}$. Our analysis proceeds by demonstrating that ERM behaves qualitatively similarly to orthogonal machine learning: the rate at which ERM recovers the $f$-component of the predictor has only a lower-order dependence on the complexity of the class $G$, adjusted for partial non-indentifiability introduced by the additive structure. These results rely on a novel H\"older style inequality for the Dudley integral which may be of independent interest. Moreover, we corroborate our theoretical findings with experiments demonstrating improved resilience to shifts in "simpler" features across numerous domains.
For a set of points in $\mathbb{R}^d$, the Euclidean $k$-means problems consists of finding $k$ centers such that the sum of distances squared from each data point to its closest center is minimized. Coresets are one the main tools developed recently to solve this problem in a big data context. They allow to compress the initial dataset while preserving its structure: running any algorithm on the coreset provides a guarantee almost equivalent to running it on the full data. In this work, we study coresets in a fully-dynamic setting: points are added and deleted with the goal to efficiently maintain a coreset with which a k-means solution can be computed. Based on an algorithm from Henzinger and Kale [ESA'20], we present an efficient and practical implementation of a fully dynamic coreset algorithm, that improves the running time by up to a factor of 20 compared to our non-optimized implementation of the algorithm by Henzinger and Kale, without sacrificing more than 7% on the quality of the k-means solution.
In the Trivially Perfect Editing problem one is given an undirected graph $G = (V,E)$ and an integer $k$ and seeks to add or delete at most $k$ edges in $G$ to obtain a trivially perfect graph. In a recent work, Dumas, Perez and Todinca [Algorithmica 2023] proved that this problem admits a kernel with $O(k^3)$ vertices. This result heavily relies on the fact that the size of trivially perfect modules can be bounded by $O(k^2)$ as shown by Drange and Pilipczuk [Algorithmica 2018]. To obtain their cubic vertex-kernel, Dumas, Perez and Todinca [Algorithmica 2023] then showed that a more intricate structure, so-called \emph{comb}, can be reduced to $O(k^2)$ vertices. In this work we show that the bound can be improved to $O(k)$ for both aforementioned structures and thus obtain a kernel with $O(k^2)$ vertices. Our approach relies on the straightforward yet powerful observation that any large enough structure contains unaffected vertices whose neighborhood remains unchanged by an editing of size $k$, implying strong structural properties.
The modular subset sum problem consists of deciding, given a modulus $m$, a multiset $S$ of $n$ integers in $0..m-1$, and a target integer $t$, whether there exists a subset of $S$ with elements summing to $t \mod m $, and to report such a set if it exists. We give a simple $O(m \log m)$-time with high probability (w.h.p.) algorithm for the modular subset sum problem. This builds on and improves on a previous $O(m \log^7 m)$ w.h.p. algorithm from Axiotis, Backurs, Jin, Tzamos, and Wu (SODA 19). Our method utilizes the ADT of the dynamic strings structure of Gawrychowski et al. (SODA~18). However, as this structure is rather complicated we present a much simpler alternative which we call the Data Dependent Tree. As an application, we consider the computational version of a fundamental theorem in zero-sum Ramsey theory. The Erd\H{o}s-Ginzburg-Ziv Theorem states that a multiset of $2n - 1$ integers always contains a subset of cardinality exactly $n$ whose values sum to a multiple of $n$. We give an algorithm for finding such a subset in time $O(n \log n)$ w.h.p. which improves on an $O(n^2)$ algorithm due to Del Lungo, Marini, and Mori (Disc. Math. 09).