If $G$ is a group, we say a subset $S$ of $G$ is product-free if the equation $xy=z$ has no solutions with $x,y,z \in S$. For $D \in \mathbb{N}$, a group $G$ is said to be $D$-quasirandom if the minimal dimension of a nontrivial complex irreducible representation of $G$ is at least $D$. Gowers showed that in a $D$-quasirandom finite group $G$, the maximal size of a product-free set is at most $|G|/D^{1/3}$. This disproved a longstanding conjecture of Babai and S\'os from 1985. For the special unitary group, $G=SU(n)$, Gowers observed that his argument yields an upper bound of $n^{-1/3}$ on the measure of a measurable product-free subset. In this paper, we improve Gowers' upper bound to $\exp(-cn^{1/3})$, where $c>0$ is an absolute constant. In fact, we establish something stronger, namely, product-mixing for measurable subsets of $SU(n)$ with measure at least $\exp(-cn^{1/3})$; for this product-mixing result, the $n^{1/3}$ in the exponent is sharp. Our approach involves introducing novel hypercontractive inequalities, which imply that the non-Abelian Fourier spectrum of the indicator function of a small set concentrates on high-dimensional irreducible representations. Our hypercontractive inequalities are obtained via methods from representation theory, harmonic analysis, random matrix theory and differential geometry. We generalize our hypercontractive inequalities from $SU(n)$ to an arbitrary $D$-quasirandom compact connected Lie group for $D$ at least an absolute constant, thereby extending our results on product-free sets to such groups. We also demonstrate various other applications of our inequalities to geometry (viz., non-Abelian Brunn-Minkowski type inequalities), mixing times, and the theory of growth in compact Lie groups.
Bayesian inference for Dirichlet-Multinomial (DM) models has a long and important history. The concentration parameter $\alpha$ is pivotal in smoothing category probabilities within the multinomial distribution and is crucial for the inference afterward. Due to the lack of a tractable form of its marginal likelihood, $\alpha$ is often chosen in an ad-hoc manner, or estimated using approximation algorithms. A constant $\alpha$ often leads to inadequate smoothing of probabilities, particularly for sparse compositional count datasets. In this paper, we introduce a novel class of prior distributions facilitating conjugate updating of the concentration parameter, allowing for full Bayesian inference for DM models. Our methodology is based on fast residue computation and admits closed-form posterior moments in specific scenarios. Additionally, our prior provides continuous shrinkage with its heavy tail and substantial mass around zero, ensuring adaptability to the sparsity or quasi-sparsity of the data. We demonstrate the usefulness of our approach on both simulated examples and on real-world applications. Finally, we conclude with directions for future research.
The Weisfeiler-Leman (WL) dimension of a graph parameter $f$ is the minimum $k$ such that, if $G_1$ and $G_2$ are indistinguishable by the $k$-dimensional WL-algorithm then $f(G_1)=f(G_2)$. The WL-dimension of $f$ is $\infty$ if no such $k$ exists. We study the WL-dimension of graph parameters characterised by the number of answers from a fixed conjunctive query to the graph. Given a conjunctive query $\varphi$, we quantify the WL-dimension of the function that maps every graph $G$ to the number of answers of $\varphi$ in $G$. The works of Dvor\'ak (J. Graph Theory 2010), Dell, Grohe, and Rattan (ICALP 2018), and Neuen (ArXiv 2023) have answered this question for full conjunctive queries, which are conjunctive queries without existentially quantified variables. For such queries $\varphi$, the WL-dimension is equal to the treewidth of the Gaifman graph of $\varphi$. In this work, we give a characterisation that applies to all conjunctive qureies. Given any conjunctive query $\varphi$, we prove that its WL-dimension is equal to the semantic extension width $\mathsf{sew}(\varphi)$, a novel width measure that can be thought of as a combination of the treewidth of $\varphi$ and its quantified star size, an invariant introduced by Durand and Mengel (ICDT 2013) describing how the existentially quantified variables of $\varphi$ are connected with the free variables. Using the recently established equivalence between the WL-algorithm and higher-order Graph Neural Networks (GNNs) due to Morris et al. (AAAI 2019), we obtain as a consequence that the function counting answers to a conjunctive query $\varphi$ cannot be computed by GNNs of order smaller than $\mathsf{sew}(\varphi)$.
We consider reachability decision problems for linear dynamical systems: Given a linear map on $\mathbb{R}^d$ , together with source and target sets, determine whether there is a point in the source set whose orbit, obtained by repeatedly applying the linear map, enters the target set. When the source and target sets are semialgebraic, this problem can be reduced to a point-to-polytope reachability question. The latter is generally believed not to be substantially harder than the well-known Skolem and Positivity Problems. The situation is markedly different for multiple reachability, i.e. the question of whether the orbit visits the target set at least m times, for some given positive integer m. In this paper, we prove that when the source set is semialgebraic and the target set consists of a hyperplane, multiple reachability is undecidable; in fact we already obtain undecidability in ambient dimension d = 10 and with fixed m = 9. Moreover, as we observe that procedures for dimensions 3 up to 9 would imply strong results pertaining to effective solutions of Diophantine equations, we mainly focus on the affine plane ($\mathbb{R}^2$). We obtain two main positive results. We show that multiple reachability is decidable for halfplane targets, and that it is also decidable for general semialgebraic targets, provided the linear map is a rotation. The latter result involves a new method, based on intersections of algebraic subgroups with subvarieties, due to Bombieri and Zannier.
A subset $S$ of vertices in a graph $G$ is a secure dominating set of $G$ if $S$ is a dominating set of $G$ and, for each vertex $u \not\in S$, there is a vertex $v \in S$ such that $uv$ is an edge and $(S \setminus \{v\}) \cup \{u\}$ is also a dominating set of $G$. The secure domination number of $G$, denoted by $\gamma_{s}(G)$, is the cardinality of a smallest secure dominating sets of $G$. In this paper, we prove that for any outerplanar graph with $n \geq 4$ vertices, $\gamma_{s}(G) \geq (n+4)/5$ and the bound is tight.
We present the general forms of piece-wise functions on partitioned domains satisfying an intrinsic $C^0$ or $C^1$ continuity across the sub-domain boundaries. These general forms are constructed based on a strategy stemming from the theory of functional connections, and we refer to partitioned domains endowed with these general forms as functionally connected elements (FCE). We further present a method, incorporating functionally connected elements and a least squares collocation approach, for solving boundary and initial value problems. This method exhibits a spectral-like accuracy, with the free functions involved in the FCE form represented by polynomial bases or by non-polynomial bases of quasi-random sinusoidal functions. The FCE method offers a unique advantage over traditional element-based methods for boundary value problems involving relative boundary conditions. A number of linear and nonlinear numerical examples in one and two dimensions are presented to demonstrate the performance of the FCE method developed herein.
The Path Contraction and Cycle Contraction problems take as input an undirected graph $G$ with $n$ vertices, $m$ edges and an integer $k$ and determine whether one can obtain a path or a cycle, respectively, by performing at most $k$ edge contractions in $G$. We revisit these NP-complete problems and prove the following results. Path Contraction admits an algorithm running in $\mathcal{O}^*(2^{k})$ time. This improves over the current algorithm known for the problem [Algorithmica 2014]. Cycle Contraction admits an algorithm running in $\mathcal{O}^*((2 + \epsilon_{\ell})^k)$ time where $0 < \epsilon_{\ell} \leq 0.5509$ is inversely proportional to $\ell = n - k$. Central to these results is an algorithm for a general variant of Path Contraction, namely, Path Contraction With Constrained Ends. We also give an $\mathcal{O}^*(2.5191^n)$-time algorithm to solve the optimization version of Cycle Contraction. Next, we turn our attention to restricted graph classes and show the following results. Path Contraction on planar graphs admits a polynomial-time algorithm. Path Contraction on chordal graphs does not admit an algorithm running in time $\mathcal{O}(n^{2-\epsilon} \cdot 2^{o(tw)})$ for any $\epsilon > 0$, unless the Orthogonal Vectors Conjecture fails. Here, $tw$ is the treewidth of the input graph. The second result complements the $\mathcal{O}(nm)$-time, i.e., $\mathcal{O}(n^2 \cdot tw)$-time, algorithm known for the problem [Discret. Appl. Math. 2014].
In the Pattern Masking for Dictionary Matching (PMDM) problem, we are given a dictionary $\mathcal{D}$ of $d$ strings, each of length $\ell$, a query string $q$ of length $\ell$, and a positive integer $z$, and we are asked to compute a smallest set $K\subseteq\{1,\ldots,\ell\}$, so that if $q[i]$, for all $i\in K$, is replaced by a wildcard, then $q$ matches at least $z$ strings from $\mathcal{D}$. The PMDM problem lies at the heart of two important applications featured in large-scale real-world systems: record linkage of databases that contain sensitive information, and query term dropping. In both applications, solving PMDM allows for providing data utility guarantees as opposed to existing approaches. We first show, through a reduction from the well-known $k$-Clique problem, that a decision version of the PMDM problem is NP-complete, even for strings over a binary alphabet. We present a data structure for PMDM that answers queries over $\mathcal{D}$ in time $\mathcal{O}(2^{\ell/2}(2^{\ell/2}+\tau)\ell)$ and requires space $\mathcal{O}(2^{\ell}d^2/\tau^2+2^{\ell/2}d)$, for any parameter $\tau\in[1,d]$. We also approach the problem from a more practical perspective. We show an $\mathcal{O}((d\ell)^{k/3}+d\ell)$-time and $\mathcal{O}(d\ell)$-space algorithm for PMDM if $k=|K|=\mathcal{O}(1)$. We generalize our exact algorithm to mask multiple query strings simultaneously. We complement our results by showing a two-way polynomial-time reduction between PMDM and the Minimum Union problem [Chlamt\'{a}\v{c} et al., SODA 2017]. This gives a polynomial-time $\mathcal{O}(d^{1/4+\epsilon})$-approximation algorithm for PMDM, which is tight under plausible complexity conjectures.
Let $\mathcal{P}$ be a simple polygon with $m$ vertices and let $P$ be a set of $n$ points inside $\mathcal{P}$. We prove that there exists, for any $\varepsilon>0$, a set $\mathcal{C} \subset P$ of size $O(1/\varepsilon^2)$ such that the following holds: for any query point $q$ inside the polygon $\mathcal{P}$, the geodesic distance from $q$ to its furthest neighbor in $\mathcal{C}$ is at least $1-\varepsilon$ times the geodesic distance to its further neighbor in $P$. Thus the set $\mathcal{C}$ can be used for answering $\varepsilon$-approximate furthest-neighbor queries with a data structure whose storage requirement is independent of the size of $P$. The coreset can be constructed in $O\left(\frac{1}{\varepsilon} \left( n\log(1/\varepsilon) + (n+m)\log(n+m)\right) \right)$ time.
We consider the Distinct Shortest Walks problem. Given two vertices $s$ and $t$ of a graph database $\mathcal{D}$ and a regular path query, enumerate all walks of minimal length from $s$ to $t$ that carry a label that conforms to the query. Usual theoretical solutions turn out to be inefficient when applied to graph models that are closer to real-life systems, in particular because edges may carry multiple labels. Indeed, known algorithms may repeat the same answer exponentially many times. We propose an efficient algorithm for multi-labelled graph databases. The preprocessing runs in $O{|\mathcal{D}|\times|\mathcal{A}|}$ and the delay between two consecutive outputs is in $O(\lambda\times|\mathcal{A}|)$, where $\mathcal{A}$ is a nondeterministic automaton representing the query and $\lambda$ is the minimal length. The algorithm can handle $\varepsilon$-transitions in $\mathcal{A}$ or queries given as regular expressions at no additional cost.
We give an isomorphism test that runs in time $n^{\operatorname{polylog}(h)}$ on all $n$-vertex graphs excluding some $h$-vertex vertex graph as a topological subgraph. Previous results state that isomorphism for such graphs can be tested in time $n^{\operatorname{polylog}(n)}$ (Babai, STOC 2016) and $n^{f(h)}$ for some function $f$ (Grohe and Marx, SIAM J. Comp., 2015). Our result also unifies and extends previous isomorphism tests for graphs of maximum degree $d$ running in time $n^{\operatorname{polylog}(d)}$ (SIAM J. Comp., 2023) and for graphs of Hadwiger number $h$ running in time $n^{\operatorname{polylog}(h)}$ (SIAM J. Comp., 2023).