A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \dots w[i_{k}]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \lvert w\rvert$. A word $w$ is $k$-subsequence universal over an alphabet $\Sigma$ if every word in $\Sigma^k$ appears in $w$ as a subsequence. In this paper, we study the intersection between the set of $k$-subsequence universal words over some alphabet $\Sigma$ and regular languages over $\Sigma$. We call a regular language $L$ \emph{$k$-$\exists$-subsequence universal} if there exists a $k$-subsequence universal word in $L$, and \emph{$k$-$\forall$-subsequence universal} if every word of $L$ is $k$-subsequence universal. We give algorithms solving the problems of deciding if a given regular language, represented by a finite automaton recognising it, is \emph{$k$-$\exists$-subsequence universal} and, respectively, if it is \emph{$k$-$\forall$-subsequence universal}, for a given $k$. The algorithms are FPT w.r.t.~the size of the input alphabet, and their run-time does not depend on $k$; they run in polynomial time in the number $n$ of states of the input automaton when the size of the input alphabet is $O(\log n)$. Moreover, we show that the problem of deciding if a given regular language is \emph{$k$-$\exists$-subsequence universal} is NP-complete, when the language is over a large alphabet. Further, we provide algorithms for counting the number of $k$-subsequence universal words (paths) accepted by a given deterministic (respectively, nondeterministic) finite automaton, and ranking an input word (path) within the set of $k$-subsequence universal words accepted by a given finite automaton.
Minimum sum vertex cover of an $n$-vertex graph $G$ is a bijection $\phi : V(G) \to [n]$ that minimizes the cost $\sum_{\{u,v\} \in E(G)} \min \{\phi(u), \phi(v) \}$. Finding a minimum sum vertex cover of a graph (the MSVC problem) is NP-hard. MSVC is studied well in the realm of approximation algorithms. The best-known approximation factor in polynomial time for the problem is $16/9$ [Bansal, Batra, Farhadi, and Tetali, SODA 2021]. Recently, Stankovic [APPROX/RANDOM 2022] proved that achieving an approximation ratio better than $1.014$ for MSVC is NP-hard, assuming the Unique Games Conjecture. We study the MSVC problem from the perspective of parameterized algorithms. The parameters we consider are the size of a minimum vertex cover and the size of a minimum clique modulator of the input graph. We obtain the following results. 1. MSVC can be solved in $2^{2^{O(k)}} n^{O(1)}$ time, where $k$ is the size of a minimum vertex cover. 2. MSVC can be solved in $f(k)\cdot n^{O(1)}$ time for some computable function $f$, where $k$ is the size of a minimum clique modulator.
The $b$-symbol metric is a generalization of the Hamming metric. Linear codes, in the $b$-symbol metric, have been used in the read channel whose outputs consist of $b$ consecutive symbols. The Griesmer bound outperforms the Singleton bound for $\mathbb{F}_q$-linear codes in the Hamming metric, when $q$ is fixed and the length is large enough. This scenario is also applicable in the $b$-symbol metric. Shi, Zhu, and Helleseth recently made a conjecture on cyclic codes in the $b$-symbol metric. In this paper, we present the $b$-symbol Griesmer bound for linear codes by concatenating linear codes and simplex codes. Based on cyclic codes and extended cyclic codes, we propose two families of distance-optimal linear codes with respect to the $b$-symbol Griesmer bound.
Consider the problem of estimating a random variable $X$ from noisy observations $Y = X+ Z$, where $Z$ is standard normal, under the $L^1$ fidelity criterion. It is well known that the optimal Bayesian estimator in this setting is the conditional median. This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian. Along the way, several other results are presented. In particular, it is demonstrated that if the conditional distribution $P_{X|Y=y}$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution. Additionally, we consider other $L^p$ losses and observe the following phenomenon: for $p \in [1,2]$, Gaussian is the only prior distribution that induces a linear optimal Bayesian estimator, and for $p \in (2,\infty)$, infinitely many prior distributions on $X$ can induce linearity. Finally, extensions are provided to encompass noise models leading to conditional distributions from certain exponential families.
Let $X$ be a finite set. A family $P$ of subsets of $X$ is called a convex geometry with ground set $X$ if (1) $\emptyset, X\in P$; (2) $A\cap B\in P$ whenever $A,B\in P$; and (3) if $A\in P$ and $A\neq X$, there is an element $\alpha\in X-A$ such that $A\cup\{\alpha\}\in P$. As a non-empty family of sets, a convex geometry has a well defined VC-dimension. In the literature, a second parameter, called convex dimension, has been defined expressly for these structures. Partially ordered by inclusion, a convex geometry is also a poset, and four additional dimension parameters have been defined for this larger class, called Dushnik-Miller dimension, Boolean dimension, local dimension, and fractional dimension, espectively. For each pair of these six dimension parameters, we investigate whether there is an infinite class of convex geometries on which one parameter is bounded and the other is not.
A matroid $M$ is an ordered pair $(E,I)$, where $E$ is a finite set called the ground set and a collection $I\subset 2^{E}$ called the independent sets which satisfy the conditions: (i) $\emptyset \in I$, (ii) $I'\subset I \in I$ implies $I'\in I$, and (iii) $I_1,I_2 \in I$ and $|I_1| < |I_2|$ implies that there is an $e\in I_2$ such that $I_1\cup \{e\} \in I$. The rank $rank(M)$ of a matroid $M$ is the maximum size of an independent set. We say that a matroid $M=(E,I)$ is representable over the reals if there is a map $\varphi \colon E \rightarrow \mathbb{R}^{rank(M)}$ such that $I\in I$ if and only if $\varphi(I)$ forms a linearly independent set. We study the problem of matroid realizability over the reals. Given a matroid $M$, we ask whether there is a set of points in the Euclidean space representing $M$. We show that matroid realizability is $\exists \mathbb R$-complete, already for matroids of rank 3. The complexity class $\exists \mathbb R$ can be defined as the family of algorithmic problems that is polynomial-time is equivalent to determining if a multivariate polynomial with integers coefficients has a real root. Our methods are similar to previous methods from the literature. Yet, the result itself was never pointed out and there is no proof readily available in the language of computer science.
This paper investigates the rate-distortion function, under a squared error distortion $D$, for an $n$-dimensional random vector uniformly distributed on an $(n-1)$-sphere of radius $R$. First, an expression for the rate-distortion function is derived for any values of $n$, $D$, and $R$. Second, two types of asymptotics with respect to the rate-distortion function of a Gaussian source are characterized. More specifically, these asymptotics concern the low-distortion regime (that is, $D \to 0$) and the high-dimensional regime (that is, $n \to \infty$).
The Learning With Errors ($\mathsf{LWE}$) problem asks to find $\mathbf{s}$ from an input of the form $(\mathbf{A}, \mathbf{b} = \mathbf{A}\mathbf{s}+\mathbf{e}) \in (\mathbb{Z}/q\mathbb{Z})^{m \times n} \times (\mathbb{Z}/q\mathbb{Z})^{m}$, for a vector $\mathbf{e}$ that has small-magnitude entries. In this work, we do not focus on solving $\mathsf{LWE}$ but on the task of sampling instances. As these are extremely sparse in their range, it may seem plausible that the only way to proceed is to first create $\mathbf{s}$ and $\mathbf{e}$ and then set $\mathbf{b} = \mathbf{A}\mathbf{s}+\mathbf{e}$. In particular, such an instance sampler knows the solution. This raises the question whether it is possible to obliviously sample $(\mathbf{A}, \mathbf{A}\mathbf{s}+\mathbf{e})$, namely, without knowing the underlying $\mathbf{s}$. A variant of the assumption that oblivious $\mathsf{LWE}$ sampling is hard has been used in a series of works constructing Succinct Non-interactive Arguments of Knowledge (SNARKs) in the standard model. As the assumption is related to $\mathsf{LWE}$, these SNARKs have been conjectured to be secure in the presence of quantum adversaries. Our main result is a quantum polynomial-time algorithm that samples well-distributed $\mathsf{LWE}$ instances while provably not knowing the solution, under the assumption that $\mathsf{LWE}$ is hard. Moreover, the approach works for a vast range of $\mathsf{LWE}$ parametrizations, including those used in the above-mentioned SNARKs.
$k$-clique listing is a vital graph mining operator with diverse applications in various networks. The state-of-the-art algorithms all adopt a branch-and-bound (BB) framework with a vertex-oriented branching strategy (called VBBkC), which forms a sub-branch by expanding a partial $k$-clique with a vertex. These algorithms have the time complexity of $O(k m (\delta/2)^{k-2})$, where $m$ is the number of edges in the graph and $\delta$ is the degeneracy of the graph. In this paper, we propose a BB framework with a new edge-oriented branching (called EBBkC), which forms a sub-branch by expanding a partial $k$-clique with two vertices that connect each other (which correspond to an edge). We explore various edge orderings for EBBkC such that it achieves a time complexity of $O(\delta m + k m (\tau/2)^{k-2})$, where $\tau$ is an integer related to the maximum truss number of the graph and we have $\tau < \delta$. The time complexity of EBBkC is better than that of VBBkC algorithms for $k>3$ since both $O(\delta m)$ and $O(k m (\tau/2)^{k-2})$ are bounded by $O(k m (\delta/2)^{k-2})$. Furthermore, we develop specialized algorithms for sub-branches on dense graphs so that we can early-terminate them and apply the specialized algorithms. We conduct extensive experiments on 19 real graphs, and the results show that our newly developed EBBkC-based algorithms with the early termination technique consistently and largely outperform the state-of-the-art (VBBkC-based) algorithms.
An infinite sequence of sets $\left\{B_{n}\right\}_{n\in\mathbb{N}}$ is said to be a heterochromatic sequence from an infinite sequence of families $\left\{ \mathcal{F}_{n} \right\}_{n \in \mathbb{N}}$, if there exists a strictly increasing sequence of natural numbers $\left\{ i_{n}\right\}_{n \in \mathbb{N}}$ such that for all $n \in \mathbb{N}$ we have $B_{n} \in \mathcal{F}_{i_{n}}$. In this paper, we have proved that if for each $n\in\mathbb{N}$, $\mathcal{F}_n$ is a family of {\em nicely shaped} convex sets in $\mathbb{R}^d$ such that each heterochromatic sequence $\left\{B_{n}\right\}_{n\in\mathbb{N}}$ from $\left\{ \mathcal{F}_{n} \right\}_{n \in \mathbb{N}}$ contains at least $k+2$ sets that can be pierced by a single $k$-flat ($k$-dimensional affine space) then all but finitely many families in $\left\{\mathcal{F}_{n}\right\}_{n\in \mathbb{N}}$ can be pierced by finitely many $k$-flats. This result can be considered as a {\em countably colorful} generalization of the $(\aleph_0, k+2)$-theorem proved by Keller and Perles (Symposium on Computational Geometry 2022). We have also established the tightness of our result by proving a number of no-go theorems.
The notion of $\alpha$-equivalence between $\lambda$-terms is commonly used to identify terms that are considered equal. However, due to the primitive treatment of free variables, this notion falls short when comparing subterms occurring within a larger context. Depending on the usage of the Barendregt convention (choosing different variable names for all involved binders), it will equate either too few or too many subterms. We introduce a formal notion of context-sensitive $\alpha$-equivalence, where two open terms can be compared within a context that resolves their free variables. We show that this equivalence coincides exactly with the notion of bisimulation equivalence. Furthermore, we present an efficient $O(n\log n)$ runtime algorithm that identifies $\lambda$-terms modulo context-sensitive $\alpha$-equivalence, improving upon a previously established $O(n\log^2 n)$ bound for a hashing modulo ordinary $\alpha$-equivalence by Maziarz et al. Hashing $\lambda$-terms is useful in many applications that require common subterm elimination and structure sharing. We employ the algorithm to obtain a large-scale, densely packed, interconnected graph of mathematical knowledge from the Coq proof assistant for machine learning purposes.