An $(m,n,R)$-de Bruijn covering array (dBCA) is a doubly periodic $M \times N$ array over an alphabet of size $q$ such that the set of all its $m \times n$ windows form a covering code with radius $R$. An upper bound of the smallest array area of an $(m,n,R)$-dBCA is provided using a probabilistic technique which is similar to the one that was used for an upper bound on the length of a de Bruijn covering sequence. A folding technique to construct a dBCA from a de Bruijn covering sequence or de Bruijn covering sequences code is presented. Several new constructions that yield shorter de Bruijn covering sequences and $(m,n,R)$-dBCAs with smaller areas are also provided. These constructions are mainly based on sequences derived from cyclic codes, self-dual sequences, primitive polynomials, an interleaving technique, folding, and mutual shifts of sequences with the same covering radius. Finally, constructions of de Bruijn covering sequences codes are also discussed.
Boolean function $F(x,y)$ for $x,y \in \{0,1\}^n$ is an XOR function if $F(x,y)=f(x\oplus y)$ for some function $f$ on $n$ input bits, where $\oplus$ is a bit-wise XOR. XOR functions are relevant in communication complexity, partially for allowing Fourier analytic technique. For total XOR functions it is known that deterministic communication complexity of $F$ is closely related to parity decision tree complexity of $f$. Montanaro and Osbourne (2009) observed that one-sided communication complexity $D_{cc}^{\rightarrow}(F)$ of $F$ is exactly equal to nonadaptive parity decision tree complexity $NADT^{\oplus}(f)$ of $f$. Hatami et al. (2018) showed that unrestricted communication complexity of $F$ is polynomially related to parity decision tree complexity of $f$. We initiate the studies of a similar connection for partial functions. We show that in case of one-sided communication complexity whether these measures are equal, depends on the number of undefined inputs of $f$. On the one hand, if $D_{cc}^{\rightarrow}(F)=t$ and $f$ is undefined on at most $O(\frac{2^{n-t}}{\sqrt{n-t}})$, then $NADT^{\oplus}(f)=t$. On the other hand, for a wide range of values of $D_{cc}^{\rightarrow}(F)$ and $NADT^{\oplus}(f)$ (from constant to $n-2$) we provide partial functions for which $D_{cc}^{\rightarrow}(F) < NADT^{\oplus}(f)$. In particular, we provide a function with an exponential gap between the two measures. Our separation results translate to the case of two-sided communication complexity as well, in particular showing that the result of Hatami et al. (2018) cannot be generalized to partial functions. Previous results for total functions heavily rely on Boolean Fourier analysis and the technique does not translate to partial functions. For the proofs of our results we build a linear algebraic framework instead. Separation results are proved through the reduction to covering codes.
This paper presents an $O^{*}(1.42^{n})$ time algorithm for the Maximum Cut problem on split graphs, along with a subexponential time algorithm for its decision variant.
A preference matrix $M$ has an entry for each pair of candidates in an election whose value $p_{ij}$ represents the proportion of voters that prefer candidate $i$ over candidate $j$. The matrix is rationalizable if it is consistent with a set of voters whose preferences are total orders. A celebrated open problem asks for a concise characterization of rationalizable preference matrices. In this paper, we generalize this matrix rationalizability question and study when a preference matrix is consistent with a set of voters whose preferences are partial orders of width $\alpha$. The width (the maximum cardinality of an antichain) of the partial order is a natural measure of the rationality of a voter; indeed, a partial order of width $1$ is a total order. Our primary focus concerns the rationality number, the minimum width required to rationalize a preference matrix. We present two main results. The first concerns the class of half-integral preference matrices, where we show the key parameter required in evaluating the rationality number is the chromatic number of the undirected unanimity graph associated with the preference matrix $M$. The second concerns the class of integral preference matrices, where we show the key parameter now is the dichromatic number of the directed voting graph associated with $M$.
In Path Set Packing, the input is an undirected graph $G$, a collection $\calp$ of simple paths in $G$, and a positive integer $k$. The problem is to decide whether there exist $k$ edge-disjoint paths in $\calp$. We study the parameterized complexity of Path Set Packing with respect to both natural and structural parameters. We show that the problem is $W[1]$-hard with respect to vertex cover number, and $W[1]$-hard respect to pathwidth plus maximum degree plus solution size. These results answer an open question raised in COCOON 2018. On the positive side, we present an FPT algorithm parameterized by feedback vertex number plus maximum degree, and present an FPT algorithm parameterized by treewidth plus maximum degree plus maximum length of a path in $\calp$. These positive results complement the hardness of Path Set Packing with respect to any subset of the parameters used in the FPT algorithms. We also give a $4$-approximation algorithm for maximum path set packing problem which runs in FPT time when parameterized by feedback edge number.
Reducing the `$\textit{hallucination}$' problem of Large Language Models (LLMs) is crucial for their wide applications. A comprehensive and fine-grained measurement of the hallucination is the first key step for the governance of this issue but is under-explored in the community. Thus, we present $\textbf{ANAH}$, a bilingual dataset that offers $\textbf{AN}$alytical $\textbf{A}$nnotation of $\textbf{H}$allucinations in LLMs within Generative Question Answering. Each answer sentence in our dataset undergoes rigorous annotation, involving the retrieval of a reference fragment, the judgment of the hallucination type, and the correction of hallucinated content. ANAH consists of ~12k sentence-level annotations for ~4.3k LLM responses covering over 700 topics, constructed by a human-in-the-loop pipeline. Thanks to the fine granularity of the hallucination annotations, we can quantitatively confirm that the hallucinations of LLMs progressively accumulate in the answer and use ANAH to train and evaluate hallucination annotators. We conduct extensive experiments on studying generative and discriminative annotators and show that, although current open-source LLMs have difficulties in fine-grained hallucination annotation, the generative annotator trained with ANAH can surpass all open-source LLMs and GPT-3.5, obtain performance competitive with GPT-4, and exhibits better generalization ability on unseen questions.
Oja's algorithm for streaming Principal Component Analysis (PCA) for $n$ datapoints in a $d$ dimensional space achieves the same sin-squared error $O(r_\mathsf{eff}/n)$ as the offline algorithm in $O(d)$ space and $O(nd)$ time and a single pass through the datapoints. Here $r_\mathsf{eff}$ is the effective rank (ratio of the trace and the principal eigenvalue of the population covariance matrix $\Sigma$). Under this computational budget, we consider the problem of sparse PCA, where the principal eigenvector of $\Sigma$ is $s$-sparse, and $r_\mathsf{eff}$ can be large. In this setting, to our knowledge, \textit{there are no known single-pass algorithms} that achieve the minimax error bound in $O(d)$ space and $O(nd)$ time without either requiring strong initialization conditions or assuming further structure (e.g., spiked) of the covariance matrix. We show that a simple single-pass procedure that thresholds the output of Oja's algorithm (the Oja vector) can achieve the minimax error bound under some regularity conditions in $O(d)$ space and $O(nd)$ time as long as $r_\mathsf{eff}=O(n/\log n)$. We present a nontrivial and novel analysis of the entries of the unnormalized Oja vector, which involves the projection of a product of independent random matrices on a random initial vector. This is completely different from previous analyses of Oja's algorithm and matrix products, which have been done when the $r_\mathsf{eff}$ is bounded.
We consider the feedback vertex set problem in undirected graphs (FVS). The input to FVS is an undirected graph $G=(V,E)$ with non-negative vertex costs. The goal is to find a minimum cost subset of vertices $S \subseteq V$ such that $G-S$ is acyclic. FVS is a well-known NP-hard problem and does not admit a $(2-\epsilon)$-approximation for any fixed $\epsilon > 0$ assuming the Unique Games Conjecture. There are combinatorial $2$-approximation algorithms and also primal-dual based $2$-approximations. Despite the existence of these algorithms for several decades, there is no known polynomial-time solvable LP relaxation for FVS with a provable integrality gap of at most $2$. More recent work (Chekuri and Madan, SODA '16) developed a polynomial-sized LP relaxation for a more general problem, namely Subset FVS, and showed that its integrality gap is at most $13$ for Subset FVS, and hence also for FVS. Motivated by this gap in our knowledge, we undertake a polyhedral study of FVS and related problems. In this work, we formulate new integer linear programs (ILPs) for FVS whose LP-relaxation can be solved in polynomial time, and whose integrality gap is at most $2$. The new insights in this process also enable us to prove that the formulation in (Chekuri and Madan, SODA '16) has an integrality gap of at most $2$ for FVS. Our results for FVS are inspired by new formulations and polyhedral results for the closely-related pseudoforest deletion set problem (PFDS). Our formulations for PFDS are in turn inspired by a connection to the densest subgraph problem. We also conjecture an extreme point property for a LP-relaxation for FVS, and give evidence for the conjecture via a corresponding result for PFDS.
We present two randomised approximate counting algorithms with $\widetilde{O}(n^{2-c}/\varepsilon^2)$ running time for some constant $c>0$ and accuracy $\varepsilon$: (1) for the hard-core model with fugacity $\lambda$ on graphs with maximum degree $\Delta$ when $\lambda=O(\Delta^{-1.5-c_1})$ where $c_1=c/(2-2c)$; (2) for spin systems with strong spatial mixing (SSM) on planar graphs with quadratic growth, such as $\mathbb{Z}^2$. For the hard-core model, Weitz's algorithm (STOC, 2006) achieves sub-quadratic running time when correlation decays faster than the neighbourhood growth, namely when $\lambda = o(\Delta^{-2})$. Our first algorithm does not require this property and extends the range where sub-quadratic algorithms exist. Our second algorithm appears to be the first to achieve sub-quadratic running time up to the SSM threshold, albeit on a restricted family of graphs. It also extends to (not necessarily planar) graphs with polynomial growth, such as $\mathbb{Z}^d$, but with a running time of the form $\widetilde{O}\left(n^2\varepsilon^{-2}/2^{c(\log n)^{1/d}}\right)$ where $d$ is the exponent of the polynomial growth and $c>0$ is some constant.
Given a graph $G$ and sets $\{\alpha_v~|~v \in V(G)\}$ and $\{\beta_v~|~v \in V(G)\}$ of non-negative integers, it is known that the decision problem whether $G$ contains a spanning tree $T$ such that $\alpha_v \le d_T (v) \le \beta_v $ for all $v \in V(G)$ is $NP$-complete. In this article, we relax the problem by demanding that the degree restrictions apply to vertices $v\in U$ only, where $U$ is a stable set of $G$. In this case, the problem becomes tractable. A. Frank presented a result characterizing the positive instances of that relaxed problem. Using matroid intersection developed by J. Edmonds, we give a new and short proof of Frank's result and show that if $U$ is stable and the edges of $G$ are weighted by arbitrary real numbers, then even a minimum-cost tree $T$ with $\alpha_v \le d_T (v) \le \beta_v $ for all $v \in U$ can be found in polynomial time if such a tree exists.
We show that it is decidable, given an automatic sequence $\bf s$ and a constant $c$, whether all prefixes of $\bf s$ have a string attractor of size $\leq c$. Using a decision procedure based on this result, we show that all prefixes of the period-doubling sequence of length $\geq 2$ have a string attractor of size $2$. We also prove analogous results for other sequences, including the Thue-Morse sequence and the Tribonacci sequence. We also provide general upper and lower bounds on string attractor size for different kinds of sequences. For example, if $\bf s$ has a finite appearance constant, then there is a string attractor for ${\bf s}[0..n-1]$ of size $O(\log n)$. If further $\bf s$ is linearly recurrent, then there is a string attractor for ${\bf s}[0..n-1]$ of size $O(1)$. For automatic sequences, the size of the smallest string attractor for ${\bf s}[0..n-1]$ is either $\Theta(1)$ or $\Theta(\log n)$, and it is decidable which case occurs. Finally, we close with some remarks about greedy string attractors.