We investigate the decidability of the monadic second-order (MSO) theory of the structure $\langle \mathbb{N};<,P_1, \ldots,P_k \rangle$, for various unary predicates $P_1,\ldots,P_k \subseteq \mathbb{N}$. We focus in particular on "arithmetic" predicates arising in the study of linear recurrence sequences, such as fixed-base powers $\mathsf{Pow}_k = \{k^n : n \in \mathbb{N}\}$, $k$-th powers $\mathsf{N}_k = \{n^k : n \in \mathbb{N}\}$, and the set of terms of the Fibonacci sequence $\mathsf{Fib} = \{0,1,2,3,5,8,13,\ldots\}$ (and similarly for other linear recurrence sequences having a single, non-repeated, dominant characteristic root). We obtain several new unconditional and conditional decidability results, a select sample of which are the following: $\bullet$ The MSO theory of $\langle \mathbb{N};<,\mathsf{Pow}_2, \mathsf{Fib} \rangle$ is decidable; $\bullet$ The MSO theory of $\langle \mathbb{N};<, \mathsf{Pow}_2, \mathsf{Pow}_3, \mathsf{Pow}_6 \rangle$ is decidable; $\bullet$ The MSO theory of $\langle \mathbb{N};<, \mathsf{Pow}_2, \mathsf{Pow}_3, \mathsf{Pow}_5 \rangle$ is decidable assuming Schanuel's conjecture; $\bullet$ The MSO theory of $\langle \mathbb{N};<, \mathsf{Pow}_4, \mathsf{N}_2 \rangle$ is decidable; $\bullet$ The MSO theory of $\langle \mathbb{N};<, \mathsf{Pow}_2, \mathsf{N}_2 \rangle$ is Turing-equivalent to the MSO theory of $\langle \mathbb{N};<,S \rangle$, where $S$ is the predicate corresponding to the binary expansion of $\sqrt{2}$. (As the binary expansion of $\sqrt{2}$ is widely believed to be normal, the corresponding MSO theory is in turn expected to be decidable.) These results are obtained by exploiting and combining techniques from dynamical systems, number theory, and automata theory.
We present a pseudopolynomial-time algorithm for the Knapsack problem that has running time $\widetilde{O}(n + t\sqrt{p_{\max}})$, where $n$ is the number of items, $t$ is the knapsack capacity, and $p_{\max}$ is the maximum item profit. This improves over the $\widetilde{O}(n + t \, p_{\max})$-time algorithm based on the convolution and prediction technique by Bateni et al.~(STOC 2018). Moreover, we give some evidence, based on a strengthening of the Min-Plus Convolution Hypothesis, that our running time might be optimal. Our algorithm uses two new technical tools, which might be of independent interest. First, we generalize the $\widetilde{O}(n^{1.5})$-time algorithm for bounded monotone min-plus convolution by Chi et al.~(STOC 2022) to the \emph{rectangular} case where the range of entries can be different from the sequence length. Second, we give a reduction from general knapsack instances to \emph{balanced} instances, where all items have nearly the same profit-to-weight ratio, up to a constant factor. Using these techniques, we can also obtain algorithms that run in time $\widetilde{O}(n + OPT\sqrt{w_{\max}})$, $\widetilde{O}(n + (nw_{\max}p_{\max})^{1/3}t^{2/3})$, and $\widetilde{O}(n + (nw_{\max}p_{\max})^{1/3} OPT^{2/3})$, where $OPT$ is the optimal total profit and $w_{\max}$ is the maximum item weight.
We study first-order logic over unordered structures whose elements carry a finite number of data values from an infinite domain. Data values can be compared wrt.\ equality. As the satisfiability problem for this logic is undecidable in general, we introduce a family of local fragments. They restrict quantification to the neighbourhood of a given reference point that is bounded by some radius. Our first main result establishes decidability of the satisfiability problem for the local radius-1 fragment in presence of one "diagonal relation". On the other hand, extending the radius leads to undecidability. In a second part, we provide the precise decidability and complexity landscape of the satisfiability problem for the existential fragments of local logic, which are parameterized by the number of data values carried by each element and the radius of the considered neighbourhoods. Altogether, we draw a landscape of formalisms that are suitable for the specification of systems with data and open up new avenues for future research.
Given a graph $G=(V,E)$ and an integer $k\in \mathbb{N}$, we investigate the 2-Eigenvalue Vertex Deletion (2-EVD) problem. The objective is to remove at most $k$ vertices such that the adjacency matrix of the resulting graph has at most two eigenvalues. It is established that the adjacency matrix of a graph has at most two eigenvalues if and only if the graph is a collection of equal-sized cliques. Thus, the 2-Eigenvalue Vertex Deletion amounts to removing a set of at most $k$ vertices to transform the graph into a collection of equal-sized cliques. The 2-Eigenvalue Edge Editing (2-EEE), 2-Eigenvalue Edge Deletion (2-EED) and 2-Eigenvalue Edge Addition (2-EEA) problems are defined analogously. We present a kernel of size $\mathcal{O}(k^{3})$ for $2$-EVD, along with an FPT algorithm with a running time of $\mathcal{O}^{*}(2^{k})$. For the problem $2$-EEE, we provide a kernel of size $\mathcal{O}(k^{2})$. Additionally, we present linear kernels of size $5k$ and $6k$ for $2$-EEA and $2$-EED respectively. For the $2$-EED, we also construct an algorithm with running time $\mathcal{O}^{*}(1.47^{k})$ . These results address open questions posed by Misra et al. (ISAAC 2023) regarding the complexity of these problems when parameterized by the solution size.
Let $R \cup B$ be a set of $n$ points in $\mathbb{R}^2$, and let $k \in 1..n$. Our goal is to compute a line that "best" separates the "red" points $R$ from the "blue" points $B$ with at most $k$ outliers. We present an efficient semi-online dynamic data structure that can maintain whether such a separator exists. Furthermore, we present efficient exact and approximation algorithms that compute a linear separator that is guaranteed to misclassify at most $k$, points and minimizes the distance to the farthest outlier. Our exact algorithm runs in $O(nk + n \log n)$ time, and our $(1+\varepsilon)$-approximation algorithm runs in $O(\varepsilon^{-1/2}((n + k^2) \log n))$ time. Based on our $(1+\varepsilon)$-approximation algorithm we then also obtain a semi-online data structure to maintain such a separator efficiently.
This Paper defines and explores solution to the problem of \emph{Inversion of a finite Sequence} over the binary field, that of finding a prefix element of the sequence which confirms with a \emph{Recurrence Relation} (RR) rule defined by a polynomial and satisfied by the sequence. The minimum number of variables (order) in a polynomial of a fixed degree defining RRs is termed as the \emph{Polynomial Complexity} of the sequence at that degree, while the minimum number of variables of such polynomials at a fixed degree which also result in a unique prefix to the sequence and maximum rank of the matrix of evaluation of its monomials, is called \emph{Polynomial Complexity of Inversion} at the chosen degree. Solutions of this problems discovers solutions to the problem of \emph{Local Inversion} of a map $F:\ftwo^n\rightarrow\ftwo^n$ at a point $y$ in $\ftwo^n$, that of solving for $x$ in $\ftwo^n$ from the equation $y=F(x)$. Local inversion of maps has important applications which provide value to this theory. In previous work it was shown that minimal order \emph{Linear Recurrence Relations} (LRR) satisfied by the sequence known as the \emph{Linear Complexity} (LC) of the sequence, gives a unique solution to the inversion when the sequence is a part of a periodic sequence. This paper explores extension of this theory for solving the inversion problem by considering \emph{Non-linear Recurrence Relations} defined by a polynomials of a fixed degree $>1$ and satisfied by the sequence. The minimal order of polynomials satisfied by a sequence is well known as non-linear complexity (defining a Feedback Shift Register of smallest order which determines the sequences by RRs) and called as \emph{Maximal Order Complexity} (MOC) of the sequence. However unlike the LC there is no unique polynomial recurrence relation at any degree.
We consider the problem of sampling from the posterior distribution of a $d$-dimensional coefficient vector $\boldsymbol{\theta}$, given linear observations $\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\theta}+\boldsymbol{\varepsilon}$. In general, such posteriors are multimodal, and therefore challenging to sample from. This observation has prompted the exploration of various heuristics that aim at approximating the posterior distribution. In this paper, we study a different approach based on decomposing the posterior distribution into a log-concave mixture of simple product measures. This decomposition allows us to reduce sampling from a multimodal distribution of interest to sampling from a log-concave one, which is tractable and has been investigated in detail. We prove that, under mild conditions on the prior, for random designs, such measure decomposition is generally feasible when the number of samples per parameter $n/d$ exceeds a constant threshold. We thus obtain a provably efficient (polynomial time) sampling algorithm in a regime where this was previously not known. Numerical simulations confirm that the algorithm is practical, and reveal that it has attractive statistical properties compared to state-of-the-art methods.
Kaplan et al. and Hoffmann et al. developed influential scaling laws for the optimal model size as a function of the compute budget, but these laws yield substantially different predictions. We explain the discrepancy by reproducing the Kaplan scaling law on two datasets (OpenWebText2 and RefinedWeb) and identifying three factors causing the difference: last layer computational cost, warmup duration, and scale-dependent optimizer tuning. With these factors corrected, we obtain excellent agreement with the Hoffmann et al. (i.e., "Chinchilla") scaling law. Counter to a hypothesis of Hoffmann et al., we find that careful learning rate decay is not essential for the validity of their scaling law. As a secondary result, we derive scaling laws for the optimal learning rate and batch size, finding that tuning the AdamW $\beta_2$ parameter is essential at lower batch sizes.
{\sc Vertex $(s, t)$-Cut} and {\sc Vertex Multiway Cut} are two fundamental graph separation problems in algorithmic graph theory. We study matroidal generalizations of these problems, where in addition to the usual input, we are given a representation $R \in \mathbb{F}^{r \times n}$ of a linear matroid $\mathcal{M} = (V(G), \mathcal{I})$ of rank $r$ in the input, and the goal is to determine whether there exists a vertex subset $S \subseteq V(G)$ that has the required cut properties, as well as is independent in the matroid $\mathcal{M}$. We refer to these problems as {\sc Independent Vertex $(s, t)$-cut}, and {\sc Independent Multiway Cut}, respectively. We show that these problems are fixed-parameter tractable ({\sf FPT}) when parameterized by the solution size (which can be assumed to be equal to the rank of the matroid $\mathcal{M}$). These results are obtained by exploiting the recent technique of flow augmentation [Kim et al.~STOC '22], combined with a dynamic programming algorithm on flow-paths \'a la [Feige and Mahdian,~STOC '06] that maintains a representative family of solutions w.r.t.~the given matroid [Marx, TCS '06; Fomin et al., JACM]. As a corollary, we also obtain {\sf FPT} algorithms for the independent version of {\sc Odd Cycle Transversal}. Further, our results can be generalized to other variants of the problems, e.g., weighted versions, or edge-deletion versions.
In the Maximum Independent Set of Hyperrectangles problem, we are given a set of $n$ (possibly overlapping) $d$-dimensional axis-aligned hyperrectangles, and the goal is to find a subset of non-overlapping hyperrectangles of maximum cardinality. For $d=1$, this corresponds to the classical Interval Scheduling problem, where a simple greedy algorithm returns an optimal solution. In the offline setting, for $d$-dimensional hyperrectangles, polynomial time $(\log n)^{O(d)}$-approximation algorithms are known. However, the problem becomes notably challenging in the online setting, where the input objects (hyperrectangles) appear one by one in an adversarial order, and on the arrival of an object, the algorithm needs to make an immediate and irrevocable decision whether or not to select the object while maintaining the feasibility. Even for interval scheduling, an $\Omega(n)$ lower bound is known on the competitive ratio. To circumvent these negative results, in this work, we study the online maximum independent set of axis-aligned hyperrectangles in the random-order arrival model, where the adversary specifies the set of input objects which then arrive in a uniformly random order. Starting from the prototypical secretary problem, the random-order model has received significant attention to study algorithms beyond the worst-case competitive analysis. Surprisingly, we show that the problem in the random-order model almost matches the best-known offline approximation guarantees, up to polylogarithmic factors. In particular, we give a simple $(\log n)^{O(d)}$-competitive algorithm for $d$-dimensional hyperrectangles in this model, which runs in $\tilde{O_d}(n)$ time. Our approach also yields $(\log n)^{O(d)}$-competitive algorithms in the random-order model for more general objects such as $d$-dimensional fat objects and ellipsoids. Furthermore, our guarantees hold with high probability.
In multi-turn dialog, utterances do not always take the full form of sentences \cite{Carbonell1983DiscoursePA}, which naturally makes understanding the dialog context more difficult. However, it is essential to fully grasp the dialog context to generate a reasonable response. Hence, in this paper, we propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question, where the question is focused on the omitted information in the dialog. Enlightened by the multi-task learning scheme, we propose a joint framework that unifies these two tasks, sharing the same encoder to extract the common and task-invariant features with different decoders to learn task-specific features. To better fusing information from the question and the dialog history in the encoding part, we propose to augment the Transformer architecture with a memory updater, which is designed to selectively store and update the history dialog information so as to support downstream tasks. For the experiment, we employ human annotators to write and examine a large-scale dialog reading comprehension dataset. Extensive experiments are conducted on this dataset, and the results show that the proposed model brings substantial improvements over several strong baselines on both tasks. In this way, we demonstrate that reasoning can indeed help better response generation and vice versa. We release our large-scale dataset for further research.