A string $w$ is called a minimal absent word (MAW) for another string $T$ if $w$ does not occur in $T$ but the proper substrings of $w$ occur in $T$. For example, let $\Sigma = \{\mathtt{a, b, c}\}$ be the alphabet. Then, the set of MAWs for string $w = \mathtt{abaab}$ is $\{\mathtt{aaa, aaba, bab, bb, c}\}$. In this paper, we study combinatorial properties of MAWs in the sliding window model, namely, how the set of MAWs changes when a sliding window of fixed length $d$ is shifted over the input string $T$ of length $n$, where $1 \leq d < n$. We present \emph{tight} upper and lower bounds on the maximum number of changes in the set of MAWs for a sliding window over $T$, both in the cases of general alphabets and binary alphabets. Our bounds improve on the previously known best bounds [Crochemore et al., 2020].
This work studies an experimental design problem where {the values of a predictor variable, denoted by $x$}, are to be determined with the goal of estimating a function $m(x)$, which is observed with noise. A linear model is fitted to $m(x)$ but it is not assumed that the model is correctly specified. It follows that the quantity of interest is the best linear approximation of $m(x)$, which is denoted by $\ell(x)$. It is shown that in this framework the ordinary least squares estimator typically leads to an inconsistent estimation of $\ell(x)$, and rather weighted least squares should be considered. An asymptotic minimax criterion is formulated for this estimator, and a design that minimizes the criterion is constructed. An important feature of this problem is that the $x$'s should be random, rather than fixed. Otherwise, the minimax risk is infinite. It is shown that the optimal random minimax design is different from its deterministic counterpart, which was studied previously, and a simulation study indicates that it generally performs better when $m(x)$ is a quadratic or a cubic function. Another finding is that when the variance of the noise goes to infinity, the random and deterministic minimax designs coincide. The results are illustrated for polynomial regression models and the general case is also discussed.
Consider a variant of Tetris played on a board of width $w$ and infinite height, where the pieces are axis-aligned rectangles of arbitrary integer dimensions, the pieces can only be moved before letting them drop, and a row does not disappear once it is full. Suppose we want to follow a greedy strategy: let each rectangle fall where it will end up the lowest given the current state of the board. To do so, we want a data structure which can always suggest a greedy move. In other words, we want a data structure which maintains a set of $O(n)$ rectangles, supports queries which return where to drop the rectangle, and updates which insert a rectangle dropped at a certain position and return the height of the highest point in the updated set of rectangles. We show via a reduction to the Multiphase problem [P\u{a}tra\c{s}cu, 2010] that on a board of width $w=\Theta(n)$, if the OMv conjecture [Henzinger et al., 2015] is true, then both operations cannot be supported in time $O(n^{1/2-\epsilon})$ simultaneously. The reduction also implies polynomial bounds from the 3-SUM conjecture and the APSP conjecture. On the other hand, we show that there is a data structure supporting both operations in $O(n^{1/2}\log^{3/2}n)$ time on boards of width $n^{O(1)}$, matching the lower bound up to a $n^{o(1)}$ factor.
Let $\mathcal{G}$ be a directed graph with vertices $1,2,\ldots, 2N$. Let $\mathcal{T}=(T_{i,j})_{(i,j)\in\mathcal{G}}$ be a family of contractive similitudes. For every $1\leq i\leq N$, let $i^+:=i+N$. For $1\leq i,j\leq N$, we define $\mathcal{M}_{i,j}=\{(i,j),(i,j^+),(i^+,j),(i^+,j^+)\}\cap\mathcal{G}$. We assume that $T_{\widetilde{i},\widetilde{j}}=T_{i,j}$ for every $(\widetilde{i},\widetilde{j})\in \mathcal{M}_{i,j}$. Let $K$ denote the Mauldin-Williams fractal determined by $\mathcal{T}$. Let $\chi=(\chi_i)_{i=1}^{2N}$ be a positive probability vector and $P$ a row-stochastic matrix which serves as an incidence matrix for $\mathcal{G}$. We denote by $\nu$ the Markov-type measure associated with $\chi$ and $P$. Let $\Omega=\{1,\ldots,2N\}$ and $G_\infty=\{\sigma\in\Omega^{\mathbb{N}}:(\sigma_i,\sigma_{i+1})\in\mathcal{G}, \;i\geq 1\}$. Let $\pi$ be the natural projection from $G_\infty$ to $K$ and $\mu=\nu\circ\pi^{-1}$. We consider the following two cases: 1. $\mathcal{G}$ has two strongly connected components consisting of $N$ vertices; 2. $\mathcal{G}$ is strongly connected. With some assumptions for $\mathcal{G}$ and $\mathcal{T}$, for case 1, we determine the exact value $s_r$ of the quantization dimension $D_r(\mu)$ for $\mu$ and prove that the $s_r$-dimensional lower quantization coefficient is always positive, but the upper one can be infinite; we establish a necessary and sufficient condition for the upper quantization coefficient for $\mu$ to be finite; for case 2, we determine $D_r(\mu)$ in terms of a pressure-like function and prove that $D_r(\mu)$-dimensional upper and lower quantization coefficient are both positive and finite.
In this paper, we consider two fundamental symmetric kernels in linear algebra: the Cholesky factorization and the symmetric rank-$k$ update (SYRK), with the classical three nested loops algorithms for these kernels. In addition, we consider a machine model with a fast memory of size $S$ and an unbounded slow memory. In this model, all computations must be performed on operands in fast memory, and the goal is to minimize the amount of communication between slow and fast memories. As the set of computations is fixed by the choice of the algorithm, only the ordering of the computations (the schedule) directly influences the volume of communications.We prove lower bounds of $\frac{1}{3\sqrt{2}}\frac{N^3}{\sqrt{S}}$ for the communication volume of the Cholesky factorization of an $N\times N$ symmetric positive definite matrix, and of $\frac{1}{\sqrt{2}}\frac{N^2M}{\sqrt{S}}$ for the SYRK computation of $\mat{A}\cdot\transpose{\mat{A}}$, where $\mathbf{A}$ is an $N\times M$ matrix. Both bounds improve the best known lower bounds from the literature by a factor $\sqrt{2}$.In addition, we present two out-of-core, sequential algorithms with matching communication volume: \TBS for SYRK, with a volume of $\frac{1}{\sqrt{2}}\frac{N^2M}{\sqrt{S}} + \bigo{NM\log N}$, and \LBC for Cholesky, with a volume of $\frac{1}{3\sqrt{2}}\frac{N^3}{\sqrt{S}} + \bigo{N^{5/2}}$. Both algorithms improve over the best known algorithms from the literature by a factor $\sqrt{2}$, and prove that the leading terms in our lower bounds cannot be improved further. This work shows that the operational intensity of symmetric kernels like SYRK or Cholesky is intrinsically higher (by a factor $\sqrt{2}$) than that of corresponding non-symmetric kernels (GEMM and LU factorization).
We present a quite curious generalization of multi-step Fibonacci numbers. For any positive rational $q$, we enumerate binary words of length $n$ whose maximal factors of the form $0^a1^b$ satisfy $a = 0$ or $aq > b$. When $q$ is an integer we rediscover classical multi-step Fibonacci numbers: Fibonacci, Tribonacci, Tetranacci, etc. When $q$ is not an integer, obtained recurrence relations are connected to certain restricted integer compositions. We also discuss Gray codes for these words, and a possibly novel generalization of the golden ratio.
In the Strip Packing problem (SP), we are given a vertical half-strip $[0,W]\times[0,\infty)$ and a set of $n$ axis-aligned rectangles of width at most $W$. The goal is to find a non-overlapping packing of all rectangles into the strip such that the height of the packing is minimized. A well-studied and frequently used practical constraint is to allow only those packings that are guillotine separable, i.e., every rectangle in the packing can be obtained by recursively applying a sequence of edge-to-edge axis-parallel cuts (guillotine cuts) that do not intersect any item of the solution. In this paper, we study approximation algorithms for the Guillotine Strip Packing problem (GSP), i.e., the Strip Packing problem where we require additionally that the packing needs to be guillotine separable. This problem generalizes the classical Bin Packing problem and also makespan minimization on identical machines, and thus it is already strongly NP-hard. Moreover, due to a reduction from the Partition problem, it is NP-hard to obtain a polynomial-time $(3/2-\varepsilon)$-approximation algorithm for GSP for any $\varepsilon>0$ (exactly as Strip Packing). We provide a matching polynomial time $(3/2+\varepsilon)$-approximation algorithm for GSP. Furthermore, we present a pseudo-polynomial time $(1+\varepsilon)$-approximation algorithm for GSP. This is surprising as it is NP-hard to obtain a $(5/4-\varepsilon)$-approximation algorithm for (general) Strip Packing in pseudo-polynomial time. Thus, our results essentially settle the approximability of GSP for both the polynomial and the pseudo-polynomial settings.
We consider the following oblivious sketching problem: given $\epsilon \in (0,1/3)$ and $n \geq d/\epsilon^2$, design a distribution $\mathcal{D}$ over $\mathbb{R}^{k \times nd}$ and a function $f: \mathbb{R}^k \times \mathbb{R}^{nd} \rightarrow \mathbb{R}$, so that for any $n \times d$ matrix $A$, $$\Pr_{S \sim \mathcal{D}} [(1-\epsilon) \|A\|_{op} \leq f(S(A),S) \leq (1+\epsilon)\|A\|_{op}] \geq 2/3,$$ where $\|A\|_{op}$ is the operator norm of $A$ and $S(A)$ denotes $S \cdot A$, interpreting $A$ as a vector in $\mathbb{R}^{nd}$. We show a tight lower bound of $k = \Omega(d^2/\epsilon^2)$ for this problem. Our result considerably strengthens the result of Nelson and Nguyen (ICALP, 2014), as it (1) applies only to estimating the operator norm, which can be estimated given any OSE, and (2) applies to distributions over general linear operators $S$ which treat $A$ as a vector and compute $S(A)$, rather than the restricted class of linear operators corresponding to matrix multiplication. Our technique also implies the first tight bounds for approximating the Schatten $p$-norm for even integers $p$ via general linear sketches, improving the previous lower bound from $k = \Omega(n^{2-6/p})$ [Regev, 2014] to $k = \Omega(n^{2-4/p})$. Importantly, for sketching the operator norm up to a factor of $\alpha$, where $\alpha - 1 = \Omega(1)$, we obtain a tight $k = \Omega(n^2/\alpha^4)$ bound, matching the upper bound of Andoni and Nguyen (SODA, 2013), and improving the previous $k = \Omega(n^2/\alpha^6)$ lower bound. Finally, we also obtain the first lower bounds for approximating Ky Fan norms.
Motivated by applications to topological data analysis, we give an efficient algorithm for computing a (minimal) presentation of a bigraded $K[x,y]$-module $M$, where $K$ is a field. The algorithm takes as input a short chain complex of free modules $X\xrightarrow{f} Y \xrightarrow{g} Z$ such that $M\cong \ker{g}/\mathrm{im}{f}$. It runs in time $O(|X|^3+|Y|^3+|Z|^3)$ and requires $O(|X|^2+|Y|^2+|Z|^2)$ memory, where $|\cdot |$ denotes the rank. Given the presentation computed by our algorithm, the bigraded Betti numbers of $M$ are readily computed. Our approach is based on a simple matrix reduction algorithm, slight variants of which compute kernels of morphisms between free modules, minimal generating sets, and Gr\"obner bases. Our algorithm for computing minimal presentations has been implemented in RIVET, a software tool for the visualization and analysis of two-parameter persistent homology. In experiments on topological data analysis problems, our implementation outperforms the standard computational commutative algebra packages Singular and Macaulay2 by a wide margin.
Solving polynomial systems whose solution set is finite is usually done in two main steps: compute a Gr\"obner basis for the degree reverse lexicographic order, and perform a change of order to find the lexicographic Gr\"obner basis. The second step is generally considered as better understood, in terms of algorithms and complexity. Yet, after two decades of progress on the first step, it turns out that the change of order now takes a large part of the solving time for many instances, including those that are generic or reached after applying a random change of variables. Like the fastest known change of order algorithms, this work focuses on the latter situation, where the ideal defined by the system satisfies structural properties. First, the ideal has a shape lexicographic Gr\"obner basis. Second, the set of leading terms with respect to the degree reverse lexicographic order has a stability property; in particular, the multiplication matrix of the smallest variable is computed for free from the input Gr\"obner basis. The current fastest algorithms rely on the sparsity of this multiplication matrix to find its minimal polynomial efficiently using Wiedemann's approach. This paper starts from the observation that this sparsity is a consequence of an algebraic structure, which can be exploited to represent the matrix concisely as a univariate polynomial matrix. We show that the Hermite normal form of that matrix yields the sought lexicographic Gr\"obner basis, under assumptions which cover the shape position case. This leads to an improved complexity bound for the second step. The practical benefit is also confirmed via implementations based on the state-of-the-art software libraries msolve and PML.
The scope of this paper is the analysis and approximation of an optimal control problem related to the Allen-Cahn equation. A tracking functional is minimized subject to the Allen-Cahn equation using distributed controls that satisfy point-wise control constraints. First and second order necessary and sufficient conditions are proved. The lowest order discontinuous Galerkin - in time - scheme is considered for the approximation of the control to state and adjoint state mappings. Under a suitable restriction on maximum size of the temporal and spatial discretization parameters $k$, $h$ respectively in terms of the parameter $\epsilon$ that describes the thickness of the interface layer, a-priori estimates are proved with constants depending polynomially upon $1/ \epsilon$. Unlike to previous works for the uncontrolled Allen-Cahn problem our approach does not rely on a construction of an approximation of the spectral estimate, and as a consequence our estimates are valid under low regularity assumptions imposed by the optimal control setting. These estimates are also valid in cases where the solution and its discrete approximation do not satisfy uniform space-time bounds independent of $\epsilon$. These estimates and a suitable localization technique, via the second order condition (see \cite{Arada-Casas-Troltzsch_2002,Casas-Mateos-Troltzsch_2005,Casas-Raymond_2006,Casas-Mateos-Raymond_2007}), allows to prove error estimates for the difference between local optimal controls and their discrete approximation as well as between the associated state and adjoint state variables and their discrete approximations