The approximate uniform sampling of graphs with a given degree sequence is a well-known, extensively studied problem in theoretical computer science and has significant applications, e.g., in the analysis of social networks. In this work we study an extension of the problem, where degree intervals are specified rather than a single degree sequence. We are interested in sampling and counting graphs whose degree sequences satisfy the degree interval constraints. A natural scenario where this problem arises is in hypothesis testing on social networks that are only partially observed. In this work, we provide the first fully polynomial almost uniform sampler (FPAUS) as well as the first fully polynomial randomized approximation scheme (FPRAS) for sampling and counting, respectively, graphs with near-regular degree intervals, in which every node $i$ has a degree from an interval not too far away from a given $d \in \N$. In order to design our FPAUS, we rely on various state-of-the-art tools from Markov chain theory and combinatorics. In particular, we provide the first non-trivial algorithmic application of a breakthrough result of Liebenau and Wormald (2017) regarding an asymptotic formula for the number of graphs with a given near-regular degree sequence. Furthermore, we also make use of the recent breakthrough of Anari et al. (2019) on sampling a base of a matroid under a strongly log-concave probability distribution. As a more direct approach, we also study a natural Markov chain recently introduced by Rechner, Strowick and M\"uller-Hannemann (2018), based on three simple local operations: Switches, hinge flips, and additions/deletions of a single edge. We obtain the first theoretical results for this Markov chain by showing it is rapidly mixing for the case of near-regular degree intervals of size at most one.
We study the tractability of conjunctive query answering for queries with unbounded arity. It is well known that tractability of the problem can be characterised in terms of the queries treewidth under the assumption of bounded arity. We extend this result to cases with unbounded arity but degree 2. To do so, we introduce hypergraph dilutions as an alternative method to primal graph minors for studying substructures of hypergraphs. Using dilutions we observe an analogue to the Excluded Grid Theorem for degree 2 hypergraphs. In consequence, we show that that the tractability of conjunctive query answering can be characterised in terms of generalised hypertree width. A similar characterisation is also shown for the corresponding counting problem. We also generalise our main structural result to arbitrary bounded degree and discuss possible paths towards a characterisation of the bounded degree case.
We study a variant of the classical $k$-median problem known as diversity-aware $k$-median (introduced by Thejaswi et al. 2021), where we are given a collection of facility subsets, and a solution must contain at least a specified number of facilities from each subset.We investigate the fixed-parameter tractability of this problem and show several negative hardness and inapproximability results, even when we afford exponential running time with respect to some parameters of the problem. Motivated by these results we present a fixed parameter approximation algorithm with approximation ratio $(1 + \frac{2}{e} +\epsilon)$, and argue that this ratio is essentially tight assuming the gap-exponential time hypothesis. We also present a simple, practical local-search algorithm that gives a bicriteria $(2k, 3+\epsilon)$ approximation with better running time bounds.
A $t$-spanner of a graph $G=(V,E)$ is a subgraph $H=(V,E')$ that contains a $uv$-path of length at most $t$ for every $uv\in E$. It is known that every $n$-vertex graph admits a $(2k-1)$-spanner with $O(n^{1+1/k})$ edges for $k\geq 1$. This bound is the best possible for $1\leq k\leq 9$ and is conjectured to be optimal due to Erd\H{o}s' girth conjecture. We study $t$-spanners for $t\in \{2,3\}$ for geometric intersection graphs in the plane. These spanners are also known as \emph{$t$-hop spanners} to emphasize the use of graph-theoretic distances (as opposed to Euclidean distances between the geometric objects or their centers). We obtain the following results: (1) Every $n$-vertex unit disk graph (UDG) admits a 2-hop spanner with $O(n)$ edges; improving upon the previous bound of $O(n\log n)$. (2) The intersection graph of $n$ axis-aligned fat rectangles admits a 2-hop spanner with $O(n\log n)$ edges, and this bound is the best possible. (3) The intersection graph of $n$ fat convex bodies in the plane admits a 3-hop spanner with $O(n\log n)$ edges. (4) The intersection graph of $n$ axis-aligned rectangles admits a 3-hop spanner with $O(n\log^2 n)$ edges.
We study a variant of the classical $k$-median problem known as diversity-aware $k$-median (introduced by Thejaswi et al. 2021), where we are given a collection of facility subsets, and a solution must contain at least a specified number of facilities from each subset.We investigate the fixed-parameter tractability of this problem and show several negative hardness and inapproximability results, even when we afford exponential running time with respect to some parameters of the problem. Motivated by these results we present a fixed parameter approximation algorithm with approximation ratio $(1 + \frac{2}{e} +\epsilon)$, and argue that this ratio is essentially tight assuming the gap-exponential time hypothesis. We also present a simple, practical local-search algorithm that gives a bicriteria $(2k, 3+\epsilon)$ approximation with better running time bounds.
This paper investigates scheduling policies for file retrieval in linear storage devices, such as magnetic tapes. Tapes are the technology of choice for long-term storage in data centers due to their low cost per capacity, reliability, and data security. While scheduling problems associated with data retrieval in tapes are classical, existing works focus on more straightforward heuristic approaches due to limited computational times imposed by standard tape specifications. Our first contribution is a theoretical investigation of three standard policies, presenting their worst-case performance and special cases of practical relevance for which they are optimal. Next, we show that the problem is polynomially solvable via two interleaved recursive models, albeit with high computational complexity. We leverage our previous results to develop two new scheduling policies with constant-ratio performance and low computational cost. Finally, we investigate properties associated with the online variant of the problem, presenting a new constant-factor competitive algorithm. Our numerical analysis on synthetic and real-world tapes from an industry partner provides insights into dataset configurations where each policy is more effective, which is of relevance to data center managers. In particular, our new best-performing policy is practical for large datasets and significantly improves upon standard algorithms in the area.
We prove that every simple 2-connected subcubic graph on $n$ vertices with $n_2$ vertices of degree 2 has a TSP walk of length at most $\frac{5n+n_2}{4}-1$, confirming a conjecture of Dvo\v{r}\'ak, Kr\'al', and Mohar. This bound is best possible; there are infinitely many subcubic and cubic graphs whose minimum TSP walks have lengths $\frac{5n+n_2}{4}-1$ and $\frac{5n}{4} - 2$ respectively. We characterize the extremal subcubic examples meeting this bound. We also give a quadratic-time combinatorial algorithm for finding such a TSP walk. In particular, we obtain a $\frac{5}{4}$-approximation algorithm for the graphic TSP on simple cubic graphs, improving on the previously best known approximation ratio of $\frac{9}{7}$.
Core decomposition is a classic technique for discovering densely connected regions in a graph with large range of applications. Formally, a $k$-core is a maximal subgraph where each vertex has at least $k$ neighbors. A natural extension of a $k$-core is a $(k, h)$-core, where each node must have at least $k$ nodes that can be reached with a path of length $h$. The downside in using $(k, h)$-core decomposition is the significant increase in the computational complexity: whereas the standard core decomposition can be done in $O(m)$ time, the generalization can require $O(n^2m)$ time, where $n$ and $m$ are the number of nodes and edges in the given graph. In this paper we propose a randomized algorithm that produces an $\epsilon$-approximation of $(k, h)$ core decomposition with a probability of $1 - \delta$ in $O(\epsilon^{-2} hm (\log^2 n - \log \delta))$ time. The approximation is based on sampling the neighborhoods of nodes, and we use Chernoff bound to prove the approximation guarantee. We demonstrate empirically that approximating the decomposition complements the exact computation: computing the approximation is significantly faster than computing the exact solution for the networks where computing the exact solution is slow.
Real-world data often comes in compressed form. Analyzing compressed data directly (without decompressing it) can save space and time by orders of magnitude. In this work, we focus on fundamental sequence comparison problems and try to quantify the gain in time complexity when the underlying data is highly compressible. We consider grammar compression, which unifies many practically relevant compression schemes. For two strings of total length $N$ and total compressed size $n$, it is known that the edit distance and a longest common subsequence (LCS) can be computed exactly in time $\tilde{O}(nN)$, as opposed to $O(N^2)$ for the uncompressed setting. Many applications need to align multiple sequences simultaneously, and the fastest known exact algorithms for median edit distance and LCS of $k$ strings run in $O(N^k)$ time. This naturally raises the question of whether compression can help to reduce the running time significantly for $k \geq 3$, perhaps to $O(N^{k/2}n^{k/2})$ or $O(Nn^{k-1})$. Unfortunately, we show lower bounds that rule out any improvement beyond $\Omega(N^{k-1}n)$ time for any of these problems assuming the Strong Exponential Time Hypothesis. At the same time, we show that approximation and compression together can be surprisingly effective. We develop an $\tilde{O}(N^{k/2}n^{k/2})$-time FPTAS for the median edit distance of $k$ sequences. In comparison, no $O(N^{k-\Omega(1)})$-time PTAS is known for the median edit distance problem in the uncompressed setting. For two strings, we get an $\tilde{O}(N^{2/3}n^{4/3})$-time FPTAS for both edit distance and LCS. In contrast, for uncompressed strings, there is not even a subquadratic algorithm for LCS that has less than a polynomial gap in the approximation factor. Building on the insight from our approximation algorithms, we also obtain results for many distance measures including the edit, Hamming, and shift distances.
We study fractional variants of the quasi-norms introduced by Brezis, Van Schaftingen, and Yung in the study of the Sobolev space $\dot W^{1,p}$. The resulting spaces are identified as a special class of real interpolation spaces of Sobolev-Slobodecki\u{\i} spaces. We establish the equivalence between Fourier analytic definitions and definitions via difference operators acting on measurable functions. We prove various new results on embeddings and non-embeddings, and give applications to harmonic and caloric extensions. For suitable wavelet bases we obtain a characterization of the approximation spaces for best $n$-term approximation from a wavelet basis via smoothness conditions on the function; this extends a classical result by DeVore, Jawerth and Popov.
We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate $\widetilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.