We study the existence of polynomial kernels, for parameterized problems without a polynomial kernel on general graphs, when restricted to graphs of bounded twin-width. Our main result is that a polynomial kernel for $k$-Dominating Set on graphs of twin-width at most 4 would contradict a standard complexity-theoretic assumption. The reduction is quite involved, especially to get the twin-width upper bound down to 4, and can be tweaked to work for Connected $k$-Dominating Set and Total $k$-Dominating Set (albeit with a worse upper bound on the twin-width). The $k$-Independent Set problem admits the same lower bound by a much simpler argument, previously observed [ICALP '21], which extends to $k$-Independent Dominating Set, $k$-Path, $k$-Induced Path, $k$-Induced Matching, etc. On the positive side, we obtain a simple quadratic vertex kernel for Connected $k$-Vertex Cover and Capacitated $k$-Vertex Cover on graphs of bounded twin-width. Interestingly the kernel applies to graphs of Vapnik-Chervonenkis density 1, and does not require a witness sequence. We also present a more intricate $O(k^{1.5})$ vertex kernel for Connected $k$-Vertex Cover. Finally we show that deciding if a graph has twin-width at most 1 can be done in polynomial time, and observe that most optimization/decision graph problems can be solved in polynomial time on graphs of twin-width at most 1.
We present an algorithm for strongly refuting smoothed instances of all Boolean CSPs. The smoothed model is a hybrid between worst and average-case input models, where the input is an arbitrary instance of the CSP with only the negation patterns of the literals re-randomized with some small probability. For an $n$-variable smoothed instance of a $k$-arity CSP, our algorithm runs in $n^{O(\ell)}$ time, and succeeds with high probability in bounding the optimum fraction of satisfiable constraints away from $1$, provided that the number of constraints is at least $\tilde{O}(n) (\frac{n}{\ell})^{\frac{k}{2} - 1}$. This matches, up to polylogarithmic factors in $n$, the trade-off between running time and the number of constraints of the state-of-the-art algorithms for refuting fully random instances of CSPs [RRS17]. We also make a surprising new connection between our algorithm and even covers in hypergraphs, which we use to positively resolve Feige's 2008 conjecture, an extremal combinatorics conjecture on the existence of even covers in sufficiently dense hypergraphs that generalizes the well-known Moore bound for the girth of graphs. As a corollary, we show that polynomial-size refutation witnesses exist for arbitrary smoothed CSP instances with number of constraints a polynomial factor below the "spectral threshold" of $n^{k/2}$, extending the celebrated result for random 3-SAT of Feige, Kim and Ofek [FKO06].
In this work, we study the $k$-means cost function. Given a dataset $X \subseteq \mathbb{R}^d$ and an integer $k$, the goal of the Euclidean $k$-means problem is to find a set of $k$ centers $C \subseteq \mathbb{R}^d$ such that $\Phi(C, X) \equiv \sum_{x \in X} \min_{c \in C} ||x - c||^2$ is minimized. Let $\Delta(X,k) \equiv \min_{C \subseteq \mathbb{R}^d} \Phi(C, X)$ denote the cost of the optimal $k$-means solution. For any dataset $X$, $\Delta(X,k)$ decreases as $k$ increases. In this work, we try to understand this behaviour more precisely. For any dataset $X \subseteq \mathbb{R}^d$, integer $k \geq 1$, and a precision parameter $\varepsilon > 0$, let $L(X, k, \varepsilon)$ denote the smallest integer such that $\Delta(X, L(X, k, \varepsilon)) \leq \varepsilon \cdot \Delta(X,k)$. We show upper and lower bounds on this quantity. Our techniques generalize for the metric $k$-median problem in arbitrary metric spaces and we give bounds in terms of the doubling dimension of the metric. Finally, we observe that for any dataset $X$, we can compute a set $S$ of size $O \left(L(X, k, \varepsilon/c) \right)$ using $D^2$-sampling such that $\Phi(S,X) \leq \varepsilon \cdot \Delta(X,k)$ for some fixed constant $c$. We also discuss some applications of our bounds.
The Minimum Linear Arrangement problem (MLA) consists of finding a mapping $\pi$ from vertices of a graph to distinct integers that minimizes $\sum_{\{u,v\}\in E}|\pi(u) - \pi(v)|$. In that setting, vertices are often assumed to lie on a horizontal line and edges are drawn as semicircles above said line. For trees, various algorithms are available to solve the problem in polynomial time in $n=|V|$. There exist variants of the MLA in which the arrangements are constrained. Iordanskii, and later Hochberg and Stallmann (HS), put forward $O(n)$-time algorithms that solve the problem when arrangements are constrained to be planar (also known as one-page book embeddings). We also consider linear arrangements of rooted trees that are constrained to be projective (planar embeddings where the root is not covered by any edge). Gildea and Temperley (GT) sketched an algorithm for projective arrangements which they claimed runs in $O(n)$ but did not provide any justification of its cost. In contrast, Park and Levy claimed that GT's algorithm runs in $O(n \log d_{max})$ where $d_{max}$ is the maximum degree but did not provide sufficient detail. Here we correct an error in HS's algorithm for the planar case, show its relationship with the projective case, and derive simple algorithms for the projective and planar cases that run without a doubt in $O(n)$ time.
There are plenty of applications and analysis for time-independent elliptic partial differential equations in the literature hinting at the benefits of overtesting by using more collocation conditions than the number of basis functions. Overtesting not only reduces the problem size, but is also known to be necessary for stability and convergence of widely used unsymmetric Kansa-type strong-form collocation methods. We consider kernel-based meshfree methods, which is a method of lines with collocation and overtesting spatially, for solving parabolic partial differential equations on surfaces without parametrization. In this paper, we extend the time-independent convergence theories for overtesting techniques to the parabolic equations on smooth and closed surfaces.
We study contests where the designer's objective is an extension of the widely studied objective of maximizing the total output: The designer gets zero marginal utility from a player's output if the output of the player is very low or very high. We model this using two objective functions: binary threshold, where a player's contribution to the designer's utility is 1 if her output is above a certain threshold, and 0 otherwise; and linear threshold, where a player's contribution is linear if her output is between a lower and an upper threshold, and becomes constant below the lower and above the upper threshold. For both of these objectives, we study (1) rank-order allocation contests that use only the ranking of the players to assign prizes and (2) general contests that may use the numerical values of the players' outputs to assign prizes. We characterize the optimal contests that maximize the designer's objective and indicate techniques to efficiently compute them. We also prove that for the linear threshold objective, a contest that distributes the prize equally among a fixed number of top-ranked players offers a factor-2 approximation to the optimal rank-order allocation contest.
A dominating set in a directed graph is a set of vertices $S$ such that all the vertices that do not belong to $S$ have an in-neighbour in $S$. A locating set $S$ is a set of vertices such that all the vertices that do not belong to $S$ are characterized uniquely by the in-neighbours they have in $S$, i.e. for every two vertices $u$ and $v$ that are not in $S$, there exists a vertex $s\in S$ that dominates exactly one of them. The size of a smallest set of a directed graph $D$ which is both locating and dominating is denoted by $\gamma^{LD}(D)$. Foucaud, Heydarshahi and Parreau proved that any twin-free digraph $D$ satisfies $\gamma^{LD}(D)\leq \frac{4n} 5 +1$ but conjectured that this bound can be lowered to $\frac{2n} 3$. The conjecture is still open. They also proved that if $D$ is a tournament, i.e. a directed graph where there is one arc between every pair of vertices, then $\gamma^{LD}(D)\leq \lceil \frac{n}{2}\rceil$. The main result of this paper is the generalization of this bound to connected local tournaments, i.e. connected digraphs where the in- and out-neighbourhoods of every vertex induce a tournament. We also prove $\gamma^{LD}(D)\leq \frac{2n} 3$ for all quasi-twin-free digraphs $D$ that admit a supervising vertex (a vertex from which any vertex is reachable). This class of digraphs generalizes twin-free acyclic graphs, the most general class for which this bound was known.
Modelling multivariate systems is important for many applications in engineering and operational research. The multivariate distributions under scrutiny usually have no analytic or closed form. Therefore their modelling employs a numerical technique, typically multivariate simulations, which can have very high dimensions. Random Orthogonal Matrix (ROM) simulation is a method that has gained some popularity because of the absence of certain simulation errors. Specifically, it exactly matches a target mean, covariance matrix and certain higher moments with every simulation. This paper extends the ROM simulation algorithm presented by Hanke et al. (2017), hereafter referred to as HPSW, which matches the target mean, covariance matrix and Kollo skewness vector exactly. Our first contribution is to establish necessary and sufficient conditions for the HPSW algorithm to work. Our second contribution is to develop a general approach for constructing admissible values in the HPSW. Our third theoretical contribution is to analyse the effect of multivariate sample concatenation on the target Kollo skewness. Finally, we illustrate the extensions we develop here using a simulation study.
Decentralized cryptocurrency systems, known as blockchains, have shown promise as infrastructure for mutually distrustful parties to agree on transactions safely. However, Bitcoin-derived blockchains and their variants suffer from the limitations of the CAP Trilemma, which is difficult to be solved through optimizing consensus protocols. Moreover, the P2P network of blockchains have problems in efficiency and reliability without considering the matching of physical and logical topologies. For the CAP Trilemma in consortium blockchains, we propose a physical topology based on the multi-dimensional hypercube with excellent partition tolerance in probability. On the other hand, the general hypercube has advantages in solving the mismatch problems in P2P networks. The general topology is further extended to a hierarchical recursive topology with more medium links or short links to balance the reliability requirements and the costs of physical network. We prove that the hypercube topology has better partition tolerance than the regular rooted tree topology and the ring lattice topology, and effectively fits the upper-layer protocols. As a result, blockchains constructed by the proposed topology can reach the CAP guarantee bound through adopting the proper transmission and the consensus protocols protocols that have strong consistency and availability.
The Ising model is a celebrated example of a Markov random field, introduced in statistical physics to model ferromagnetism. This is a discrete exponential family with binary outcomes, where the sufficient statistic involves a quadratic term designed to capture correlations arising from pairwise interactions. However, in many situations the dependencies in a network arise not just from pairs, but from peer-group effects. A convenient mathematical framework for capturing higher-order dependencies, is the $p$-tensor Ising model, where the sufficient statistic consists of a multilinear polynomial of degree $p$. This thesis develops a framework for statistical inference of the natural parameters in $p$-tensor Ising models. We begin with the Curie-Weiss Ising model, where we unearth various non-standard phenomena in the asymptotics of the maximum-likelihood (ML) estimates of the parameters, such as the presence of a critical curve in the interior of the parameter space on which these estimates have a limiting mixture distribution, and a surprising superefficiency phenomenon at the boundary point(s) of this curve. ML estimation fails in more general $p$-tensor Ising models due to the presence of a computationally intractable normalizing constant. To overcome this issue, we use the popular maximum pseudo-likelihood (MPL) method, which avoids computing the inexplicit normalizing constant based on conditional distributions. We derive general conditions under which the MPL estimate is $\sqrt{N}$-consistent, where $N$ is the size of the underlying network. Finally, we consider a more general Ising model, which incorporates high-dimensional covariates at the nodes of the network, that can also be viewed as a logistic regression model with dependent observations. In this model, we show that the parameters can be estimated consistently under sparsity assumptions on the true covariate vector.
The list-decodable code has been an active topic in theoretical computer science since the seminal papers of M. Sudan and V. Guruswami in 1997-1998. There are general result about the Johnson radius and the list-decoding capacity theorem for random codes. However few results about general constraints on rates, list-decodable radius and list sizes for list-decodable codes have been obtained. In this paper we show that rates, list-decodable radius and list sizes are closely related to the classical topic of covering codes. We prove new simple but strong upper bounds for list-decodable codes based on various covering codes. Then any good upper bound on the covering radius imply a good upper bound on the size of list-decodable codes. Hence the list-decodablity of codes is a strong constraint from the view of covering codes. Our covering code upper bounds for $(d,1)$ list decodable codes give highly non-trivial upper bounds on the sizes of codes with the given minimum Hamming distances. Our results give exponential improvements on the recent generalized Singleton upper bound of Shangguan and Tamo in STOC 2020, when the code lengths are very large. The asymptotic forms of covering code bounds can partially recover the list-decoding capacity theorem, the Blinovsky bound and the combinatorial bound of Guruswami-H{\aa}stad-Sudan-Zuckerman. We also suggest to study the combinatorial covering list-decodable codes as a natural generalization of combinatorial list-decodable codes.