We study the local complexity landscape of locally checkable labeling (LCL) problems on constant-degree graphs with a focus on complexities below $\log^* n$. Our contribution is threefold: Our main contribution is that we complete the classification of the complexity landscape of LCL problems on trees in the LOCAL model, by proving that every LCL problem with local complexity $o(\log^* n)$ has actually complexity $O(1)$. This result improves upon the previous speedup result from $o(\log \log^* n)$ to $O(1)$ by [Chang, Pettie, FOCS 2017]. In the related LCA and Volume models [Alon, Rubinfeld, Vardi, Xie, SODA 2012, Rubinfeld, Tamir, Vardi, Xie, 2011, Rosenbaum, Suomela, PODC 2020], we prove the same speedup from $o(\log^* n)$ to $O(1)$ for all bounded degree graphs. Similarly, we complete the classification of the LOCAL complexity landscape of oriented $d$-dimensional grids by proving that any LCL problem with local complexity $o(\log^* n)$ has actually complexity $O(1)$. This improves upon the previous speed-up from $o(\sqrt[d]{\log^* n})$ by Suomela in [Chang, Pettie, FOCS 2017].
Spectral independence is a recently-developed framework for obtaining sharp bounds on the convergence time of the classical Glauber dynamics. This new framework has yielded optimal $O(n \log n)$ sampling algorithms on bounded-degree graphs for a large class of problems throughout the so-called uniqueness regime, including, for example, the problems of sampling independent sets, matchings, and Ising-model configurations. Our main contribution is to relax the bounded-degree assumption that has so far been important in establishing and applying spectral independence. Previous methods for avoiding degree bounds rely on using $L^p$-norms to analyse contraction on graphs with bounded connective constant (Sinclair, Srivastava, Yin; FOCS'13). The non-linearity of $L^p$-norms is an obstacle to applying these results to bound spectral independence. Our solution is to capture the $L^p$-analysis recursively by amortising over the subtrees of the recurrence used to analyse contraction. Our method generalises previous analyses that applied only to bounded-degree graphs. As a main application of our techniques, we consider the random graph $G(n,d/n)$, where the previously known algorithms run in time $n^{O(\log d)}$ or applied only to large $d$. We refine these algorithmic bounds significantly, and develop fast $n^{1+o(1)}$ algorithms based on Glauber dynamics that apply to all $d$, throughout the uniqueness regime.
We study complexity classes of local problems on regular trees from the perspective of distributed local algorithms and descriptive combinatorics. We show that, surprisingly, some deterministic local complexity classes from the hierarchy of distributed computing exactly coincide with well studied classes of problems in descriptive combinatorics. Namely, we show that a local problem admits a continuous solution if and only if it admits a local algorithm with local complexity $O(\log^* n)$, and a Baire measurable solution if and only if it admits a local algorithm with local complexity $O(\log n)$.
We study the problem of testing whether a function $f: \mathbb{R}^n \to \mathbb{R}$ is a polynomial of degree at most $d$ in the \emph{distribution-free} testing model. Here, the distance between functions is measured with respect to an unknown distribution $\mathcal{D}$ over $\mathbb{R}^n$ from which we can draw samples. In contrast to previous work, we do not assume that $\mathcal{D}$ has finite support. We design a tester that given query access to $f$, and sample access to $\mathcal{D}$, makes $(d/\varepsilon)^{O(1)}$ many queries to $f$, accepts with probability $1$ if $f$ is a polynomial of degree $d$, and rejects with probability at least $2/3$ if every degree-$d$ polynomial $P$ disagrees with $f$ on a set of mass at least $\varepsilon$ with respect to $\mathcal{D}$. Our result also holds under mild assumptions when we receive only a polynomial number of bits of precision for each query to $f$, or when $f$ can only be queried on rational points representable using a logarithmic number of bits. Along the way, we prove a new stability theorem for multivariate polynomials that may be of independent interest.
We study the distributed minimum spanning tree (MST) problem, a fundamental problem in distributed computing. It is well-known that distributed MST can be solved in $\tilde{O}(D+\sqrt{n})$ rounds in the standard CONGEST model (where $n$ is the network size and $D$ is the network diameter) and this is essentially the best possible round complexity (up to logarithmic factors). However, in resource-constrained networks such as ad hoc wireless and sensor networks, nodes spending so much time can lead to significant spending of resources such as energy. Motivated by the above consideration, we study distributed algorithms for MST under the \emph{sleeping model} [Chatterjee et al., PODC 2020], a model for design and analysis of resource-efficient distributed algorithms. In the sleeping model, a node can be in one of two modes in any round -- \emph{sleeping} or \emph{awake} (unlike the traditional model where nodes are always awake). Only the rounds in which a node is \emph{awake} are counted, while \emph{sleeping} rounds are ignored. A node spends resources only in the awake rounds and hence the main goal is to minimize the \emph{awake complexity} of a distributed algorithm, the worst-case number of rounds any node is awake. We present deterministic and randomized distributed MST algorithms that have an \emph{optimal} awake complexity of $O(\log n)$ time with a matching lower bound. We also show that our randomized awake-optimal algorithm has essentially the best possible round complexity by presenting a lower bound of $\tilde{\Omega}(n)$ on the product of the awake and round complexity of any distributed algorithm (including randomized) that outputs an MST, where $\tilde{\Omega}$ hides a $1/(\text{polylog } n)$ factor.
Maximal Independent Set (MIS) is one of the central and most well-studied problems in distributed computing. Even after four decades of intensive research, the best-known (randomized) MIS algorithms take $O(\log{n})$ worst-case rounds on general graphs (where $n$ is the number of nodes), while the best-known lower bound is $\Omega\left(\sqrt{\frac{\log{n}}{\log{\log{n}}}}\right)$ rounds. Breaking past the $O(\log{n})$ worst-case bound or showing stronger lower bounds have been longstanding open problems. Our main contribution is that we show that MIS can be computed in (worst-case) awake complexity of $O(\log \log n)$ rounds that is (essentially) exponentially better compared to the (traditional) round complexity lower bound of $\Omega\left(\sqrt{\frac{\log{n}}{\log{\log{n}}}}\right)$. Specifically, we present the following results. (1) We present a randomized distributed (Monte Carlo) algorithm for MIS that with high probability computes an MIS and has $O(\log\log{n})$-rounds awake complexity. This algorithm has (traditional) {\em round complexity} that is $O(poly(n))$. Our bounds hold in the $CONGEST(O(polylog n))$ model where only $O(polylog n)$ (specifically $O(\log^3 n)$) bits are allowed to be sent per edge per round. (2) We also show that we can drastically reduce the round complexity at the cost of a slight increase in awake complexity by presenting a randomized MIS algorithm with $O(\log \log n \log^* n )$ awake complexity and $O(\log^3 n \log \log n \log^*n)$ round complexity in the $CONGEST(O(polylog n))$ model.
For a connected graph $G=(V,E)$, a matching $M\subseteq E$ is a matching cut of $G$ if $G-M$ is disconnected. It is known that for an integer $d$, the corresponding decision problem Matching Cut is polynomial-time solvable for graphs of diameter at most $d$ if $d\leq 2$ and NP-complete if $d\geq 3$. We prove the same dichotomy for graphs of bounded radius. For a graph $H$, a graph is $H$-free if it does not contain $H$ as an induced subgraph. As a consequence of our result, we can solve Matching Cut in polynomial time for $P_6$-free graphs, extending a recent result of Feghali for $P_5$-free graphs. We then extend our result to hold even for $(sP_3+P_6)$-free graphs for every $s\geq 0$ and initiate a complexity classification of Matching Cut for $H$-free graphs.
The lossless compression of a single source $X^n$ was recently shown to be achievable with a notion of strong locality; any $X_i$ can be decoded from a {\emph{constant}} number of compressed bits, with a vanishing in $n$ probability of error. In contrast with the single source setup, we show that for two separately encoded sources $(X^n,Y^n)$, lossless compression and strong locality is generally not possible. More precisely, we show that for the class of "confusable" sources strong locality cannot be achieved whenever one of the sources is compressed below its entropy. In this case, irrespectively of $n$, the probability of error of decoding any $(X_i,Y_i)$ is lower bounded by $2^{-O(d_{\mathrm{loc}})}$, where $d_{\mathrm{loc}}$ denotes the number of compressed bits accessed by the local decoder. Conversely, if the source is not confusable, strong locality is possible even if one of the sources is compressed below its entropy. Results extend to any number of sources.
Selecting the most suitable algorithm and determining its hyperparameters for a given optimization problem is a challenging task. Accurately predicting how well a certain algorithm could solve the problem is hence desirable. Recent studies in single-objective numerical optimization show that supervised machine learning methods can predict algorithm performance using landscape features extracted from the problem instances. Existing approaches typically treat the algorithms as black-boxes, without consideration of their characteristics. To investigate in this work if a selection of landscape features that depends on algorithms properties could further improve regression accuracy, we regard the modular CMA-ES framework and estimate how much each landscape feature contributes to the best algorithm performance regression models. Exploratory data analysis performed on this data indicate that the set of most relevant features does not depend on the configuration of individual modules, but the influence that these features have on regression accuracy does. In addition, we have shown that by using classifiers that take the features relevance on the model accuracy, we are able to predict the status of individual modules in the CMA-ES configurations.
It is shown, with two sets of indicators that separately load on two distinct factors, independent of one another conditional on the past, that if it is the case that at least one of the factors causally affects the other, then, in many settings, the process will converge to a factor model in which a single factor will suffice to capture the covariance structure among the indicators. Factor analysis with one wave of data can then not distinguish between factor models with a single factor versus those with two factors that are causally related. Therefore, unless causal relations between factors can be ruled out a priori, alleged empirical evidence from one-wave factor analysis for a single factor still leaves open the possibilities of a single factor or of two factors that causally affect one another. The implications for interpreting the factor structure of psychological scales, such as self-report scales for anxiety and depression, or for happiness and purpose, are discussed. The results are further illustrated through simulations to gain insight into the practical implications of the results in more realistic settings prior to the convergence of the processes. Some further generalizations to an arbitrary number of underlying factors are noted.
In the pooled data problem we are given a set of $n$ agents, each of which holds a hidden state bit, either $0$ or $1$. A querying procedure returns for a query set the sum of the states of the queried agents. The goal is to reconstruct the states using as few queries as possible. In this paper we consider two noise models for the pooled data problem. In the noisy channel model, the result for each agent flips with a certain probability. In the noisy query model, each query result is subject to random Gaussian noise. Our results are twofold. First, we present and analyze for both error models a simple and efficient distributed algorithm that reconstructs the initial states in a greedy fashion. Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability. Secondly, we present simulation results of our algorithm and compare its performance with approximate message passing (AMP) algorithms that are conjectured to be optimal in a number of related problems.