In the vertex connectivity problem, given an undirected $n$-vertex $m$-edge graph $G$, we need to compute the minimum number of vertices that can disconnect $G$ after removing them. This problem is one of the most well-studied graph problems. From 2019, a new line of work [Nanongkai et al.~STOC'19;SODA'20;STOC'21] has used randomized techniques to break the quadratic-time barrier and, very recently, culminated in an almost-linear time algorithm via the recently announced maxflow algorithm by Chen et al. In contrast, all known deterministic algorithms are much slower. The fastest algorithm [Gabow FOCS'00] takes $O(m(n+\min\{c^{5/2},cn^{3/4}\}))$ time where $c$ is the vertex connectivity. It remains open whether there exists a subquadratic-time deterministic algorithm for any constant $c>3$. In this paper, we give the first deterministic almost-linear time vertex connectivity algorithm for all constants $c$. Our running time is $m^{1+o(1)}2^{O(c^{2})}$ time, which is almost-linear for all $c=o(\sqrt{\log n})$. This is the first deterministic algorithm that breaks the $O(n^{2})$-time bound on sparse graphs where $m=O(n)$, which is known for more than 50 years ago [Kleitman'69]. Towards our result, we give a new reduction framework to vertex expanders which in turn exploits our new almost-linear time construction of mimicking network for vertex connectivity. The previous construction by Kratsch and Wahlstr\"{o}m [FOCS'12] requires large polynomial time and is randomized.
We propose a monitoring strategy for efficient and robust estimation of disease prevalence and case numbers within closed and enumerated populations such as schools, workplaces, or retirement communities. The proposed design relies largely on voluntary testing, notoriously biased (e.g., in the case of COVID-19) due to non-representative sampling. The approach yields unbiased and comparatively precise estimates with no assumptions about factors underlying selection of individuals for voluntary testing, building on the strength of what can be a small random sampling component. This component unlocks a previously proposed "anchor stream" estimator, a well-calibrated alternative to classical capture-recapture (CRC) estimators based on two data streams. We show here that this estimator is equivalent to a direct standardization based on "capture", i.e., selection (or not) by the voluntary testing program, made possible by means of a key parameter identified by design. This equivalency simultaneously allows for novel two-stream CRC-like estimation of general means (e.g., of continuous variables such as antibody levels or biomarkers). For inference, we propose adaptations of a Bayesian credible interval when estimating case counts and bootstrapping when estimating means of continuous variables. We use simulations to demonstrate significant precision benefits relative to random sampling alone.
In a mixed generalized linear model, the objective is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two statistically independent signals in a mixed generalized linear model with Gaussian covariates. Spectral methods are a popular class of estimators which output the top two eigenvectors of a suitable data-dependent matrix. However, despite the wide applicability, their design is still obtained via heuristic considerations, and the number of samples $n$ needed to guarantee recovery is super-linear in the signal dimension $d$. In this paper, we develop exact asymptotics on spectral methods in the challenging proportional regime in which $n, d$ grow large and their ratio converges to a finite constant. By doing so, we are able to optimize the design of the spectral method, and combine it with a simple linear estimator, in order to minimize the estimation error. Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms. Numerical simulations for mixed linear regression and phase retrieval display the advantage enabled by our analysis over existing designs of spectral methods.
Predicting the future development of an anatomical shape from a single baseline is an important but difficult problem to solve. Research has shown that it should be tackled in curved shape spaces, as (e.g., disease-related) shape changes frequently expose nonlinear characteristics. We thus propose a novel prediction method that encodes the whole shape in a Riemannian shape space. It then learns a simple prediction technique that is founded on statistical hierarchical modelling of longitudinal training data. It is fully automatic, which makes it stand out in contrast to parameter-rich state-of-the-art methods. When applied to predict the future development of the shape of right hippocampi under Alzheimer's disease, it outperforms deep learning supported variants and achieves results on par with state-of-the-art.
Expander graphs play a central role in graph theory and algorithms. With a number of powerful algorithmic tools developed around them, such as the Cut-Matching game, expander pruning, expander decomposition, and algorithms for decremental All-Pairs Shortest Paths (APSP) in expanders, to name just a few, the use of expanders in the design of graph algorithms has become ubiquitous. Specific applications of interest to us are fast deterministic algorithms for cut problems in static graphs, and algorithms for dynamic distance-based graph problems, such as APSP. Unfortunately, the use of expanders in these settings incurs a number of drawbacks. For example, the best currently known algorithm for decremental APSP in constant-degree expanders can only achieve a $(\log n)^{O(1/\epsilon^2)}$-approximation with $n^{1+O(\epsilon)}$ total update time for any $\epsilon$. All currently known algorithms for the Cut Player in the Cut-Matching game are either randomized, or provide rather weak guarantees. This, in turn, leads to somewhat weak algorithmic guarantees for several central cut problems: for example, the best current almost linear time deterministic algorithm for Sparsest Cut can only achieve approximation factor $(\log n)^{\omega(1)}$. Lastly, when relying on expanders in distance-based problems, such as dynamic APSP, via current methods, it seems inevitable that one has to settle for approximation factors that are at least $\Omega(\log n)$. In this paper we propose the use of well-connected graphs, and introduce a new algorithmic toolkit for such graphs that, in a sense, mirrors the above mentioned algorithmic tools for expanders. One of these new tools is the Distanced Matching game, an analogue of the Cut-Matching game for well-connected graphs. We demonstrate the power of these new tools by obtaining better results for several of the problems mentioned above.
Agent-based model (ABM) has been widely used to study infectious disease transmission by simulating behaviors and interactions of autonomous individuals called agents. In the ABM, agent states, for example infected or susceptible, are assigned according to a set of simple rules, and a complex dynamics of disease transmission is described by the collective states of agents over time. Despite the flexibility in real-world modeling, ABMs have received less attention by statisticians because of the intractable likelihood functions which lead to difficulty in estimating parameters and quantifying uncertainty around model outputs. To overcome this limitation, we propose to treat the entire system as a Hidden Markov Model and develop the ABM for infectious disease transmission within the Bayesian framework. The hidden states in the model are represented by individual agent's states over time. We estimate the hidden states and the parameters associated with the model by applying particle Markov Chain Monte Carlo algorithm. Performance of the approach for parameter recovery and prediction along with sensitivity to prior assumptions are evaluated under various simulation conditions. Finally, we apply the proposed approach to the study of COVID-19 outbreak on Diamond Princess cruise ship and examine the differences in transmission by key demographic characteristics, while considering different network structures and the limitations of COVID-19 testing in the cruise.
This paper focuses on the problem of testing the null hypothesis that the regression functions of several populations are equal under a general nonparametric homoscedastic regression model. It is well known that linear kernel regression estimators are sensitive to atypical responses. These distorted estimates will influence the test statistic constructed from them so the conclusions obtained when testing equality of several regression functions may also be affected. In recent years, the use of testing procedures based on empirical characteristic functions has shown good practical properties. For that reason, to provide more reliable inferences, we construct a test statistic that combines characteristic functions and residuals obtained from a robust smoother under the null hypothesis. The asymptotic distribution of the test statistic is studied under the null hypothesis and under root$-n$ contiguous alternatives. A Monte Carlo study is performed to compare the finite sample behaviour of the proposed test with the classical one obtained using local averages. The reported numerical experiments show the advantage of the proposed methodology over the one based on Nadaraya--Watson estimators for finite samples. An illustration to a real data set is also provided and enables to investigate the sensitivity of the $p-$value to the bandwidth selection.
Conformal prediction constructs a confidence set for an unobserved response of a feature vector based on previous identically distributed and exchangeable observations of responses and features. It has a coverage guarantee at any nominal level without additional assumptions on their distribution. Its computation deplorably requires a refitting procedure for all replacement candidates of the target response. In regression settings, this corresponds to an infinite number of model fits. Apart from relatively simple estimators that can be written as pieces of linear function of the response, efficiently computing such sets is difficult, and is still considered as an open problem. We exploit the fact that, \emph{often}, conformal prediction sets are intervals whose boundaries can be efficiently approximated by classical root-finding algorithms. We investigate how this approach can overcome many limitations of formerly used strategies; we discuss its complexity and drawbacks.
We study the problem of designing worst-case to average-case reductions for quantum algorithms. For all linear problems, we provide an explicit and efficient transformation of quantum algorithms that are only correct on a small (even sub-constant) fraction of their inputs into ones that are correct on all inputs. This stands in contrast to the classical setting, where such results are only known for a small number of specific problems or restricted computational models. En route, we obtain a tight $\Omega(n^2)$ lower bound on the average-case quantum query complexity of the Matrix-Vector Multiplication problem. Our techniques strengthen and generalise the recently introduced additive combinatorics framework for classical worst-case to average-case reductions (STOC 2022) to the quantum setting. We rely on quantum singular value transformations to construct quantum algorithms for linear verification in superposition and learning Bogolyubov subspaces from noisy quantum oracles. We use these tools to prove a quantum local correction lemma, which lies at the heart of our reductions, based on a noise-robust probabilistic generalisation of Bogolyubov's lemma from additive combinatorics.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.
When is heterogeneity in the composition of an autonomous robotic team beneficial and when is it detrimental? We investigate and answer this question in the context of a minimally viable model that examines the role of heterogeneous speeds in perimeter defense problems, where defenders share a total allocated speed budget. We consider two distinct problem settings and develop strategies based on dynamic programming and on local interaction rules. We present a theoretical analysis of both approaches and our results are extensively validated using simulations. Interestingly, our results demonstrate that the viability of heterogeneous teams depends on the amount of information available to the defenders. Moreover, our results suggest a universality property: across a wide range of problem parameters the optimal ratio of the speeds of the defenders remains nearly constant.