In this paper, we present efficient distributed algorithms for classical symmetry breaking problems, maximal independent sets (MIS) and ruling sets, in power graphs. We work in the standard CONGEST model of distributed message passing, where the communication network is abstracted as a graph $G$. Typically, the problem instance in CONGEST is identical to the communication network $G$, that is, we perform the symmetry breaking in $G$. In this work, we consider a setting where the problem instance corresponds to a power graph $G^k$, where each node of the communication network $G$ is connected to all of its $k$-hop neighbors. Our main contribution is a deterministic polylogarithmic time algorithm for computing $k$-ruling sets of $G^k$, which (for $k>1$) improves exponentially on the current state-of-the-art runtimes. The main technical ingredient for this result is a deterministic sparsification procedure which may be of independent interest. On top of being a natural family of problems, ruling sets (in power graphs) are well-motivated through their applications in the powerful shattering framework [BEPS JACM'16, Ghaffari SODA'19] (and others). We present randomized algorithms for computing maximal independent sets and ruling sets of $G^k$ in essentially the same time as they can be computed in $G$. We also revisit the shattering algorithm for MIS [BEPS JACM'16] and present different approaches for the post-shattering phase. Our solutions are algorithmically and analytically simpler (also in the LOCAL model) than existing solutions and obtain the same runtime as [Ghaffari SODA'16].
Sampling from Gibbs distributions $p(x) \propto \exp(-V(x)/\varepsilon)$ and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. However, while efficient algorithms are known for convex potentials $V$, the situation is much more difficult in the non-convex case, where algorithms necessarily suffer from the curse of dimensionality in the worst case. For optimization, which can be seen as a low-temperature limit of sampling, it is known that smooth functions $V$ allow faster convergence rates. Specifically, for $m$-times differentiable functions in $d$ dimensions, the optimal rate for algorithms with $n$ function evaluations is known to be $O(n^{-m/d})$, where the constant can potentially depend on $m, d$ and the function to be optimized. Hence, the curse of dimensionality can be alleviated for smooth functions at least in terms of the convergence rate. Recently, it has been shown that similarly fast rates can also be achieved with polynomial runtime $O(n^{3.5})$, where the exponent $3.5$ is independent of $m$ or $d$. Hence, it is natural to ask whether similar rates for sampling and log-partition computation are possible, and whether they can be realized in polynomial time with an exponent independent of $m$ and $d$. We show that the optimal rates for sampling and log-partition computation are sometimes equal and sometimes faster than for optimization. We then analyze various polynomial-time sampling algorithms, including an extension of a recent promising optimization approach, and find that they sometimes exhibit interesting behavior but no near-optimal rates. Our results also give further insights on the relation between sampling, log-partition, and optimization problems.
We develop a distribution regression model under endogenous sample selection. This model is a semi-parametric generalization of the Heckman selection model. It accommodates much richer effects of the covariates on outcome distribution and patterns of heterogeneity in the selection process, and allows for drastic departures from the Gaussian error structure, while maintaining the same level tractability as the classical model. The model applies to continuous, discrete and mixed outcomes. We provide identification, estimation, and inference methods, and apply them to obtain wage decomposition for the UK. Here we decompose the difference between the male and female wage distributions into composition, wage structure, selection structure, and selection sorting effects. After controlling for endogenous employment selection, we still find substantial gender wage gap -- ranging from 21\% to 40\% throughout the (latent) offered wage distribution that is not explained by composition. We also uncover positive sorting for single men and negative sorting for married women that accounts for a substantive fraction of the gender wage gap at the top of the distribution.
We study the balanced $k$-way hypergraph partitioning problem, with a special focus on its practical applications to manycore scheduling. Given a hypergraph on $n$ nodes, our goal is to partition the node set into $k$ parts of size at most $(1+\epsilon)\cdot \frac{n}{k}$ each, while minimizing the cost of the partitioning, defined as the number of cut hyperedges, possibly also weighted by the number of partitions they intersect. We show that this problem cannot be approximated to within a $n^{1/\text{poly} \log\log n}$ factor of the optimal solution in polynomial time if the Exponential Time Hypothesis holds, even for hypergraphs of maximal degree 2. We also study the hardness of the partitioning problem from a parameterized complexity perspective, and in the more general case when we have multiple balance constraints. Furthermore, we consider two extensions of the partitioning problem that are motivated from practical considerations. Firstly, we introduce the concept of hyperDAGs to model precedence-constrained computations as hypergraphs, and we analyze the adaptation of the balanced partitioning problem to this case. Secondly, we study the hierarchical partitioning problem to model hierarchical NUMA (non-uniform memory access) effects in modern computer architectures, and we show that ignoring this hierarchical aspect of the communication cost can yield significantly weaker solutions.
Machine learning can benefit from causal discovery for interpretation and from causal inference for generalization. In this line of research, a few invariant learning algorithms for out-of-distribution (OOD) generalization have been proposed by using multiple training environments to find invariant relationships. Some of them are focused on causal discovery as Invariant Causal Prediction (ICP), which finds causal parents of a variable of interest, and some directly provide a causal optimal predictor that generalizes well in OOD environments as Invariant Risk Minimization (IRM). This group of algorithms works under the assumption of multiple environments that represent different interventions in the causal inference context. Those environments are not normally available when working with observational data and real-world applications. Here we propose a method to generate them in an efficient way. We assess the performance of this unsupervised learning problem by implementing ICP on simulated data. We also show how to apply ICP efficiently integrated with our method for causal discovery. Finally, we proposed an improved version of our method in combination with ICP for datasets with multiple covariates where ICP and other causal discovery methods normally degrade in performance.
In the average-case $k$-SUM problem, given $r$ integers chosen uniformly at random from $\{0,\ldots,M-1\}$, the objective is to find a set of $k$ numbers that sum to $0$ modulo $M$ (this set is called a solution). In the related $k$-XOR problem, given $k$ uniformly random Boolean vectors of length $\log{M}$, the objective is to find a set of $k$ of them whose bitwise-XOR is the all-zero vector. Both of these problems have widespread applications in the study of fine-grained complexity and cryptanalysis. The feasibility and complexity of these problems depends on the relative values of $k$, $r$, and $M$. The dense regime of $M \leq r^k$, where solutions exist with high probability, is quite well-understood and we have several non-trivial algorithms and hardness conjectures here. Much less is known about the sparse regime of $M\gg r^k$, where solutions are unlikely to exist. The best answers we have for many fundamental questions here are limited to whatever carries over from the dense or worst-case settings. We study the planted $k$-SUM and $k$-XOR problems in the sparse regime. In these problems, a random solution is planted in a randomly generated instance and has to be recovered. As $M$ increases past $r^k$, these planted solutions tend to be the only solutions with increasing probability, potentially becoming easier to find. We show several results about the complexity and applications of these problems.
Distribution testing is a fundamental statistical task with many applications, but we are interested in a variety of problems where systematic mislabelings of the sample prevent us from applying the existing theory. To apply distribution testing to these problems, we introduce distribution testing under the parity trace, where the algorithm receives an ordered sample $S$ that reveals only the least significant bit of each element. This abstraction reveals connections between the following three problems of interest, allowing new upper and lower bounds: 1. In distribution testing with a confused collector, the collector of the sample may be incapable of distinguishing between nearby elements of a domain (e.g. a machine learning classifier). We prove bounds for distribution testing with a confused collector on domains structured as a cycle or a path. 2. Recent work on the fundamental testing vs. learning question established tight lower bounds on distribution-free sample-based property testing by reduction from distribution testing, but the tightness is limited to symmetric properties. The parity trace allows a broader family of equivalences to non-symmetric properties, while recovering and strengthening many of the previous results with a different technique. 3. We give the first results for property testing in the well-studied trace reconstruction model, where the goal is to test whether an unknown string $x$ satisfies some property or is far from satisfying that property, given only independent random traces of $x$. Our main technical result is a tight bound of $\widetilde \Theta\left((n/\epsilon)^{4/5} + \sqrt n/\epsilon^2\right)$ for testing uniformity of distributions over $[n]$ under the parity trace, leading also to results for the problems above.
Message Passing Neural Networks (MPNNs) are a widely used class of Graph Neural Networks (GNNs). The limited representational power of MPNNs inspires the study of provably powerful GNN architectures. However, knowing one model is more powerful than another gives little insight about what functions they can or cannot express. It is still unclear whether these models are able to approximate specific functions such as counting certain graph substructures, which is essential for applications in biology, chemistry and social network analysis. Motivated by this, we propose to study the counting power of Subgraph MPNNs, a recent and popular class of powerful GNN models that extract rooted subgraphs for each node, assign the root node a unique identifier and encode the root node's representation within its rooted subgraph. Specifically, we prove that Subgraph MPNNs fail to count more-than-4-cycles at node level, implying that node representations cannot correctly encode the surrounding substructures like ring systems with more than four atoms. To overcome this limitation, we propose I$^2$-GNNs to extend Subgraph MPNNs by assigning different identifiers for the root node and its neighbors in each subgraph. I$^2$-GNNs' discriminative power is shown to be strictly stronger than Subgraph MPNNs and partially stronger than the 3-WL test. More importantly, I$^2$-GNNs are proven capable of counting all 3, 4, 5 and 6-cycles, covering common substructures like benzene rings in organic chemistry, while still keeping linear complexity. To the best of our knowledge, it is the first linear-time GNN model that can count 6-cycles with theoretical guarantees. We validate its counting power in cycle counting tasks and demonstrate its competitive performance in molecular prediction benchmarks.
In this work, for a given oriented graph $D$, we study its interval and hull numbers, denoted by ${in}(D)$ and ${hn}(D)$, respectively, in the geodetic, ${P_3}$ and ${P_3^*}$ convexities. This last one, we believe to be formally defined and first studied in this paper, although its undirected version is well-known in the literature. Concerning bounds, for a strongly oriented graph $D$, we prove that ${hn_g}(D)\leq m(D)-n(D)+2$ and that there is a strongly oriented graph such that ${hn_g}(D) = m(D)-n(D)$. We also determine exact values for the hull numbers in these three convexities for tournaments, which imply polynomial-time algorithms to compute them. These results allows us to deduce polynomial-time algorithms to compute ${hn_{P_3}}(D)$ when the underlying graph of $D$ is split or cobipartite. Moreover, we provide a meta-theorem by proving that if deciding whether ${in_g}(D)\leq k$ or ${hn_g}(D)\leq k$ is NP-hard or W[i]-hard parameterized by $k$, for some $i\in\mathbb{Z_+^*}$, then the same holds even if the underlying graph of $D$ is bipartite. Next, we prove that deciding whether ${hn_{P_3}}(D)\leq k$ or ${hn_{P_3^*}}(D)\leq k$ is W[2]-hard parameterized by $k$, even if the underlying graph of $D$ is bipartite; that deciding whether ${in_{P_3}}(D)\leq k$ or ${in_{P_3^*}}(D)\leq k$ is NP-complete, even if $D$ has no directed cycles and the underlying graph of $D$ is a chordal bipartite graph; and that deciding whether ${in_{P_3}}(D)\leq k$ or ${in_{P_3^*}}(D)\leq k$ is W[2]-hard parameterized by $k$, even if the underlying graph of $D$ is split. We also argue that the interval and hull numbers in the oriented $P_3$ and $P_3^*$ convexities can be computed in polynomial time for graphs of bounded tree-width by using Courcelle's theorem.
Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.
Reasoning with knowledge expressed in natural language and Knowledge Bases (KBs) is a major challenge for Artificial Intelligence, with applications in machine reading, dialogue, and question answering. General neural architectures that jointly learn representations and transformations of text are very data-inefficient, and it is hard to analyse their reasoning process. These issues are addressed by end-to-end differentiable reasoning systems such as Neural Theorem Provers (NTPs), although they can only be used with small-scale symbolic KBs. In this paper we first propose Greedy NTPs (GNTPs), an extension to NTPs addressing their complexity and scalability limitations, thus making them applicable to real-world datasets. This result is achieved by dynamically constructing the computation graph of NTPs and including only the most promising proof paths during inference, thus obtaining orders of magnitude more efficient models. Then, we propose a novel approach for jointly reasoning over KBs and textual mentions, by embedding logic facts and natural language sentences in a shared embedding space. We show that GNTPs perform on par with NTPs at a fraction of their cost while achieving competitive link prediction results on large datasets, providing explanations for predictions, and inducing interpretable models. Source code, datasets, and supplementary material are available online at //github.com/uclnlp/gntp.