Let $G$ be a graph with $n$ vertices and $m$ edges. One of several hierarchies towards the stability number of $G$ is the exact subgraph hierarchy (ESH). On the first level it computes the Lov\'{a}sz theta function $\vartheta(G)$ as semidefinite program (SDP) with a matrix variable of order $n+1$ and $n+m+1$ constraints. On the $k$-th level it adds all exact subgraph constraints (ESC) for subgraphs of order $k$ to the SDP. An ESC ensures that the submatrix of the matrix variable corresponding to the subgraph is in the correct polytope. By including only some ESCs into the SDP the ESH can be exploited computationally. In this paper we introduce a variant of the ESH that computes $\vartheta(G)$ through an SDP with a matrix variable of order $n$ and $m+1$ constraints. We show that it makes sense to include the ESCs into this SDP and introduce the compressed ESH (CESH) analogously to the ESH. Computationally the CESH seems favorable as the SDP is smaller. However, we prove that the bounds based on the ESH are always at least as good as those of the CESH. In computational experiments sometimes they are significantly better. We also introduce scaled ESCs (SESCs), which are a more natural way to include exactness constraints into the smaller SDP and we prove that including an SESC is equivalent to including an ESC for every subgraph.
We analyze the complexity of learning directed acyclic graphical models from observational data in general settings without specific distributional assumptions. Our approach is information-theoretic and uses a local Markov boundary search procedure in order to recursively construct ancestral sets in the underlying graphical model. Perhaps surprisingly, we show that for certain graph ensembles, a simple forward greedy search algorithm (i.e. without a backward pruning phase) suffices to learn the Markov boundary of each node. This substantially improves the sample complexity, which we show is at most polynomial in the number of nodes. This is then applied to learn the entire graph under a novel identifiability condition that generalizes existing conditions from the literature. As a matter of independent interest, we establish finite-sample guarantees for the problem of recovering Markov boundaries from data. Moreover, we apply our results to the special case of polytrees, for which the assumptions simplify, and provide explicit conditions under which polytrees are identifiable and learnable in polynomial time. We further illustrate the performance of the algorithm, which is easy to implement, in a simulation study. Our approach is general, works for discrete or continuous distributions without distributional assumptions, and as such sheds light on the minimal assumptions required to efficiently learn the structure of directed graphical models from data.
We present semantic correctness proofs of automatic differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Throughout, we show how the analysis extends to AD methods for computing higher order derivatives using a Taylor approximation.
This paper studies the time-varying bearing-based tracking of leader-follower formations. The desired constraints between agents are specified by bearing vectors, and several leaders are moving with a bounded reference velocity. Each followers can measure the relative positions of its neighbors, its own velocities, and receive information from their neighbors. Under the assumptions that the desired formation is infinitesimally bearing rigid and the local reference frames of followers are aligned with each other, two control laws are presented in this paper based on sliding mode control approach. Stability analyses are given based on Lyapunov stability theory and supported by numerical simulations.
We consider Upper Domination, the problem of finding the minimal dominating set of maximum cardinality. Very few exact algorithms have been described for solving Upper Domination. In particular, no binary programming formulations for Upper Domination have been described in literature, although such formulations have proved quite successful for other kinds of domination problems. We introduce two such binary programming formulations, and compare their performance on various kinds of graphs. We demonstrate that the first performs better in most cases, but the second performs better for very sparse graphs. Also included is a short proof that the upper domination of any generalized Petersen graph P(n,k) is equal to n.
Consider the computations at a node in a message passing algorithm. Assume that the node has incoming and outgoing messages $\mathbf{x} = (x_1, x_2, \ldots, x_n)$ and $\mathbf{y} = (y_1, y_2, \ldots, y_n)$, respectively. In this paper, we investigate a class of structures that can be adopted by the node for computing $\mathbf{y}$ from $\mathbf{x}$, where each $y_j, j = 1, 2, \ldots, n$ is computed via a binary tree with leaves $\mathbf{x}$ excluding $x_j$. We make three main contributions regarding this class of structures. First, we prove that the minimum complexity of such a structure is $3n - 6$, and if a structure has such complexity, its minimum latency is $\delta + \lceil \log(n-2^{\delta}) \rceil$ with $\delta = \lfloor \log(n/2) \rfloor$, where the logarithm always takes base two. Second, we prove that the minimum latency of such a structure is $\lceil \log(n-1) \rceil$, and if a structure has such latency, its minimum complexity is $n \log(n-1)$ when $n-1$ is a power of two. Third, given $(n, \tau)$ with $\tau \geq \lceil \log(n-1) \rceil$, we propose a construction for a structure which we conjecture to have the minimum complexity among structures with latencies at most $\tau$. Our construction method runs in $O(n^3 \log^2(n))$ time, and the obtained structure has complexity at most (generally much smaller than) $n \lceil \log(n) \rceil - 2$.
A directed acyclic graph $G=(V,E)$ is said to be $(e,d)$-depth robust if for every subset $S \subseteq V$ of $|S| \leq e$ nodes the graph $G-S$ still contains a directed path of length $d$. If the graph is $(e,d)$-depth-robust for any $e,d$ such that $e+d \leq (1-\epsilon)|V|$ then the graph is said to be $\epsilon$-extreme depth-robust. In the field of cryptography, (extremely) depth-robust graphs with low indegree have found numerous applications including the design of side-channel resistant Memory-Hard Functions, Proofs of Space and Replication, and in the design of Computationally Relaxed Locally Correctable Codes. In these applications, it is desirable to ensure the graphs are locally navigable, i.e., there is an efficient algorithm $\mathsf{GetParents}$ running in time $\mathrm{polylog} |V|$ which takes as input a node $v \in V$ and returns the set of $v$'s parents. We give the first explicit construction of locally navigable $\epsilon$-extreme depth-robust graphs with indegree $O(\log |V|)$. Previous constructions of $\epsilon$-extreme depth-robust graphs either had indegree $\tilde{\omega}(\log^2 |V|)$ or were not explicit.
A recent literature considers causal inference using noisy proxies for unobserved confounding factors. The proxies are divided into two sets that are independent conditional on the confounders. One set of proxies are `negative control treatments' and the other are `negative control outcomes'. Existing work applies to low-dimensional settings with a fixed number of proxies and confounders. In this work we consider linear models with many proxy controls and possibly many confounders. A key insight is that if each group of proxies is strictly larger than the number of confounding factors, then a matrix of nuisance parameters has a low-rank structure and a vector of nuisance parameters has a sparse structure. We can exploit the rank-restriction and sparsity to reduce the number of free parameters to be estimated. The number of unobserved confounders is not known a priori but we show that it is identified, and we apply penalization methods to adapt to this quantity. We provide an estimator with a closed-form as well as a doubly-robust estimator that must be evaluated using numerical methods. We provide conditions under which our doubly-robust estimator is uniformly root-$n$ consistent, asymptotically centered normal, and our suggested confidence intervals have asymptotically correct coverage. We provide simulation evidence that our methods achieve better performance than existing approaches in high dimensions, particularly when the number of proxies is substantially larger than the number of confounders.
We propose a learning framework for graph kernels, which is theoretically grounded on regularizing optimal transport. This framework provides a novel optimal transport distance metric, namely Regularized Wasserstein (RW) discrepancy, which can preserve both features and structure of graphs via Wasserstein distances on features and their local variations, local barycenters and global connectivity. Two strongly convex regularization terms are introduced to improve the learning ability. One is to relax an optimal alignment between graphs to be a cluster-to-cluster mapping between their locally connected vertices, thereby preserving the local clustering structure of graphs. The other is to take into account node degree distributions in order to better preserve the global structure of graphs. We also design an efficient algorithm to enable a fast approximation for solving the optimization problem. Theoretically, our framework is robust and can guarantee the convergence and numerical stability in optimization. We have empirically validated our method using 12 datasets against 16 state-of-the-art baselines. The experimental results show that our method consistently outperforms all state-of-the-art methods on all benchmark databases for both graphs with discrete attributes and graphs with continuous attributes.
The planted densest subgraph detection problem refers to the task of testing whether in a given (random) graph there is a subgraph that is unusually dense. Specifically, we observe an undirected and unweighted graph on $n$ nodes. Under the null hypothesis, the graph is a realization of an Erd\H{o}s-R\'{e}nyi graph with edge probability (or, density) $q$. Under the alternative, there is a subgraph on $k$ vertices with edge probability $p>q$. The statistical as well as the computational barriers of this problem are well-understood for a wide range of the edge parameters $p$ and $q$. In this paper, we consider a natural variant of the above problem, where one can only observe a small part of the graph using adaptive edge queries. For this model, we determine the number of queries necessary and sufficient for detecting the presence of the planted subgraph. Specifically, we show that any (possibly randomized) algorithm must make $\mathsf{Q} = \Omega(\frac{n^2}{k^2\chi^4(p||q)}\log^2n)$ adaptive queries (on expectation) to the adjacency matrix of the graph to detect the planted subgraph with probability more than $1/2$, where $\chi^2(p||q)$ is the Chi-Square distance. On the other hand, we devise a quasi-polynomial-time algorithm that detects the planted subgraph with high probability by making $\mathsf{Q} = O(\frac{n^2}{k^2\chi^4(p||q)}\log^2n)$ non-adaptive queries. We then propose a polynomial-time algorithm which is able to detect the planted subgraph using $\mathsf{Q} = O(\frac{n^3}{k^3\chi^2(p||q)}\log^3 n)$ queries. We conjecture that in the leftover regime, where $\frac{n^2}{k^2}\ll\mathsf{Q}\ll \frac{n^3}{k^3}$, no polynomial-time algorithms exist. Our results resolve two questions posed in \cite{racz2020finding}, where the special case of adaptive detection and recovery of a planted clique was considered.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.