This paper studies the classical problem of finding all $k$ nearest neighbors to points of a query set $Q$ in another reference set $R$ within any metric space. The well-known work by Beygelzimer, Kakade, and Langford in 2006 introduced cover trees and claimed to guarantee a near linear time complexity in the size $|R|$ of the reference set for $k=1$. Our previous work defined compressed cover trees and corrected the key arguments for $k\geq 1$ and previously unknown challenging data cases. In 2009 Ram, Lee, March, and Gray attempted to improve the time complexity by using pairs of cover trees on the query and reference sets. In 2015 Curtin with the above co-authors used extra parameters to finally prove a similar complexity for $k = 1$. Our work fills all previous gaps and substantially improves the neighbor search based on pairs of new compressed cover trees. The novel imbalance parameter of paired trees allowed us to prove a better time complexity for any number of neighbors $k\geq 1$.
Recently, methods have been proposed that exploit the invariance of prediction models with respect to changing environments to infer subsets of the causal parents of a response variable. If the environments influence only few of the underlying mechanisms, the subset identified by invariant causal prediction, for example, may be small, or even empty. We introduce the concept of minimal invariance and propose invariant ancestry search (IAS). In its population version, IAS outputs a set which contains only ancestors of the response and is a superset of the output of ICP. When applied to data, corresponding guarantees hold asymptotically if the underlying test for invariance has asymptotic level and power. We develop scalable algorithms and perform experiments on simulated and real data.
We consider problems that can be formulated as a task of finding an optimal triangulation of a graph w.r.t. some notion of optimality. We present algorithms parameterized by the size of a minimum edge clique cover ($cc$) to such problems. This parameterization occurs naturally in many problems in this setting, e.g., in the perfect phylogeny problem $cc$ is at most the number of taxa, in fractional hypertreewidth $cc$ is at most the number of hyperedges, and in treewidth of Bayesian networks $cc$ is at most the number of non-root nodes. We show that the number of minimal separators of graphs is at most $2^{cc}$, the number of potential maximal cliques is at most $3^{cc}$, and these objects can be listed in times $O^*(2^{cc})$ and $O^*(3^{cc})$, respectively, even when no edge clique cover is given as input; the $O^*(\cdot)$ notation omits factors polynomial in the input size. These enumeration algorithms imply $O^*(3^{cc})$ time algorithms for problems such as treewidth, weighted minimum fill-in, and feedback vertex set. For generalized and fractional hypertreewidth we give $O^*(4^m)$ time and $O^*(3^m)$ time algorithms, respectively, where $m$ is the number of hyperedges. When an edge clique cover of size $cc'$ is given as a part of the input we give $O^*(2^{cc'})$ time algorithms for treewidth, minimum fill-in, and chordal sandwich. This implies an $O^*(2^n)$ time algorithm for perfect phylogeny, where $n$ is the number of taxa. We also give polynomial space algorithms with time complexities $O^*(9^{cc'})$ and $O^*(9^{cc + O(\log^2 cc)})$ for problems in this framework.
We propose confidence regions with asymptotically correct uniform coverage probability of parameters whose Fisher information matrix can be singular at important points of the parameter set. Our work is motivated by the need for reliable inference on scale parameters close or equal to zero in mixed models, which is obtained as a special case. The confidence regions are constructed by inverting a continuous extension of the score test statistic standardized by expected information, which we show exists at points of singular information under regularity conditions. Similar results have previously only been obtained for scalar parameters, under conditions stronger than ours, and applications to mixed models have not been considered. In simulations our confidence regions have near-nominal coverage with as few as $n = 20$ independent observations, regardless of how close to the boundary the true parameter is. It is a corollary of our main results that the proposed test statistic has an asymptotic chi-square distribution with degrees of freedom equal to the number of tested parameters, even if they are on the boundary of the parameter set.
A long line of research on fixed parameter tractability of integer programming culminated with showing that integer programs with n variables and a constraint matrix with dual tree-depth d and largest entry D are solvable in time g(d,D)poly(n) for some function g. However, the dual tree-depth of a constraint matrix is not preserved by row operations, i.e., a given integer program can be equivalent to another with a smaller dual tree-depth, and thus does not reflect its geometric structure. We prove that the minimum dual tree-depth of a row-equivalent matrix is equal to the branch-depth of the matroid defined by the columns of the matrix. We design a fixed parameter algorithm for computing branch-depth of matroids represented over a finite field and a fixed parameter algorithm for computing a row-equivalent matrix with minimum dual tree-depth. Finally, we use these results to obtain an algorithm for integer programming running in time g(d*,D)poly(n) where d* is the branch-depth of the constraint matrix; the branch-depth cannot be replaced by the more permissive notion of branch-width.
This paper investigates why it is beneficial, when solving a problem, to search in the neighbourhood of a current solution. The paper identifies properties of problems and neighbourhoods that support two novel proofs that neighbourhood search is beneficial over blind search. These are: firstly a proof that search within the neighbourhood is more likely to find an improving solution in a single search step than blind search; and secondly a proof that a local improvement, using a sequence of neighbourhood search steps, is likely to achieve a greater improvement than a sequence of blind search steps. To explore the practical impact of these properties, a range of problem sets and neighbourhoods are generated, where these properties are satisfied to different degrees. Experiments reveal that the benefits of neighbourhood search vary dramatically in consequence. Random problems of a classical combinatorial optimisation problem are analysed, in order to demonstrate that the underlying theory is reflected in practice.
In this paper, we show several parameterized problems to be complete for the class XNLP: parameterized problems that can be solved with a non-deterministic algorithm that uses $f(k)\log n$ space and $f(k)n^c$ time, with $f$ a computable function, $n$ the input size, $k$ the parameter and $c$ a constant. The problems include Maximum Regular Induced Subgraph and Max Cut parameterized by linear clique-width, Capacitated (Red-Blue) Dominating Set parameterized by pathwidth, Odd Cycle Transversal parameterized by a new parameter we call logarithmic linear clique-width (defined as $k/\log n$ for an $n$-vertex graph of linear clique-width $k$), and Bipartite Bandwidth.
A Java parallel streams implementation of the $K$-nearest neighbor descent algorithm is presented using a natural statistical termination criterion. Input data consist of a set $S$ of $n$ objects of type V, and a Function<V, Comparator<V>>, which enables any $x \in S$ to decide which of $y, z \in S\setminus\{x\}$ is more similar to $x$. Experiments with the Kullback-Leibler divergence Comparator support the prediction that the number of rounds of $K$-nearest neighbor updates need not exceed twice the diameter of the undirected version of a random regular out-degree $K$ digraph on $n$ vertices. Overall complexity was $O(n K^2 \log_K(n))$ in the class of examples studied. When objects are sampled uniformly from a $d$-dimensional simplex, accuracy of the $K$-nearest neighbor approximation is high up to $d = 20$, but declines in higher dimensions, as theory would predict.
Long-term outcomes of experimental evaluations are necessarily observed after long delays. We develop semiparametric methods for combining the short-term outcomes of an experimental evaluation with observational measurements of the joint distribution of short-term and long-term outcomes to estimate long-term treatment effects. We characterize semiparametric efficiency bounds for estimation of the average effect of a treatment on a long-term outcome in several instances of this problem. These calculations facilitate the construction of semiparametrically efficient estimators. The finite-sample performance of these estimators is analyzed with a simulation calibrated to a randomized evaluation of the long-term effects of a poverty alleviation program.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.