The inverse geodesic length of a graph $G$ is the sum of the inverse of the distances between all pairs of distinct vertices of $G$. In some domains it is known as the Harary index or the global efficiency of the graph. We show that, if $G$ is planar and has $n$ vertices, then the inverse geodesic length of $G$ can be computed in roughly $O(n^{9/5})$ time. We also show that, if $G$ has $n$ vertices and treewidth at most $k$, then the inverse geodesic length of $G$ can be computed in $O(n \log^{O(k)}n)$ time. In both cases we use techniques developed for computing the sum of the distances, which does not have "inverse" component, together with batched evaluations of rational functions.
We consider a participatory budgeting problem in which each voter submits a proposal for how to divide a single divisible resource (such as money or time) among several possible alternatives (such as public projects or activities) and these proposals must be aggregated into a single aggregate division. Under $\ell_1$ preferences -- for which a voter's disutility is given by the $\ell_1$ distance between the aggregate division and the division he or she most prefers -- the social welfare-maximizing mechanism, which minimizes the average $\ell_1$ distance between the outcome and each voter's proposal, is incentive compatible (Goel et al. 2016). However, it fails to satisfy the natural fairness notion of proportionality, placing too much weight on majority preferences. Leveraging a connection between market prices and the generalized median rules of Moulin (1980), we introduce the independent markets mechanism, which is both incentive compatible and proportional. We unify the social welfare-maximizing mechanism and the independent markets mechanism by defining a broad class of moving phantom mechanisms that includes both. We show that every moving phantom mechanism is incentive compatible. Finally, we characterize the social welfare-maximizing mechanism as the unique Pareto-optimal mechanism in this class, suggesting an inherent tradeoff between Pareto optimality and proportionality.
Hamiltonian cycles in graphs were first studied in the 1850s. Since then, an impressive amount of research has been dedicated to identifying classes of graphs that allow Hamiltonian cycles, and to related questions. The corresponding decision problem, that asks whether a given graph is Hamiltonian (i.\,e.\ admits a Hamiltonian cycle), is one of Karp's famous NP-complete problems. In this paper we study graphs of bounded degree that are \emph{far} from being Hamiltonian, where a graph $G$ on $n$ vertices is \emph{far} from being Hamiltonian, if modifying a constant fraction of $n$ edges is necessary to make $G$ Hamiltonian. We give an explicit deterministic construction of a class of graphs of bounded degree that are locally Hamiltonian, but (globally) far from being Hamiltonian. Here, \emph{locally Hamiltonian} means that every subgraph induced by the neighbourhood of a small vertex set appears in some Hamiltonian graph. More precisely, we obtain graphs which differ in $\Theta(n)$ edges from any Hamiltonian graph, but non-Hamiltonicity cannot be detected in the neighbourhood of $o(n)$ vertices. Our class of graphs yields a class of hard instances for one-sided error property testers with linear query complexity. It is known that any property tester (even with two-sided error) requires a linear number of queries to test Hamiltonicity (Yoshida, Ito, 2010). This is proved via a randomised construction of hard instances. In contrast, our construction is deterministic. So far only very few deterministic constructions of hard instances for property testing are known. We believe that our construction may lead to future insights in graph theory and towards a characterisation of the properties that are testable in the bounded-degree model.
The randomized singular value decomposition (SVD) is a popular and effective algorithm for computing a near-best rank $k$ approximation of a matrix $A$ using matrix-vector products with standard Gaussian vectors. Here, we generalize the randomized SVD to multivariate Gaussian vectors, allowing one to incorporate prior knowledge of $A$ into the algorithm. This enables us to explore the continuous analogue of the randomized SVD for Hilbert--Schmidt (HS) operators using operator-function products with functions drawn from a Gaussian process (GP). We then construct a new covariance kernel for GPs, based on weighted Jacobi polynomials, which allows us to rapidly sample the GP and control the smoothness of the randomly generated functions. Numerical examples on matrices and HS operators demonstrate the applicability of the algorithm.
For a map (function) $F(x):\ftwo^n\rightarrow\ftwo^n$ and a given $y$ in the image of $F$ the problem of \emph{local inversion} of $F$ is to find all inverse images $x$ in $\ftwo^n$ such that $y=F(x)$. In Cryptology, such a problem arises in Cryptanalysis of One way Functions (OWFs). The well known TMTO attack in Cryptanalysis is a probabilistic algorithm for computing one solution of local inversion using $O(\sqrt N)$ order computation in offline as well as online for $N=2^n$. This paper proposes a complete algorithm for solving the local inversion problem which uses linear complexity for a unique solution in a periodic orbit. The algorithm is shown to require an offline computation to solve a hard problem (possibly requiring exponential computation) and an online computation dependent on $y$ that of repeated forward evaluation $F(x)$ on points $x$ in $\ff_{2^n}$ which is polynomial time at each evaluation. However the forward evaluation is repeated at most as many number of times as the Linear Complexity of the sequence $\{y,F(y),\ldots\}$ to get one possible solution when this sequence is periodic. All other solutions are obtained in chains $\{e,F(e),\ldots\}$ for all points $e$ in the Garden of Eden (GOE) of the map $F$. Hence a solution $x$ exists iff either the former sequence is periodic or a solution occurs in a chain starting from a point in GOE. The online computation then turns out to be polynomial time $O(L^k)$ in the linear complexity $L$ of the sequence to compute one possible solution in a periodic orbit or $O(l)$ the chain length for a fixed $n$. Hence this is a complete algorithm for solving the problem of finding all rational solutions $x$ of the equation $F(x)=y$ for a given $y$ and a map $F$ in $\ff_{2^n}$.
We study the Bahadur efficiency of several weighted L2--type goodness--of--fit tests based on the empirical characteristic function. The methods considered are for normality and exponentiality testing, and for testing goodness--of--fit to the logistic distribution. Our results are helpful in deciding which specific test a potential practitioner should apply. For the celebrated BHEP and energy tests for normality we obtain novel efficiency results, with some of them in the multivariate case, while in the case of the logistic distribution this is the first time that efficiencies are computed for any composite goodness--of--fit test.
A bipartite graph $G=(A,B,E)$ is ${\cal H}$-convex, for some family of graphs ${\cal H}$, if there exists a graph $H\in {\cal H}$ with $V(H)=A$ such that the set of neighbours in $A$ of each $b\in B$ induces a connected subgraph of $H$. Many $\mathsf{NP}$-complete problems, including problems such as Dominating Set, Feedback Vertex Set, Induced Matching and List $k$-Colouring, become polynomial-time solvable for ${\mathcal H}$-convex graphs when ${\mathcal H}$ is the set of paths. In this case, the class of ${\mathcal H}$-convex graphs is known as the class of convex graphs. The underlying reason is that the class of convex graphs has bounded mim-width. We extend the latter result to families of ${\mathcal H}$-convex graphs where (i) ${\mathcal H}$ is the set of cycles, or (ii) ${\mathcal H}$ is the set of trees with bounded maximum degree and a bounded number of vertices of degree at least $3$. As a consequence, we can re-prove and strengthen a large number of results on generalized convex graphs known in the literature. To complement result (ii), we show that the mim-width of ${\mathcal H}$-convex graphs is unbounded if ${\mathcal H}$ is the set of trees with arbitrarily large maximum degree or an arbitrarily large number of vertices of degree at least $3$. In this way we are able to determine complexity dichotomies for the aforementioned graph problems. Afterwards we perform a more refined width-parameter analysis, which shows even more clearly which width parameters are bounded for classes of ${\cal H}$-convex graphs.
For an integer $k \geq 1$ and a graph $G$, let $\mathcal{K}_k(G)$ be the graph that has vertex set all proper $k$-colorings of $G$, and an edge between two vertices $\alpha$ and~$\beta$ whenever the coloring~$\beta$ can be obtained from $\alpha$ by a single Kempe change. A theorem of Meyniel from 1978 states that $\mathcal{K}_5(G)$ is connected with diameter $O(5^{|V(G)|})$ for every planar graph $G$. We significantly strengthen this result, by showing that there is a positive constant $c$ such that $\mathcal{K}_5(G)$ has diameter $O(|V(G)|^c)$ for every planar graph $G$.
Counterfactual explanations are usually generated through heuristics that are sensitive to the search's initial conditions. The absence of guarantees of performance and robustness hinders trustworthiness. In this paper, we take a disciplined approach towards counterfactual explanations for tree ensembles. We advocate for a model-based search aiming at "optimal" explanations and propose efficient mixed-integer programming approaches. We show that isolation forests can be modeled within our framework to focus the search on plausible explanations with a low outlier score. We provide comprehensive coverage of additional constraints that model important objectives, heterogeneous data types, structural constraints on the feature space, along with resource and actionability restrictions. Our experimental analyses demonstrate that the proposed search approach requires a computational effort that is orders of magnitude smaller than previous mathematical programming algorithms. It scales up to large data sets and tree ensembles, where it provides, within seconds, systematic explanations grounded on well-defined models solved to optimality.
Graph Neural Networks (GNN) come in many flavors, but should always be either invariant (permutation of the nodes of the input graph does not affect the output) or equivariant (permutation of the input permutes the output). In this paper, we consider a specific class of invariant and equivariant networks, for which we prove new universality theorems. More precisely, we consider networks with a single hidden layer, obtained by summing channels formed by applying an equivariant linear operator, a pointwise non-linearity and either an invariant or equivariant linear operator. Recently, Maron et al. (2019) showed that by allowing higher-order tensorization inside the network, universal invariant GNNs can be obtained. As a first contribution, we propose an alternative proof of this result, which relies on the Stone-Weierstrass theorem for algebra of real-valued functions. Our main contribution is then an extension of this result to the equivariant case, which appears in many practical applications but has been less studied from a theoretical point of view. The proof relies on a new generalized Stone-Weierstrass theorem for algebra of equivariant functions, which is of independent interest. Finally, unlike many previous settings that consider a fixed number of nodes, our results show that a GNN defined by a single set of parameters can approximate uniformly well a function defined on graphs of varying size.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.