We propose quasi-stable coloring, an approximate version of stable coloring. Stable coloring, also called color refinement, is a well-studied technique in graph theory for classifying vertices, which can be used to build compact, lossless representations of graphs. However, its usefulness is limited due to its reliance on strict symmetries. Real data compresses very poorly using color refinement. We propose the first, to our knowledge, approximate color refinement scheme, which we call quasi-stable coloring. By using approximation, we alleviate the need for strict symmetry, and allow for a tradeoff between the degree of compression and the accuracy of the representation. We study three applications: Linear Programming, Max-Flow, and Betweenness Centrality, and provide theoretical evidence in each case that a quasi-stable coloring can lead to good approximations on the reduced graph. Next, we consider how to compute a maximal quasi-stable coloring: we prove that, in general, this problem is NP-hard, and propose a simple, yet effective algorithm based on heuristics. Finally, we evaluate experimentally the quasi-stable coloring technique on several real graphs and applications, comparing with prior approximation techniques. A reference implementation and the experiment code are available at //github.com/mkyl/QuasiStableColors.jl .
A proper $k$-coloring of a graph $G$ is a \emph{neighbor-locating $k$-coloring} if for each pair of vertices in the same color class, the sets of colors found in their neighborhoods are different. The neighbor-locating chromatic number $\chi_{NL}(G)$ is the minimum $k$ for which $G$ admits a neighbor-locating $k$-coloring. A proper $k$-coloring of a graph $G$ is a \emph{locating $k$-coloring} if for each pair of vertices $x$ and $y$ in the same color-class, there exists a color class $S_i$ such that $d(x,S_i)\neq d(y,S_i)$. The locating chromatic number $\chi_{L}(G)$ is the minimum $k$ for which $G$ admits a locating $k$-coloring. It follows that $\chi(G)\leq\chi_L(G)\leq\chi_{NL}(G)$ for any graph $G$, where $\chi(G)$ is the usual chromatic number of $G$. We show that for any three integers $p,q,r$ with $2\leq p\leq q\leq r$ (except when $2=p=q<r$), there exists a connected graph $G_{p,q,r}$ with $\chi(G_{p,q,r})=p$, $\chi_L(G_{p,q,r})=q$ and $\chi_{NL}(G_{p,q,r})=r$. We also show that the locating chromatic number (resp., neighbor-locating chromatic number) of an induced subgraph of a graph $G$ can be arbitrarily larger than that of $G$. Alcon \textit{et al.} showed that the number $n$ of vertices of $G$ is bounded above by $k(2^{k-1}-1)$, where $\chi_{NL}(G)=k$ and $G$ is connected (this bound is tight). When $G$ has maximum degree $\Delta$, they also showed that a smaller upper-bound on $n$ of order $k^{\Delta+1}$ holds. We generalize the latter by proving that if $G$ has order $n$ and at most $an+b$ edges, then $n$ is upper-bounded by a bound of the order of $k^{2a+1}+2b$. Moreover, we describe constructions of such graphs which are close to reaching the bound.
We investigate the problem of stochastic, combinatorial multi-armed bandits where the learner only has access to bandit feedback and the reward function can be non-linear. We provide a general framework for adapting discrete offline approximation algorithms into sublinear $\alpha$-regret methods that only require bandit feedback, achieving $\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative $\alpha$-regret dependence on the horizon $T$. The framework only requires the offline algorithms to be robust to small errors in function evaluation. The adaptation procedure does not even require explicit knowledge of the offline approximation algorithm -- the offline algorithm can be used as black box subroutine. To demonstrate the utility of the proposed framework, the proposed framework is applied to multiple problems in submodular maximization, adapting approximation algorithms for cardinality and for knapsack constraints. The new CMAB algorithms for knapsack constraints outperform a full-bandit method developed for the adversarial setting in experiments with real-world data.
In many modern statistical problems, the limited available data must be used both to develop the hypotheses to test, and to test these hypotheses-that is, both for exploratory and confirmatory data analysis. Reusing the same dataset for both exploration and testing can lead to massive selection bias, leading to many false discoveries. Selective inference is a framework that allows for performing valid inference even when the same data is reused for exploration and testing. In this work, we are interested in the problem of selective inference for data clustering, where a clustering procedure is used to hypothesize a separation of the data points into a collection of subgroups, and we then wish to test whether these data-dependent clusters in fact represent meaningful differences within the data. Recent work by Gao et al. [2022] provides a framework for doing selective inference for this setting, where the hierarchical clustering algorithm is used for producing the cluster assignments, which was then extended to k-means clustering by Chen and Witten [2022]. Both these works rely on assuming a known covariance structure for the data, but in practice, the noise level needs to be estimated-and this is particularly challenging when the true cluster structure is unknown. In our work, we extend to the setting of noise with unknown variance, and provide a selective inference method for this more general setting. Empirical results show that our new method is better able to maintain high power while controlling Type I error when the true noise level is unknown.
A compression function is a map that slims down an observational set into a subset of reduced size, while preserving its informational content. In multiple applications, the condition that one new observation makes the compressed set change is interpreted that this observation brings in extra information and, in learning theory, this corresponds to misclassification, or misprediction. In this paper, we lay the foundations of a new theory that allows one to keep control on the probability of change of compression (called the "risk"). We identify conditions under which the cardinality of the compressed set is a consistent estimator for the risk (without any upper limit on the size of the compressed set) and prove unprecedentedly tight bounds to evaluate the risk under a generally applicable condition of preference. All results are usable in a fully agnostic setup, without requiring any a priori knowledge on the probability distribution of the observations. Not only these results offer a valid support to develop trust in observation-driven methodologies, they also play a fundamental role in learning techniques as a tool for hyper-parameter tuning.
We study the computational complexity of $c$-Colored $P_\ell$ Deletion and $c$-Colored $C_\ell$ Deletion. In these problems, one is given a $c$-edge-colored graph and wants to destroy all induced $c$-colored paths or cycles, respectively, on $\ell$ vertices by deleting at most $k$ edges. Herein, a path or cycle is $c$-colored if it contains edges of $c$ distinct colors. We show that $c$-Colored $P_\ell$ Deletion and $c$-Colored $C_\ell$ Deletion are NP-hard for each non-trivial combination of $c$ and $\ell$. We then analyze the parameterized complexity of these problems. We extend the notion of neighborhood diversity to edge-colored graphs and show that both problems are fixed-parameter tractable with respect to the colored neighborhood diversity of the input graph. We also provide hardness results to outline the limits of parameterization by the standard parameter solution size $k$. Finally, we consider bicolored input graphs and show a special case of $2$-Colored $P_4$ Deletion that can be solved in polynomial time.
In this paper, we study fast first-order algorithms that approximately solve linear programs (LPs). More specifically, we apply algorithms from online linear programming to offline LPs and derive algorithms that are free of any matrix multiplication. To further improve the applicability of the proposed methods, we propose a variable-duplication technique that achieves $\mathcal{O}(\sqrt{mn/K})$ optimality gap by copying each variable $K$ times. Moreover, we identify that online algorithms can be efficiently incorporated into a column generation framework for large-scale LPs. Finally, numerical experiments show that our proposed methods can be applied either as an approximate direct solver or as an initialization subroutine in frameworks of exact LP solving.
Bayesian model comparison (BMC) offers a principled approach for assessing the relative merits of competing computational models and propagating uncertainty into model selection decisions. However, BMC is often intractable for the popular class of hierarchical models due to their high-dimensional nested parameter structure. To address this intractability, we propose a deep learning method for performing BMC on any set of hierarchical models which can be instantiated as probabilistic programs. Since our method enables amortized inference, it allows efficient re-estimation of posterior model probabilities and fast performance validation prior to any real-data application. In a series of extensive validation studies, we benchmark the performance of our method against the state-of-the-art bridge sampling method and demonstrate excellent amortized inference across all BMC settings. We then use our method to compare four hierarchical evidence accumulation models that have previously been deemed intractable for BMC due to partly implicit likelihoods. In this application, we corroborate evidence for the recently proposed L\'evy flight model of decision-making and show how transfer learning can be leveraged to enhance training efficiency. Reproducible code for all analyses is provided.
We present a novel method for the safety verification of nonlinear dynamical models that uses neural networks to represent abstractions of their dynamics. Neural networks have extensively been used before as approximators; in this work, we make a step further and use them for the first time as abstractions. For a given dynamical model, our method synthesises a neural network that overapproximates its dynamics by ensuring an arbitrarily tight, formally certified bound on the approximation error. For this purpose, we employ a counterexample-guided inductive synthesis procedure. We show that this produces a neural ODE with non-deterministic disturbances that constitutes a formal abstraction of the concrete model under analysis. This guarantees a fundamental property: if the abstract model is safe, i.e., free from any initialised trajectory that reaches an undesirable state, then the concrete model is also safe. By using neural ODEs with ReLU activation functions as abstractions, we cast the safety verification problem for nonlinear dynamical models into that of hybrid automata with affine dynamics, which we verify using SpaceEx. We demonstrate that our approach performs comparably to the mature tool Flow* on existing benchmark nonlinear models. We additionally demonstrate and that it is effective on models that do not exhibit local Lipschitz continuity, which are out of reach to the existing technologies.
Dedicated tensor accelerators demonstrate the importance of linear algebra in modern applications. Such accelerators have the potential for impressive performance gains, but require programmers to rewrite code using vendor APIs - a barrier to wider scale adoption. Recent work overcomes this by matching and replacing patterns within code, but such approaches are fragile and fail to cope with the diversity of real-world codes. We develop ATC, a compiler that uses program synthesis to map regions of code to specific APIs. The mapping space that ATC explores is combinatorially large, requiring the development of program classification, dynamic analysis, variable constraint generation and lexical distance matching techniques to make it tractable. We apply ATC to real-world tensor and linear algebra codes and evaluate them against four state-of-the-art approaches. We accelerate between 2.6x and 7x more programs, leading to over an order of magnitude performance improvement.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.