We propose an efficient $\epsilon$-differentially private algorithm, that given a simple {\em weighted} $n$-vertex, $m$-edge graph $G$ with a \emph{maximum unweighted} degree $\Delta(G) \leq n-1$, outputs a synthetic graph which approximates the spectrum with $\widetilde{O}(\min\{\Delta(G), \sqrt{n}\})$ bound on the purely additive error. To the best of our knowledge, this is the first $\epsilon$-differentially private algorithm with a non-trivial additive error for approximating the spectrum of the graph. One of the subroutines of our algorithm also precisely simulates the exponential mechanism over a non-convex set, which could be of independent interest given the recent interest in sampling from a {\em log-concave distribution} defined over a convex set. Spectral approximation also allows us to approximate all possible $(S,T)$-cuts, but it incurs an error that depends on the maximum degree, $\Delta(G)$. We further show that using our sampler, we can also output a synthetic graph that approximates the sizes of all $(S,T)$-cuts on $n$ vertices weighted graph $G$ with $m$ edges while preserving $(\epsilon,\delta)$-differential privacy and an additive error of $\widetilde{O}(\sqrt{mn}/\epsilon)$. We also give a matching lower bound (with respect to all the parameters) on the private cut approximation for weighted graphs. This removes the gap of $\sqrt{W_{\mathsf{avg}}}$ in the upper and lower bound in Eli{\'a}{\v{s}}, Kapralov, Kulkarni, and Lee (SODA 2020), where $W_{\mathsf{avg}}$ is the average edge weight.
In this paper, we propose a randomized $\tilde{O}(\mu(G))$-round algorithm for the maximum cardinality matching problem in the CONGEST model, where $\mu(G)$ means the maximum size of a matching of the input graph $G$. The proposed algorithm substantially improves the current best worst-case running time. The key technical ingredient is a new randomized algorithm of finding an augmenting path of length $\ell$ with high probability within $\tilde{O}(\ell)$ rounds, which positively settles an open problem left in the prior work by Ahmadi and Kuhn [DISC'20]. The idea of our augmenting path algorithm is based on a recent result by Kitamura and Izumi [IEICE Trans.'22], which efficiently identifies a sparse substructure of the input graph containing an augmenting path, following a new concept called \emph{alternating base trees}. Their algorithm, however, resorts to a centralized approach of collecting the entire information of the substructure into a single vertex for constructing an augmenting path. The technical highlight of this paper is to provide a fully-decentralized counterpart of such a centralized method. To develop the algorithm, we prove several new structural properties of alternating base trees, which are of independent interest.
A matrix $\Phi \in \mathbb{R}^{Q \times N}$ satisfies the restricted isometry property if $\|\Phi x\|_2^2$ is approximately equal to $\|x\|_2^2$ for all $k$-sparse vectors $x$. We give a construction of RIP matrices with the optimal $Q = O(k \log(N/k))$ rows using $O(k\log(N/k)\log(k))$ bits of randomness. The main technical ingredient is an extension of the Hanson-Wright inequality to $\epsilon$-biased distributions.
Neural ordinary differential equations (NODEs) have been proven useful for learning non-linear dynamics of arbitrary trajectories. However, current NODE methods capture variations across trajectories only via the initial state value or by auto-regressive encoder updates. In this work, we introduce Modulated Neural ODEs (MoNODEs), a novel framework that sets apart dynamics states from underlying static factors of variation and improves the existing NODE methods. In particular, we introduce $\textit{time-invariant modulator variables}$ that are learned from the data. We incorporate our proposed framework into four existing NODE variants. We test MoNODE on oscillating systems, videos and human walking trajectories, where each trajectory has trajectory-specific modulation. Our framework consistently improves the existing model ability to generalize to new dynamic parameterizations and to perform far-horizon forecasting. In addition, we verify that the proposed modulator variables are informative of the true unknown factors of variation as measured by $R^2$ scores.
We provide a simple and natural solution to the problem of generating all $2^n \cdot n!$ signed permutations of $[n] = \{1,2,\ldots,n\}$. Our solution provides a pleasing generalization of the most famous ordering of permutations: plain changes (Steinhaus-Johnson-Trotter algorithm). In plain changes, the $n!$ permutations of $[n]$ are ordered so that successive permutations differ by swapping a pair of adjacent symbols, and the order is often visualized as a weaving pattern involving $n$ ropes. Here we model a signed permutation using $n$ ribbons with two distinct sides, and each successive configuration is created by twisting (i.e., swapping and turning over) two neighboring ribbons or a single ribbon. By greedily prioritizing $2$-twists of the largest symbol before $1$-twists of the largest symbol, we create a signed version of plain change's memorable zig-zag pattern. We provide a loopless algorithm (i.e., worst-case $\mathcal{O}(1)$-time per object) by extending the well-known mixed-radix Gray code algorithm.
We propose a novel algorithm for data augmentation in nonlinear over-parametrized regression. Our data augmentation algorithm borrows from the literature on causality and extends the recently proposed Anchor regression (AR) method for data augmentation, which is in contrast to the current state-of-the-art domain-agnostic solutions that rely on the Mixup literature. Our Anchor Data Augmentation (ADA) uses several replicas of the modified samples in AR to provide more training examples, leading to more robust regression predictions. We apply ADA to linear and nonlinear regression problems using neural networks. ADA is competitive with state-of-the-art C-Mixup solutions.
We investigate the problem of approximating an incomplete preference relation $\succsim$ on a finite set by a complete preference relation. We aim to obtain this approximation in such a way that the choices on the basis of two preferences, one incomplete, the other complete, have the smallest possible discrepancy in the aggregate. To this end, we use the top-difference metric on preferences, and define a best complete approximation of $\succsim$ as a complete preference relation nearest to $\succsim$ relative to this metric. We prove that such an approximation must be a maximal completion of $\succsim$, and that it is, in fact, any one completion of $\succsim$ with the largest index. Finally, we use these results to provide a sufficient condition for the best complete approximation of a preference to be its canonical completion. This leads to closed-form solutions to the best approximation problem in the case of several incomplete preference relations of interest.
The matrix semigroup membership problem asks, given square matrices $M,M_1,\ldots,M_k$ of the same dimension, whether $M$ lies in the semigroup generated by $M_1,\ldots,M_k$. It is classical that this problem is undecidable in general but decidable in case $M_1,\ldots,M_k$ commute. In this paper we consider the problem of whether, given $M_1,\ldots,M_k$, the semigroup generated by $M_1,\ldots,M_k$ contains a non-negative matrix. We show that in case $M_1,\ldots,M_k$ commute, this problem is decidable subject to Schanuel's Conjecture. We show also that the problem is undecidable if the commutativity assumption is dropped. A key lemma in our decidability result is a procedure to determine, given a matrix $M$, whether the sequence of matrices $(M^n)_{n\geq 0}$ is ultimately nonnegative. This answers a problem posed by S. Akshay (arXiv:2205.09190). The latter result is in stark contrast to the notorious fact that it is not known how to determine effectively whether for any specific matrix index $(i,j)$ the sequence $(M^n)_{i,j}$ is ultimately nonnegative (which is a formulation of the Ultimate Positivity Problem for linear recurrence sequences).
We define a new class of predicates called equilevel predicates on a distributive lattice which eases the analysis of parallel algorithms. Many combinatorial problems such as the vertex cover problem, the bipartite matching problem, and the minimum spanning tree problem can be modeled as detecting an equilevel predicate. The problem of detecting an equilevel problem is NP-complete, but equilevel predicates with the helpful property can be detected in polynomial time in an online manner. An equilevel predicate has the helpful property with a polynomial time algorithm if the algorithm can return a nonempty set of indices such that advancing on any of them can be used to detect the predicate. Furthermore, the refined independently helpful property allows online parallel detection of such predicates in NC. When the independently helpful property holds, advancing on all the specified indices in parallel can be used to detect the predicate in polylogarithmic time. We also define a special class of equilevel predicates called solitary predicates. Unless NP = RP, this class of predicate also does not admit efficient algorithms. Earlier work has shown that solitary predicates with the efficient advancement can be detected in polynomial time. We introduce two properties called the antimonotone advancement and the efficient rejection which yield the detection of solitary predicates in NC. Finally, we identify the minimum spanning tree, the shortest path, and the conjunctive predicate detection as problems satisfying such properties, giving alternative certifications of their NC memberships as a result.
In this paper, we investigate the problem of deciding whether two random databases $\mathsf{X}\in\mathcal{X}^{n\times d}$ and $\mathsf{Y}\in\mathcal{Y}^{n\times d}$ are statistically dependent or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these two databases are statistically independent, while under the alternative, there exists an unknown row permutation $\sigma$, such that $\mathsf{X}$ and $\mathsf{Y}^\sigma$, a permuted version of $\mathsf{Y}$, are statistically dependent with some known joint distribution, but have the same marginal distributions as the null. We characterize the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$, $d$, and some spectral properties of the generative distributions of the datasets. For example, we prove that if a certain function of the eigenvalues of the likelihood function and $d$, is below a certain threshold, as $d\to\infty$, then weak detection (performing slightly better than random guessing) is statistically impossible, no matter what the value of $n$ is. This mimics the performance of an efficient test that thresholds a centered version of the log-likelihood function of the observed matrices. We also analyze the case where $d$ is fixed, for which we derive strong (vanishing error) and weak detection lower and upper bounds.
Graph-based kNN algorithms have garnered widespread popularity for machine learning tasks due to their simplicity and effectiveness. However, as factual data often inherit complex distributions, the conventional kNN graph's reliance on a unified k-value can hinder its performance. A crucial factor behind this challenge is the presence of ambiguous samples along decision boundaries that are inevitably more prone to incorrect classifications. To address the situation, we propose the Distribution-Informed adaptive kNN Graph (DaNNG), which combines adaptive kNN with distribution-aware graph construction. By incorporating an approximation of the distribution with customized k-adaption criteria, DaNNG can significantly improve performance on ambiguous samples, and hence enhance overall accuracy and generalization capability. Through rigorous evaluations on diverse benchmark datasets, DaNNG outperforms state-of-the-art algorithms, showcasing its adaptability and efficacy across various real-world scenarios.