The integer complexity $f(n)$ of a positive integer $n$ is defined as the minimum number of 1's needed to represent $n$, using additions, multiplications and parentheses. We present two simple and faster algorithms for computing the integer complexity: 1) A near-optimal $O(N\mathop{\mathrm{polylog}} N)$-time algorithm for computing the integer complexity of all $n\leq N$, improving the previous $O(N^{1.223})$ one [Cordwell et al., 2017]. 2) The first sublinear-time algorithm for computing the integer complexity of a single $n$, with running time $O(n^{0.6154})$. The previous algorithms for computing a single $f(n)$ require computing all $f(1),\dots,f(n)$.
The Weisfeiler-Leman (WL) dimension of a graph parameter $f$ is the minimum $k$ such that, if $G_1$ and $G_2$ are indistinguishable by the $k$-dimensional WL-algorithm then $f(G_1)=f(G_2)$. The WL-dimension of $f$ is $\infty$ if no such $k$ exists. We study the WL-dimension of graph parameters characterised by the number of answers from a fixed conjunctive query to the graph. Given a conjunctive query $\varphi$, we quantify the WL-dimension of the function that maps every graph $G$ to the number of answers of $\varphi$ in $G$. The works of Dvor\'ak (J. Graph Theory 2010), Dell, Grohe, and Rattan (ICALP 2018), and Neuen (ArXiv 2023) have answered this question for full conjunctive queries, which are conjunctive queries without existentially quantified variables. For such queries $\varphi$, the WL-dimension is equal to the treewidth of the Gaifman graph of $\varphi$. In this work, we give a characterisation that applies to all conjunctive qureies. Given any conjunctive query $\varphi$, we prove that its WL-dimension is equal to the semantic extension width $\mathsf{sew}(\varphi)$, a novel width measure that can be thought of as a combination of the treewidth of $\varphi$ and its quantified star size, an invariant introduced by Durand and Mengel (ICDT 2013) describing how the existentially quantified variables of $\varphi$ are connected with the free variables. Using the recently established equivalence between the WL-algorithm and higher-order Graph Neural Networks (GNNs) due to Morris et al. (AAAI 2019), we obtain as a consequence that the function counting answers to a conjunctive query $\varphi$ cannot be computed by GNNs of order smaller than $\mathsf{sew}(\varphi)$.
Existing contrastive learning methods rely on pairwise sample contrast $z_x^\top z_{x'}$ to learn data representations, but the learned features often lack clear interpretability from a human perspective. Theoretically, it lacks feature identifiability and different initialization may lead to totally different features. In this paper, we study a new method named tri-factor contrastive learning (triCL) that involves a 3-factor contrast in the form of $z_x^\top S z_{x'}$, where $S=\text{diag}(s_1,\dots,s_k)$ is a learnable diagonal matrix that automatically captures the importance of each feature. We show that by this simple extension, triCL can not only obtain identifiable features that eliminate randomness but also obtain more interpretable features that are ordered according to the importance matrix $S$. We show that features with high importance have nice interpretability by capturing common classwise features, and obtain superior performance when evaluated for image retrieval using a few features. The proposed triCL objective is general and can be applied to different contrastive learning methods like SimCLR and CLIP. We believe that it is a better alternative to existing 2-factor contrastive learning by improving its identifiability and interpretability with minimal overhead. Code is available at //github.com/PKU-ML/Tri-factor-Contrastive-Learning.
This paper studies the prediction of a target $\mathbf{z}$ from a pair of random variables $(\mathbf{x},\mathbf{y})$, where the ground-truth predictor is additive $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$. We study the performance of empirical risk minimization (ERM) over functions $f+g$, $f \in F$ and $g \in G$, fit on a given training distribution, but evaluated on a test distribution which exhibits covariate shift. We show that, when the class $F$ is "simpler" than $G$ (measured, e.g., in terms of its metric entropy), our predictor is more resilient to heterogeneous covariate shifts} in which the shift in $\mathbf{x}$ is much greater than that in $\mathbf{y}$. Our analysis proceeds by demonstrating that ERM behaves qualitatively similarly to orthogonal machine learning: the rate at which ERM recovers the $f$-component of the predictor has only a lower-order dependence on the complexity of the class $G$, adjusted for partial non-indentifiability introduced by the additive structure. These results rely on a novel H\"older style inequality for the Dudley integral which may be of independent interest. Moreover, we corroborate our theoretical findings with experiments demonstrating improved resilience to shifts in "simpler" features across numerous domains.
For a set of points in $\mathbb{R}^d$, the Euclidean $k$-means problems consists of finding $k$ centers such that the sum of distances squared from each data point to its closest center is minimized. Coresets are one the main tools developed recently to solve this problem in a big data context. They allow to compress the initial dataset while preserving its structure: running any algorithm on the coreset provides a guarantee almost equivalent to running it on the full data. In this work, we study coresets in a fully-dynamic setting: points are added and deleted with the goal to efficiently maintain a coreset with which a k-means solution can be computed. Based on an algorithm from Henzinger and Kale [ESA'20], we present an efficient and practical implementation of a fully dynamic coreset algorithm, that improves the running time by up to a factor of 20 compared to our non-optimized implementation of the algorithm by Henzinger and Kale, without sacrificing more than 7% on the quality of the k-means solution.
We consider the problem of testing and learning quantum $k$-juntas: $n$-qubit unitary matrices which act non-trivially on just $k$ of the $n$ qubits and as the identity on the rest. As our main algorithmic results, we give (a) a $\widetilde{O}(\sqrt{k})$-query quantum algorithm that can distinguish quantum $k$-juntas from unitary matrices that are "far" from every quantum $k$-junta; and (b) a $O(4^k)$-query algorithm to learn quantum $k$-juntas. We complement our upper bounds for testing quantum $k$-juntas and learning quantum $k$-juntas with near-matching lower bounds of $\Omega(\sqrt{k})$ and $\Omega(\frac{4^k}{k})$, respectively. Our techniques are Fourier-analytic and make use of a notion of influence of qubits on unitaries.
We propose a novel and practical privacy notion called $f$-Membership Inference Privacy ($f$-MIP), which explicitly considers the capabilities of realistic adversaries under the membership inference attack threat model. Consequently, $f$-MIP offers interpretable privacy guarantees and improved utility (e.g., better classification accuracy). In particular, we derive a parametric family of $f$-MIP guarantees that we refer to as $\mu$-Gaussian Membership Inference Privacy ($\mu$-GMIP) by theoretically analyzing likelihood ratio-based membership inference attacks on stochastic gradient descent (SGD). Our analysis highlights that models trained with standard SGD already offer an elementary level of MIP. Additionally, we show how $f$-MIP can be amplified by adding noise to gradient updates. Our analysis further yields an analytical membership inference attack that offers two distinct advantages over previous approaches. First, unlike existing state-of-the-art attacks that require training hundreds of shadow models, our attack does not require any shadow model. Second, our analytical attack enables straightforward auditing of our privacy notion $f$-MIP. Finally, we quantify how various hyperparameters (e.g., batch size, number of model parameters) and specific data characteristics determine an attacker's ability to accurately infer a point's membership in the training set. We demonstrate the effectiveness of our method on models trained on vision and tabular datasets.
The modular subset sum problem consists of deciding, given a modulus $m$, a multiset $S$ of $n$ integers in $0..m-1$, and a target integer $t$, whether there exists a subset of $S$ with elements summing to $t \mod m $, and to report such a set if it exists. We give a simple $O(m \log m)$-time with high probability (w.h.p.) algorithm for the modular subset sum problem. This builds on and improves on a previous $O(m \log^7 m)$ w.h.p. algorithm from Axiotis, Backurs, Jin, Tzamos, and Wu (SODA 19). Our method utilizes the ADT of the dynamic strings structure of Gawrychowski et al. (SODA~18). However, as this structure is rather complicated we present a much simpler alternative which we call the Data Dependent Tree. As an application, we consider the computational version of a fundamental theorem in zero-sum Ramsey theory. The Erd\H{o}s-Ginzburg-Ziv Theorem states that a multiset of $2n - 1$ integers always contains a subset of cardinality exactly $n$ whose values sum to a multiple of $n$. We give an algorithm for finding such a subset in time $O(n \log n)$ w.h.p. which improves on an $O(n^2)$ algorithm due to Del Lungo, Marini, and Mori (Disc. Math. 09).
Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the rows and columns or the full collection of all valid orderings and valid matrices. We show that there is a constant $c > 0$ such that the algorithm terminates in $O(n^2)$ time with high probability and in expectation for random $n \times n$ binary matrices with i.i.d.\ Bernoulli $(p)$ entries $(m_{ij})_{ij=1}^n$ such that $\frac{c\log^2(n)}{n(\log\log(n))^2} \leq p \leq \frac{1}{2}$.
We study the problem of testing whether a symmetric $d \times d$ input matrix $A$ is symmetric positive semidefinite (PSD), or is $\epsilon$-far from the PSD cone, meaning that $\lambda_{\min}(A) \leq - \epsilon \|A\|_p$, where $\|A\|_p$ is the Schatten-$p$ norm of $A$. In applications one often needs to quickly tell if an input matrix is PSD, and a small distance from the PSD cone may be tolerable. We consider two well-studied query models for measuring efficiency, namely, the matrix-vector and vector-matrix-vector query models. We first consider one-sided testers, which are testers that correctly classify any PSD input, but may fail on a non-PSD input with a tiny failure probability. Up to logarithmic factors, in the matrix-vector query model we show a tight $\widetilde{\Theta}(1/\epsilon^{p/(2p+1)})$ bound, while in the vector-matrix-vector query model we show a tight $\widetilde{\Theta}(d^{1-1/p}/\epsilon)$ bound, for every $p \geq 1$. We also show a strong separation between one-sided and two-sided testers in the vector-matrix-vector model, where a two-sided tester can fail on both PSD and non-PSD inputs with a tiny failure probability. In particular, for the important case of the Frobenius norm, we show that any one-sided tester requires $\widetilde{\Omega}(\sqrt{d}/\epsilon)$ queries. However we introduce a bilinear sketch for two-sided testing from which we construct a Frobenius norm tester achieving the optimal $\widetilde{O}(1/\epsilon^2)$ queries. We also give a number of additional separations between adaptive and non-adaptive testers. Our techniques have implications beyond testing, providing new methods to approximate the spectrum of a matrix with Frobenius norm error using dimensionality reduction in a way that preserves the signs of eigenvalues.
For some $\epsilon > 10^{-36}$ we give a randomized $3/2-\epsilon$ approximation algorithm for metric TSP.