We provide a framework to analyze the convergence of discretized kinetic Langevin dynamics for $M$-$\nabla$Lipschitz, $m$-convex potentials. Our approach gives convergence rates of $\mathcal{O}(m/M)$, with explicit stepsize restrictions, which are of the same order as the stability threshold for Gaussian targets and are valid for a large interval of the friction parameter. We apply this methodology to various integration schemes which are popular in the molecular dynamics and machine learning communities. Further, we introduce the property ``$\gamma$-limit convergent" (GLC) to characterize underdamped Langevin schemes that converge to overdamped dynamics in the high-friction limit and which have stepsize restrictions that are independent of the friction parameter; we show that this property is not generic by exhibiting methods from both the class and its complement. Finally, we provide asymptotic bias estimates for the BAOAB scheme, which remain accurate in the high-friction limit by comparison to a modified stochastic dynamics which preserves the invariant measure.
The rise of powerful AI models, more formally $\textit{General-Purpose AI Systems}$ (GPAIS), has led to impressive leaps in performance across a wide range of tasks. At the same time, researchers and practitioners alike have raised a number of privacy concerns, resulting in a wealth of literature covering various privacy risks and vulnerabilities of AI models. Works surveying such risks provide differing focuses, leading to disparate sets of privacy risks with no clear unifying taxonomy. We conduct a systematic review of these survey papers to provide a concise and usable overview of privacy risks in GPAIS, as well as proposed mitigation strategies. The developed privacy framework strives to unify the identified privacy risks and mitigations at a technical level that is accessible to non-experts. This serves as the basis for a practitioner-focused interview study to assess technical stakeholder perceptions of privacy risks and mitigations in GPAIS.
We devise a deterministic algorithm for minimum Steiner cut, which uses $(\log n)^{O(1)}$ maximum flow calls and additional near-linear time. This algorithm improves on Li and Panigrahi's (FOCS 2020) algorithm, which uses $(\log n)^{O(1/\epsilon^4)}$ maximum flow calls and additional $O(m^{1+\epsilon})$ time, for $\epsilon > 0$. Our algorithm thus shows that deterministic minimum Steiner cut can be solved in maximum flow time up to polylogarithmic factors, given any black-box deterministic maximum flow algorithm. Our main technical contribution is a novel deterministic graph decomposition method for terminal vertices that generalizes all existing $s$-strong partitioning methods, which we believe may have future applications.
We prove a lower bound on the communication complexity of computing the $n$-fold xor of an arbitrary function $f$, in terms of the communication complexity and rank of $f$. We prove that $D(f^{\oplus n}) \geq n \cdot \Big(\frac{\Omega(D(f))}{\log \mathsf{rk}(f)} -\log \mathsf{rk}(f)\Big )$, where here $D(f), D(f^{\oplus n})$ represent the deterministic communication complexity, and $\mathsf{rk}(f)$ is the rank of $f$. Our methods involve a new way to use information theory to reason about deterministic communication complexity.
In traditional models of supervised learning, the goal of a learner -- given examples from an arbitrary joint distribution on $\mathbb{R}^d \times \{\pm 1\}$ -- is to output a hypothesis that is competitive (to within $\epsilon$) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes, we introduce a smoothed-analysis framework that requires a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of $k$-halfspaces in time $k^{poly(\frac{\log k}{\epsilon \gamma}) }$ where $\gamma$ is the margin parameter. Before our work, the best-known runtime was exponential in $k$ (Arriaga and Vempala, 1999).
Given a graph $G=(V,E)$ and an integer $k\in \mathbb{N}$, we investigate the 2-Eigenvalue Vertex Deletion (2-EVD) problem. The objective is to remove at most $k$ vertices such that the adjacency matrix of the resulting graph has at most two eigenvalues. It is established that the adjacency matrix of a graph has at most two eigenvalues if and only if the graph is a collection of equal-sized cliques. Thus, the 2-Eigenvalue Vertex Deletion amounts to removing a set of at most $k$ vertices to transform the graph into a collection of equal-sized cliques. The 2-Eigenvalue Edge Editing (2-EEE), 2-Eigenvalue Edge Deletion (2-EED) and 2-Eigenvalue Edge Addition (2-EEA) problems are defined analogously. We present a kernel of size $\mathcal{O}(k^{3})$ for $2$-EVD, along with an FPT algorithm with a running time of $\mathcal{O}^{*}(2^{k})$. For the problem $2$-EEE, we provide a kernel of size $\mathcal{O}(k^{2})$. Additionally, we present linear kernels of size $5k$ and $6k$ for $2$-EEA and $2$-EED respectively. For the $2$-EED, we also construct an algorithm with running time $\mathcal{O}^{*}(1.47^{k})$ . These results address open questions posed by Misra et al. (ISAAC 2023) regarding the complexity of these problems when parameterized by the solution size.
Given a set $P$ of $n$ points and a set $S$ of $m$ disks in the plane, the disk hitting set problem asks for a smallest subset of $P$ such that every disk of $S$ contains at least one point in the subset. The problem is NP-hard. In this paper, we consider a line-constrained version in which all disks have their centers on a line. We present an $O(m\log^2n+(n+m)\log(n+m))$ time algorithm for the problem. This improves the previously best result of $O(m^2\log m+(n+m)\log(n+m))$ time for the weighted case of the problem where every point of $P$ has a weight and the objective is to minimize the total weight of the hitting set. Our algorithm actually solves a more general line-separable problem with a single intersection property: The points of $P$ and the disk centers are separated by a line $\ell$ and the boundary of every two disks intersect at most once on the side of $\ell$ containing $P$.
The difficulty of factoring large integers into primes is the basis for cryptosystems such as RSA. Due to the widespread popularity of RSA, there have been many proposed attacks on the factorization problem such as side-channel attacks where some bits of the prime factors are available. When enough bits of the prime factors are known, two methods that are effective at solving the factorization problem are satisfiability (SAT) solvers and Coppersmith's method. The SAT approach reduces the factorization problem to a Boolean satisfiability problem, while Coppersmith's approach uses lattice basis reduction. Both methods have their advantages, but they also have their limitations: Coppersmith's method does not apply when the known bit positions are randomized, while SAT-based methods can take advantage of known bits in arbitrary locations, but have no knowledge of the algebraic structure exploited by Coppersmith's method. In this paper we describe a new hybrid SAT and computer algebra approach to efficiently solve random leaked-bit factorization problems. Specifically, Coppersmith's method is invoked by a SAT solver to determine whether a partial bit assignment can be extended to a complete assignment. Our hybrid implementation solves random leaked-bit factorization problems significantly faster than either a pure SAT or pure computer algebra approach.
Sorting has a natural generalization where the input consists of: (1) a ground set $X$ of size $n$, (2) a partial oracle $O_P$ specifying some fixed partial order $P$ on $X$ and (3) a linear oracle $O_L$ specifying a linear order $L$ that extends $P$. The goal is to recover the linear order $L$ on $X$ using the fewest number of linear oracle queries. In this problem, we measure algorithmic complexity through three metrics: oracle queries to $O_L$, oracle queries to $O_P$, and the time spent. Any algorithm requires worst-case $\log_2 e(P)$ linear oracle queries to recover the linear order on $X$. Kahn and Saks presented the first algorithm that uses $\Theta(\log e(P))$ linear oracle queries (using $O(n^2)$ partial oracle queries and exponential time). The state-of-the-art for the general problem is by Cardinal, Fiorini, Joret, Jungers and Munro who at STOC'10 manage to separate the linear and partial oracle queries into a preprocessing and query phase. They can preprocess $P$ using $O(n^2)$ partial oracle queries and $O(n^{2.5})$ time. Then, given $O_L$, they uncover the linear order on $X$ in $\Theta(\log e(P))$ linear oracle queries and $O(n + \log e(P))$ time -- which is worst-case optimal in the number of linear oracle queries but not in the time spent. For $c \geq 1$, our algorithm can preprocess $O_P$ using $O(n^{1 + \frac{1}{c}})$ queries and time. Given $O_L$, we uncover $L$ using $\Theta(c \log e(P))$ queries and time. We show a matching lower bound, as there exist positive constants $(\alpha, \beta)$ where for any constant $c \geq 1$, any algorithm that uses at most $\alpha \cdot n^{1 + \frac{1}{c}}$ preprocessing must use worst-case at least $\beta \cdot c \log e(P)$ linear oracle queries. Thus, we solve the problem of sorting under partial information through an algorithm that is asymptotically tight across all three metrics.
We introduce networked communication to the mean-field game framework, in particular to oracle-free settings where $N$ decentralised agents learn along a single, non-episodic run of the empirical system. We prove that our architecture, with only a few reasonable assumptions about network structure, has sample guarantees bounded between those of the centralised- and independent-learning cases. We discuss how the sample guarantees of the three theoretical algorithms do not actually result in practical convergence. We therefore show that in practical settings where the theoretical parameters are not observed (leading to poor estimation of the Q-function), our communication scheme significantly accelerates convergence over the independent case (and often even the centralised case), without relying on the assumption of a centralised learner. We contribute further practical enhancements to all three theoretical algorithms, allowing us to present their first empirical demonstrations. Our experiments confirm that we can remove several of the theoretical assumptions of the algorithms, and display the empirical convergence benefits brought by our new networked communication. We additionally show that the networked approach has significant advantages, over both the centralised and independent alternatives, in terms of robustness to unexpected learning failures and to changes in population size.
We consider the problem of sampling from the posterior distribution of a $d$-dimensional coefficient vector $\boldsymbol{\theta}$, given linear observations $\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\theta}+\boldsymbol{\varepsilon}$. In general, such posteriors are multimodal, and therefore challenging to sample from. This observation has prompted the exploration of various heuristics that aim at approximating the posterior distribution. In this paper, we study a different approach based on decomposing the posterior distribution into a log-concave mixture of simple product measures. This decomposition allows us to reduce sampling from a multimodal distribution of interest to sampling from a log-concave one, which is tractable and has been investigated in detail. We prove that, under mild conditions on the prior, for random designs, such measure decomposition is generally feasible when the number of samples per parameter $n/d$ exceeds a constant threshold. We thus obtain a provably efficient (polynomial time) sampling algorithm in a regime where this was previously not known. Numerical simulations confirm that the algorithm is practical, and reveal that it has attractive statistical properties compared to state-of-the-art methods.