Given an $n$-vertex $m$-edge digraph $G = (V,E)$ and a set $S \subseteq V$, $|S| = n^{\sigma}$ (for some $0 < \sigma \le 1$) of designated sources, the $S \times V$-direcahability problem is to compute for every $s \in S$, the set of all the vertices reachable from $s$ in $G$. Known naive algorithms for this problem either run a BFS/DFS separately from every source, and as a result require $O(m \cdot n^{\sigma})$ time, or compute the transitive closure of $G$ in $\tilde O(n^{\omega})$ time, where $\omega < 2.371552\ldots$ is the matrix multiplication exponent. Hence, the current state-of-the-art bound for the problem on graphs with $m = \Theta(n^{\mu})$ edges in $\tilde O(n^{\min \{\mu + \sigma, \omega \}})$. Our first contribution is an algorithm with running time $\tilde O(n^{1 + \tiny{\frac{2}{3}} \omega(\sigma)})$ for this problem, where $\omega(\sigma)$ is the rectangular matrix multiplication exponent. Using current state-of-the-art estimates on $\omega(\sigma)$, our exponent is better than $\min \{2 + \sigma, \omega \}$ for $\tilde \sigma \le \sigma \le 0.53$, where $1/3 < \tilde \sigma < 0.3336$ is a universal constant. Our second contribution is a sequence of algorithms $\mathcal A_0, \mathcal A_1, \mathcal A_2, \ldots$ for the $S \times V$-direachability problem. We argue that under a certain assumption that we introduce, for every $\tilde \sigma \le \sigma < 1$, there exists a sufficiently large index $k = k(\sigma)$ so that $\mathcal A_k$ improves upon the current state-of-the-art bounds for $S \times V$-direachability with $|S| = n^{\sigma}$, in the densest regime $\mu =2$. We show that to prove this assumption, it is sufficient to devise an algorithm that computes a rectangular max-min matrix product roughly as efficiently as ordinary $(+, \cdot)$ matrix product. Our algorithms heavily exploit recent constructions of directed shortcuts by Kogan and Parter.
On the reference tetrahedron $K$, we construct, for each $k \in \mathbb{N}_0$, a right inverse for the trace operator $u \mapsto (u, \partial_{n} u, \ldots, \partial_{n}^k u)|_{\partial K}$. The operator is stable as a mapping from the trace space of $W^{s, p}(K)$ to $W^{s, p}(K)$ for all $p \in (1, \infty)$ and $s \in (k+1/p, \infty)$. Moreover, if the data is the trace of a polynomial of degree $N \in \mathbb{N}_0$, then the resulting lifting is a polynomial of degree $N$. One consequence of the analysis is a novel characterization for the range of the trace operator.
We study contextual bandits with low-rank structure where, in each round, if the (context, arm) pair $(i,j)\in [m]\times [n]$ is selected, the learner observes a noisy sample of the $(i,j)$-th entry of an unknown low-rank reward matrix. Successive contexts are generated randomly in an i.i.d. manner and are revealed to the learner. For such bandits, we present efficient algorithms for policy evaluation, best policy identification and regret minimization. For policy evaluation and best policy identification, we show that our algorithms are nearly minimax optimal. For instance, the number of samples required to return an $\varepsilon$-optimal policy with probability at least $1-\delta$ typically scales as ${m+n\over \varepsilon^2}\log(1/\delta)$. Our regret minimization algorithm enjoys minimax guarantees scaling as $r^{7/4}(m+n)^{3/4}\sqrt{T}$, which improves over existing algorithms. All the proposed algorithms consist of two phases: they first leverage spectral methods to estimate the left and right singular subspaces of the low-rank reward matrix. We show that these estimates enjoy tight error guarantees in the two-to-infinity norm. This in turn allows us to reformulate our problems as a misspecified linear bandit problem with dimension roughly $r(m+n)$ and misspecification controlled by the subspace recovery error, as well as to design the second phase of our algorithms efficiently.
We consider the multivariate max-linear regression problem where the model parameters $\boldsymbol{\beta}_{1},\dotsc,\boldsymbol{\beta}_{k}\in\mathbb{R}^{p}$ need to be estimated from $n$ independent samples of the (noisy) observations $y = \max_{1\leq j \leq k} \boldsymbol{\beta}_{j}^{\mathsf{T}} \boldsymbol{x} + \mathrm{noise}$. The max-linear model vastly generalizes the conventional linear model, and it can approximate any convex function to an arbitrary accuracy when the number of linear models $k$ is large enough. However, the inherent nonlinearity of the max-linear model renders the estimation of the regression parameters computationally challenging. Particularly, no estimator based on convex programming is known in the literature. We formulate and analyze a scalable convex program given by anchored regression (AR) as the estimator for the max-linear regression problem. Under the standard Gaussian observation setting, we present a non-asymptotic performance guarantee showing that the convex program recovers the parameters with high probability. When the $k$ linear components are equally likely to achieve the maximum, our result shows a sufficient number of noise-free observations for exact recovery scales as {$k^{4}p$} up to a logarithmic factor. { This sample complexity coincides with that by alternating minimization (Ghosh et al., {2021}). Moreover, the same sample complexity applies when the observations are corrupted with arbitrary deterministic noise. We provide empirical results that show that our method performs as our theoretical result predicts, and is competitive with the alternating minimization algorithm particularly in presence of multiplicative Bernoulli noise. Furthermore, we also show empirically that a recursive application of AR can significantly improve the estimation accuracy.}
We present a parallel algorithm for the $(1-\epsilon)$-approximate maximum flow problem in capacitated, undirected graphs with $n$ vertices and $m$ edges, achieving $O(\epsilon^{-3}\text{polylog} n)$ depth and $O(m \epsilon^{-3} \text{polylog} n)$ work in the PRAM model. Although near-linear time sequential algorithms for this problem have been known for almost a decade, no parallel algorithms that simultaneously achieved polylogarithmic depth and near-linear work were known. At the heart of our result is a polylogarithmic depth, near-linear work recursive algorithm for computing congestion approximators. Our algorithm involves a recursive step to obtain a low-quality congestion approximator followed by a "boosting" step to improve its quality which prevents a multiplicative blow-up in error. Similar to Peng [SODA'16], our boosting step builds upon the hierarchical decomposition scheme of R\"acke, Shah, and T\"aubig [SODA'14]. A direct implementation of this approach, however, leads only to an algorithm with $n^{o(1)}$ depth and $m^{1+o(1)}$ work. To get around this, we introduce a new hierarchical decomposition scheme, in which we only need to solve maximum flows on subgraphs obtained by contracting vertices, as opposed to vertex-induced subgraphs used in R\"acke, Shah, and T\"aubig [SODA'14]. In particular, we are able to directly extract congestion approximators for the subgraphs from a congestion approximator for the entire graph, thereby avoiding additional recursion on those subgraphs. Along the way, we also develop a parallel flow-decomposition algorithm that is crucial to achieving polylogarithmic depth and may be of independent interest.
An unbiased $m$-sparsification of a vector $p\in \mathbb{R}^n$ is a random vector $Q\in \mathbb{R}^n$ with mean $p$ that has at most $m<n$ nonzero coordinates. Unbiased sparsification compresses the original vector without introducing bias; it arises in various contexts, such as in federated learning and sampling sparse probability distributions. Ideally, unbiased sparsification should also minimize the expected value of a divergence function $\mathsf{Div}(Q,p)$ that measures how far away $Q$ is from the original $p$. If $Q$ is optimal in this sense, then we call it efficient. Our main results describe efficient unbiased sparsifications for divergences that are either permutation-invariant or additively separable. Surprisingly, the characterization for permutation-invariant divergences is robust to the choice of divergence function, in the sense that our class of optimal $Q$ for squared Euclidean distance coincides with our class of optimal $Q$ for Kullback-Leibler divergence, or indeed any of a wide variety of divergences.
We present Clifford-Steerable Convolutional Neural Networks (CS-CNNs), a novel class of $\mathrm{E}(p, q)$-equivariant CNNs. CS-CNNs process multivector fields on pseudo-Euclidean spaces $\mathbb{R}^{p,q}$. They cover, for instance, $\mathrm{E}(3)$-equivariance on $\mathbb{R}^3$ and Poincar\'e-equivariance on Minkowski spacetime $\mathbb{R}^{1,3}$. Our approach is based on an implicit parametrization of $\mathrm{O}(p,q)$-steerable kernels via Clifford group equivariant neural networks. We significantly and consistently outperform baseline methods on fluid dynamics as well as relativistic electrodynamics forecasting tasks.
Sparse linear regression (SLR) is a well-studied problem in statistics where one is given a design matrix $X\in\mathbb{R}^{m\times n}$ and a response vector $y=X\theta^*+w$ for a $k$-sparse vector $\theta^*$ (that is, $\|\theta^*\|_0\leq k$) and small, arbitrary noise $w$, and the goal is to find a $k$-sparse $\widehat{\theta} \in \mathbb{R}^n$ that minimizes the mean squared prediction error $\frac{1}{m}\|X\widehat{\theta}-X\theta^*\|^2_2$. While $\ell_1$-relaxation methods such as basis pursuit, Lasso, and the Dantzig selector solve SLR when the design matrix is well-conditioned, no general algorithm is known, nor is there any formal evidence of hardness in an average-case setting with respect to all efficient algorithms. We give evidence of average-case hardness of SLR w.r.t. all efficient algorithms assuming the worst-case hardness of lattice problems. Specifically, we give an instance-by-instance reduction from a variant of the bounded distance decoding (BDD) problem on lattices to SLR, where the condition number of the lattice basis that defines the BDD instance is directly related to the restricted eigenvalue condition of the design matrix, which characterizes some of the classical statistical-computational gaps for sparse linear regression. Also, by appealing to worst-case to average-case reductions from the world of lattices, this shows hardness for a distribution of SLR instances; while the design matrices are ill-conditioned, the resulting SLR instances are in the identifiable regime. Furthermore, for well-conditioned (essentially) isotropic Gaussian design matrices, where Lasso is known to behave well in the identifiable regime, we show hardness of outputting any good solution in the unidentifiable regime where there are many solutions, assuming the worst-case hardness of standard and well-studied lattice problems.
We consider the problem of finding ``dissimilar'' $k$ shortest paths from $s$ to $t$ in an edge-weighted directed graph $D$, where the dissimilarity is measured by the minimum pairwise Hamming distances between these paths. More formally, given an edge-weighted directed graph $D = (V, A)$, two specified vertices $s, t \in V$, and integers $d, k$, the goal of Dissimilar Shortest Paths is to decide whether $D$ has $k$ shortest paths $P_1, \dots, P_k$ from $s$ to $t$ such that $|A(P_i) \mathbin{\triangle} A(P_j)| \ge d$ for distinct $P_i$ and $P_j$. We design a deterministic algorithm to solve Dissimilar Shortest Paths with running time $2^{O(3^kdk^2)}n^{O(1)}$, that is, Dissimilar Shortest Paths is fixed-parameter tractable parameterized by $k + d$. To complement this positive result, we show that Dissimilar Shortest Paths is W[1]-hard when parameterized by only $k$ and paraNP-hard parameterized by $d$.
We show that there exist infinitely many $n \in \mathbb{Z}^+$ such that for any constant $\epsilon > 0$, any deterministic algorithm to solve $k$-\textsf{SAT} for $k \geq 3$ must perform at least $(2^{k-\frac{3}{2}-\epsilon})^{\frac{n}{k+1}}$ operations, where $n$ is the number of variables in the $k$\textsf{-SAT} instance.
In the Maximum Independent Set of Hyperrectangles problem, we are given a set of $n$ (possibly overlapping) $d$-dimensional axis-aligned hyperrectangles, and the goal is to find a subset of non-overlapping hyperrectangles of maximum cardinality. For $d=1$, this corresponds to the classical Interval Scheduling problem, where a simple greedy algorithm returns an optimal solution. In the offline setting, for $d$-dimensional hyperrectangles, polynomial time $(\log n)^{O(d)}$-approximation algorithms are known. However, the problem becomes notably challenging in the online setting, where the input objects (hyperrectangles) appear one by one in an adversarial order, and on the arrival of an object, the algorithm needs to make an immediate and irrevocable decision whether or not to select the object while maintaining the feasibility. Even for interval scheduling, an $\Omega(n)$ lower bound is known on the competitive ratio. To circumvent these negative results, in this work, we study the online maximum independent set of axis-aligned hyperrectangles in the random-order arrival model, where the adversary specifies the set of input objects which then arrive in a uniformly random order. Starting from the prototypical secretary problem, the random-order model has received significant attention to study algorithms beyond the worst-case competitive analysis. Surprisingly, we show that the problem in the random-order model almost matches the best-known offline approximation guarantees, up to polylogarithmic factors. In particular, we give a simple $(\log n)^{O(d)}$-competitive algorithm for $d$-dimensional hyperrectangles in this model, which runs in $\tilde{O_d}(n)$ time. Our approach also yields $(\log n)^{O(d)}$-competitive algorithms in the random-order model for more general objects such as $d$-dimensional fat objects and ellipsoids.