We introduce Clifford Group Equivariant Neural Networks: a novel approach for constructing $\mathrm{O}(n)$- and $\mathrm{E}(n)$-equivariant models. We identify and study the $\textit{Clifford group}$, a subgroup inside the Clifford algebra whose definition we adjust to achieve several favorable properties. Primarily, the group's action forms an orthogonal automorphism that extends beyond the typical vector space to the entire Clifford algebra while respecting the multivector grading. This leads to several non-equivalent subrepresentations corresponding to the multivector decomposition. Furthermore, we prove that the action respects not just the vector space structure of the Clifford algebra but also its multiplicative structure, i.e., the geometric product. These findings imply that every polynomial in multivectors, An advantage worth mentioning is that we obtain expressive layers that can elegantly generalize to inner-product spaces of any dimension. We demonstrate, notably from a single core implementation, state-of-the-art performance on several distinct tasks, including a three-dimensional $n$-body experiment, a four-dimensional Lorentz-equivariant high-energy physics experiment, and a five-dimensional convex hull experiment.
We present new large-scale algorithms for fitting a subgradient regularized multivariate convex regression function to $n$ samples in $d$ dimensions -- a key problem in shape constrained nonparametric regression with applications in statistics, engineering and the applied sciences. The infinite-dimensional learning task can be expressed via a convex quadratic program (QP) with $O(nd)$ decision variables and $O(n^2)$ constraints. While instances with $n$ in the lower thousands can be addressed with current algorithms within reasonable runtimes, solving larger problems (e.g., $n\approx 10^4$ or $10^5$) is computationally challenging. To this end, we present an active set type algorithm on the dual QP. For computational scalability, we allow for approximate optimization of the reduced sub-problems; and propose randomized augmentation rules for expanding the active set. We derive novel computational guarantees for our algorithms. We demonstrate that our framework can approximately solve instances of the subgradient regularized convex regression problem with $n=10^5$ and $d=10$ within minutes; and shows strong computational performance compared to earlier approaches.
In the literature on algorithms for performing the multi-term addition $s_n=\sum_{i=1}^n x_i$ using floating-point arithmetic it is often shown that a hardware unit that has single normalization and rounding improves precision, area, latency, and power consumption, compared with the use of standard add or fused multiply-add units. However, non-monotonicity can appear when computing sums with a subclass of multi-term addition units, which currently is not explored in the literature. We demonstrate that common techniques for performing multi-term addition with $n\geq 4$, without normalization of intermediate quantities, can result in non-monotonicity -- increasing one of the addends $x_i$ decreases the sum $s_n$. Summation is required in dot product and matrix multiplication operations, operations that have increasingly started appearing in the hardware of supercomputers, thus knowing where monotonicity is preserved can be of interest to the users of these machines. Our results suggest that non-monotonicity of summation, in some of the commercial hardware devices that implement a specific class of multi-term adders, is a feature that may have appeared unintentionally as a consequence of design choices that reduce circuit area and other metrics. To demonstrate our findings, we use formal proofs as well as a numerical simulation of non-monotonic multi-term adders in MATLAB.
Let $P$ be a set of at most $n$ points and let $R$ be a set of at most $n$ geometric ranges, such as for example disks or rectangles, where each $p \in P$ has an associated supply $s_{p} > 0$, and each $r \in R$ has an associated demand $d_{r} > 0$. A (many-to-many) matching is a set $\mathcal{A}$ of ordered triples $(p,r,a_{pr}) \in P \times R \times \mathbb{R}_{>0}$ such that $p \in r$ and the $a_{pr}$'s satisfy the constraints given by the supplies and demands. We show how to compute a maximum matching, that is, a matching maximizing $\sum_{(p,r,a_{pr}) \in \mathcal{A}} a_{pr}$. Using our techniques, we can also solve minimum bottleneck problems, such as computing a perfect matching between a set of $n$ red points $P$ and a set of $n$ blue points $Q$ that minimizes the length of the longest edge. For the $L_\infty$-metric, we can do this in time $O(n^{1+\varepsilon})$ in any fixed dimension, for the $L_2$-metric in the plane in time $O(n^{4/3 + \varepsilon})$, for any $\varepsilon > 0$.
Quantum Markov chains generalize classical Markov chains for random variables to the quantum realm and exhibit unique inherent properties, making them an important feature in quantum information theory. In this work, we propose the concept of virtual quantum Markov chains (VQMCs), focusing on scenarios where subsystems retain classical information about global systems from measurement statistics. As a generalization of quantum Markov chains, VQMCs characterize states where arbitrary global shadow information can be recovered from subsystems through local quantum operations and measurements. We present an algebraic characterization for virtual quantum Markov chains and show that the virtual quantum recovery is fully determined by the block matrices of a quantum state on its subsystems. Notably, we find a distinction between two classes of tripartite entanglement by showing that the W state is a VQMC while the GHZ state is not. Furthermore, we establish semidefinite programs to determine the optimal sampling overhead and the robustness of virtual quantum Markov chains. We demonstrate the optimal sampling overhead is additive, indicating no free lunch to further reduce the sampling cost of recovery from parallel calls of the VQMC states. Our findings elucidate distinctions between quantum Markov chains and virtual quantum Markov chains, extending our understanding of quantum recovery to scenarios prioritizing classical information from measurement statistics.
In the Euclidean Bottleneck Steiner Tree problem, the input consists of a set of $n$ points in $\mathbb{R}^2$ called terminals and a parameter $k$, and the goal is to compute a Steiner tree that spans all the terminals and contains at most $k$ points of $\mathbb{R}^2$ as Steiner points such that the maximum edge-length of the Steiner tree is minimized, where the length of a tree edge is the Euclidean distance between its two endpoints. The problem is well-studied and is known to be NP-hard. In this paper, we give a $k^{O(k)} n^{O(1)}$-time algorithm for Euclidean Bottleneck Steiner Tree, which implies that the problem is fixed-parameter tractable (FPT). This settles an open question explicitly asked by Bae et al. [Algorithmica, 2011], who showed that the $\ell_1$ and $\ell_{\infty}$ variants of the problem are FPT. Our approach can be generalized to the problem with $\ell_p$ metric for any rational $1 \le p \le \infty$, or even other metrics on $\mathbb{R}^2$.
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions without needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion tokens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.
Markov chain Monte Carlo (MCMC) methods are simulated by local exploration of complex statistical distributions, and while bypassing the cumbersome requirement of a specific analytical expression for the target, this stochastic exploration of an uncertain parameter space comes at the expense of a large number of samples, and this computational complexity increases with parameter dimensionality. Although at the exploration level, some methods are proposed to accelerate the convergence of the algorithm, such as tempering, Hamiltonian Monte Carlo, Rao-redwellization, and scalable methods for better performance, it cannot avoid the stochastic nature of this exploration. We consider the target distribution as a mapping where the infinite-dimensional Eulerian space of the parameters consists of a number of deterministic submanifolds and propose a generalized energy metric, termed weighted Riesz energy, where a number of points is generated through pairwise interactions, to discretize rectifiable submanifolds. We study the properties of the point, called Riesz particle, and embed it into sequential MCMC, and we find that there will be higher acceptance rates with fewer evaluations, we validate it through experimental comparative analysis from a linear Gaussian state-space model with synthetic data and a non-linear stochastic volatility model with real-world data.
In the matroid partitioning problem, we are given $k$ matroids $\mathcal{M}_1 = (V, \mathcal{I}_1), \dots , \mathcal{M}_k = (V, \mathcal{I}_k)$ defined over a common ground set $V$ of $n$ elements, and we need to find a partitionable set $S \subseteq V$ of largest possible cardinality, denoted by $p$. Here, a set $S \subseteq V$ is called partitionable if there exists a partition $(S_1, \dots , S_k)$ of $S$ with $S_i \in \mathcal{I}_i$ for $i = 1, \ldots, k$. In 1986, Cunningham [SICOMP 1986] presented a matroid partition algorithm that uses $O(n p^{3/2} + k n)$ independence oracle queries, which was the previously known best algorithm. This query complexity is $O(n^{5/2})$ when $k \leq n$. Our main result is to present a matroid partition algorithm that uses $\tilde{O}(k'^{1/3} n p + k n)$ independence oracle queries, where $k' = \min\{k, p\}$. This query complexity is $\tilde{O}(n^{7/3})$ when $k \leq n$, and this improves upon the one of previous Cunningham's algorithm. To obtain this, we present a new approach \emph{edge recycling augmentation}, which can be attained through new ideas: an efficient utilization of the binary search technique by Nguyen [2019] and Chakrabarty-Lee-Sidford-Singla-Wong [FOCS 2019] and a careful analysis of the independence oracle query complexity. Our analysis differs significantly from the one for matroid intersection algorithms, because of the parameter $k$. We also present a matroid partition algorithm that uses $\tilde{O}((n + k) \sqrt{p})$ rank oracle queries.
We consider the optimization problem of cardinality constrained maximization of a monotone submodular set function $f:2^U\to\mathbb{R}_{\geq 0}$ (SM) with noisy evaluations of $f$. In particular, it is assumed that we do not have value oracle access to $f$, but instead for any $X\subseteq U$ and $u\in U$ we can take samples from a noisy distribution with expected value $f(X\cup\{u\})-f(X)$. Our goal is to develop algorithms in this setting that take as few samples as possible, and return a solution with an approximation guarantee relative to the optimal with high probability. We propose the algorithm Confident Threshold Greedy (CTG), which is based on the threshold greedy algorithm of Badanidiyuru and Vondrak [1] and samples adaptively in order to produce an approximate solution with high probability. We prove that CTG achieves an approximation ratio arbitrarily close to $1-1/e$, depending on input parameters. We provide an experimental evaluation on real instances of SM and demonstrate the sample efficiency of CTG.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.