This work introduces E3x, a software package for building neural networks that are equivariant with respect to the Euclidean group $\mathrm{E}(3)$, consisting of translations, rotations, and reflections of three-dimensional space. Compared to ordinary neural networks, $\mathrm{E}(3)$-equivariant models promise benefits whenever input and/or output data are quantities associated with three-dimensional objects. This is because the numeric values of such quantities (e.g. positions) typically depend on the chosen coordinate system. Under transformations of the reference frame, the values change predictably, but the underlying rules can be difficult to learn for ordinary machine learning models. With built-in $\mathrm{E}(3)$-equivariance, neural networks are guaranteed to satisfy the relevant transformation rules exactly, resulting in superior data efficiency and accuracy. The code for E3x is available from //github.com/google-research/e3x, detailed documentation and usage examples can be found on //e3x.readthedocs.io.
We present a new distance oracle in the fully dynamic setting: given a weighted undirected graph $G=(V,E)$ with $n$ vertices undergoing both edge insertions and deletions, and an arbitrary parameter $\epsilon$ where $1/\log^{c} n<\epsilon<1$ and $c>0$ is a small constant, we can deterministically maintain a data structure with $n^{\epsilon}$ worst-case update time that, given any pair of vertices $(u,v)$, returns a $2^{{\rm poly}(1/\epsilon)}$-approximate distance between $u$ and $v$ in ${\rm poly}(1/\epsilon)\log\log n$ query time. Our algorithm significantly advances the state-of-the-art in two aspects, both for fully dynamic algorithms and even decremental algorithms. First, no existing algorithm with worst-case update time guarantees a $o(n)$-approximation while also achieving an $n^{2-\Omega(1)}$ update and $n^{o(1)}$ query time, while our algorithm offers a constant $O_{\epsilon}(1)$-approximation with $n^{\epsilon}$ update time and $O_{\epsilon}(\log \log n)$ query time. Second, even if amortized update time is allowed, it is the first deterministic constant-approximation algorithm with $n^{1-\Omega(1)}$ update and query time. The best result in this direction is the recent deterministic distance oracle by Chuzhoy and Zhang [STOC 2023] which achieves an approximation of $(\log\log n)^{2^{O(1/\epsilon^{3})}}$ with amortized update time of $n^{\epsilon}$ and query time of $2^{{\rm poly}(1/\epsilon)}\log n\log\log n$. We obtain the result by dynamizing tools related to length-constrained expanders [Haeupler-R\"acke-Ghaffari, STOC 2022; Haeupler-Hershkowitz-Tan, 2023; Haeupler-Huebotter-Ghaffari, 2022]. Our technique completely bypasses the 40-year-old Even-Shiloach tree, which has remained the most pervasive tool in the area but is inherently amortized.
We study the approximability of the MaxCut problem in the presence of predictions. Specifically, we consider two models: in the noisy predictions model, for each vertex we are given its correct label in $\{-1,+1\}$ with some unknown probability $1/2 + \epsilon$, and the other (incorrect) label otherwise. In the more-informative partial predictions model, for each vertex we are given its correct label with probability $\epsilon$ and no label otherwise. We assume only pairwise independence between vertices in both models. We show how these predictions can be used to improve on the worst-case approximation ratios for this problem. Specifically, we give an algorithm that achieves an $\alpha + \widetilde{\Omega}(\epsilon^4)$-approximation for the noisy predictions model, where $\alpha \approx 0.878$ is the MaxCut threshold. While this result also holds for the partial predictions model, we can also give a $\beta + \Omega(\epsilon)$-approximation, where $\beta \approx 0.858$ is the approximation ratio for MaxBisection given by Raghavendra and Tan. This answers a question posed by Ola Svensson in his plenary session talk at SODA'23.
We consider the reinforcement learning problem for the constrained Markov decision process (CMDP), which plays a central role in satisfying safety or resource constraints in sequential learning and decision-making. In this problem, we are given finite resources and a MDP with unknown transition probabilities. At each stage, we take an action, collecting a reward and consuming some resources, all assumed to be unknown and need to be learned over time. In this work, we take the first step towards deriving optimal problem-dependent guarantees for the CMDP problems. We derive a logarithmic regret bound, which translates into a $O(\frac{\kappa}{\epsilon}\cdot\log^2(1/\epsilon))$ sample complexity bound, with $\kappa$ being a problem-dependent parameter, yet independent of $\epsilon$. Our sample complexity bound improves upon the state-of-art $O(1/\epsilon^2)$ sample complexity for CMDP problems established in the previous literature, in terms of the dependency on $\epsilon$. To achieve this advance, we develop a new framework for analyzing CMDP problems. To be specific, our algorithm operates in the primal space and we resolve the primal LP for the CMDP problem at each period in an online manner, with \textit{adaptive} remaining resource capacities. The key elements of our algorithm are: i). an eliminating procedure that characterizes one optimal basis of the primal LP, and; ii) a resolving procedure that is adaptive to the remaining resources and sticks to the characterized optimal basis.
Building on recent constructions of Quantum Cross Subspace Alignment (QCSA) codes, this work develops a coding scheme for QEBXSTPIR, i.e., classical private information retrieval with $X$-secure storage and $T$-private queries, over a quantum multiple access channel, that is resilient to any set of up to $E$ erased servers (equivalently known as unresponsive servers, or stragglers) together with any set of up to $B$ Byzantine servers. The scheme is accordingly labeled QEBCSA, with the `E' and `B' indicating resilience to erased and Byzantine servers respectively. The QEBCSA code structure may be broadly useful for problems such as quantum coded secure distributed computation, where security, straggler resilience, and distributed superdense coding gains are simultaneously required. The $X$-security property is further exploited to improve the communication rate when $\epsilon$-error decoding is allowed.
Geometric matching is an important topic in computational geometry and has been extensively studied over decades. In this paper, we study a geometric-matching problem, known as geometric many-to-many matching. In this problem, the input is a set $S$ of $n$ colored points in $\mathbb{R}^d$, which implicitly defines a graph $G = (S,E(S))$ where $E(S) = \{(p,q): p,q \in S \text{ have different colors}\}$, and the goal is to compute a minimum-cost subset $E^* \subseteq E(S)$ of edges that cover all points in $S$. Here the cost of $E^*$ is the sum of the costs of all edges in $E^*$, where the cost of a single edge $e$ is the Euclidean distance (or more generally, the $L_p$-distance) between the two endpoints of $e$. Our main result is a $(1+\varepsilon)$-approximation algorithm with an optimal running time $O_\varepsilon(n \log n)$ for geometric many-to-many matching in any fixed dimension, which works under any $L_p$-norm. This is the first near-linear approximation scheme for the problem in any $d \geq 2$. Prior to this work, only the bipartite case of geometric many-to-many matching was considered in $\mathbb{R}^1$ and $\mathbb{R}^2$, and the best known approximation scheme in $\mathbb{R}^2$ takes $O_\varepsilon(n^{1.5} \cdot \mathsf{poly}(\log n))$ time.
On current computer architectures, GMRES' performance can be limited by its communication cost to generate orthonormal basis vectors of the Krylov subspace. To address this performance bottleneck, its $s$-step variant orthogonalizes a block of $s$ basis vectors at a time, potentially reducing the communication cost by a factor of $s$. Unfortunately, for a large step size $s$, the solver can generate extremely ill-conditioned basis vectors, and to maintain stability in practice, a conservatively small step size is used, which limits the performance of the $s$-step solver. To enhance the performance using a small step size, in this paper, we introduce a two-stage block orthogonalization scheme. Similar to the original scheme, the first stage of the proposed method operates on a block of $s$ basis vectors at a time, but its objective is to maintain the well-conditioning of the generated basis vectors with a lower cost. The orthogonalization of the basis vectors is delayed until the second stage when enough basis vectors are generated to obtain higher performance. Our analysis shows the stability of the proposed two-stage scheme. The performance is improved because while the same amount of computation as the original scheme is required, most of the communication is done at the second stage of the proposed scheme, reducing the overall communication requirements. Our performance results with up to 192 NVIDIA V100 GPUs on the Summit supercomputer demonstrate that when solving a 2D Laplace problem, the two-stage approach can reduce the orthogonalization time and the total time-to-solution by the respective factors of up to $2.6\times$ and $1.6\times$ over the original $s$-step GMRES, which had already obtained the respective speedups of $2.1\times$ and $1.8\times$ over the standard GMRES. Similar speedups were obtained for 3D problems and for matrices from the SuiteSparse Matrix Collection.
We present an ongoing implementation of a KE-tableau based reasoner for a decidable fragment of stratified elementary set theory expressing the description logic $\mathcal{DL}\langle \mathsf{4LQS^{R,\!\times}}\rangle(\mathbf{D})$ (shortly $\mathcal{DL}_{\mathbf{D}}^{4,\!\times}$). The reasoner checks the consistency of $\mathcal{DL}_{\mathbf{D}}^{4,\!\times}$-knowledge bases (KBs) represented in set-theoretic terms. It is implemented in \textsf{C++} and supports $\mathcal{DL}_{\mathbf{D}}^{4,\!\times}$-KBs serialized in the OWL/XML format. To the best of our knowledge, this is the first attempt to implement a reasoner for the consistency checking of a description logic represented via a fragment of set theory that can also classify standard OWL ontologies.
We present a KE-tableau-based implementation of a reasoner for a decidable fragment of (stratified) set theory expressing the description logic $\mathcal{DL}\langle \mathsf{4LQS^{R,\!\times}}\rangle(\mathbf{D})$ ($\mathcal{DL}_{\mathbf{D}}^{4,\!\times}$, for short). Our application solves the main TBox and ABox reasoning problems for $\mathcal{DL}_{\mathbf{D}}^{4,\!\times}$. In particular, it solves the consistency problem for $\mathcal{DL}_{\mathbf{D}}^{4,\!\times}$-knowledge bases represented in set-theoretic terms, and a generalization of the \emph{Conjunctive Query Answering} problem in which conjunctive queries with variables of three sorts are admitted. The reasoner, which extends and optimizes a previous prototype for the consistency checking of $\mathcal{DL}_{\mathbf{D}}^{4,\!\times}$-knowledge bases (see \cite{cilc17}), is implemented in \textsf{C++}. It supports $\mathcal{DL}_{\mathbf{D}}^{4,\!\times}$-knowledge bases serialized in the OWL/XML format, and it admits also rules expressed in SWRL (Semantic Web Rule Language).
Click-through rate (CTR) prediction plays a critical role in recommender systems and online advertising. The data used in these applications are multi-field categorical data, where each feature belongs to one field. Field information is proved to be important and there are several works considering fields in their models. In this paper, we proposed a novel approach to model the field information effectively and efficiently. The proposed approach is a direct improvement of FwFM, and is named as Field-matrixed Factorization Machines (FmFM, or $FM^2$). We also proposed a new explanation of FM and FwFM within the FmFM framework, and compared it with the FFM. Besides pruning the cross terms, our model supports field-specific variable dimensions of embedding vectors, which acts as soft pruning. We also proposed an efficient way to minimize the dimension while keeping the model performance. The FmFM model can also be optimized further by caching the intermediate vectors, and it only takes thousands of floating-point operations (FLOPs) to make a prediction. Our experiment results show that it can out-perform the FFM, which is more complex. The FmFM model's performance is also comparable to DNN models which require much more FLOPs in runtime.
Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets. They need to compute node representations recursively from their neighbors. Current GCN training algorithms suffer from either high computational costs that grow exponentially with the number of layers, or high memory usage for loading the entire graph and node embeddings. In this paper, we propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training, hence greatly reducing time and memory complexities. We present theoretical analysis for L-GCN under the graph isomorphism framework, that L-GCN leads to as powerful GCNs as the more costly conventional training algorithm does, under mild conditions. We further propose L^2-GCN, which learns a controller for each layer that can automatically adjust the training epochs per layer in L-GCN. Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size, while maintaining comparable prediction performance. With the learned controller, L^2-GCN can further cut the training time in half. Our codes are available at //github.com/Shen-Lab/L2-GCN.