Transformers with linear attention (i.e., linear transformers) and state-space models have recently been suggested as a viable linear-time alternative to transformers with softmax attention. However, these models still underperform transformers especially on tasks that require in-context retrieval. While more expressive variants of linear transformers which replace the additive update in linear transformers with the delta rule (DeltaNet) have been found to be more effective at associative recall, existing algorithms for training such models do not parallelize over sequence length and are thus inefficient to train on modern hardware. This work describes a hardware-efficient algorithm for training linear transformers with the delta rule, which exploits a memory-efficient representation for computing products of Householder matrices. This algorithm allows us to scale up DeltaNet to standard language modeling settings. We train a 1.3B model for 100B tokens and find that it outperforms recent linear-time baselines such as Mamba and GLA in terms of perplexity and zero-shot performance on downstream tasks. We also experiment with two hybrid models which combine DeltaNet layers with (1) sliding-window attention layers every other layer or (2) two global attention layers, and find that these hybrids outperform strong transformer baselines.
The multiple scattering theory (MST) is a Green's function method that has been widely used in electronic structure calculations for crystalline disordered systems. The key property of the MST method is the scattering path matrix (SPM) that characterizes the Green's function within a local solution representation. This paper studies various approximations of the SPM, under the condition that an appropriate reference is used for perturbation. In particular, we justify the convergence of the SPM approximations with respect to the size of scattering region and the length of scattering path, which are the central numerical parameters to achieve a linear-scaling MST method. We present numerical experiments on several typical systems to support the theory.
This paper presents a distributed inverse dynamics controller (DIDC) for quadruped robots that addresses the limitations of existing reactive controllers: simplified dynamical models, the inability to handle exact friction cone constraints, and the high computational requirements of whole-body controllers. Current methods either ignore friction constraints entirely or use linear approximations, leading to potential slip and instability, while comprehensive whole-body controllers demand significant computational resources. Our approach uses full rigid-body dynamics and enforces exact friction cone constraints through a novel geometric optimization-based solver. DIDC combines the required generalized forces corresponding to the actuated and unactuated spaces by projecting them onto the actuated space while satisfying the physical constraints and maintaining orthogonality between the base and joint tracking objectives. Experimental validation shows that our approach reduces foot slippage, improves orientation tracking, and converges at least two times faster than existing reactive controllers with generic QP-based implementations. The controller enables stable omnidirectional trotting at various speeds and consumes less power than comparable methods while running efficiently on embedded processors.
Kernel methods map data into high-dimensional spaces, enabling linear algorithms to learn nonlinear functions without explicitly storing the feature vectors. Quantum kernel methods promise efficient learning by encoding feature maps into exponentially large Hilbert spaces inherent in quantum systems. In this work we implement quantum kernels on a 10-qubit star-topology register in a nuclear magnetic resonance (NMR) platform. We experimentally encode classical data in the evolution of multiple quantum coherence orders using data-dependent unitary transformations and then demonstrate one-dimensional regression and two-dimensional classification tasks. By extending the register to a double-layered star configuration, we propose an extended quantum kernel to handle non-parametrized operator inputs. By numerically simulating the extended quantum kernel, we show classification of entangling and nonentangling unitaries. These results confirm that quantum kernels exhibit strong capabilities in classical as well as quantum machine learning tasks.
This work focuses on model preparation for electrostatic simulations of CAD designs to realize a rapid virtual prototyping concept. We present a boundary element method (BEM) allowing discontinuous fields between surfaces. The corresponding edges of the CAD model are enhanced with the data required to integrate over non-conforming elements. Finally, we generate a mesh for each CAD surface. The approach is verified via numerical experiments and shows excellent agreement with conforming BEM results.
A multichannel extension to the RVQGAN neural coding method is proposed, and realized for data-driven compression of third-order Ambisonics audio. The input- and output layers of the generator and discriminator models are modified to accept multiple (16) channels without increasing the model bitrate. We also propose a loss function for accounting for spatial perception in immersive reproduction, and transfer learning from single-channel models. Listening test results with 7.1.4 immersive playback show that the proposed extension is suitable for coding scene-based, 16-channel Ambisonics content with good quality at 16 kbps when trained and tested on the EigenScape database. The model has potential applications for learning other types of content and multichannel formats.
Finding unambiguous diagrammatic representations for first-order logical formulas and relational queries with arbitrarily nested disjunctions has been a surprisingly long-standing unsolved problem. We refer to this problem as the disjunction problem (of diagrammatic query representations). This work solves the disjunction problem. Our solution unifies, generalizes, and overcomes the shortcomings of prior approaches for disjunctions. It extends the recently proposed Relational Diagrams and is identical for disjunction-free queries. However, it can preserve the relational patterns and the safety for all well-formed Tuple Relational Calculus (TRC) queries, even with arbitrary disjunctions. Additionally, its size is proportional to the original TRC query and can thus be exponentially more succinct than Relational Diagrams.
While the situation has improved for text-only models, it again seems to be the case currently that multimodal (text and image) models develop faster than ways to evaluate them. In this paper, we bring a recently developed evaluation paradigm from text models to multimodal models, namely evaluation through the goal-oriented game (self) play, complementing reference-based and preference-based evaluation. Specifically, we define games that challenge a model's capability to represent a situation from visual information and align such representations through dialogue. We find that the largest closed models perform rather well on the games that we define, while even the best open-weight models struggle with them. On further analysis, we find that the exceptional deep captioning capabilities of the largest models drive some of the performance. There is still room to grow for both kinds of models, ensuring the continued relevance of the benchmark.
The Knaster-Tarski theorem, also known as Tarski's theorem, guarantees that every monotone function defined on a complete lattice has a fixed point. We analyze the query complexity of finding such a fixed point on the $k$-dimensional grid of side length $n$ under the $\leq$ relation. Specifically, there is an unknown monotone function $f: \{0,1,\ldots, n-1\}^k \to \{0,1,\ldots, n-1\}^k$ and an algorithm must query a vertex $v$ to learn $f(v)$. A key special case of interest is the Boolean hypercube $\{0,1\}^k$, which is isomorphic to the power set lattice -- the original setting of the Knaster-Tarski theorem. Our lower bound characterizes the randomized and deterministic query complexity of the Tarski search problem on the Boolean hypercube as $\Theta(k)$. More generally, we prove a randomized lower bound of $\Omega\left( k + \frac{k \cdot \log{n}}{\log{k}} \right)$ for the $k$-dimensional grid of side length $n$, which is asymptotically tight in high dimensions when $k$ is large relative to $n$.
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.
Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.