We propose a supplement matrix method for computing eigenvalues of a dual Hermitian matrix, and discuss its application in multi-agent formation control. Suppose we have a ring, which can be the real field, the complex field, or the quaternion ring. We study dual number symmetric matrices, dual complex Hermitian matrices and dual quaternion Hermitian matrices in a unified frame of dual Hermitian matrices. An $n \times n$ dual Hermitian matrix has $n$ dual number eigenvalues. We define determinant, characteristic polynomial and supplement matrices for a dual Hermitian matrix. Supplement matrices are Hermitian matrices in the original ring. The standard parts of the eigenvalues of that dual Hermitian matrix are the eigenvalues of the standard part Hermitian matrix in the original ring, while the dual parts of the eigenvalues of that dual Hermitian matrix are the eigenvalues of those supplement matrices. Hence, by applying any practical method for computing eigenvalues of Hermitian matrices in the original ring, we have a practical method for computing eigenvalues of a dual Hermitian matrix. We call this method the supplement matrix method. In multi-agent formation control, a desired relative configuration scheme may be given. People need to know if this scheme is reasonable such that a feasible solution of configurations of these multi-agents exists. By exploring the eigenvalue problem of dual Hermitian matrices, and its link with the unit gain graph theory, we open a cross-disciplinary approach to solve the relative configuration problem. Numerical experiments are reported.
A proof of quantumness is an efficiently verifiable interactive test that an efficient quantum computer can pass, but all efficient classical computers cannot (under some cryptographic assumption). Such protocols play a crucial role in the certification of quantum devices. Existing single-round protocols (like asking the quantum computer to factor a large number) require large quantum circuits, whereas multi-round ones use smaller circuits but require experimentally challenging mid-circuit measurements. As such, current proofs of quantumness are out of reach for near-term devices. In this work, we construct efficient single-round proofs of quantumness based on existing knowledge assumptions. While knowledge assumptions have not been previously considered in this context, we show that they provide a natural basis for separating classical and quantum computation. Specifically, we show that multi-round protocols based on Decisional Diffie-Hellman (DDH) or Learning With Errors (LWE) can be "compiled" into single-round protocols using a knowledge-of-exponent assumption or knowledge-of-lattice-point assumption, respectively. We also prove an adaptive hardcore-bit statement for a family of claw-free functions based on DDH, which might be of independent interest. Previous approaches to constructing single-round protocols relied on the random oracle model and thus incurred the overhead associated with instantiating the oracle with a cryptographic hash function. In contrast, our protocols have the same resource requirements as their multi-round counterparts without necessitating mid-circuit measurements, making them, arguably, the most efficient single-round proofs of quantumness to date. Our work also helps in understanding the interplay between black-box/white-box reductions and cryptographic assumptions in the design of proofs of quantumness.
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads. These theoretical insights are validated experimentally and offer natural suggestions for alternative architectures.
A white noise signal can access any possible configuration of values, though statistically over many samples tends to a uniform spectral distribution, and is highly unlikely to produce intelligible sound. But how unlikely? The probability that white noise generates a music-like signal over different durations is analyzed, based on some necessary features observed in real music audio signals such as mostly proximate movement and zero crossing rate. Given the mathematical results, the rarity of music as a signal is considered overall. The applicability of this study is not just to show that music has a precious rarity value, but that examination of the size of music relative to the overall size of audio signal space provides information to inform new generations of algorithmic music system (which are now often founded on audio signal generation directly, and may relate to white noise via such machine learning processes as diffusion). Estimated upper bounds on the rarity of music to the size of various physical and musical spaces are compared, to better understand the magnitude of the results (pun intended). Underlying the research are the questions `how much music is still out there?' and `how much music could a machine learning process actually reach?'.
We consider limit probabilities of first order properties in random graphs with a given degree sequence. Under mild conditions on the degree sequence, we show that the closure set of limit probabilities is a finite union of closed intervals. Moreover, we characterize the degree sequences for which this closure set is the interval $[0,1]$, a property that is intimately related with the probability that the random graph is acyclic. As a side result, we compile a full description of the cycle distribution of random graphs and study their fragment (disjoint union of unicyclic components) in the subcritical regime. Finally, we amend the proof of the existence of limit probabilities for first order properties in random graphs with a given degree sequence; this result was already claimed by Lynch~[IEEE LICS 2003] but his proof contained some inaccuracies.
Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Given input length $n$, previous works have shown that constant-depth transformers with finite precision $\mathsf{poly}(n)$ embedding size can only solve problems in $\mathsf{TC}^0$ without CoT. We first show an even tighter expressiveness upper bound for constant-depth transformers with constant-bit precision, which can only solve problems in $\mathsf{AC}^0$, a proper subset of $ \mathsf{TC}^0$. However, with $T$ steps of CoT, constant-depth transformers using constant-bit precision and $O(\log n)$ embedding size can solve any problem solvable by boolean circuits of size $T$. Empirically, enabling CoT dramatically improves the accuracy for tasks that are hard for parallel computation, including the composition of permutation groups, iterated squaring, and circuit value problems, especially for low-depth transformers.
We study stochastic approximation algorithms with Markovian noise and constant step-size $\alpha$. We develop a method based on infinitesimal generator comparisons to study the bias of the algorithm, which is the expected difference between $\theta_n$ -- the value at iteration $n$ -- and $\theta^*$ -- the unique equilibrium of the corresponding ODE. We show that, under some smoothness conditions, this bias is of order $O(\alpha)$. Furthermore, we show that the time-averaged bias is equal to $\alpha V + O(\alpha^2)$, where $V$ is a constant characterized by a Lyapunov equation, showing that $\esp{\bar{\theta}_n} \approx \theta^*+V\alpha + O(\alpha^2)$, where $\bar{\theta}_n=(1/n)\sum_{k=1}^n\theta_k$ is the Polyak-Ruppert average. We also show that $\bar{\theta}_n$ converges with high probability around $\theta^*+\alpha V$. We illustrate how to combine this with Richardson-Romberg extrapolation to derive an iterative scheme with a bias of order $O(\alpha^2)$.
The analysis of formal models that include quantitative aspects such as timing or probabilistic choices is performed by quantitative verification tools. Broad and mature tool support is available for computing basic properties such as expected rewards on basic models such as Markov chains. Previous editions of QComp, the comparison of tools for the analysis of quantitative formal models, focused on this setting. Many application scenarios, however, require more advanced property types such as LTL and parameter synthesis queries as well as advanced models like stochastic games and partially observable MDPs. For these, tool support is in its infancy today. This paper presents the outcomes of QComp 2023: a survey of the state of the art in quantitative verification tool support for advanced property types and models. With tools ranging from first research prototypes to well-supported integrations into established toolsets, this report highlights today's active areas and tomorrow's challenges in tool-focused research for quantitative verification.
The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.