Selinger gave a superoperator model of a first-order quantum programming language and proved that it is fully definable and hence fully abstract. This paper proposes an extension of the superoperator model to higher-order programs based on modules over superoperators or, equivalently, enriched presheaves over the category of superoperators. The enriched presheaf category can be easily proved to be a model of intuitionistic linear logic with cofree exponential, from which one can cave out a model of classical linear logic by a kind of bi-orthogonality construction. Although the structures of an enriched presheaf category are usually rather complex, a morphism in the classical model can be expressed simply as a matrix of completely positive maps. The model inherits many desirable properties from the superoperator model. A conceptually interesting property is that our model has only a state whose "total probability" is bounded by 1, i.e. does not have a state where true and false each occur with probability 2/3. Another convenient property inherited from the superoperator model is a $\omega$CPO-enrichment. Remarkably, our model has a sufficient structure to interpret arbitrary recursive types by the standard domain theoretic technique. We introduce Quantum FPC, a quantum $\lambda$-calculus with recursive types, and prove that our model is a fully abstract model of Quantum FPC.
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.
This work considers the asymptotic behavior of the distance between two sample covariance matrices (SCM). A general result is provided for a class of functionals that can be expressed as sums of traces of functions that are separately applied to each covariance matrix. In particular, this class includes very conventional metrics, such as the Euclidean distance or Jeffrery's divergence, as well as a number of other more sophisticated distances recently derived from Riemannian geometry considerations, such as the log-Euclidean metric. In particular, we analyze the asymptotic behavior of this class of functionals by establishing a central limit theorem that allows us to describe their asymptotic statistical law. In order to account for the fact that the sample sizes of two SCMs are of the same order of magnitude as their observation dimension, results are provided by assuming that these parameters grow to infinity while their quotients converge to fixed quantities. Numerical results illustrate how this type of result can be used in order to predict the performance of these metrics in practical machine learning algorithms, such as clustering of SCMs.
This article presents a general solution to the problem of computational complexity. First, it gives a historical introduction to the problem since the revival of the foundational problems of mathematics at the end of the 19th century. Second, building on the theory of functional relations in mathematics, it provides a theoretical framework where we can rigorously distinguish two pairs of concepts: Between solving a problem and verifying the solution to a problem. Between a deterministic and a non-deterministic model of computation. Third, it presents the theory of computational complexity and the difficulties in solving the P versus NP problem. Finally, it gives a complete proof that a certain decision problem in NP has an algorithmic exponential lower bound thus establishing firmly that P is different from NP. The proof presents a new way of approaching the subject: neither by entering into the unmanageable difficulties of proving this type of lower bound for the known NP-complete problems nor by entering into the difficulties regarding the properties of the many complexity classes established since the mid-1970s.
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be.
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.
Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, we outline some potential directions for future research.
Behaviors of the synthetic characters in current military simulations are limited since they are generally generated by rule-based and reactive computational models with minimal intelligence. Such computational models cannot adapt to reflect the experience of the characters, resulting in brittle intelligence for even the most effective behavior models devised via costly and labor-intensive processes. Observation-based behavior model adaptation that leverages machine learning and the experience of synthetic entities in combination with appropriate prior knowledge can address the issues in the existing computational behavior models to create a better training experience in military training simulations. In this paper, we introduce a framework that aims to create autonomous synthetic characters that can perform coherent sequences of believable behavior while being aware of human trainees and their needs within a training simulation. This framework brings together three mutually complementary components. The first component is a Unity-based simulation environment - Rapid Integration and Development Environment (RIDE) - supporting One World Terrain (OWT) models and capable of running and supporting machine learning experiments. The second is Shiva, a novel multi-agent reinforcement and imitation learning framework that can interface with a variety of simulation environments, and that can additionally utilize a variety of learning algorithms. The final component is the Sigma Cognitive Architecture that will augment the behavior models with symbolic and probabilistic reasoning capabilities. We have successfully created proof-of-concept behavior models leveraging this framework on realistic terrain as an essential step towards bringing machine learning into military simulations.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.