We analyse the typical structure of games in terms of the connectivity properties of their best-response graphs. In particular, we show that almost every 'large' generic game that has a pure Nash equilibrium is connected, meaning that every non-equilibrium action profile can reach every pure Nash equilibrium via best-response paths. This has implications for dynamics in games: many adaptive dynamics, such as the best-response dynamic with inertia, lead to equilibrium in connected games. It follows that there are simple, uncoupled, adaptive dynamics for which period-by-period play converges almost surely to a pure Nash equilibrium in almost every 'large' generic game that has one. We build on recent results in probabilistic combinatorics for our characterisation of game connectivity.
We introduce Group Spike-and-slab Variational Bayes (GSVB), a scalable method for group sparse regression. A fast co-ordinate ascent variational inference (CAVI) algorithm is developed for several common model families including Gaussian, Binomial and Poisson. Theoretical guarantees for our proposed approach are provided by deriving contraction rates for the variational posterior in grouped linear regression. Through extensive numerical studies, we demonstrate that GSVB provides state-of-the-art performance, offering a computationally inexpensive substitute to MCMC, whilst performing comparably or better than existing MAP methods. Additionally, we analyze three real world datasets wherein we highlight the practical utility of our method, demonstrating that GSVB provides parsimonious models with excellent predictive performance, variable selection and uncertainty quantification.
We combine dependent types with linear type systems that soundly and completely capture polynomial time computation. We explore two systems for capturing polynomial time: one system that disallows construction of iterable data, and one, based on the LFPL system of Martin Hofmann, that controls construction via a payment method. Both of these are extended to full dependent types via Quantitative Type Theory, allowing for arbitrary computation in types alongside guaranteed polynomial time computation in terms. We prove the soundness of the systems using a realisability technique due to Dal Lago and Hofmann. Our long-term goal is to combine the extensional reasoning of type theory with intensional reasoning about the resources intrinsically consumed by programs. This paper is a step along this path, which we hope will lead both to practical systems for reasoning about programs' resource usage, and to theoretical use as a form of synthetic computational complexity theory.
The X-ray transform is one of the most fundamental integral operators in image processing and reconstruction. In this article, we revisit its mathematical formalism, and propose an innovative approach making use of Reproducing Kernel Hilbert Spaces (RKHS). Within this framework, the X-ray transform can be considered as a natural analogue of Euclidean projections. The RKHS framework considerably simplifies projection image interpolation, and leads to an analogue of the celebrated representer theorem for the problem of tomographic reconstruction. It leads to methodology that is dimension-free and stands apart from conventional filtered back-projection techniques, as it does not hinge on the Fourier transform. It also allows us to establish sharp stability results at a genuinely functional level, but in the realistic setting where the data are discrete and noisy. The RKHS framework is amenable to any reproducing kernel on a unit ball, affording a high level of generality. When the kernel is chosen to be rotation-invariant, one can obtain explicit spectral representations which elucidate the regularity structure of the associated Hilbert spaces, and one can also solve the reconstruction problem at the same computational cost as filtered back-projection.
Data depth is a powerful nonparametric tool originally proposed to rank multivariate data from center outward. In this context, one of the most archetypical depth notions is Tukey's halfspace depth. In the last few decades notions of depth have also been proposed for functional data. However, Tukey's depth cannot be extended to handle functional data because of its degeneracy. Here, we propose a new halfspace depth for functional data which avoids degeneracy by regularization. The halfspace projection directions are constrained to have a small reproducing kernel Hilbert space norm. Desirable theoretical properties of the proposed depth, such as isometry invariance, maximality at center, monotonicity relative to a deepest point, upper semi-continuity, and consistency are established. Moreover, the regularized halfspace depth can rank functional data with varying emphasis in shape or magnitude, depending on the regularization. A new outlier detection approach is also proposed, which is capable of detecting both shape and magnitude outliers. It is applicable to trajectories in L2, a very general space of functions that include non-smooth trajectories. Based on extensive numerical studies, our methods are shown to perform well in terms of detecting outliers of different types. Three real data examples showcase the proposed depth notion.
The emerging field of quantum machine learning has the potential of revolutionizing our perspectives of quantum computing and artificial intelligence. In the predominantly empirical realm of quantum machine learning, a theoretical void persists. This paper addresses the gap by highlighting the quantum cross entropy, a pivotal counterpart to the classical cross entropy. We establish quantum cross entropy's role in quantum data compression, a fundamental machine learning task, by demonstrating that it acts as the compression rate for sub-optimal quantum source coding. Our approach involves a novel, universal quantum data compression protocol based on the quantum generalization of variable-length coding and the principle of quantum strong typicality. This reveals that quantum cross entropy can effectively serve as a loss function in quantum machine learning algorithms. Furthermore, we illustrate that the minimum of quantum cross entropy aligns with the von Neumann entropy, reinforcing its role as the optimal compression rate and underscoring its significance in advancing our understanding of quantum machine learning's theoretical framework.
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.
In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Modern neural network training relies heavily on data augmentation for improved generalization. After the initial success of label-preserving augmentations, there has been a recent surge of interest in label-perturbing approaches, which combine features and labels across training samples to smooth the learned decision surface. In this paper, we propose a new augmentation method that leverages the first and second moments extracted and re-injected by feature normalization. We replace the moments of the learned features of one training image by those of another, and also interpolate the target labels. As our approach is fast, operates entirely in feature space, and mixes different signals than prior methods, one can effectively combine it with existing augmentation methods. We demonstrate its efficacy across benchmark data sets in computer vision, speech, and natural language processing, where it consistently improves the generalization performance of highly competitive baseline networks.
Advanced methods of applying deep learning to structured data such as graphs have been proposed in recent years. In particular, studies have focused on generalizing convolutional neural networks to graph data, which includes redefining the convolution and the downsampling (pooling) operations for graphs. The method of generalizing the convolution operation to graphs has been proven to improve performance and is widely used. However, the method of applying downsampling to graphs is still difficult to perform and has room for improvement. In this paper, we propose a graph pooling method based on self-attention. Self-attention using graph convolution allows our pooling method to consider both node features and graph topology. To ensure a fair comparison, the same training procedures and model architectures were used for the existing pooling methods and our method. The experimental results demonstrate that our method achieves superior graph classification performance on the benchmark datasets using a reasonable number of parameters.