Stochastic memoization is a higher-order construct of probabilistic programming languages that is key in Bayesian nonparametrics, a modular approach that allows us to extend models beyond their parametric limitations and compose them in an elegant and principled manner. Stochastic memoization is simple and useful in practice, but semantically elusive, particularly regarding dataflow transformations. As the naive implementation resorts to the state monad, which is not commutative, it is not clear if stochastic memoization preserves the dataflow property -- i.e., whether we can reorder the lines of a program without changing its semantics, provided the dataflow graph is preserved. In this paper, we give an operational and categorical semantics to stochastic memoization and name generation in the context of a minimal probabilistic programming language, for a restricted class of functions. Our contribution is a first model of stochastic memoization of constant Bernoulli functions with a non-enumerable type, which validates data flow transformations, bridging the gap between traditional probability theory and higher-order probability models. Our model uses a presheaf category and a novel probability monad on it.
High-dimensional data are routinely collected in many areas. We are particularly interested in Bayesian classification models in which one or more variables are imbalanced. Current Markov chain Monte Carlo algorithms for posterior computation are inefficient as $n$ and/or $p$ increase due to worsening time per step and mixing rates. One strategy is to use a gradient-based sampler to improve mixing while using data sub-samples to reduce per-step computational complexity. However, usual sub-sampling breaks down when applied to imbalanced data. Instead, we generalize piece-wise deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch sub-sampling. These approaches maintain the correct stationary distribution with arbitrarily small sub-samples, and substantially outperform current competitors. We provide theoretical support and illustrate gains in simulated and real data applications.
Covariance matrices of random vectors contain information that is crucial for modelling. Certain structures and patterns of the covariances (or correlations) may be used to justify parametric models, e.g., autoregressive models. Until now, there have been only few approaches for testing such covariance structures systematically and in a unified way. In the present paper, we propose such a unified testing procedure, and we will exemplify the approach with a large variety of covariance structure models. This includes common structures such as diagonal matrices, Toeplitz matrices, and compound symmetry but also the more involved autoregressive matrices. We propose hypothesis tests for these structures, and we use bootstrap techniques for better small-sample approximation. The structures of the proposed tests invite for adaptations to other covariance patterns by choosing the hypothesis matrix appropriately. We prove their correctness for large sample sizes. The proposed methods require only weak assumptions. With the help of a simulation study, we assess the small sample properties of the tests. We also analyze a real data set to illustrate the application of the procedure.
We consider scalar semilinear elliptic PDEs, where the nonlinearity is strongly monotone, but only locally Lipschitz continuous. To linearize the arising discrete nonlinear problem, we employ a damped Zarantonello iteration, which leads to a linear Poisson-type equation that is symmetric and positive definite. The resulting system is solved by a contractive algebraic solver such as a multigrid method with local smoothing. We formulate a fully adaptive algorithm that equibalances the various error components coming from mesh refinement, iterative linearization, and algebraic solver. We prove that the proposed adaptive iteratively linearized finite element method (AILFEM) guarantees convergence with optimal complexity, where the rates are understood with respect to the overall computational cost (i.e., the computational time). Numerical experiments investigate the involved adaptivity parameters.
The variational quantum algorithms are crucial for the application of NISQ computers. Such algorithms require short quantum circuits, which are more amenable to implementation on near-term hardware, and many such methods have been developed. One of particular interest is the so-called variational quantum state diagonalization method, which constitutes an important algorithmic subroutine and can be used directly to work with data encoded in quantum states. In particular, it can be applied to discern the features of quantum states, such as entanglement properties of a system, or in quantum machine learning algorithms. In this work, we tackle the problem of designing a very shallow quantum circuit, required in the quantum state diagonalization task, by utilizing reinforcement learning (RL). We use a novel encoding method for the RL-state, a dense reward function, and an $\epsilon$-greedy policy to achieve this. We demonstrate that the circuits proposed by the reinforcement learning methods are shallower than the standard variational quantum state diagonalization algorithm and thus can be used in situations where hardware capabilities limit the depth of quantum circuits. The methods we propose in the paper can be readily adapted to address a wide range of variational quantum algorithms.
Adversarial generative models, such as Generative Adversarial Networks (GANs), are widely applied for generating various types of data, i.e., images, text, and audio. Accordingly, its promising performance has led to the GAN-based adversarial attack methods in the white-box and black-box attack scenarios. The importance of transferable black-box attacks lies in their ability to be effective across different models and settings, more closely aligning with real-world applications. However, it remains challenging to retain the performance in terms of transferable adversarial examples for such methods. Meanwhile, we observe that some enhanced gradient-based transferable adversarial attack algorithms require prolonged time for adversarial sample generation. Thus, in this work, we propose a novel algorithm named GE-AdvGAN to enhance the transferability of adversarial samples whilst improving the algorithm's efficiency. The main approach is via optimising the training process of the generator parameters. With the functional and characteristic similarity analysis, we introduce a novel gradient editing (GE) mechanism and verify its feasibility in generating transferable samples on various models. Moreover, by exploring the frequency domain information to determine the gradient editing direction, GE-AdvGAN can generate highly transferable adversarial samples while minimizing the execution time in comparison to the state-of-the-art transferable adversarial attack algorithms. The performance of GE-AdvGAN is comprehensively evaluated by large-scale experiments on different datasets, which results demonstrate the superiority of our algorithm. The code for our algorithm is available at: //github.com/LMBTough/GE-advGAN
Spatial regression models are central to the field of spatial statistics. Nevertheless, their estimation in case of large and irregular gridded spatial datasets presents considerable computational challenges. To tackle these computational problems, Arbia \citep{arbia_2014_pairwise} introduced a pseudo-likelihood approach (called pairwise likelihood, say PL) which required the identification of pairs of observations that are internally correlated, but mutually conditionally uncorrelated. However, while the PL estimators enjoy optimal theoretical properties, their practical implementation when dealing with data observed on irregular grids suffers from dramatic computational issues (connected with the identification of the pairs of observations) that, in most empirical cases, negatively counter-balance its advantages. In this paper we introduce an algorithm specifically designed to streamline the computation of the PL in large and irregularly gridded spatial datasets, dramatically simplifying the estimation phase. In particular, we focus on the estimation of Spatial Error models (SEM). Our proposed approach, efficiently pairs spatial couples exploiting the KD tree data structure and exploits it to derive the closed-form expressions for fast parameter approximation. To showcase the efficiency of our method, we provide an illustrative example using simulated data, demonstrating the computational advantages if compared to a full likelihood inference are not at the expenses of accuracy.
The frontier of quantum computing (QC) simulation on classical hardware is quickly reaching the hard scalability limits for computational feasibility. Nonetheless, there is still a need to simulate large quantum systems classically, as the Noisy Intermediate Scale Quantum (NISQ) devices are yet to be considered fault tolerant and performant enough in terms of operations per second. Each of the two main exact simulation techniques, state vector and tensor network simulators, boasts specific limitations. The exponential memory requirement of state vector simulation, when compared to the qubit register sizes of currently available quantum computers, quickly saturates the capacity of the top HPC machines currently available. Tensor network contraction approaches, which encode quantum circuits into tensor networks and then contract them over an output bit string to obtain its probability amplitude, still fall short of the inherent complexity of finding an optimal contraction path, which maps to a max-cut problem on a dense mesh, a notably NP-hard problem. This article aims at investigating the limits of current state-of-the-art simulation techniques on a test bench made of eight widely used quantum subroutines, each in 31 different configurations, with special emphasis on performance. We then correlate the performance measures of the simulators with the metrics that characterise the benchmark circuits, identifying the main reasons behind the observed performance trend. From our observations, given the structure of a quantum circuit and the number of qubits, we highlight how to select the best simulation strategy, obtaining a speedup of up to an order of magnitude.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.