The simulation of geological facies in an unobservable volume is essential in various geoscience applications. Given the complexity of the problem, deep generative learning is a promising approach to overcome the limitations of traditional geostatistical simulation models, in particular their lack of physical realism. This research aims to investigate the application of generative adversarial networks and deep variational inference for conditionally simulating meandering channels in underground volumes. In this paper, we review the generative deep learning approaches, in particular the adversarial ones and the stabilization techniques that aim to facilitate their training. The proposed approach is tested on 2D and 3D simulations generated by the stochastic process-based model Flumy. Morphological metrics are utilized to compare our proposed method with earlier iterations of generative adversarial networks. The results indicate that by utilizing recent stabilization techniques, generative adversarial networks can efficiently sample from target data distributions. Moreover, we demonstrate the ability to simulate conditioned simulations through the latent variable model property of the proposed approach.
Covariance matrices of random vectors contain information that is crucial for modelling. Certain structures and patterns of the covariances (or correlations) may be used to justify parametric models, e.g., autoregressive models. Until now, there have been only few approaches for testing such covariance structures systematically and in a unified way. In the present paper, we propose such a unified testing procedure, and we will exemplify the approach with a large variety of covariance structure models. This includes common structures such as diagonal matrices, Toeplitz matrices, and compound symmetry but also the more involved autoregressive matrices. We propose hypothesis tests for these structures, and we use bootstrap techniques for better small-sample approximation. The structures of the proposed tests invite for adaptations to other covariance patterns by choosing the hypothesis matrix appropriately. We prove their correctness for large sample sizes. The proposed methods require only weak assumptions. With the help of a simulation study, we assess the small sample properties of the tests. We also analyze a real data set to illustrate the application of the procedure.
Computer simulations (a.k.a. white-box models) are more indispensable than ever to model intricate engineering systems. However, computational models alone often fail to fully capture the complexities of reality. When physical experiments are accessible though, it is of interest to enhance the incomplete information offered by computational models. Gray-box modeling is concerned with the problem of merging information from data-driven (a.k.a. black-box) models and white-box (i.e., physics-based) models. In this paper, we propose to perform this task by using multi-fidelity surrogate models (MFSMs). A MFSM integrates information from models with varying computational fidelity into a new surrogate model. The multi-fidelity surrogate modeling framework we propose handles noise-contaminated data and is able to estimate the underlying noise-free high-fidelity function. Our methodology emphasizes on delivering precise estimates of the uncertainty in its predictions in the form of confidence and prediction intervals, by quantitatively incorporating the different types of uncertainty that affect the problem, arising from measurement noise and from lack of knowledge due to the limited experimental design budget on both the high- and low-fidelity models. Applied to gray-box modeling, our MFSM framework treats noisy experimental data as the high-fidelity and the white-box computational models as their low-fidelity counterparts. The effectiveness of our methodology is showcased through synthetic examples and a wind turbine application.
Nurmuhammad et al. developed the Sinc-Nystr\"{o}m methods for initial value problems in which the solutions exhibit exponential decay end behavior. In these methods, the Single-Exponential (SE) transformation or the Double-Exponential (DE) transformation is combined with the Sinc approximation. Hara and Okayama improved on these transformations to attain a better convergence rate, which was later supported by theoretical error analyses. However, these methods have a computational drawback owing to the inclusion of a special function in the basis functions. To address this issue, Okayama and Hara proposed Sinc-collocation methods, which do not include any special function in the basis functions. This study conducts error analyses of these methods.
The variational quantum algorithms are crucial for the application of NISQ computers. Such algorithms require short quantum circuits, which are more amenable to implementation on near-term hardware, and many such methods have been developed. One of particular interest is the so-called variational quantum state diagonalization method, which constitutes an important algorithmic subroutine and can be used directly to work with data encoded in quantum states. In particular, it can be applied to discern the features of quantum states, such as entanglement properties of a system, or in quantum machine learning algorithms. In this work, we tackle the problem of designing a very shallow quantum circuit, required in the quantum state diagonalization task, by utilizing reinforcement learning (RL). We use a novel encoding method for the RL-state, a dense reward function, and an $\epsilon$-greedy policy to achieve this. We demonstrate that the circuits proposed by the reinforcement learning methods are shallower than the standard variational quantum state diagonalization algorithm and thus can be used in situations where hardware capabilities limit the depth of quantum circuits. The methods we propose in the paper can be readily adapted to address a wide range of variational quantum algorithms.
A countable structure is indivisible if for every coloring with finite range there is a monochromatic isomorphic subcopy of the structure. Each indivisible structure $\mathcal{S}$ naturally corresponds to an indivisibility problem $\mathsf{Ind}\ \mathcal{S}$, which outputs such a subcopy given a presentation and coloring. We investigate the Weihrauch complexity of the indivisibility problems for two structures: the rational numbers $\mathbb{Q}$ as a linear order, and the equivalence relation $\mathscr{E}$ with countably many equivalence classes each having countably many members. We separate the Weihrauch degrees of both $\mathsf{Ind}\ \mathbb{Q}$ and $\mathsf{Ind}\ \mathscr{E}$ from several benchmark problems, showing in particular that $\mathsf{C}_\mathbb{N} \vert_\mathrm{W} \mathsf{Ind}\ \mathbb{Q}$ and hence $\mathsf{Ind}\ \mathbb{Q}$ is strictly weaker than the problem of finding an interval in which some color is dense for a given coloring of $\mathbb{Q}$; and that the Weihrauch degree of $\mathsf{Ind}\ \mathscr{E}_k$ is strictly between those of $\mathsf{SRT}^2_k$ and $\mathsf{RT}^2_k$, where $\mathsf{Ind}\ \mathcal{S}_k$ is the restriction of $\mathsf{Ind}\ \mathcal{S}$ to $k$-colorings.
The frontier of quantum computing (QC) simulation on classical hardware is quickly reaching the hard scalability limits for computational feasibility. Nonetheless, there is still a need to simulate large quantum systems classically, as the Noisy Intermediate Scale Quantum (NISQ) devices are yet to be considered fault tolerant and performant enough in terms of operations per second. Each of the two main exact simulation techniques, state vector and tensor network simulators, boasts specific limitations. The exponential memory requirement of state vector simulation, when compared to the qubit register sizes of currently available quantum computers, quickly saturates the capacity of the top HPC machines currently available. Tensor network contraction approaches, which encode quantum circuits into tensor networks and then contract them over an output bit string to obtain its probability amplitude, still fall short of the inherent complexity of finding an optimal contraction path, which maps to a max-cut problem on a dense mesh, a notably NP-hard problem. This article aims at investigating the limits of current state-of-the-art simulation techniques on a test bench made of eight widely used quantum subroutines, each in 31 different configurations, with special emphasis on performance. We then correlate the performance measures of the simulators with the metrics that characterise the benchmark circuits, identifying the main reasons behind the observed performance trend. From our observations, given the structure of a quantum circuit and the number of qubits, we highlight how to select the best simulation strategy, obtaining a speedup of up to an order of magnitude.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.