We present the deep neural network multigrid solver (DNN-MG) that we develop for the instationary Navier-Stokes equations. DNN-MG improves computational efficiency using a judicious combination of a geometric multigrid solver and a recurrent neural network with memory. DNN-MG uses the multi-grid method to classically solve on coarse levels while the neural network corrects interpolated solutions on fine ones, thus avoiding the increasingly expensive computations that would have to be performed there. This results in a reduction in computation time through DNN-MG's highly compact neural network. The compactness results from its design for local patches and the available coarse multigrid solutions that provides a "guide" for the corrections. A compact neural network with a small number of parameters also reduces training time and data. Furthermore, the network's locality facilitates generalizability and allows one to use DNN-MG trained on one mesh domain also on different ones. We demonstrate the efficacy of DNN-MG for variations of the 2D laminar flow around an obstacle. For these, our method significantly improves the solutions as well as lift and drag functionals while requiring only about half the computation time of a full multigrid solution. We also show that DNN-MG trained for the configuration with one obstacle can be generalized to other time dependent problems that can be solved efficiently using a geometric multigrid method.
Applying the concept of S-convergence, based on averaging in the spirit of Strong Law of Large Numbers, the vanishing viscosity solutions of the Euler system are studied. We show how to efficiently compute a viscosity solution of the Euler system as the S-limit of numerical solutions obtained by the Viscosity Finite Volume method. Theoretical results are illustrated by numerical simulations of the Kelvin--Helmholtz instability problem.
Classical interior penalty discontinuous Galerkin (IPDG) methods for diffusion problems require a number of assumptions on the local variation of mesh-size, polynomial degree, and of the diffusion coefficient to determine the values of the, so-called, discontinuity-penalization parameter and/or to perform error analysis. Variants of IPDG methods involving weighted averages of the gradient of the approximate solution have been proposed in the context of high-contrast diffusion coefficients to mitigate the dependence of the contrast in the stability and in the error analysis. Here, we present a new IPDG method, involving carefully constructed weighted averages of the gradient of the approximate solution, which is shown to be robust even for the most extreme simultaneous local mesh, polynomial degree and diffusion coefficient variation scenarios, without resulting in unreasonably large penalization. The new method, henceforth termed as \emph{robust IPDG} (RIPDG), offers typically significantly better conditioning than the standard IPDG method when applied to scenarios with strong mesh/polynomial degree/diffusion local variation. On the other hand, when using uniform meshes, constant polynomial degree and for problems with constant diffusion coefficients, the RIPDG method is identical to the classical IPDG. Numerical experiments indicate the favourable performance of the new RIPDG method over the classical version in terms of conditioning and error.
Anomalous behavior is ubiquitous in subsurface solute transport due to the presence of high degrees of heterogeneity at different scales in the media. Although fractional models have been extensively used to describe the anomalous transport in various subsurface applications, their application is hindered by computational challenges. Simpler nonlocal models characterized by integrable kernels and finite interaction length represent a computationally feasible alternative to fractional models; yet, the informed choice of their kernel functions still remains an open problem. We propose a general data-driven framework for the discovery of optimal kernels on the basis of very small and sparse data sets in the context of anomalous subsurface transport. Using spatially sparse breakthrough curves recovered from fine-scale particle-density simulations, we learn the best coarse-scale nonlocal model using a nonlocal operator regression technique. Predictions of the breakthrough curves obtained using the optimal nonlocal model show good agreement with fine-scale simulation results even at locations and time intervals different from the ones used to train the kernel, confirming the excellent generalization properties of the proposed algorithm. A comparison with trained classical models and with black-box deep neural networks confirms the superiority of the predictive capability of the proposed model.
Learning mapping between two function spaces has attracted considerable research attention. However, learning the solution operator of partial differential equations (PDEs) remains a challenge in scientific computing. Therefore, in this study, we propose a novel pseudo-differential integral operator (PDIO) inspired by a pseudo-differential operator, which is a generalization of a differential operator and characterized by a certain symbol. We parameterize the symbol by using a neural network and show that the neural-network-based symbol is contained in a smooth symbol class. Subsequently, we prove that the PDIO is a bounded linear operator, and thus is continuous in the Sobolev space. We combine the PDIO with the neural operator to develop a pseudo-differential neural operator (PDNO) to learn the nonlinear solution operator of PDEs. We experimentally validate the effectiveness of the proposed model by using Burgers' equation, Darcy flow, and the Navier-Stokes equation. The results reveal that the proposed PDNO outperforms the existing neural operator approaches in most experiments.
High-order entropy-stable discontinuous Galerkin methods for the compressible Euler and Navier-Stokes equations require the positivity of thermodynamic quantities in order to guarantee their well-posedness. In this work, we introduce a positivity limiting strategy for entropy-stable discontinuous Galerkin discretizations based on convex limiting. The key ingredient in the limiting procedure is a low order positivity-preserving discretization based on graph viscosity terms. The proposed limiting strategy is both positivity preserving and discretely entropy-stable for the compressible Euler and Navier-Stokes equations. Numerical experiments confirm the high order accuracy and robustness of the proposed strategy.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
We present an end-to-end CNN architecture for fine-grained visual recognition called Collaborative Convolutional Network (CoCoNet). The network uses a collaborative filter after the convolutional layers to represent an image as an optimal weighted collaboration of features learned from training samples as a whole rather than one at a time. This gives CoCoNet more power to encode the fine-grained nature of the data with limited samples in an end-to-end fashion. We perform a detailed study of the performance with 1-stage and 2-stage transfer learning and different configurations with benchmark architectures like AlexNet and VggNet. The ablation study shows that the proposed method outperforms its constituent parts considerably and consistently. CoCoNet also outperforms the baseline popular deep learning based fine-grained recognition method, namely Bilinear-CNN (BCNN) with statistical significance. Experiments have been performed on the fine-grained species recognition problem, but the method is general enough to be applied to other similar tasks. Lastly, we also introduce a new public dataset for fine-grained species recognition, that of Indian endemic birds and have reported initial results on it. The training metadata and new dataset are available through the corresponding author.
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery in deep learning. Existing works indicate that this observation holds for both complicated real datasets and simple datasets of one-dimensional (1-d) functions. In this work, for natural images and low-frequency dominant 1-d functions, we empirically found that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). F-Principle can be observed over various DNN setups of different activation functions, layer structures and training algorithms in our experiments. F-Principle can be used to understand (i) the behavior of DNN training in the information plane and (ii) why DNNs often generalize well albeit its ability of overfitting. This F-Principle potentially can provide insights into understanding the general principle underlying DNN optimization and generalization for real datasets.
For neural networks (NNs) with rectified linear unit (ReLU) or binary activation functions, we show that their training can be accomplished in a reduced parameter space. Specifically, the weights in each neuron can be trained on the unit sphere, as opposed to the entire space, and the threshold can be trained in a bounded interval, as opposed to the real line. We show that the NNs in the reduced parameter space are mathematically equivalent to the standard NNs with parameters in the whole space. The reduced parameter space shall facilitate the optimization procedure for the network training, as the search space becomes (much) smaller. We demonstrate the improved training performance using numerical examples.