In the present work, advanced spatial and temporal discretization techniques are tailored to hyperelastic physics-augmented neural networks, i.e., neural network based constitutive models which fulfill all relevant mechanical conditions of hyperelasticity by construction. The framework takes into account the structure of neural network-based constitutive models, in particular, that their derivatives are more complex compared to analytical models. The proposed framework allows for convenient mixed Hu-Washizu like finite element formulations applicable to nearly incompressible material behavior. The key feature of this work is a tailored energy-momentum scheme for time discretization, which allows for energy and momentum preserving dynamical simulations. Both the mixed formulation and the energy-momentum discretization are applied in finite element analysis. For this, a hyperelastic physics-augmented neural network model is calibrated to data generated with an analytical potential. In all finite element simulations, the proposed discretization techniques show excellent performance. All of this demonstrates that, from a formal point of view, neural networks are essentially mathematical functions. As such, they can be applied in numerical methods as straightforwardly as analytical constitutive models. Nevertheless, their special structure suggests to tailor advanced discretization methods, to arrive at compact mathematical formulations and convenient implementations.
Neural networks have revolutionized the field of machine learning with increased predictive capability. In addition to improving the predictions of neural networks, there is a simultaneous demand for reliable uncertainty quantification on estimates made by machine learning methods such as neural networks. Bayesian neural networks (BNNs) are an important type of neural network with built-in capability for quantifying uncertainty. This paper discusses aleatoric and epistemic uncertainty in BNNs and how they can be calculated. With an example dataset of images where the goal is to identify the amplitude of an event in the image, it is shown that epistemic uncertainty tends to be lower in images which are well-represented in the training dataset and tends to be high in images which are not well-represented. An algorithm for out-of-distribution (OoD) detection with BNN epistemic uncertainty is introduced along with various experiments demonstrating factors influencing the OoD detection capability in a BNN. The OoD detection capability with epistemic uncertainty is shown to be comparable to the OoD detection in the discriminator network of a generative adversarial network (GAN) with comparable network architecture.
This work presents the multiharmonic analysis and derivation of functional type a posteriori estimates of a distributed eddy current optimal control problem and its state equation in a time-periodic setting. The existence and uniqueness of the solution of a weak space-time variational formulation for the optimality system and the forward problem are proved by deriving inf-sup and sup-sup conditions. Using the inf-sup and sup-sup conditions, we derive guaranteed, sharp, and fully computable bounds of the approximation error for the optimal control problem and the forward problem in the functional type a posteriori estimation framework. We present here the first computational results on the derived estimates.
This article presents the openCFS submodule scattered data reader for coupling multi-physical simulations performed in different simulation programs. For instance, by considering a forward-coupling of a surface vibration simulation (mechanical system) to an acoustic propagation simulation using time-dependent acoustic absorbing material as a noise mitigation measure. The nearest-neighbor search of the target and source points from the interpolation is performed using the FLANN or the CGAL library. In doing so, the coupled field (e.g., surface velocity) is interpolated from a source representation consisting of field values physically stored and organized in a file directory to a target representation being the quadrature points in the case of the finite element method. A test case of the functionality is presented in the "testsuite" module of the openCFS software called "Abc2dcsvt". This scattered data reader module was successfully applied in numerous studies on flow-induced sound generation. Within this short article, the functionality, and usability of this module are described.
Particle methods based on evolving the spatial derivatives of the solution were originally introduced to simulate reaction-diffusion processes, inspired by vortex methods for the Navier--Stokes equations. Such methods, referred to as gradient random walk methods, were extensively studied in the '90s and have several interesting features, such as being grid free, automatically adapting to the solution by concentrating elements where the gradient is large and significantly reducing the variance of the standard random walk approach. In this work, we revive these ideas by showing how to generalize the approach to a larger class of partial differential equations, including hyperbolic systems of conservation laws. To achieve this goal, we first extend the classical Monte Carlo method to relaxation approximation of systems of conservation laws, and subsequently consider a novel particle dynamics based on the spatial derivatives of the solution. The methodology, combined with asymptotic-preserving splitting discretization, yields a way to construct a new class of gradient-based Monte Carlo methods for hyperbolic systems of conservation laws. Several results in one spatial dimension for scalar equations and systems of conservation laws show that the new methods are very promising and yield remarkable improvements compared to standard Monte Carlo approaches, either in terms of variance reduction as well as in describing the shock structure.
At least two, different approaches to define and solve statistical models for the analysis of economic systems exist: the typical, econometric one, interpreting the Gravity Model specification as the expected link weight of an arbitrary probability distribution, and the one rooted into statistical physics, constructing maximum-entropy distributions constrained to satisfy certain network properties. In a couple of recent, companion papers they have been successfully integrated within the framework induced by the constrained minimisation of the Kullback-Leibler divergence: specifically, two, broad classes of models have been devised, i.e. the integrated and the conditional ones, defined by different, probabilistic rules to place links, load them with weights and turn them into proper, econometric prescriptions. Still, the recipes adopted by the two approaches to estimate the parameters entering into the definition of each model differ. In econometrics, a likelihood that decouples the binary and weighted parts of a model, treating a network as deterministic, is typically maximised; to restore its random character, two alternatives exist: either solving the likelihood maximisation on each configuration of the ensemble and taking the average of the parameters afterwards or taking the average of the likelihood function and maximising the latter one. The difference between these approaches lies in the order in which the operations of averaging and maximisation are taken - a difference that is reminiscent of the quenched and annealed ways of averaging out the disorder in spin glasses. The results of the present contribution, devoted to comparing these recipes in the case of continuous, conditional network models, indicate that the annealed estimation recipe represents the best alternative to the deterministic one.
Kinetic approaches are generally accurate in dealing with microscale plasma physics problems but are computationally expensive for large-scale or multiscale systems. One of the long-standing problems in plasma physics is the integration of kinetic physics into fluid models, which is often achieved through sophisticated analytical closure terms. In this paper, we successfully construct a multi-moment fluid model with an implicit fluid closure included in the neural network using machine learning. The multi-moment fluid model is trained with a small fraction of sparsely sampled data from kinetic simulations of Landau damping, using the physics-informed neural network (PINN) and the gradient-enhanced physics-informed neural network (gPINN). The multi-moment fluid model constructed using either PINN or gPINN reproduces the time evolution of the electric field energy, including its damping rate, and the plasma dynamics from the kinetic simulations. In addition, we introduce a variant of the gPINN architecture, namely, gPINN$p$ to capture the Landau damping process. Instead of including the gradients of all the equation residuals, gPINN$p$ only adds the gradient of the pressure equation residual as one additional constraint. Among the three approaches, the gPINN$p$-constructed multi-moment fluid model offers the most accurate results. This work sheds light on the accurate and efficient modeling of large-scale systems, which can be extended to complex multiscale laboratory, space, and astrophysical plasma physics problems.
We discuss probabilistic neural networks for unsupervised learning with a fixed internal representation as models for machine understanding. Here understanding is intended as mapping data to an already existing representation which encodes an {\em a priori} organisation of the feature space. We derive the internal representation by requiring that it satisfies the principles of maximal relevance and of maximal ignorance about how different features are combined. We show that, when hidden units are binary variables, these two principles identify a unique model -- the Hierarchical Feature Model (HFM) -- which is fully solvable and provides a natural interpretation in terms of features. We argue that learning machines with this architecture enjoy a number of interesting properties, like the continuity of the representation with respect to changes in parameters and data, the possibility to control the level of compression and the ability to support functions that go beyond generalisation. We explore the behaviour of the model with extensive numerical experiments and argue that models where the internal representation is fixed reproduce a learning modality which is qualitatively different from that of more traditional models such as Restricted Boltzmann Machines.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.