We extend the finite element interpolated neural network (FEINN) framework from partial differential equations (PDEs) with weak solutions in $H^1$ to PDEs with weak solutions in $H(\textbf{curl})$ or $H(\textbf{div})$. To this end, we consider interpolation trial spaces that satisfy the de Rham Hilbert subcomplex, providing stable and structure-preserving neural network discretisations for a wide variety of PDEs. This approach, coined compatible FEINNs, has been used to accurately approximate the $H(\textbf{curl})$ inner product. We numerically observe that the trained network outperforms finite element solutions by several orders of magnitude for smooth analytical solutions. Furthermore, to showcase the versatility of the method, we demonstrate that compatible FEINNs achieve high accuracy in solving surface PDEs such as the Darcy equation on a sphere. Additionally, the framework can integrate adaptive mesh refinements to effectively solve problems with localised features. We use an adaptive training strategy to train the network on a sequence of progressively adapted meshes. Finally, we compare compatible FEINNs with the adjoint neural network method for solving inverse problems. We consider a one-loop algorithm that trains the neural networks for unknowns and missing parameters using a loss function that includes PDE residual and data misfit terms. The algorithm is applied to identify space-varying physical parameters for the $H(\textbf{curl})$ model problem from partial or noisy observations. We find that compatible FEINNs achieve accuracy and robustness comparable to, if not exceeding, the adjoint method in these scenarios.
Machine learning interatomic potentials (MLIPs) often neglect long-range interactions, such as electrostatic and dispersion forces. In this work, we introduce a straightforward and efficient method to account for long-range interactions by learning a latent variable from local atomic descriptors and applying an Ewald summation to this variable. We demonstrate that in systems including charged and polar molecular dimers, bulk water, and water-vapor interface, standard short-ranged MLIPs can lead to unphysical predictions even when employing message passing. The long-range models effectively eliminate these artifacts, with only about twice the computational cost of short-range MLIPs.
This work studies the parameter-dependent diffusion equation in a two-dimensional domain consisting of locally mirror symmetric layers. It is assumed that the diffusion coefficient is a constant in each layer. The goal is to find approximate parameter-to-solution maps that have a small number of terms. It is shown that in the case of two layers one can find a solution formula consisting of three terms with explicit dependencies on the diffusion coefficient. The formula is based on decomposing the solution into orthogonal parts related to both of the layers and the interface between them. This formula is then expanded to an approximate one for the multi-layer case. We give an analytical formula for square layers and use the finite element formulation for more general layers. The results are illustrated with numerical examples and have applications for reduced basis methods by analyzing the Kolmogorov n-width.
We propose a method utilizing physics-informed neural networks (PINNs) to solve Poisson equations that serve as control variates in the computation of transport coefficients via fluctuation formulas, such as the Green--Kubo and generalized Einstein-like formulas. By leveraging approximate solutions to the Poisson equation constructed through neural networks, our approach significantly reduces the variance of the estimator at hand. We provide an extensive numerical analysis of the estimators and detail a methodology for training neural networks to solve these Poisson equations. The approximate solutions are then incorporated into Monte Carlo simulations as effective control variates, demonstrating the suitability of the method for moderately high-dimensional problems where fully deterministic solutions are computationally infeasible.
We present neural network-based constitutive models for hyperelastic geometrically exact beams. The proposed models are physics-augmented, i.e., formulated to fulfill important mechanical conditions by construction, which improves accuracy and generalization. Strains and curvatures of the beam are used as input for feed-forward neural networks that represent the effective hyperelastic beam potential. Forces and moments are received as the gradients of the beam potential, ensuring thermodynamic consistency. Normalization conditions are considered via additional projection terms. Symmetry conditions are implemented by an invariant-based approach for transverse isotropy and a more flexible point symmetry constraint, which is included in transverse isotropy but poses fewer restrictions on the constitutive response. Furthermore, a data augmentation approach is proposed to improve the scaling behavior of the models for varying cross-section radii. Additionally, we introduce a parameterization with a scalar parameter to represent ring-shaped cross-sections with different ratios between the inner and outer radii. Formulating the beam potential as a neural network provides a highly flexible model. This enables efficient constitutive surrogate modeling for geometrically exact beams with nonlinear material behavior and cross-sectional deformation, which otherwise would require computationally much more expensive methods. The models are calibrated and tested with data generated for beams with circular and ring-shaped hyperelastic deformable cross-sections at varying inner and outer radii, showing excellent accuracy and generalization. The applicability of the proposed point symmetric model is further demonstrated by applying it in beam simulations. In all studied cases, the proposed model shows excellent performance.
A statistical network model with overlapping communities can be generated as a superposition of mutually independent random graphs of varying size. The model is parameterized by the number of nodes, the number of communities, and the joint distribution of the community size and the edge probability. This model admits sparse parameter regimes with power-law limiting degree distributions and non-vanishing clustering coefficients. This article presents large-scale approximations of clique and cycle frequencies for graph samples generated by the model, which are valid for regimes with unbounded numbers of overlapping communities. Our results reveal the growth rates of these subgraph frequencies and show that their theoretical densities can be reliably estimated from data.
Conjugate heat transfer (CHT) analyses are vital for the design of many energy systems. However, high-fidelity CHT numerical simulations are computationally intensive, which limits their applications such as design optimization, where hundreds to thousands of evaluations are required. In this work, we develop a modular deep encoder-decoder hierarchical (DeepEDH) convolutional neural network, a novel deep-learning-based surrogate modeling methodology for computationally intensive CHT analyses. Leveraging convective temperature dependencies, we propose a two-stage temperature prediction architecture that couples velocity and temperature fields. The proposed DeepEDH methodology is demonstrated by modeling the pressure, velocity, and temperature fields for a liquid-cooled cold-plate-based battery thermal management system with variable channel geometry. A computational mesh and CHT formulation of the cold plate is created and solved using the finite element method (FEM), generating a dataset of 1,500 simulations. Our performance analysis covers the impact of the novel architecture, separate DeepEDH models for each field, output geometry masks, multi-stage temperature field predictions, and optimizations of the hyperparameters and architecture. Furthermore, we quantify the influence of the CHT analysis' thermal boundary conditions on surrogate model performance, highlighting improved temperature model performance with higher heat fluxes. Compared to other deep learning neural network surrogate models, such as U-Net and DenseED, the proposed DeepEDH architecture for CHT analyses exhibits up to a 65% enhancement in the coefficient of determination $R^{2}$. (*Due to the notification of arXiv "The Abstract field cannot be longer than 1,920 characters", the appeared Abstract is shortened. For the full Abstract, please download the Article.)
Composite materials often exhibit mechanical anisotropy owing to the material properties or geometrical configurations of the microstructure. This makes their inverse design a two-fold problem. First, we must learn the type and orientation of anisotropy and then find the optimal design parameters to achieve the desired mechanical response. In our work, we solve this challenge by first training a forward surrogate model based on the macroscopic stress-strain data obtained via computational homogenization for a given multiscale material. To this end, we use partially Input Convex Neural Networks (pICNNs) to obtain a polyconvex representation of the strain energy in terms of the invariants of the Cauchy-Green deformation tensor. The network architecture and the strain energy function are modified to incorporate, by construction, physics and mechanistic assumptions into the framework. While training the neural network, we find the type of anisotropy, if any, along with the preferred directions. Once the model is trained, we solve the inverse problem using an evolution strategy to obtain the design parameters that give a desired mechanical response. We test the framework against synthetic macroscale and also homogenized data. For cases where polyconvexity might be violated during the homogenization process, we present viable alternate formulations. The trained model is also integrated into a finite element framework to invert design parameters that result in a desired macroscopic response. We show that the invariant-based model is able to solve the inverse problem for a stress-strain dataset with a different preferred direction than the one it was trained on and is able to not only learn the polyconvex potentials of hyperelastic materials but also recover the correct parameters for the inverse design problem.
The choice of architecture of a neural network influences which functions will be realizable by that neural network and, as a result, studying the expressiveness of a chosen architecture has received much attention. In ReLU neural networks, the presence of stably unactivated neurons can reduce the network's expressiveness. In this work, we investigate the probability of a neuron in the second hidden layer of such neural networks being stably unactivated when the weights and biases are initialized from symmetric probability distributions. For networks with input dimension $n_0$, we prove that if the first hidden layer has $n_0+1$ neurons then this probability is exactly $\frac{2^{n_0}+1}{4^{n_0+1}}$, and if the first hidden layer has $n_1$ neurons, $n_1 \le n_0$, then the probability is $\frac{1}{2^{n_1+1}}$. Finally, for the case when the first hidden layer has more neurons than $n_0+1$, a conjecture is proposed along with the rationale. Computational evidence is presented to support the conjecture.
Two common methods for solving absolute value equations (AVE) are SOR-like iteration method and fixed point iteration (FPI) method. In this paper, novel convergence analysis, which result wider convergence range, of the SOR-like iteration and the FPI are given. Based on the new analysis, a new optimal iterative parameter with a analytical form is obtained for the SOR-like iteration. In addition, an optimal iterative parameter with a analytical form is also obtained for FPI. Surprisingly, the SOR-like iteration and the FPI are the same whenever they are equipped with our optimal iterative parameters. As a by product, we give two new constructive proof for a well known sufficient condition such that AVE has a unique solution for any right hand side. Numerical results demonstrate our claims.
Powerful deep neural networks are vulnerable to adversarial attacks. To obtain adversarially robust models, researchers have separately developed adversarial training and Jacobian regularization techniques. There are abundant theoretical and empirical studies for adversarial training, but theoretical foundations for Jacobian regularization are still lacking. In this study, we show that Jacobian regularization is closely related to adversarial training in that $\ell_{2}$ or $\ell_{1}$ Jacobian regularized loss serves as an approximate upper bound on the adversarially robust loss under $\ell_{2}$ or $\ell_{\infty}$ adversarial attack respectively. Further, we establish the robust generalization gap for Jacobian regularized risk minimizer via bounding the Rademacher complexity of both the standard loss function class and Jacobian regularization function class. Our theoretical results indicate that the norms of Jacobian are related to both standard and robust generalization. We also perform experiments on MNIST data classification to demonstrate that Jacobian regularized risk minimization indeed serves as a surrogate for adversarially robust risk minimization, and that reducing the norms of Jacobian can improve both standard and robust generalization. This study promotes both theoretical and empirical understandings to adversarially robust generalization via Jacobian regularization.