Graph-based neural networks and, specifically, message-passing neural networks (MPNNs) have shown great potential in predicting physical properties of solids. In this work, we train an MPNN to first classify materials through density functional theory data from the AFLOW database as being metallic or semiconducting/insulating. We then perform a neural-architecture search to explore the model architecture and hyperparameter space of MPNNs to predict the band gaps of the materials identified as non-metals. The parameters in the search include the number of message-passing steps, latent size, and activation-function, among others. The top-performing models from the search are pooled into an ensemble that significantly outperforms existing models from the literature. Uncertainty quantification is evaluated with Monte-Carlo Dropout and ensembling, with the ensemble method proving superior. The domain of applicability of the ensemble model is analyzed with respect to the crystal systems, the inclusion of a Hubbard parameter in the density functional calculations, and the atomic species building up the materials.
The paper's goal is to provide a simple unified approach to perform sensitivity analysis using Physics-informed neural networks (PINN). The main idea lies in adding a new term in the loss function that regularizes the solution in a small neighborhood near the nominal value of the parameter of interest. The added term represents the derivative of the loss function with respect to the parameter of interest. The result of this modification is a solution to the problem along with the derivative of the solution with respect to the parameter of interest (the sensitivity). We call the new technique to perform sensitivity analysis within this context SA-PINN. We show the effectiveness of the technique using 3 examples: the first one is a simple 1D advection-diffusion problem to show the methodology, the second is a 2D Poisson's problem with 9 parameters of interest and the last one is a transient two-phase flow in porous media problem.
Perturbed by natural hazards, community-level infrastructure networks operate like many-body systems, with behaviors emerging from coupling individual component dynamics with group correlations and interactions. It follows that we can borrow methods from statistical physics to study the response of infrastructure systems to natural disasters. This study aims to construct a joint probability distribution model to describe the post-hazard state of infrastructure networks and propose an efficient surrogate model of the joint distribution for large-scale systems. Specifically, we present maximum entropy modeling of the regional impact of natural hazards on civil infrastructures. Provided with the current state of knowledge, the principle of maximum entropy yields the ``most unbiased`` joint distribution model for the performances of infrastructures. In the general form, the model can handle multivariate performance states and higher-order correlations. In a particular yet typical scenario of binary performance state variables with knowledge of their mean and pairwise correlation, the joint distribution reduces to the Ising model in statistical physics. In this context, we propose using a dichotomized Gaussian model as an efficient surrogate for the maximum entropy model, facilitating the application to large systems. Using the proposed method, we investigate the seismic collective behavior of a large-scale road network (with 8,694 nodes and 26,964 links) in San Francisco, showcasing the non-trivial collective behaviors of infrastructure systems.
Recent advances in deep learning have given us some very promising results on the generalization ability of deep neural networks, however literature still lacks a comprehensive theory explaining why heavily over-parametrized models are able to generalize well while fitting the training data. In this paper we propose a PAC type bound on the generalization error of feedforward ReLU networks via estimating the Rademacher complexity of the set of networks available from an initial parameter vector via gradient descent. The key idea is to bound the sensitivity of the network's gradient to perturbation of the input data along the optimization trajectory. The obtained bound does not explicitly depend on the depth of the network. Our results are experimentally verified on the MNIST and CIFAR-10 datasets.
A Physics-Informed Neural Network (PINN) provides a distinct advantage by synergizing neural networks' capabilities with the problem's governing physical laws. In this study, we introduce an innovative approach for solving seepage problems by utilizing the PINN, harnessing the capabilities of Deep Neural Networks (DNNs) to approximate hydraulic head distributions in seepage analysis. To effectively train the PINN model, we introduce a comprehensive loss function comprising three components: one for evaluating differential operators, another for assessing boundary conditions, and a third for appraising initial conditions. The validation of the PINN involves solving four benchmark seepage problems. The results unequivocally demonstrate the exceptional accuracy of the PINN in solving seepage problems, surpassing the accuracy of FEM in addressing both steady-state and free-surface seepage problems. Hence, the presented approach highlights the robustness of the PINN and underscores its precision in effectively addressing a spectrum of seepage challenges. This amalgamation enables the derivation of accurate solutions, overcoming limitations inherent in conventional methods such as mesh generation and adaptability to complex geometries.
The success of over-parameterized neural networks trained to near-zero training error has caused great interest in the phenomenon of benign overfitting, where estimators are statistically consistent even though they interpolate noisy training data. While benign overfitting in fixed dimension has been established for some learning methods, current literature suggests that for regression with typical kernel methods and wide neural networks, benign overfitting requires a high-dimensional setting where the dimension grows with the sample size. In this paper, we show that the smoothness of the estimators, and not the dimension, is the key: benign overfitting is possible if and only if the estimator's derivatives are large enough. We generalize existing inconsistency results to non-interpolating models and more kernels to show that benign overfitting with moderate derivatives is impossible in fixed dimension. Conversely, we show that rate-optimal benign overfitting is possible for regression with a sequence of spiky-smooth kernels with large derivatives. Using neural tangent kernels, we translate our results to wide neural networks. We prove that while infinite-width networks do not overfit benignly with the ReLU activation, this can be fixed by adding small high-frequency fluctuations to the activation function. Our experiments verify that such neural networks, while overfitting, can indeed generalize well even on low-dimensional data sets.
Neural networks (NNs) are primarily developed within the frequentist statistical framework. Nevertheless, frequentist NNs lack the capability to provide uncertainties in the predictions, and hence their robustness can not be adequately assessed. Conversely, the Bayesian neural networks (BNNs) naturally offer predictive uncertainty by applying Bayes' theorem. However, their computational requirements pose significant challenges. Moreover, both frequentist NNs and BNNs suffer from overfitting issues when dealing with noisy and sparse data, which render their predictions unwieldy away from the available data space. To address both these problems simultaneously, we leverage insights from a hierarchical setting in which the parameter priors are conditional on hyperparameters to construct a BNN by applying a semi-analytical framework known as nonlinear sparse Bayesian learning (NSBL). We call our network sparse Bayesian neural network (SBNN) which aims to address the practical and computational issues associated with BNNs. Simultaneously, imposing a sparsity-inducing prior encourages the automatic pruning of redundant parameters based on the automatic relevance determination (ARD) concept. This process involves removing redundant parameters by optimally selecting the precision of the parameters prior probability density functions (pdfs), resulting in a tractable treatment for overfitting. To demonstrate the benefits of the SBNN algorithm, the study presents an illustrative regression problem and compares the results of a BNN using standard Bayesian inference, hierarchical Bayesian inference, and a BNN equipped with the proposed algorithm. Subsequently, we demonstrate the importance of considering the full parameter posterior by comparing the results with those obtained using the Laplace approximation with and without NSBL.
We study the training dynamics of shallow neural networks, in a two-timescale regime in which the stepsizes for the inner layer are much smaller than those for the outer layer. In this regime, we prove convergence of the gradient flow to a global optimum of the non-convex optimization problem in a simple univariate setting. The number of neurons need not be asymptotically large for our result to hold, distinguishing our result from popular recent approaches such as the neural tangent kernel or mean-field regimes. Experimental illustration is provided, showing that the stochastic gradient descent behaves according to our description of the gradient flow and thus converges to a global optimum in the two-timescale regime, but can fail outside of this regime.
Numerous applications in the field of molecular communications (MC) such as healthcare systems are often event-driven. The conventional Shannon capacity may not be the appropriate metric for assessing performance in such cases. We propose the identification (ID) capacity as an alternative metric. Particularly, we consider randomized identification (RI) over the discrete-time Poisson channel (DTPC), which is typically used as a model for MC systems that utilize molecule-counting receivers. In the ID paradigm, the receiver's focus is not on decoding the message sent. However, he wants to determine whether a message of particular significance to him has been sent or not. In contrast to Shannon transmission codes, the size of ID codes for a Discrete Memoryless Channel (DMC) grows doubly exponentially fast with the blocklength, if randomized encoding is used. In this paper, we derive the capacity formula for RI over the DTPC subject to some peak and average power constraints. Furthermore, we analyze the case of state-dependent DTPC.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.