Inspired by the relation between deep neural network (DNN) and partial differential equations (PDEs), we study the general form of the PDE models of deep neural networks. To achieve this goal, we formulate DNN as an evolution operator from a simple base model. Based on several reasonable assumptions, we prove that the evolution operator is actually determined by convection-diffusion equation. This convection-diffusion equation model gives mathematical explanation for several effective networks. Moreover, we show that the convection-diffusion model improves the robustness and reduces the Rademacher complexity. Based on the convection-diffusion equation, we design a new training method for ResNets. Experiments validate the performance of the proposed method.
Neural network (NN) designed for challenging machine learning tasks is in general a highly nonlinear mapping that contains massive variational parameters. High complexity of NN, if unbounded or unconstrained, might unpredictably cause severe issues including over-fitting, loss of generalization power, and unbearable cost of hardware. In this work, we propose a general compression scheme that significantly reduces the variational parameters of NN by encoding them to deep automatically-differentiable tensor network (ADTN) that contains exponentially-fewer free parameters. Superior compression performance of our scheme is demonstrated on several widely-recognized NN's (FC-2, LeNet-5, AlextNet, ZFNet and VGG-16) and datasets (MNIST, CIFAR-10 and CIFAR-100). For instance, we compress two linear layers in VGG-16 with approximately $10^{7}$ parameters to two ADTN's with just 424 parameters, where the testing accuracy on CIFAR-10 is improved from $90.17 \%$ to $91.74\%$. Our work suggests TN as an exceptionally efficient mathematical structure for representing the variational parameters of NN's, which exhibits superior compressibility over the commonly-used matrices and multi-way arrays.
Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems. This choice has been originally suggested as a way to compensate for the so called vanishing gradient problem which can undercut stochastic gradient descent (SGD) learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights. We study the problem of storing an extensive number of random patterns and find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases, at odds with the case of threshold units in which the capacity diverges. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure: while the majority of solutions are close in distance but still isolated, there exist rare regions of solutions which are much more dense than the similar ones in the case of threshold units. These solutions are robust to perturbations of the weights and can tolerate large perturbations of the inputs. The analytical results are corroborated by numerical findings.
In this paper, we develop a general theory for adaptive nonparametric estimation of the mean function of a non-stationary and nonlinear time series model using deep neural networks (DNNs). We first consider two types of DNN estimators, non-penalized and sparse-penalized DNN estimators, and establish their generalization error bounds for general non-stationary time series. We then derive minimax lower bounds for estimating mean functions belonging to a wide class of nonlinear autoregressive (AR) models that include nonlinear generalized additive AR, single index, and threshold AR models. Building upon the results, we show that the sparse-penalized DNN estimator is adaptive and attains the minimax optimal rates up to a poly-logarithmic factor for many nonlinear AR models. Through numerical simulations, we demonstrate the usefulness of the DNN methods for estimating nonlinear AR models with intrinsic low-dimensional structures and discontinuous or rough mean functions, which is consistent with our theory.
In this study, we developed an inverse analysis framework that proposes a microstructure for dual-phase (DP) steel that exhibits high strength and ductility. The inverse analysis method proposed in this study involves repeated random searches on a model that combines a generative adversarial network (GAN), which generates microstructures, and a convolutional neural network (CNN), which predicts the maximum stress and working limit strain from DP steel microstructures. GAN was trained using images of DP steel microstructures generated by the phase-field method. CNN was trained using images of DP steel microstructures, the maximum stress and the working limit strain calculated by the dislocation-crystal plasticity finite element method. The constructed framework made an efficient search for microstructures possible because of a low-dimensional search space by a latent variable of GAN. The multiple deformation modes were considered in this framework, which allowed the required microstructures to be explored under complex deformation modes. A microstructure with a fine grain size was proposed by using the developed framework.
We present extended Galerkin neural networks (xGNN), a variational framework for approximating general boundary value problems (BVPs) with error control. The main contributions of this work are (1) a rigorous theory guiding the construction of new weighted least squares variational formulations suitable for use in neural network approximation of general BVPs (2) an ``extended'' feedforward network architecture which incorporates and is even capable of learning singular solution structures, thus greatly improving approximability of singular solutions. Numerical results are presented for several problems including steady Stokes flow around re-entrant corners and in convex corners with Moffatt eddies in order to demonstrate efficacy of the method.
This study investigates the robustness of graph embedding methods for community detection in the face of network perturbations, specifically edge deletions. Graph embedding techniques, which represent nodes as low-dimensional vectors, are widely used for various graph machine learning tasks due to their ability to capture structural properties of networks effectively. However, the impact of perturbations on the performance of these methods remains relatively understudied. The research considers state-of-the-art graph embedding methods from two families: matrix factorization (e.g., LE, LLE, HOPE, M-NMF) and random walk-based (e.g., DeepWalk, LINE, node2vec). Through experiments conducted on both synthetic and real-world networks, the study reveals varying degrees of robustness within each family of graph embedding methods. The robustness is found to be influenced by factors such as network size, initial community partition strength, and the type of perturbation. Notably, node2vec and LLE consistently demonstrate higher robustness for community detection across different scenarios, including networks with degree and community size heterogeneity. These findings highlight the importance of selecting an appropriate graph embedding method based on the specific characteristics of the network and the task at hand, particularly in scenarios where robustness to perturbations is crucial.
In this chapter, we address the challenge of exploring the posterior distributions of Bayesian inverse problems with computationally intensive forward models. We consider various multivariate proposal distributions, and compare them with single-site Metropolis updates. We show how fast, approximate models can be leveraged to improve the MCMC sampling efficiency.
In this paper, to the best of our knowledge, we make the first attempt at studying the parametric semilinear elliptic eigenvalue problems with the parametric coefficient and some power-type nonlinearities. The parametric coefficient is assumed to have an affine dependence on the countably many parameters with an appropriate class of sequences of functions. In this paper, we obtain the upper bound estimation for the mixed derivatives of the ground eigenpairs that has the same form obtained recently for the linear eigenvalue problem. The three most essential ingredients for this estimation are the parametric analyticity of the ground eigenpairs, the uniform boundedness of the ground eigenpairs, and the uniform positive differences between ground eigenvalues of linear operators. All these three ingredients need new techniques and a careful investigation of the nonlinear eigenvalue problem that will be presented in this paper. As an application, considering each parameter as a uniformly distributed random variable, we estimate the expectation of the eigenpairs using a randomly shifted quasi-Monte Carlo lattice rule and show the dimension-independent error bound.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.