We present VPVnet, a deep neural network method for the Stokes' equations under reduced regularity. Different with recently proposed deep learning methods [40,51] which are based on the original form of PDEs, VPVnet uses the least square functional of the first-order velocity-pressure-vorticity (VPV) formulation ([30]) as loss functions. As such, only first-order derivative is required in the loss functions, hence the method is applicable to a much larger class of problems, e.g. problems with non-smooth solutions. Despite that several methods have been proposed recently to reduce the regularity requirement by transforming the original problem into a corresponding variational form, while for the Stokes' equations, the choice of approximating spaces for the velocity and the pressure has to satisfy the LBB condition additionally. Here by making use of the VPV formulation, lower regularity requirement is achieved with no need for considering the LBB condition. Convergence and error estimates have been established for the proposed method. It is worth emphasizing that the VPVnet method is divergence-free and pressure-robust, while classical inf-sup stable mixed finite elements for the Stokes' equations are not pressure-robust. Various numerical experiments including 2D and 3D lid-driven cavity test cases are conducted to demonstrate its efficiency and accuracy.
We present two strategies for designing passivity preserving higher order discretization methods for Maxwell's equations in nonlinear Kerr-type media. Both approaches are based on variational approximation schemes in space and time. This allows to rigorously prove energy conservation or dissipation, and thus passivity, on the fully discrete level. For linear media, the proposed methods coincide with certain combinations of mixed finite element and implicit Runge-Kutta schemes. The order optimal convergence rates, which can thus be expected for linear problems, are also observed for nonlinear problems in the numerical tests.
The training of deep neural networks (DNNs) is currently predominantly done using first-order methods. Some of these methods (e.g., Adam, AdaGrad, and RMSprop, and their variants) incorporate a small amount of curvature information by using a diagonal matrix to precondition the stochastic gradient. Recently, effective second-order methods, such as KFAC, K-BFGS, Shampoo, and TNT, have been developed for training DNNs, by preconditioning the stochastic gradient by layer-wise block-diagonal matrices. Here we propose and analyze the convergence of an approximate natural gradient method, mini-block Fisher (MBF), that lies in between these two classes of methods. Specifically, our method uses a block-diagonal approximation to the Fisher matrix, where for each layer in the DNN, whether it is convolutional or feed-forward and fully connected, the associated diagonal block is also block-diagonal and is composed of a large number of mini-blocks of modest size. Our novel approach utilizes the parallelism of GPUs to efficiently perform computations on the large number of matrices in each layer. Consequently, MBF's per-iteration computational cost is only slightly higher than it is for first-order methods. Finally, the performance of our proposed method is compared to that of several baseline methods, on both Auto-encoder and CNN problems, to validate its effectiveness both in terms of time efficiency and generalization power.
This paper considers coding for so-called partially stuck (defect) memory cells. Such memory cells can only store partial information as some of their levels cannot be used fully due to, e.g., wearout. First, we present new constructions that are able to mask $u$ partially stuck cells while correcting at the same time $t$ random errors. The process of "masking" determines a word whose entries coincide with writable levels at the (partially) stuck cells. For $u>1$ and alphabet size $q>2$, our new constructions improve upon the required redundancy of known constructions for $t=0$, and require less redundancy for masking partially stuck cells than former works required for masking fully stuck cells (which cannot store any information). Second, we show that treating some of the partially stuck cells as erroneous cells can decrease the required redundancy for some parameters. Lastly, we derive Singleton-like, sphere-packing-like, and Gilbert--Varshamov-like bounds. Numerical comparisons state that our constructions match the Gilbert--Varshamov-like bounds for several code parameters, e.g., BCH codes that contain all-one word by our first construction.
We consider the evolution of curve networks in two dimensions (2d) and surface clusters in three dimensions (3d). The motion of the interfaces is described by surface diffusion, with boundary conditions at the triple junction points/lines, where three interfaces meet, and at the boundary points/lines, where an interface meets a fixed planar boundary. We propose a parametric finite element method based on a suitable variational formulation. The constructed method is semi-implicit and can be shown to satisfy the volume conservation of each enclosed bubble and the unconditional energy-stability, thus preserving the two fundamental geometric structures of the flow. Besides, the method has very good properties with respect to the distribution of mesh points, thus no mesh smoothing or regularization technique is required. A generalization of the introduced scheme to the case of anisotropic surface energies and non-neutral external boundaries is also considered. Numerical results are presented for the evolution of two-dimensional curve networks and three-dimensional surface clusters in the cases of both isotropic and anisotropic surface energies.
Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance. This phenomenon is referred to as "benign overfitting". Recently, there emerges a line of works studying "benign overfitting" from the theoretical perspective. However, they are limited to linear models or kernel/random feature models, and there is still a lack of theoretical understanding about when and how benign overfitting occurs in neural networks. In this paper, we study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN). We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss. On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss. These together demonstrate a sharp phase transition between benign overfitting and harmful overfitting, driven by the signal-to-noise ratio. To the best of our knowledge, this is the first work that precisely characterizes the conditions under which benign overfitting can occur in training convolutional neural networks.
Within the framework of parameter dependent PDEs, we develop a constructive approach based on Deep Neural Networks for the efficient approximation of the parameter-to-solution map. The research is motivated by the limitations and drawbacks of state-of-the-art algorithms, such as the Reduced Basis method, when addressing problems that show a slow decay in the Kolmogorov n-width. Our work is based on the use of deep autoencoders, which we employ for encoding and decoding a high fidelity approximation of the solution manifold. To provide guidelines for the design of deep autoencoders, we consider a nonlinear version of the Kolmogorov n-width over which we base the concept of a minimal latent dimension. We show that the latter is intimately related to the topological properties of the solution manifold, and we provide theoretical results with particular emphasis on second order elliptic PDEs, characterizing the minimal dimension and the approximation errors of the proposed approach. The theory presented is further supported by numerical experiments, where we compare the proposed approach with classical POD-Galerkin reduced order models. In particular, we consider parametrized advection-diffusion PDEs, and we test the methodology in the presence of strong transport fields, singular terms and stochastic coefficients.
Benign overfitting demonstrates that overparameterized models can perform well on test data while fitting noisy training data. However, it only considers the final min-norm solution in linear regression, which ignores the algorithm information and the corresponding training procedure. In this paper, we generalize the idea of benign overfitting to the whole training trajectory instead of the min-norm solution and derive a time-variant bound based on the trajectory analysis. Starting from the time-variant bound, we further derive a time interval that suffices to guarantee a consistent generalization error for a given feature covariance. Unlike existing approaches, the newly proposed generalization bound is characterized by a time-variant effective dimension of feature covariance. By introducing the time factor, we relax the strict assumption on the feature covariance matrix required in previous benign overfitting under the regimes of overparameterized linear regression with gradient descent. This paper extends the scope of benign overfitting, and experiment results indicate that the proposed bound accords better with empirical evidence.
In this paper, we propose GT-GDA, a distributed optimization method to solve saddle point problems of the form: $\min_{\mathbf{x}} \max_{\mathbf{y}} \{F(\mathbf{x},\mathbf{y}) :=G(\mathbf{x}) + \langle \mathbf{y}, \overline{P} \mathbf{x} \rangle - H(\mathbf{y})\}$, where the functions $G(\cdot)$, $H(\cdot)$, and the the coupling matrix $\overline{P}$ are distributed over a strongly connected network of nodes. GT-GDA is a first-order method that uses gradient tracking to eliminate the dissimilarity caused by heterogeneous data distribution among the nodes. In the most general form, GT-GDA includes a consensus over the local coupling matrices to achieve the optimal (unique) saddle point, however, at the expense of increased communication. To avoid this, we propose a more efficient variant GT-GDA-Lite that does not incur the additional communication and analyze its convergence in various scenarios. We show that GT-GDA converges linearly to the unique saddle point solution when $G(\cdot)$ is smooth and convex, $H(\cdot)$ is smooth and strongly convex, and the global coupling matrix $\overline{P}$ has full column rank. We further characterize the regime under which GT-GDA exhibits a network topology-independent convergence behavior. We next show the linear convergence of GT-GDA to an error around the unique saddle point, which goes to zero when the coupling cost ${\langle \mathbf y, \overline{P} \mathbf x \rangle}$ is common to all nodes, or when $G(\cdot)$ and $H(\cdot)$ are quadratic. Numerical experiments illustrate the convergence properties and importance of GT-GDA and GT-GDA-Lite for several applications.
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
We present a simple and general framework for feature learning from point cloud. The key to the success of CNNs is the convolution operator that is capable of leveraging spatially-local correlation in data represented densely in grids (e.g. images). However, point cloud are irregular and unordered, thus a direct convolving of kernels against the features associated with the points will result in deserting the shape information while being variant to the orders. To address these problems, we propose to learn a X-transformation from the input points, and then use it to simultaneously weight the input features associated with the points and permute them into latent potentially canonical order, before the element-wise product and sum operations are applied. The proposed method is a generalization of typical CNNs into learning features from point cloud, thus we call it PointCNN. Experiments show that PointCNN achieves on par or better performance than state-of-the-art methods on multiple challenging benchmark datasets and tasks.