In this paper, we establish a sharp upper bound on the the number of fixed points a certain class of neural networks can have. The networks under study (autoencoders) can be viewed as discrete dynamical systems whose nonlinearities are given by the choice of activation functions. To this end, we introduce a new class $\mathcal{F}$ of $C^1$ activation functions that is closed under composition, and contains e.g. the logistic sigmoid function. We use this class to show that any 1-dimensional neural network of arbitrary depth with activation functions in $\mathcal{F}$ has at most three fixed points. Due to the simple nature of such networks, we are able to completely understand their fixed points, providing a foundation to the much needed connection between application and theory of deep neural networks.
Kinetic approaches are generally accurate in dealing with microscale plasma physics problems but are computationally expensive for large-scale or multiscale systems. One of the long-standing problems in plasma physics is the integration of kinetic physics into fluid models, which is often achieved through sophisticated analytical closure terms. In this paper, we successfully construct a multi-moment fluid model with an implicit fluid closure included in the neural network using machine learning. The multi-moment fluid model is trained with a small fraction of sparsely sampled data from kinetic simulations of Landau damping, using the physics-informed neural network (PINN) and the gradient-enhanced physics-informed neural network (gPINN). The multi-moment fluid model constructed using either PINN or gPINN reproduces the time evolution of the electric field energy, including its damping rate, and the plasma dynamics from the kinetic simulations. In addition, we introduce a variant of the gPINN architecture, namely, gPINN$p$ to capture the Landau damping process. Instead of including the gradients of all the equation residuals, gPINN$p$ only adds the gradient of the pressure equation residual as one additional constraint. Among the three approaches, the gPINN$p$-constructed multi-moment fluid model offers the most accurate results. This work sheds light on the accurate and efficient modeling of large-scale systems, which can be extended to complex multiscale laboratory, space, and astrophysical plasma physics problems.
This paper introduces a first implementation of a novel likelihood-ratio-based approach for constructing confidence intervals for neural networks. Our method, called DeepLR, offers several qualitative advantages: most notably, the ability to construct asymmetric intervals that expand in regions with a limited amount of data, and the inherent incorporation of factors such as the amount of training time, network architecture, and regularization techniques. While acknowledging that the current implementation of the method is prohibitively expensive for many deep-learning applications, the high cost may already be justified in specific fields like medical predictions or astrophysics, where a reliable uncertainty estimate for a single prediction is essential. This work highlights the significant potential of a likelihood-ratio-based uncertainty estimate and establishes a promising avenue for future research.
Hyperauthorship, a phenomenon whereby there are a disproportionately large number of authors on a single paper, is increasingly common in several scientific disciplines, but with unknown consequences for network metrics used to study scientific collaboration. The validity of co-authorship as a proxy for scientific collaboration is affected by this. Using bibliometric data from publications in the field of genomics, we examine the impact of hyperauthorship on metrics of scientific collaboration, and propose a method to determine a suitable cutoff threshold for hyperauthored papers and compare co-authorship networks with and without hyperauthored works. Our analysis reveals that including hyperauthored papers dramatically impacts the structural positioning of central authors and the topological characteristics of the network, while producing small influences on whole-network cohesion measures. We present two solutions to minimize the impact of hyperauthorship: using a mathematically grounded and reproducible calculation of threshold cutoff to exclude hyperauthored papers or fractional counting to weight network results. Our findings affirm the structural influences of hyperauthored papers and suggest that scholars should be mindful when using co-authorship networks to study scientific collaboration.
In this paper, we make the first attempt to apply the boundary integrated neural networks (BINNs) for the numerical solution of two-dimensional (2D) elastostatic and piezoelectric problems. BINNs combine artificial neural networks with the well-established boundary integral equations (BIEs) to effectively solve partial differential equations (PDEs). The BIEs are utilized to map all the unknowns onto the boundary, after which these unknowns are approximated using artificial neural networks and resolved via a training process. In contrast to traditional neural network-based methods, the current BINNs offer several distinct advantages. First, by embedding BIEs into the learning procedure, BINNs only need to discretize the boundary of the solution domain, which can lead to a faster and more stable learning process (only the boundary conditions need to be fitted during the training). Second, the differential operator with respect to the PDEs is substituted by an integral operator, which effectively eliminates the need for additional differentiation of the neural networks (high-order derivatives of neural networks may lead to instability in learning). Third, the loss function of the BINNs only contains the residuals of the BIEs, as all the boundary conditions have been inherently incorporated within the formulation. Therefore, there is no necessity for employing any weighing functions, which are commonly used in traditional methods to balance the gradients among different objective functions. Moreover, BINNs possess the ability to tackle PDEs in unbounded domains since the integral representation remains valid for both bounded and unbounded domains. Extensive numerical experiments show that BINNs are much easier to train and usually give more accurate learning solutions as compared to traditional neural network-based methods.
Physics-informed neural networks (PINNs) have emerged as a powerful tool for solving inverse problems, especially in cases where no complete information about the system is known and scatter measurements are available. This is especially useful in hemodynamics since the boundary information is often difficult to model, and high-quality blood flow measurements are generally hard to obtain. In this work, we use the PINNs methodology for estimating reduced-order model parameters and the full velocity field from scatter 2D noisy measurements in the ascending aorta. The results show stable and accurate parameter estimations when using the method with simulated data, while the velocity reconstruction shows dependence on the measurement quality and the flow pattern complexity. The method allows for solving clinical-relevant inverse problems in hemodynamics and complex coupled physical systems.
Plasticity, the ability of a neural network to quickly change its predictions in response to new information, is essential for the adaptability and robustness of deep reinforcement learning systems. Deep neural networks are known to lose plasticity over the course of training even in relatively simple learning problems, but the mechanisms driving this phenomenon are still poorly understood. This paper conducts a systematic empirical analysis into plasticity loss, with the goal of understanding the phenomenon mechanistically in order to guide the future development of targeted solutions. We find that loss of plasticity is deeply connected to changes in the curvature of the loss landscape, but that it often occurs in the absence of saturated units. Based on this insight, we identify a number of parameterization and optimization design choices which enable networks to better preserve plasticity over the course of training. We validate the utility of these findings on larger-scale RL benchmarks in the Arcade Learning Environment.
In this paper, we develop a unified regression approach to model unconditional quantiles, M-quantiles and expectiles of multivariate dependent variables exploiting the multidimensional Huber's function. To assess the impact of changes in the covariates across the entire unconditional distribution of the responses, we extend the work of Firpo et al. (2009) by running a mean regression of the recentered influence function on the explanatory variables. We discuss the estimation procedure and establish the asymptotic properties of the derived estimators. A data-driven procedure is also presented to select the tuning constant of the Huber's function. The validity of the proposed methodology is explored with simulation studies and through an application using the Survey of Household Income and Wealth 2016 conducted by the Bank of Italy.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface.
This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language