We present extended Galerkin neural networks (xGNN), a variational framework for approximating general boundary value problems (BVPs) with error control. The main contributions of this work are (1) a rigorous theory guiding the construction of new weighted least squares variational formulations suitable for use in neural network approximation of general BVPs (2) an ``extended'' feedforward network architecture which incorporates and is even capable of learning singular solution structures, thus greatly improving approximability of singular solutions. Numerical results are presented for several problems including steady Stokes flow around re-entrant corners and in convex corners with Moffatt eddies in order to demonstrate efficacy of the method.
Sepsis is a life-threatening and serious global health issue. This study combines knowledge with available hospital data to investigate the potential causes of Sepsis that can be affected by policy decisions. We investigate the underlying causal structure of this problem by combining clinical expertise with score-based, constraint-based, and hybrid structure learning algorithms. A novel approach to model averaging and knowledge-based constraints was implemented to arrive at a consensus structure for causal inference. The structure learning process highlighted the importance of exploring data-driven approaches alongside clinical expertise. This includes discovering unexpected, although reasonable, relationships from a clinical perspective. Hypothetical interventions on Chronic Obstructive Pulmonary Disease, Alcohol dependence, and Diabetes suggest that the presence of any of these risk factors in patients increases the likelihood of Sepsis. This finding, alongside measuring the effect of these risk factors on Sepsis, has potential policy implications. Recognising the importance of prediction in improving Sepsis related health outcomes, the model built is also assessed in its ability to predict Sepsis. The predictions generated by the consensus model were assessed for their accuracy, sensitivity, and specificity. These three indicators all had results around 70%, and the AUC was 80%, which means the causal structure of the model is reasonably accurate given that the models were trained on data available for commissioning purposes only.
The explicit regularization and optimality of deep neural networks estimators from independent data have made considerable progress recently. The study of such properties on dependent data is still a challenge. In this paper, we carry out deep learning from strongly mixing observations, and deal with the squared and a broad class of loss functions. We consider sparse-penalized regularization for deep neural network predictor. For a general framework that includes, regression estimation, classification, time series prediction,$\cdots$, oracle inequality for the expected excess risk is established and a bound on the class of H\"older smooth functions is provided. For nonparametric regression from strong mixing data and sub-exponentially error, we provide an oracle inequality for the $L_2$ error and investigate an upper bound of this error on a class of H\"older composition functions. For the specific case of nonparametric autoregression with Gaussian and Laplace errors, a lower bound of the $L_2$ error on this H\"older composition class is established. Up to logarithmic factor, this bound matches its upper bound; so, the deep neural network estimator attains the minimax optimal rate.
We introduce SPUD (Semantically Perturbed Universal Dependencies), a framework for creating nonce treebanks for the multilingual Universal Dependencies (UD) corpora. SPUD data satisfies syntactic argument structure, provides syntactic annotations, and ensures grammaticality via language-specific rules. We create nonce data in Arabic, English, French, German, and Russian, and demonstrate two use cases of SPUD treebanks. First, we investigate the effect of nonce data on word co-occurrence statistics, as measured by perplexity scores of autoregressive (ALM) and masked language models (MLM). We find that ALM scores are significantly more affected by nonce data than MLM scores. Second, we show how nonce data affects the performance of syntactic dependency probes. We replicate the findings of M\"uller-Eberstein et al. (2022) on nonce test data and show that the performance declines on both MLMs and ALMs wrt. original test data. However, a majority of the performance is kept, suggesting that the probe indeed learns syntax independently from semantics.
The ability to simulate realistic networks based on empirical data is an important task across scientific disciplines, from epidemiology to computer science. Often simulation approaches involve selecting a suitable network generative model such as Erd\"os-R\'enyi or small-world. However, few tools are available to quantify if a particular generative model is suitable for capturing a given network structure or organization. We utilize advances in interpretable machine learning to classify simulated networks by our generative models based on various network attributes, using both primary features and their interactions. Our study underscores the significance of specific network features and their interactions in distinguishing generative models, comprehending complex network structures, and the formation of real-world networks.
This paper presents an analysis of properties of two hybrid discretization methods for Gaussian derivatives, based on convolutions with either the normalized sampled Gaussian kernel or the integrated Gaussian kernel followed by central differences. The motivation for studying these discretization methods is that in situations when multiple spatial derivatives of different order are needed at the same scale level, they can be computed significantly more efficiently compared to more direct derivative approximations based on explicit convolutions with either sampled Gaussian kernels or integrated Gaussian kernels. While these computational benefits do also hold for the genuinely discrete approach for computing discrete analogues of Gaussian derivatives, based on convolution with the discrete analogue of the Gaussian kernel followed by central differences, the underlying mathematical primitives for the discrete analogue of the Gaussian kernel, in terms of modified Bessel functions of integer order, may not be available in certain frameworks for image processing, such as when performing deep learning based on scale-parameterized filters in terms of Gaussian derivatives, with learning of the scale levels. In this paper, we present a characterization of the properties of these hybrid discretization methods, in terms of quantitative performance measures concerning the amount of spatial smoothing that they imply, as well as the relative consistency of scale estimates obtained from scale-invariant feature detectors with automatic scale selection, with an emphasis on the behaviour for very small values of the scale parameter, which may differ significantly from corresponding results obtained from the fully continuous scale-space theory, as well as between different types of discretization methods.
The Spatial AutoRegressive model (SAR) is commonly used in studies involving spatial and network data to estimate the spatial or network peer influence and the effects of covariates on the response, taking into account the spatial or network dependence. While the model can be efficiently estimated with a Quasi maximum likelihood approach (QMLE), the detrimental effect of covariate measurement error on the QMLE and how to remedy it is currently unknown. If covariates are measured with error, then the QMLE may not have the $\sqrt{n}$ convergence and may even be inconsistent even when a node is influenced by only a limited number of other nodes or spatial units. We develop a measurement error-corrected ML estimator (ME-QMLE) for the parameters of the SAR model when covariates are measured with error. The ME-QMLE possesses statistical consistency and asymptotic normality properties. We consider two types of applications. The first is when the true covariate cannot be measured directly, and a proxy is observed instead. The second one involves including latent homophily factors estimated with error from the network for estimating peer influence. Our numerical results verify the bias correction property of the estimator and the accuracy of the standard error estimates in finite samples. We illustrate the method on a real dataset related to county-level death rates from the COVID-19 pandemic.
Training spiking neural networks to approximate complex functions is essential for studying information processing in the brain and neuromorphic computing. Yet, the binary nature of spikes constitutes a challenge for direct gradient-based training. To sidestep this problem, surrogate gradients have proven empirically successful, but their theoretical foundation remains elusive. Here, we investigate the relation of surrogate gradients to two theoretically well-founded approaches. On the one hand, we consider smoothed probabilistic models, which, due to lack of support for automatic differentiation, are impractical for training deep spiking neural networks, yet provide gradients equivalent to surrogate gradients in single neurons. On the other hand, we examine stochastic automatic differentiation, which is compatible with discrete randomness but has never been applied to spiking neural network training. We find that the latter provides the missing theoretical basis for surrogate gradients in stochastic spiking neural networks. We further show that surrogate gradients in deterministic networks correspond to a particular asymptotic case and numerically confirm the effectiveness of surrogate gradients in stochastic multi-layer spiking neural networks. Finally, we illustrate that surrogate gradients are not conservative fields and, thus, not gradients of a surrogate loss. Our work provides the missing theoretical foundation for surrogate gradients and an analytically well-founded solution for end-to-end training of stochastic spiking neural networks.
We consider a fictitious domain formulation for fluid-structure interaction problems based on a distributed Lagrange multiplier to couple the fluid and solid behaviors. How to deal with the coupling term is crucial since the construction of the associated finite element matrix requires the integration of functions defined over non-matching grids: the exact computation can be performed by intersecting the involved meshes, whereas an approximate coupling matrix can be evaluated on the original meshes by introducing a quadrature error. The purpose of this paper is twofold: we prove that the discrete problem is well-posed also when the coupling term is constructed in approximate way and we discuss quadrature error estimates over non-matching grids.
We present a dimension-incremental method for function approximation in bounded orthonormal product bases to learn the solutions of various differential equations. Therefore, we deconstruct the source function of the differential equation into parameters like Fourier or Spline coefficients and treat the solution of the differential equation as a high-dimensional function w.r.t. the spatial variables, these parameters and also further possible parameters from the differential equation itself. Finally, we learn this function in the sense of sparse approximation in a suitable function space by detecting coefficients of the basis expansion with largest absolute value. Investigating the corresponding indices of the basis coefficients yields further insights on the structure of the solution as well as its dependency on the parameters and their interactions and allows for a reasonable generalization to even higher dimensions and therefore better resolutions of the deconstructed source function.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.