Hawkes process are very popular mathematical tools for modelling phenomena exhibiting a \textit{self-exciting} or \textit{self-correcting} behaviour. Typical examples are earthquakes occurrence, wild-fires, drought, capture-recapture, crime violence, trade exchange, and social network activity. The widespread use of Hawkes process in different fields calls for fast, reproducible, reliable, easy-to-code techniques to implement such models. We offer a technique to perform approximate Bayesian inference of Hawkes process parameters based on the use of the R-package \inlabru. The \inlabru R-package, in turn, relies on the INLA methodology to approximate the posterior of the parameters. Our Hawkes process approximation is based on a decomposition of the log-likelihood in three parts, which are linearly approximated separately. The linear approximation is performed with respect to the mode of the parameters' posterior distribution, which is determined with an iterative gradient-based method. The approximation of the posterior parameters is therefore deterministic, ensuring full reproducibility of the results. The proposed technique only requires the user to provide the functions to calculate the different parts of the decomposed likelihood, which are internally linearly approximated by the R-package \inlabru. We provide a comparison with the \bayesianETAS R-package which is based on an MCMC method. The two techniques provide similar results but our approach requires two to ten times less computational time to converge, depending on the amount of data.
In this paper, we construct a derivative-free multi-step iterative scheme based on Steffensen's method. To avoid excessively increasing the number of functional evaluations and, at the same time, to increase the order of convergence, we freeze the divided differences used from the second step and use a weight function on already evaluated operators. Therefore, we define a family of multi-step methods with convergence order 2m, where m is the number of steps, free of derivatives, with several parameters and with dynamic behaviour, in some cases, similar to Steffensen's method. In addition, we study how to increase the convergence order of the defined family by introducing memory in two different ways: using the usual divided differences and the Kurchatov divided differences. We perform some numerical experiments to see the behaviour of the proposed family and suggest different weight functions to visualize with dynamical planes in some cases the dynamical behaviour.
Gaussian process regression in its most simplified form assumes normal homoscedastic noise and utilizes analytically tractable mean and covariance functions of predictive posterior distribution using Gaussian conditioning. Its hyperparameters are estimated by maximizing the evidence, commonly known as type II maximum likelihood estimation. Unfortunately, Bayesian inference based on Gaussian likelihood is not robust to outliers, which are often present in the observational training data sets. To overcome this problem, we propose a robust process model in the Gaussian process framework with the likelihood of observed data expressed as the Huber probability distribution. The proposed model employs weights based on projection statistics to scale residuals and bound the influence of vertical outliers and bad leverage points on the latent functions estimates while exhibiting a high statistical efficiency at the Gaussian and thick tailed noise distributions. The proposed method is demonstrated by two real world problems and two numerical examples using datasets with additive errors following thick tailed distributions such as Students t, Laplace, and Cauchy distribution.
Bayesian inference for high-dimensional inverse problems is computationally costly and requires selecting a suitable prior distribution. Amortized variational inference addresses these challenges via a neural network that approximates the posterior distribution not only for one instance of data, but a distribution of data pertaining to a specific inverse problem. During inference, the neural network -- in our case a conditional normalizing flow -- provides posterior samples at virtually no cost. However, the accuracy of amortized variational inference relies on the availability of high-fidelity training data, which seldom exists in geophysical inverse problems due to the Earth's heterogeneity. In addition, the network is prone to errors if evaluated over out-of-distribution data. As such, we propose to increase the resilience of amortized variational inference in the presence of moderate data distribution shifts. We achieve this via a correction to the latent distribution that improves the posterior distribution approximation for the data at hand. The correction involves relaxing the standard Gaussian assumption on the latent distribution and parameterizing it via a Gaussian distribution with an unknown mean and (diagonal) covariance. These unknowns are then estimated by minimizing the Kullback-Leibler divergence between the corrected and the (physics-based) true posterior distributions. While generic and applicable to other inverse problems, by means of a linearized seismic imaging example, we show that our correction step improves the robustness of amortized variational inference with respect to changes in the number of seismic sources, noise variance, and shifts in the prior distribution. This approach provides a seismic image with limited artifacts and an assessment of its uncertainty at approximately the same cost as five reverse-time migrations.
The method of Chernoff approximation is a powerful and flexible tool of functional analysis that in many cases allows expressing exp(tL) in terms of variable coefficients of linear differential operator L. In this paper we prove a theorem that allows us to apply this method to find the resolvent of operator L. We demonstrate this on the second order differential operator. As a corollary, we obtain a new representation of the solution of an inhomogeneous second order linear ordinary differential equation in terms of functions that are the coefficients of this equation playing the role of parameters for the problem.
Motivated by recent findings that within-subject (WS) variability of longitudinal biomarkers is a risk factor for many health outcomes, this paper introduces and studies a new joint model of a longitudinal biomarker with heterogeneous WS variability and competing risks time-to-event outcome. Specifically, our joint model consists of a linear mixed-effects multiple location-scale submodel for the individual mean trajectory and WS variability of the longitudinal biomarker and a semiparametric cause-specific Cox proportional hazards submodel for the competing risks survival outcome. The submodels are linked together via shared random effects. We derive an expectation-maximization (EM) algorithm for semiparametric maximum likelihood estimation and a profile-likelihood method for standard error estimation. We implement scalable computational algorithms that can scale to biobank-scale data with tens of thousands of subjects. Our simulation results demonstrate that the proposed method has superior performance and that classical joint models with homogeneous WS variability can suffer from estimation bias, invalid inference, and poor prediction accuracy in the presence of heterogeneous WS variability. An application of the developed method to the large Multi-Ethnic Study of Atherosclerosis (MESA) data not only revealed that subject-specific WS variability in systolic blood pressure (SBP) is highly predictive of heart failure and death, but also yielded more accurate dynamic prediction of heart failure or death by accounting for both the mean trajectory and WS variability of SBP. Our user-friendly R package \textbf{JMH} is publicly available at \url{//github.com/shanpengli/JMH}.
Energy system modellers typically choose a low spatial resolution for their models based on administrative boundaries such as countries, which eases data collection and reduces computation times. However, a low spatial resolution can lead to sub-optimal investment decisions for wind and solar generation. Ignoring power grid bottlenecks within regions tends to underestimate system costs, while combining locations with different wind and solar capacity factors in the same resource class tends to overestimate costs. We investigate these two competing effects in a capacity expansion model for Europe's power system with a high share of renewables, taking advantage of newly-available high-resolution datasets as well as computational advances. We vary the number of nodes, interpolating between a 37-node model based on country and synchronous zone boundaries, and a 1024-node model based on the location of electricity substations. If we focus on the effect of renewable resource resolution and ignore network restrictions, we find that a higher resolution allows the optimal solution to concentrate wind and solar capacity at sites with better capacity factors and thus reduces system costs by up to 10% compared to a low resolution model. This results in a big swing from offshore to onshore wind investment. However, if we introduce grid bottlenecks by raising the network resolution, costs increase by up to 23% as generation has to be sourced more locally at sites with worse capacity factors. These effects are most pronounced in scenarios where grid expansion is limited, for example, by low local acceptance. We show that allowing grid expansion mitigates some of the effects of the low grid resolution, and lowers overall costs by around 16%.
Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. In many cases, the gradient of any such loss function has superlinear growth, making the use of the widely-accepted (stochastic) gradient descent methods, which are based on Euler numerical schemes, problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithm's convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm are based on the taming technology for diffusion processes with superlinear coefficients as developed in \citet{tamed-euler, SabanisAoAP} and for MCMC algorithms in \citet{tula}. Numerical experiments are presented which confirm the theoretical findings and illustrate the need for the use of the new algorithm in comparison to vanilla SGLD within the framework of ANNs.
This paper aims to provide practitioners of causal mediation analysis with a better understanding of estimation options. We take as inputs two familiar strategies (weighting and model-based prediction) and a simple way of combining them (weighted models), and show how a range of estimators can be generated, with different modeling requirements and robustness properties. The primary goal is to help build intuitive appreciation for robust estimation that is conducive to sound practice. A second goal is to provide a "menu" of estimators that practitioners can choose from for the estimation of marginal natural (in)direct effects. The estimators generated from this exercise include some that coincide or are similar to existing estimators and others that have not previously appeared in the literature. We note several different ways to estimate the weights for cross-world weighting based on three expressions of the weighting function, including one that is novel; and show how to check the resulting covariate and mediator balance. We use a random continuous weights bootstrap to obtain confidence intervals, and also derive general asymptotic variance formulas for the estimators. The estimators are illustrated using data from an adolescent alcohol use prevention study.
We study the properties of nonparametric least squares regression using deep neural networks. We derive non-asymptotic upper bounds for the prediction error of the empirical risk minimizer of feedforward deep neural regression. Our error bounds achieve minimax optimal rate and significantly improve over the existing ones in the sense that they depend polynomially on the dimension of the predictor, instead of exponentially on dimension. We show that the neural regression estimator can circumvent the curse of dimensionality under the assumption that the predictor is supported on an approximate low-dimensional manifold or a set with low Minkowski dimension. We also establish the optimal convergence rate under the exact manifold support assumption. We investigate how the prediction error of the neural regression estimator depends on the structure of neural networks and propose a notion of network relative efficiency between two types of neural networks, which provides a quantitative measure for evaluating the relative merits of different network structures. To establish these results, we derive a novel approximation error bound for the H\"older smooth functions with a positive smoothness index using ReLU activated neural networks, which may be of independent interest. Our results are derived under weaker assumptions on the data distribution and the neural network structure than those in the existing literature.
The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.