The hazard function represents one of the main quantities of interest in the analysis of survival data. We propose a general approach for parametrically modelling the dynamics of the hazard function using systems of autonomous ordinary differential equations (ODEs). This modelling approach can be used to provide qualitative and quantitative analyses of the evolution of the hazard function over time. Our proposal capitalises on the extensive literature of ODEs which, in particular, allow for establishing basic rules or laws on the dynamics of the hazard function via the use of autonomous ODEs. We show how to implement the proposed modelling framework in cases where there is an analytic solution to the system of ODEs or where an ODE solver is required to obtain a numerical solution. We focus on the use of a Bayesian modelling approach, but the proposed methodology can also be coupled with maximum likelihood estimation. A simulation study is presented to illustrate the performance of these models and the interplay of sample size and censoring. Two case studies using real data are presented to illustrate the use of the proposed approach and to highlight the interpretability of the corresponding models. We conclude with a discussion on potential extensions of our work and strategies to include covariates into our framework. Although we focus on examples on Medical Statistics, the proposed framework is applicable in any context where the interest lies on estimating and interpreting the dynamics hazard function.
Predicting quantum operator matrices such as Hamiltonian, overlap, and density matrices in the density functional theory (DFT) framework is crucial for understanding material properties. Current methods often focus on individual operators and struggle with efficiency and scalability for large systems. Here we introduce a novel deep learning model, SLEM (Strictly Localized Equivariant Message-passing) for predicting multiple quantum operators, that achieves state-of-the-art accuracy while dramatically improving computational efficiency. SLEM's key innovation is its strict locality-based design, constructing local, equivariant representations for quantum tensors while preserving physical symmetries. This enables complex many-body dependence without expanding the effective receptive field, leading to superior data efficiency and transferability. Using an innovative SO(2) convolution technique, SLEM reduces the computational complexity of high-order tensor products and is therefore capable of handling systems requiring the $f$ and $g$ orbitals in their basis sets. We demonstrate SLEM's capabilities across diverse 2D and 3D materials, achieving high accuracy even with limited training data. SLEM's design facilitates efficient parallelization, potentially extending DFT simulations to systems with device-level sizes, opening new possibilities for large-scale quantum simulations and high-throughput materials discovery.
A non-intrusive model order reduction method for bilinear stochastic differential equations with additive noise is proposed. A reduced order model (ROM) is designed in order to approximate the statistical properties of high-dimensional systems. The drift and diffusion coefficients of the ROM are inferred from state observations by solving appropriate least-squares problems. The closeness of the ROM obtained by the presented approach to the intrusive ROM obtained by the proper orthogonal decomposition (POD) method is investigated. Two generalisations of the snapshot-based dominant subspace construction to the stochastic case are presented. Numerical experiments are provided to compare the developed approach to POD.
We present a new method for constructing valid covariance functions of Gaussian processes for spatial analysis in irregular, non-convex domains such as bodies of water. Standard covariance functions based on geodesic distances are not guaranteed to be positive definite on such domains, while existing non-Euclidean approaches fail to respect the partially Euclidean nature of these domains where the geodesic distance agrees with the Euclidean distances for some pairs of points. Using a visibility graph on the domain, we propose a class of covariance functions that preserve Euclidean-based covariances between points that are connected in the domain while incorporating the non-convex geometry of the domain via conditional independence relationships. We show that the proposed method preserves the partially Euclidean nature of the intrinsic geometry on the domain while maintaining validity (positive definiteness) and marginal stationarity of the covariance function over the entire parameter space, properties which are not always fulfilled by existing approaches to construct covariance functions on non-convex domains. We provide useful approximations to improve computational efficiency, resulting in a scalable algorithm. We compare the performance of our method with those of competing state-of-the-art methods using simulation studies on synthetic non-convex domains. The method is applied to data regarding acidity levels in the Chesapeake Bay, showing its potential for ecological monitoring in real-world spatial applications on irregular domains.
We present a new algorithm for solving linear-quadratic regulator (LQR) problems with linear equality constraints. This is the first such exact algorithm that is guaranteed to have a runtime that is linear in the number of stages, as well as linear in the number of both state-only constraints as well as mixed state-and-control constraints, without imposing any restrictions on the problem instances. We also show how to easily parallelize this algorithm to run in parallel runtime logarithmic in the number of stages of the problem.
We investigate the set of invariant idempotent probabilities for countable idempotent iterated function systems (IFS) defined in compact metric spaces. We demonstrate that, with constant weights, there exists a unique invariant idempotent probability. Utilizing Secelean's approach to countable IFSs, we introduce partially finite idempotent IFSs and prove that the sequence of invariant idempotent measures for these systems converges to the invariant measure of the original countable IFS. We then apply these results to approximate such measures with discrete systems, producing, in the one-dimensional case, data series whose Higuchi fractal dimension can be calculated. Finally, we provide numerical approximations for two-dimensional cases and discuss the application of generalized Higuchi dimensions in these scenarios.
Predictive modeling involving simulation and sensor data at the same time, is a growing challenge in computational science. Even with large-scale finite element models, a mismatch to the sensor data often remains, which can be attributed to different sources of uncertainty. For such a scenario, the statistical finite element method (statFEM) can be used to condition a simulated field on given sensor data. This yields a posterior solution which resembles the data much better and additionally provides consistent estimates of uncertainty, including model misspecification. For frequency or parameter dependent problems, occurring, e.g. in acoustics or electromagnetism, solving the full order model at the frequency grid and conditioning it on data quickly results in a prohibitive computational cost. In this case, the introduction of a surrogate in form of a reduced order model yields much smaller systems of equations. In this paper, we propose a reduced order statFEM framework relying on Krylov-based moment matching. We introduce a data model which explicitly includes the bias induced by the reduced approximation, which is estimated by an inexpensive error indicator. The results of the new statistical reduced order method are compared to the standard statFEM procedure applied to a ROM prior, i.e. without explicitly accounting for the reduced order bias. The proposed method yields better accuracy and faster convergence throughout a given frequency range for different numerical examples.
Reduced basis methods for approximating the solutions of parameter-dependant partial differential equations (PDEs) are based on learning the structure of the set of solutions - seen as a manifold ${\mathcal S}$ in some functional space - when the parameters vary. This involves investigating the manifold and, in particular, understanding whether it is close to a low-dimensional affine space. This leads to the notion of Kolmogorov $N$-width that consists of evaluating to which extent the best choice of a vectorial space of dimension $N$ approximates ${\mathcal S}$ well enough. If a good approximation of elements in ${\mathcal S}$ can be done with some well-chosen vectorial space of dimension $N$ -- provided $N$ is not too large -- then a ``reduced'' basis can be proposed that leads to a Galerkin type method for the approximation of any element in ${\mathcal S}$. In many cases, however, the Kolmogorov $N$-width is not so small, even if the parameter set lies in a space of small dimension yielding a manifold with small dimension. In terms of complexity reduction, this gap between the small dimension of the manifold and the large Kolmogorov $N$-width can be explained by the fact that the Kolmogorov $N$-width is linear while, in contrast, the dependency in the parameter is, most often, non-linear. There have been many contributions aiming at reconciling these two statements, either based on deterministic or AI approaches. We investigate here further a new paradigm that, in some sense, merges these two aspects: the nonlinear compressive reduced basisapproximation. We focus on a simple multiparameter problem and illustrate rigorously that the complexity associated with the approximation of the solution to the parameter dependant PDE is directly related to the number of parameters rather than the Kolmogorov $N$-width.
In this work, we use the monolithic convex limiting (MCL) methodology to enforce relevant inequality constraints in implicit finite element discretizations of the compressible Euler equations. In this context, preservation of invariant domains follows from positivity preservation for intermediate states of the density and internal energy. To avoid spurious oscillations, we additionally impose local maximum principles on intermediate states of the density, velocity components, and specific total energy. For the backward Euler time stepping, we show the invariant domain preserving (IDP) property of the fully discrete MCL scheme by constructing a fixed-point iteration that is IDP and converges under a strong time step restriction. Our iterative solver for the nonlinear discrete problem employs a more efficient fixed-point iteration. The matrix of the associated linear system is a robust low-order Jacobian approximation that exploits the homogeneity property of the flux function. The limited antidiffusive terms are treated explicitly. We use positivity preservation as a stopping criterion for nonlinear iterations. The first iteration yields the solution of a linearized semi-implicit problem. This solution possesses the discrete conservation property but is generally not IDP. Further iterations are performed if any non-IDP states are detected. The existence of an IDP limit is guaranteed by our analysis. To facilitate convergence to steady-state solutions, we perform adaptive explicit underrelaxation at the end of each time step. The calculation of appropriate relaxation factors is based on an approximate minimization of nodal entropy residuals. The performance of proposed algorithms and alternative solution strategies is illustrated by the convergence history for standard two-dimensional test problems.
This work explores multi-modal inference in a high-dimensional simplified model, analytically quantifying the performance gain of multi-modal inference over that of analyzing modalities in isolation. We present the Bayes-optimal performance and weak recovery thresholds in a model where the objective is to recover the latent structures from two noisy data matrices with correlated spikes. The paper derives the approximate message passing (AMP) algorithm for this model and characterizes its performance in the high-dimensional limit via the associated state evolution. The analysis holds for a broad range of priors and noise channels, which can differ across modalities. The linearization of AMP is compared numerically to the widely used partial least squares (PLS) and canonical correlation analysis (CCA) methods, which are both observed to suffer from a sub-optimal recovery threshold.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.