Mediation analysis aims to decipher the underlying causal mechanisms between an exposure, an outcome, and intermediate variables called mediators. Initially developed for fixed-time mediator and outcome, it has been extended to the framework of longitudinal data by discretizing the assessment times of mediator and outcome. Yet, processes in play in longitudinal studies are usually defined in continuous time and measured at irregular and subject-specific visits. This is the case in dementia research when cerebral and cognitive changes measured at planned visits in cohorts are of interest. We thus propose a methodology to estimate the causal mechanisms between a time-fixed exposure ($X$), a mediator process ($\mathcal{M}_t$) and an outcome process ($\mathcal{Y}_t$) both measured repeatedly over time in the presence of a time-dependent confounding process ($\mathcal{L}_t$). We consider three types of causal estimands, the natural effects, path-specific effects and randomized interventional analogues to natural effects, and provide identifiability assumptions. We employ a dynamic multivariate model based on differential equations for their estimation. The performance of the methods are explored in simulations, and we illustrate the method in two real-world examples motivated by the 3C cerebral aging study to assess: (1) the effect of educational level on functional dependency through depressive symptomatology and cognitive functioning, and (2) the effect of a genetic factor on cognitive functioning potentially mediated by vascular brain lesions and confounded by neurodegeneration.
This research conducts a thorough reevaluation of seismic fragility curves by utilizing ordinal regression models, moving away from the commonly used log-normal distribution function known for its simplicity. It explores the nuanced differences and interrelations among various ordinal regression approaches, including Cumulative, Sequential, and Adjacent Category models, alongside their enhanced versions that incorporate category-specific effects and variance heterogeneity. The study applies these methodologies to empirical bridge damage data from the 2008 Wenchuan earthquake, using both frequentist and Bayesian inference methods, and conducts model diagnostics using surrogate residuals. The analysis covers eleven models, from basic to those with heteroscedastic extensions and category-specific effects. Through rigorous leave-one-out cross-validation, the Sequential model with category-specific effects emerges as the most effective. The findings underscore a notable divergence in damage probability predictions between this model and conventional Cumulative probit models, advocating for a substantial transition towards more adaptable fragility curve modeling techniques that enhance the precision of seismic risk assessments. In conclusion, this research not only readdresses the challenge of fitting seismic fragility curves but also advances methodological standards and expands the scope of seismic fragility analysis. It advocates for ongoing innovation and critical reevaluation of conventional methods to advance the predictive accuracy and applicability of seismic fragility models within the performance-based earthquake engineering domain.
Learning to Optimize (L2O) stands at the intersection of traditional optimization and machine learning, utilizing the capabilities of machine learning to enhance conventional optimization techniques. As real-world optimization problems frequently share common structures, L2O provides a tool to exploit these structures for better or faster solutions. This tutorial dives deep into L2O techniques, introducing how to accelerate optimization algorithms, promptly estimate the solutions, or even reshape the optimization problem itself, making it more adaptive to real-world applications. By considering the prerequisites for successful applications of L2O and the structure of the optimization problems at hand, this tutorial provides a comprehensive guide for practitioners and researchers alike.
This work proposes a novel variational approximation of partial differential equations on moving geometries determined by explicit boundary representations. The benefits of the proposed formulation are the ability to handle large displacements of explicitly represented domain boundaries without generating body-fitted meshes and remeshing techniques. For the space discretization, we use a background mesh and an unfitted method that relies on integration on cut cells only. We perform this intersection by using clipping algorithms. To deal with the mesh movement, we pullback the equations to a reference configuration (the spatial mesh at the initial time slab times the time interval) that is constant in time. This way, the geometrical intersection algorithm is only required in 3D, another key property of the proposed scheme. At the end of the time slab, we compute the deformed mesh, intersect the deformed boundary with the background mesh, and consider an exact transfer operator between meshes to compute jump terms in the time discontinuous Galerkin integration. The transfer is also computed using geometrical intersection algorithms. We demonstrate the applicability of the method to fluid problems around rotating (2D and 3D) geometries described by oriented boundary meshes. We also provide a set of numerical experiments that show the optimal convergence of the method.
Existing approaches for device placement ignore the topological features of computation graphs and rely mostly on heuristic methods for graph partitioning. At the same time, they either follow a grouper-placer or an encoder-placer architecture, which requires understanding the interaction structure between code operations. To bridge the gap between encoder-placer and grouper-placer techniques, we propose a novel framework for the task of device placement, relying on smaller computation graphs extracted from the OpenVINO toolkit using reinforcement learning. The framework consists of five steps, including graph coarsening, node representation learning and policy optimization. It facilitates end-to-end training and takes into consideration the directed and acyclic nature of the computation graphs. We also propose a model variant, inspired by graph parsing networks and complex network analysis, enabling graph representation learning and personalized graph partitioning jointly, using an unspecified number of groups. To train the entire framework, we utilize reinforcement learning techniques by employing the execution time of the suggested device placements to formulate the reward. We demonstrate the flexibility and effectiveness of our approach through multiple experiments with three benchmark models, namely Inception-V3, ResNet, and BERT. The robustness of the proposed framework is also highlighted through an ablation study. The suggested placements improve the inference speed for the benchmark models by up to $58.2\%$ over CPU execution and by up to $60.24\%$ compared to other commonly used baselines.
We propose an actor-critic algorithm for a family of complex problems arising in algebraic statistics and discrete optimization. The core task is to produce a sample from a finite subset of the non-negative integer lattice defined by a high-dimensional polytope. We translate the problem into a Markov decision process and devise an actor-critic reinforcement learning (RL) algorithm to learn a set of good moves that can be used for sampling. We prove that the actor-critic algorithm converges to an approximately optimal sampling policy. To tackle complexity issues that typically arise in these sampling problems, and to allow the RL to function at scale, our solution strategy takes three steps: decomposing the starting point of the sample, using RL on each induced subproblem, and reconstructing to obtain a sample in the original polytope. In this setup, the proof of convergence applies to each subproblem in the decomposition. We test the method in two regimes. In statistical applications, a high-dimensional polytope arises as the support set for the reference distribution in a model/data fit test for a broad family of statistical models for categorical data. We demonstrate how RL can be used for model fit testing problems for data sets for which traditional MCMC samplers converge too slowly due to problem size and sparsity structure. To test the robustness of the algorithm and explore its generalization properties, we apply it to synthetically generated data of various sizes and sparsity levels.
This paper presents GMASK, a general algorithm for distributed approximate similarity search that accepts any arbitrary distance function. GMASK requires a clustering algorithm that induces Voronoi regions in a dataset and returns a representative element for each region. Then, it creates a multilevel indexing structure suitable for large datasets with high dimensionality and sparsity, usually stored in distributed systems. Many similarity search algorithms rely on $k$-means, typically associated with the Euclidean distance, which is inappropriate for specific problems. Instead, in this work we implement GMASK using $k$-medoids to make it compatible with any distance and a wider range of problems. Experimental results verify the applicability of this method with real datasets, improving the performance of alternative algorithms for approximate similarity search. In addition, results confirm existing intuitions regarding the advantages of using certain instances of the Minkowski distance in high-dimensional datasets.
Characteristic formulae give a complete logical description of the behaviour of processes modulo some chosen notion of behavioural semantics. They allow one to reduce equivalence or preorder checking to model checking, and are exactly the formulae in the modal logics characterizing classic behavioural equivalences and preorders for which model checking can be reduced to equivalence or preorder checking. This paper studies the complexity of determining whether a formula is characteristic for some finite, loop-free process in each of the logics providing modal characterizations of the simulation-based semantics in van Glabbeek's branching-time spectrum. Since characteristic formulae in each of those logics are exactly the consistent and prime ones, it presents complexity results for the satisfiability and primality problems, and investigates the boundary between modal logics for which those problems can be solved in polynomial time and those for which they become computationally hard. Amongst other contributions, this article also studies the complexity of constructing characteristic formulae in the modal logics characterizing simulation-based semantics, both when such formulae are presented in explicit form and via systems of equations.
We introduce a novel exploratory technique, termed biarchetype analysis, which extends archetype analysis to simultaneously identify archetypes of both observations and features. This innovative unsupervised machine learning tool aims to represent observations and features through instances of pure types, or biarchetypes, which are easily interpretable as they embody mixtures of observations and features. Furthermore, the observations and features are expressed as mixtures of the biarchetypes, which makes the structure of the data easier to understand. We propose an algorithm to solve biarchetype analysis. Although clustering is not the primary aim of this technique, biarchetype analysis is demonstrated to offer significant advantages over biclustering methods, particularly in terms of interpretability. This is attributed to biarchetypes being extreme instances, in contrast to the centroids produced by biclustering, which inherently enhances human comprehension. The application of biarchetype analysis across various machine learning challenges underscores its value, and both the source code and examples are readily accessible in R and Python at //github.com/aleixalcacer/JA-BIAA.
Multivariate probabilistic verification is concerned with the evaluation of joint probability distributions of vector quantities such as a weather variable at multiple locations or a wind vector for instance. The logarithmic score is a proper score that is useful in this context. In order to apply this score to ensemble forecasts, a choice for the density is required. Here, we are interested in the specific case when the density is multivariate normal with mean and covariance given by the ensemble mean and ensemble covariance, respectively. Under the assumptions of multivariate normality and exchangeability of the ensemble members, a relationship is derived which describes how the logarithmic score depends on ensemble size. It permits to estimate the score in the limit of infinite ensemble size from a small ensemble and thus produces a fair logarithmic score for multivariate ensemble forecasts under the assumption of normality. This generalises a study from 2018 which derived the ensemble size adjustment of the logarithmic score in the univariate case. An application to medium-range forecasts examines the usefulness of the ensemble size adjustments when multivariate normality is only an approximation. Predictions of vectors consisting of several different combinations of upper air variables are considered. Logarithmic scores are calculated for these vectors using ECMWF's daily extended-range forecasts which consist of a 100-member ensemble. The probabilistic forecasts of these vectors are verified against operational ECMWF analyses in the Northern mid-latitudes in autumn 2023. Scores are computed for ensemble sizes from 8 to 100. The fair logarithmic scores of ensembles with different cardinalities are very close, in contrast to the unadjusted scores which decrease considerably with ensemble size. This provides evidence for the practical usefulness of the derived relationships.
The Fisher-Rao distance between two probability distributions of a statistical model is defined as the Riemannian geodesic distance induced by the Fisher information metric. In order to calculate the Fisher-Rao distance in closed-form, we need (1) to elicit a formula for the Fisher-Rao geodesics, and (2) to integrate the Fisher length element along those geodesics. We consider several numerically robust approximation and bounding techniques for the Fisher-Rao distances: First, we report generic upper bounds on Fisher-Rao distances based on closed-form 1D Fisher-Rao distances of submodels. Second, we describe several generic approximation schemes depending on whether the Fisher-Rao geodesics or pregeodesics are available in closed-form or not. In particular, we obtain a generic method to guarantee an arbitrarily small additive error on the approximation provided that Fisher-Rao pregeodesics and tight lower and upper bounds are available. Third, we consider the case of Fisher metrics being Hessian metrics, and report generic tight upper bounds on the Fisher-Rao distances using techniques of information geometry. Uniparametric and biparametric statistical models always have Fisher Hessian metrics, and in general a simple test allows to check whether the Fisher information matrix yields a Hessian metric or not. Fourth, we consider elliptical distribution families and show how to apply the above techniques to these models. We also propose two new distances based either on the Fisher-Rao lengths of curves serving as proxies of Fisher-Rao geodesics, or based on the Birkhoff/Hilbert projective cone distance. Last, we consider an alternative group-theoretic approach for statistical transformation models based on the notion of maximal invariant which yields insights on the structures of the Fisher-Rao distance formula which may be used fruitfully in applications.