We introduce a causal regularisation extension to anchor regression (AR) for improved out-of-distribution (OOD) generalisation. We present anchor-compatible losses, aligning with the anchor framework to ensure robustness against distribution shifts. Various multivariate analysis (MVA) algorithms, such as (Orthonormalized) PLS, RRR, and MLR, fall within the anchor framework. We observe that simple regularisation enhances robustness in OOD settings. Estimators for selected algorithms are provided, showcasing consistency and efficacy in synthetic and real-world climate science problems. The empirical validation highlights the versatility of anchor regularisation, emphasizing its compatibility with MVA approaches and its role in enhancing replicability while guarding against distribution shifts. The extended AR framework advances causal inference methodologies, addressing the need for reliable OOD generalisation.
Functional data analysis almost always involves smoothing discrete observations into curves, because they are never observed in continuous time and rarely without error. Although smoothing parameters affect the subsequent inference, data-driven methods for selecting these parameters are not well-developed, frustrated by the difficulty of using all the information shared by curves while being computationally efficient. On the one hand, smoothing individual curves in an isolated, albeit sophisticated way, ignores useful signals present in other curves. On the other hand, bandwidth selection by automatic procedures such as cross-validation after pooling all the curves together quickly become computationally unfeasible due to the large number of data points. In this paper we propose a new data-driven, adaptive kernel smoothing, specifically tailored for functional principal components analysis through the derivation of sharp, explicit risk bounds for the eigen-elements. The minimization of these quadratic risk bounds provide refined, yet computationally efficient bandwidth rules for each eigen-element separately. Both common and independent design cases are allowed. Rates of convergence for the estimators are derived. An extensive simulation study, designed in a versatile manner to closely mimic the characteristics of real data sets supports our methodological contribution. An illustration on a real data application is provided.
We develop a theory for the representation of opaque solids as volumes. Starting from a stochastic representation of opaque solids as random indicator functions, we prove the conditions under which such solids can be modeled using exponential volumetric transport. We also derive expressions for the volumetric attenuation coefficient as a functional of the probability distributions of the underlying indicator functions. We generalize our theory to account for isotropic and anisotropic scattering at different parts of the solid, and for representations of opaque solids as stochastic implicit surfaces. We derive our volumetric representation from first principles, which ensures that it satisfies physical constraints such as reciprocity and reversibility. We use our theory to explain, compare, and correct previous volumetric representations, as well as propose meaningful extensions that lead to improved performance in 3D reconstruction tasks.
One of the key elements of probabilistic seismic risk assessment studies is the fragility curve, which represents the conditional probability of failure of a mechanical structure for a given scalar measure derived from seismic ground motion. For many structures of interest, estimating these curves is a daunting task because of the limited amount of data available; data which is only binary in our framework, i.e., only describing the structure as being in a failure or non-failure state. A large number of methods described in the literature tackle this challenging framework through parametric log-normal models. Bayesian approaches, on the other hand, allow model parameters to be learned more efficiently. However, the impact of the choice of the prior distribution on the posterior distribution cannot be readily neglected and, consequently, neither can its impact on any resulting estimation. This paper proposes a comprehensive study of this parametric Bayesian estimation problem for limited and binary data. Using the reference prior theory as a cornerstone, this study develops an objective approach to choosing the prior. This approach leads to the Jeffreys prior, which is derived for this problem for the first time. The posterior distribution is proven to be proper with the Jeffreys prior but improper with some traditional priors found in the literature. With the Jeffreys prior, the posterior distribution is also shown to vanish at the boundaries of the parameters' domain, which means that sampling the posterior distribution of the parameters does not result in anomalously small or large values. Therefore, the use of the Jeffreys prior does not result in degenerate fragility curves such as unit-step functions, and leads to more robust credibility intervals. The numerical results obtained from different case studies-including an industrial example-illustrate the theoretical predictions.
Perturbation analysis has emerged as a significant concern across multiple disciplines, with notable advancements being achieved, particularly in the realm of matrices. This study centers on specific aspects pertaining to tensor T-eigenvalues within the context of the tensor-tensor multiplication. Initially, an analytical perturbation analysis is introduced to explore the sensitivity of T-eigenvalues. In the case of third-order tensors featuring square frontal slices, we extend the classical Gershgorin disc theorem and show that all T-eigenvalues are located inside a union of Gershgorin discs. Additionally, we extend the Bauer-Fike theorem to encompass F-diagonalizable tensors and present two modified versions applicable to more general scenarios. The tensor case of the Kahan theorem, which accounts for general perturbations on Hermite tensors, is also investigated. Furthermore, we propose the concept of pseudospectra for third-order tensors based on tensor-tensor multiplication. We develop four definitions that are equivalent under the spectral norm to characterize tensor $\varepsilon$-pseudospectra. Additionally, we present several pseudospectral properties. To provide visualizations, several numerical examples are also provided to illustrate the $\varepsilon$-pseudospectra of specific tensors at different levels.
We exploit the similarities between Tikhonov regularization and Bayesian hierarchical models to propose a regularization scheme that acts like a distributed Tikhonov regularization where the amount of regularization varies from component to component. In the standard formulation, Tikhonov regularization compensates for the inherent ill-conditioning of linear inverse problems by augmenting the data fidelity term measuring the mismatch between the data and the model output with a scaled penalty functional. The selection of the scaling is the core problem in Tikhonov regularization. If an estimate of the amount of noise in the data is available, a popular way is to use the Morozov discrepancy principle, stating that the scaling parameter should be chosen so as to guarantee that the norm of the data fitting error is approximately equal to the norm of the noise in the data. A too small value of the regularization parameter would yield a solution that fits to the noise while a too large value would lead to an excessive penalization of the solution. In many applications, it would be preferable to apply distributed regularization, replacing the regularization scalar by a vector valued parameter, allowing different regularization for different components of the unknown, or for groups of them. A distributed Tikhonov-inspired regularization is particularly well suited when the data have significantly different sensitivity to different components, or to promote sparsity of the solution. The numerical scheme that we propose, while exploiting the Bayesian interpretation of the inverse problem and identifying the Tikhonov regularization with the Maximum A Posteriori (MAP) estimation, requires no statistical tools. A combination of numerical linear algebra and optimization tools makes the scheme computationally efficient and suitable for problems where the matrix is not explicitly available.
Bayesian factor analysis is routinely used for dimensionality reduction in modeling of high-dimensional covariance matrices. Factor analytic decompositions express the covariance as a sum of a low rank and diagonal matrix. In practice, Gibbs sampling algorithms are typically used for posterior computation, alternating between updating the latent factors, loadings, and residual variances. In this article, we exploit a blessing of dimensionality to develop a provably accurate pseudo-posterior for the covariance matrix that bypasses the need for Gibbs or other variants of Markov chain Monte Carlo sampling. Our proposed Factor Analysis with BLEssing of dimensionality (FABLE) approach relies on a first-stage singular value decomposition (SVD) to estimate the latent factors, and then defines a jointly conjugate prior for the loadings and residual variances. The accuracy of the resulting pseudo-posterior for the covariance improves with increasing dimensionality. We show that FABLE has excellent performance in high-dimensional covariance matrix estimation, including producing well calibrated credible intervals, both theoretically and through simulation experiments. We also demonstrate the strength of our approach in terms of accurate inference and computational efficiency by applying it to a gene expression data set.
Signal cancellation provides a radically new and efficient approach to exploratory factor analysis, without matrix decomposition nor presetting the required number of factors. Its current implementation requires that each factor has at least two unique indicators. Its principle is that it is always possible to combine two indicator variables exclusive to the same factor with weights that cancel their common factor information. Successful combinations, consisting of nose only, are recognized by their null correlations with all remaining variables. The optimal combinations of multifactorial indicators, though, typically retain correlations with some other variables. Their signal, however, can be cancelled through combinations with unifactorial indicators of their contributing factors. The loadings are estimated from the relative signal cancellation weights of the variables involved along with their observed correlations. The factor correlations are obtained from those of their unifactorial indicators, corrected by their factor loadings. The method is illustrated with synthetic data from a complex six-factor structure that even includes two doublet factors. Another example using actual data documents that signal cancellation can rival confirmatory factor analysis.
Bilevel optimization, with broad applications in machine learning, has an intricate hierarchical structure. Gradient-based methods have emerged as a common approach to large-scale bilevel problems. However, the computation of the hyper-gradient, which involves a Hessian inverse vector product, confines the efficiency and is regarded as a bottleneck. To circumvent the inverse, we construct a sequence of low-dimensional approximate Krylov subspaces with the aid of the Lanczos process. As a result, the constructed subspace is able to dynamically and incrementally approximate the Hessian inverse vector product with less effort and thus leads to a favorable estimate of the hyper-gradient. Moreover, we propose a~provable subspace-based framework for bilevel problems where one central step is to solve a small-size tridiagonal linear system. To the best of our knowledge, this is the first time that subspace techniques are incorporated into bilevel optimization. This successful trial not only enjoys $\mathcal{O}(\epsilon^{-1})$ convergence rate but also demonstrates efficiency in a synthetic problem and two deep learning tasks.
There has been an increasing interest on summary-free versions of approximate Bayesian computation (ABC), which replace distances among summaries with discrepancies between the empirical distributions of the observed data and the synthetic samples generated under the proposed parameter values. The success of these solutions has motivated theoretical studies on the limiting properties of the induced posteriors. However, current results (i) are often tailored to a specific discrepancy, (ii) require, either explicitly or implicitly, regularity conditions on the data generating process and the assumed statistical model, and (iii) yield bounds depending on sequences of control functions that are not made explicit. As such, there is the lack of a theoretical framework that (i) is unified, (ii) facilitates the derivation of limiting properties that hold uniformly, and (iii) relies on verifiable assumptions that provide concentration bounds clarifying which factors govern the limiting behavior of the ABC posterior. We address this gap via a novel theoretical framework that introduces the concept of Rademacher complexity in the analysis of the limiting properties for discrepancy-based ABC posteriors. This yields a unified theory that relies on constructive arguments and provides more informative asymptotic results and uniform concentration bounds, even in settings not covered by current studies. These advancements are obtained by relating the properties of summary-free ABC posteriors to the behavior of the Rademacher complexity associated with the chosen discrepancy within the family of integral probability semimetrics. This family extends summary-based ABC, and includes the Wasserstein distance and maximum mean discrepancy (MMD), among others. As clarified through a focus on the MMD case and via illustrative simulations, this perspective yields an improved understanding of summary-free ABC.
Causal inference with observational studies often suffers from unmeasured confounding, yielding biased estimators based on the unconfoundedness assumption. Sensitivity analysis assesses how the causal conclusions change with respect to different degrees of unmeasured confounding. Most existing sensitivity analysis methods work well for specific types of statistical estimation or testing strategies. We propose a flexible sensitivity analysis framework that can deal with commonly used inverse probability weighting, outcome regression, and doubly robust estimators simultaneously. It is based on the well-known parametrization of the selection bias as comparisons of the observed and counterfactual outcomes conditional on observed covariates. It is attractive for practical use because it only requires simple modifications of the standard estimators. Moreover, it naturally extends to many other causal inference settings, including the causal risk ratio or odds ratio, the average causal effect on the treated units, and studies with survival outcomes. We also develop an R package saci to implement our sensitivity analysis estimators.