Multivariate imputation by chained equations (MICE) is one of the most popular approaches to address missing values in a data set. This approach requires specifying a univariate imputation model for every variable under imputation. The specification of which predictors should be included in these univariate imputation models can be a daunting task. Principal component analysis (PCA) can simplify this process by replacing all of the potential imputation model predictors with a few components summarizing their variance. In this article, we extend the use of PCA with MICE to include a supervised aspect whereby information from the variables under imputation is incorporated into the principal component estimation. We conducted an extensive simulation study to assess the statistical properties of MICE with different versions of supervised dimensionality reduction and we compared them with the use of classical unsupervised PCA as a simpler dimensionality reduction technique.
Symmetry is a cornerstone of much of mathematics, and many probability distributions possess symmetries characterized by their invariance to a collection of group actions. Thus, many mathematical and statistical methods rely on such symmetry holding and ostensibly fail if symmetry is broken. This work considers under what conditions a sequence of probability measures asymptotically gains such symmetry or invariance to a collection of group actions. Considering the many symmetries of the Gaussian distribution, this work effectively proposes a non-parametric type of central limit theorem. That is, a Lipschitz function of a high dimensional random vector will be asymptotically invariant to the actions of certain compact topological groups. Applications of this include a partial law of the iterated logarithm for uniformly random points in an $\ell_p^n$-ball and an asymptotic equivalence between classical parametric statistical tests and their randomization counterparts even when invariance assumptions are violated.
A numerical method is proposed for simulation of composite open quantum systems. It is based on Lindblad master equations and adiabatic elimination. Each subsystem is assumed to converge exponentially towards a stationary subspace, slightly impacted by some decoherence channels and weakly coupled to the other subsystems. This numerical method is based on a perturbation analysis with an asymptotic expansion. It exploits the formulation of the slow dynamics with reduced dimension. It relies on the invariant operators of the local and nominal dissipative dynamics attached to each subsystem. Second-order expansion can be computed only with local numerical calculations. It avoids computations on the tensor-product Hilbert space attached to the full system. This numerical method is particularly well suited for autonomous quantum error correction schemes. Simulations of such reduced models agree with complete full model simulations for typical gates acting on one and two cat-qubits (Z, ZZ and CNOT) when the mean photon number of each cat-qubit is less than 8. For larger mean photon numbers and gates with three cat-qubits (ZZZ and CCNOT), full model simulations are almost impossible whereas reduced model simulations remain accessible. In particular, they capture both the dominant phase-flip error-rate and the very small bit-flip error-rate with its exponential suppression versus the mean photon number.
Models of complex technological systems inherently contain interactions and dependencies among their input variables that affect their joint influence on the output. Such models are often computationally expensive and few sensitivity analysis methods can effectively process such complexities. Moreover, the sensitivity analysis field as a whole pays limited attention to the nature of interaction effects, whose understanding can prove to be critical for the design of safe and reliable systems. In this paper, we introduce and extensively test a simple binning approach for computing sensitivity indices and demonstrate how complementing it with the smart visualization method, simulation decomposition (SimDec), can permit important insights into the behavior of complex engineering models. The simple binning approach computes first-, second-order effects, and a combined sensitivity index, and is considerably more computationally efficient than Sobol' indices. The totality of the sensitivity analysis framework provides an efficient and intuitive way to analyze the behavior of complex systems containing interactions and dependencies.
A new mechanical model on noncircular shallow tunnelling considering initial stress field is proposed in this paper by constraining far-field ground surface to eliminate displacement singularity at infinity, and the originally unbalanced tunnel excavation problem in existing solutions is turned to an equilibrium one of mixed boundaries. By applying analytic continuation, the mixed boundaries are transformed to a homogenerous Riemann-Hilbert problem, which is subsequently solved via an efficient and accurate iterative method with boundary conditions of static equilibrium, displacement single-valuedness, and traction along tunnel periphery. The Lanczos filtering technique is used in the final stress and displacement solution to reduce the Gibbs phenomena caused by the constrained far-field ground surface for more accurte results. Several numerical cases are conducted to intensively verify the proposed solution by examining boundary conditions and comparing with existing solutions, and all the results are in good agreements. Then more numerical cases are conducted to investigate the stress and deformation distribution along ground surface and tunnel periphery, and several engineering advices are given. Further discussions on the defects of the proposed solution are also conducted for objectivity.
Differential geometric approaches are ubiquitous in several fields of mathematics, physics and engineering, and their discretizations enable the development of network-based mathematical and computational frameworks, which are essential for large-scale data science. The Forman-Ricci curvature (FRC) - a statistical measure based on Riemannian geometry and designed for networks - is known for its high capacity for extracting geometric information from complex networks. However, extracting information from dense networks is still challenging due to the combinatorial explosion of high-order network structures. Motivated by this challenge we sought a set-theoretic representation theory for high-order network cells and FRC, as well as their associated concepts and properties, which together provide an alternative and efficient formulation for computing high-order FRC in complex networks. We provide a pseudo-code, a software implementation coined FastForman, as well as a benchmark comparison with alternative implementations. Crucially, our representation theory reveals previous computational bottlenecks and also accelerates the computation of FRC. As a consequence, our findings open new research possibilities in complex systems where higher-order geometric computations are required.
We present a semi-Lagrangian characteristic mapping method for the incompressible Euler equations on a rotating sphere. The numerical method uses a spatio-temporal discretization of the inverse flow map generated by the Eulerian velocity as a composition of sub-interval flows formed by $C^1$ spherical spline interpolants. This approximation technique has the capacity of resolving sub-grid scales generated over time without increasing the spatial resolution of the computational grid. The numerical method is analyzed and validated using standard test cases yielding third-order accuracy in the supremum norm. Numerical experiments illustrating the unique resolution properties of the method are performed and demonstrate the ability to reproduce the forward energy cascade at sub-grid scales by upsampling the numerical solution.
Rational function approximations provide a simple but flexible alternative to polynomial approximation, allowing one to capture complex non-linearities without oscillatory artifacts. However, there have been few attempts to use rational functions on noisy data due to the likelihood of creating spurious singularities. To avoid the creation of singularities, we use Bernstein polynomials and appropriate conditions on their coefficients to force the denominator to be strictly positive. While this reduces the range of rational polynomials that can be expressed, it keeps all the benefits of rational functions while maintaining the robustness of polynomial approximation in noisy data scenarios. Our numerical experiments on noisy data show that existing rational approximation methods continually produce spurious poles inside the approximation domain. This contrasts our method, which cannot create poles in the approximation domain and provides better fits than a polynomial approximation and even penalized splines on functions with multiple variables. Moreover, guaranteeing pole-free in an interval is critical for estimating non-constant coefficients when numerically solving differential equations using spectral methods. This provides a compact representation of the original differential equation, allowing numeric solvers to achieve high accuracy quickly, as seen in our experiments.
In this paper, we view the statistical inverse problems of partial differential equations (PDEs) as PDE-constrained regression and focus on learning the prediction function of the prior probability measures. From this perspective, we propose general generalization bounds for learning infinite-dimensionally defined prior measures in the style of the probability approximately correct Bayesian learning theory. The theoretical framework is rigorously defined on infinite-dimensional separable function space, which makes the theories intimately connected to the usual infinite-dimensional Bayesian inverse approach. Inspired by the concept of $\alpha$-differential privacy, a generalized condition (containing the usual Gaussian measures employed widely in the statistical inverse problems of PDEs) has been proposed, which allows the learned prior measures to depend on the measured data (the prediction function with measured data as input and the prior measure as output can be introduced). After illustrating the general theories, the specific settings of linear and nonlinear problems have been given and can be easily casted into our general theories to obtain concrete generalization bounds. Based on the obtained generalization bounds, infinite-dimensionally well-defined practical algorithms are formulated. Finally, numerical examples of the backward diffusion and Darcy flow problems are provided to demonstrate the potential applications of the proposed approach in learning the prediction function of the prior probability measures.
Factor models are widely used for dimension reduction in the analysis of multivariate data. This is achieved through decomposition of a p x p covariance matrix into the sum of two components. Through a latent factor representation, they can be interpreted as a diagonal matrix of idiosyncratic variances and a shared variation matrix, that is, the product of a p x k factor loadings matrix and its transpose. If k << p, this defines a sparse factorisation of the covariance matrix. Historically, little attention has been paid to incorporating prior information in Bayesian analyses using factor models where, at best, the prior for the factor loadings is order invariant. In this work, a class of structured priors is developed that can encode ideas of dependence structure about the shared variation matrix. The construction allows data-informed shrinkage towards sensible parametric structures while also facilitating inference over the number of factors. Using an unconstrained reparameterisation of stationary vector autoregressions, the methodology is extended to stationary dynamic factor models. For computational inference, parameter-expanded Markov chain Monte Carlo samplers are proposed, including an efficient adaptive Gibbs sampler. Two substantive applications showcase the scope of the methodology and its inferential benefits.
In this paper, we consider an inverse space-dependent source problem for a time-fractional diffusion equation. To deal with the ill-posedness of the problem, we transform the problem into an optimal control problem with total variational (TV) regularization. In contrast to the classical Tikhonov model incorporating $L^2$ penalty terms, the inclusion of a TV term proves advantageous in reconstructing solutions that exhibit discontinuities or piecewise constancy. The control problem is approximated by a fully discrete scheme, and convergence results are provided within this framework. Furthermore, a lineraed primal-dual iterative algorithm is proposed to solve the discrete control model based on an equivalent saddle-point reformulation, and several numerical experiments are presented to demonstrate the efficiency of the algorithm.