Multivariate functional data that are cross-sectionally compositional data are attracting increasing interest in the statistical modeling literature, a major example being trajectories over time of compositions derived from cause-specific mortality rates. In this work, we develop a novel functional concurrent regression model in which independent variables are functional compositions. This allows us to investigate the relationship over time between life expectancy at birth and compositions derived from cause-specific mortality rates of four distinct age classes, namely 0--4, 5--39, 40--64 and 65+. A penalized approach is developed to estimate the regression coefficients and select the relevant variables. Then an efficient computational strategy based on an augmented Lagrangian algorithm is derived to solve the resulting optimization problem. The good performances of the model in predicting the response function and estimating the unknown functional coefficients are shown in a simulation study. The results on real data confirm the important role of neoplasms and cardiovascular diseases in determining life expectancy emerged in other studies and reveal several other contributions not yet observed.
A general a posteriori error analysis applies to five lowest-order finite element methods for two fourth-order semi-linear problems with trilinear non-linearity and a general source. A quasi-optimal smoother extends the source term to the discrete trial space, and more importantly, modifies the trilinear term in the stream-function vorticity formulation of the incompressible 2D Navier-Stokes and the von K\'{a}rm\'{a}n equations. This enables the first efficient and reliable a posteriori error estimates for the 2D Navier-Stokes equations in the stream-function vorticity formulation for Morley, two discontinuous Galerkin, $C^0$ interior penalty, and WOPSIP discretizations with piecewise quadratic polynomials.
Mediation analysis is widely used in health science research to evaluate the extent to which an intermediate variable explains an observed exposure-outcome relationship. However, the validity of analysis can be compromised when the exposure is measured with error. Motivated by the Health Professionals Follow-up Study (HPFS), we investigate the impact of exposure measurement error on assessing mediation with a survival outcome, based on the Cox proportional hazards outcome model. When the outcome is rare and there is no exposure-mediator interaction, we show that the uncorrected estimators of the natural indirect and direct effects can be biased into either direction, but the uncorrected estimator of the mediation proportion is approximately unbiased as long as the measurement error is not large or the mediator-exposure association is not strong. We develop ordinary regression calibration and risk set regression calibration approaches to correct the exposure measurement error-induced bias when estimating mediation effects and allowing for an exposure-mediator interaction in the Cox outcome model. The proposed approaches require a validation study to characterize the measurement error process. We apply the proposed approaches to the HPFS (1986-2016) to evaluate extent to which reduced body mass index mediates the protective effect of vigorous physical activity on the risk of cardiovascular diseases, and compare the finite-sample properties of the proposed estimators via simulations.
We examine the tension between academic impact - the volume of citations received by publications - and scientific disruption. Intuitively, one would expect disruptive scientific work to be rewarded by high volumes of citations and, symmetrically, impactful work to also be disruptive. A number of recent studies have instead shown that such intuition is often at odds with reality. In this paper, we break down the relationship between impact and disruption with a detailed correlation analysis in two large data sets of publications in Computer Science and Physics. We find that highly disruptive papers tend to be cited at higher rates than average. Contrastingly, the opposite is not true, as we do not find highly impactful papers to be particularly disruptive. Notably, these results qualitatively hold even within individual scientific careers, as we find that - on average - an author's most disruptive work tends to be well cited, whereas their most cited work does not tend to be disruptive. We discuss the implications of our findings in the context of academic evaluation systems, and show how they can contribute to reconcile seemingly contradictory results in the literature.
Transition amplitudes and transition probabilities are relevant to many areas of physics simulation, including the calculation of response properties and correlation functions. These quantities can also be related to solving linear systems of equations. Here we present three related algorithms for calculating transition probabilities. First, we extend a previously published short-depth algorithm, allowing for the two input states to be non-orthogonal. Building on this first procedure, we then derive a higher-depth algorithm based on Trotterization and Richardson extrapolation that requires fewer circuit evaluations. Third, we introduce a tunable algorithm that allows for trading off circuit depth and measurement complexity, yielding an algorithm that can be tailored to specific hardware characteristics. Finally, we implement proof-of-principle numerics for models in physics and chemistry and for a subroutine in variational quantum linear solving (VQLS). The primary benefits of our approaches are that (a) arbitrary non-orthogonal states may now be used with small increases in quantum resources, (b) we (like another recently proposed method) entirely avoid subroutines such as the Hadamard test that may require three-qubit gates to be decomposed, and (c) in some cases fewer quantum circuit evaluations are required as compared to the previous state-of-the-art in NISQ algorithms for transition probabilities.
In cluster randomized experiments, units are often recruited after the random cluster assignment, and data are only available for the recruited sample. Post-randomization recruitment can lead to selection bias, inducing systematic differences between the overall and the recruited populations, and between the recruited intervention and control arms. In this setting, we define causal estimands for the overall and the recruited populations. We first show that if units select their cluster independently of the treatment assignment, cluster randomization implies individual randomization in the overall population. We then prove that under the assumption of ignorable recruitment, the average treatment effect on the recruited population can be consistently estimated from the recruited sample using inverse probability weighting. Generally we cannot identify the average treatment effect on the overall population. Nonetheless, we show, via a principal stratification formulation, that one can use weighting of the recruited sample to identify treatment effects on two meaningful subpopulations of the overall population: units who would be recruited into the study regardless of the assignment, and units who would be recruited in the study under treatment but not under control. We develop a corresponding estimation strategy and a sensitivity analysis method for checking the ignorable recruitment assumption.
Combinatorial optimization - a field of research addressing problems that feature strongly in a wealth of scientific and industrial contexts - has been identified as one of the core potential fields of applicability of quantum computers. It is still unclear, however, to what extent quantum algorithms can actually outperform classical algorithms for this type of problems. In this work, by resorting to computational learning theory and cryptographic notions, we prove that quantum computers feature an in-principle super-polynomial advantage over classical computers in approximating solutions to combinatorial optimization problems. Specifically, building on seminal work by Kearns and Valiant and introducing a new reduction, we identify special types of problems that are hard for classical computers to approximate up to polynomial factors. At the same time, we give a quantum algorithm that can efficiently approximate the optimal solution within a polynomial factor. The core of the quantum advantage discovered in this work is ultimately borrowed from Shor's quantum algorithm for factoring. Concretely, we prove a super-polynomial advantage for approximating special instances of the so-called integer programming problem. In doing so, we provide an explicit end-to-end construction for advantage bearing instances. This result shows that quantum devices have, in principle, the power to approximate combinatorial optimization solutions beyond the reach of classical efficient algorithms. Our results also give clear guidance on how to construct such advantage-bearing problem instances.
The recently introduced Genetic Column Generation (GenCol) algorithm has been numerically observed to efficiently and accurately compute high-dimensional optimal transport plans for general multi-marginal problems, but theoretical results on the algorithm have hitherto been lacking. The algorithm solves the OT linear program on a dynamically updated low-dimensional submanifold consisting of sparse plans. The submanifold dimension exceeds the sparse support of optimal plans only by a fixed factor $\beta$. Here we prove that for $\beta \geq 2$ and in the two-marginal case, GenCol always converges to an exact solution, for arbitrary costs and marginals. The proof relies on the concept of c-cyclical monotonicity. As an offshoot, GenCol rigorously reduces the data complexity of numerically solving two-marginal OT problems from $O(\ell^2)$ to $O(\ell)$ without any loss in accuracy, where $\ell$ is the number of discretization points for a single marginal. At the end of the paper we also present some insights into the convergence behavior in the multi-marginal case.
We are interested in numerical algorithms for computing the electrical field generated by a charge distribution localized on scale $l$ in an infinite heterogeneous correlated random medium, in a situation where the medium is only known in a box of diameter $L\gg l$ around the support of the charge. We show that the algorithm of Lu, Otto and Wang, suggesting optimal Dirichlet boundary conditions motivated by the multipole expansion of Bella, Giunti and Otto, still performs well in correlated media. With overwhelming probability, we obtain a convergence rate in terms of $l$, $L$ and the size of the correlations for which optimality is supported with numerical simulations. These estimates are provided for ensembles which satisfy a multi-scale logarithmic Sobolev inequality, where our main tool is an extension of the semi-group estimates established by the first author. As part of our strategy, we construct sub-linear second-order correctors in this correlated setting which is of independent interest.
When students and users of statistical methods first learn about regression analysis there is an emphasis on the technical details of models and estimation methods that invariably runs ahead of the purposes for which these models might be used. More broadly, statistics is widely understood to provide a body of techniques for "modelling data", underpinned by what we describe as the "true model myth", according to which the task of the statistician/data analyst is to build a model that closely approximates the true data generating process. By way of our own historical examples and a brief review of mainstream clinical research journals, we describe how this perspective leads to a range of problems in the application of regression methods, including misguided "adjustment" for covariates, misinterpretation of regression coefficients and the widespread fitting of regression models without a clear purpose. We then outline an alternative approach to the teaching and application of regression methods, which begins by focussing on clear definition of the substantive research question within one of three distinct types: descriptive, predictive, or causal. The simple univariable regression model may be introduced as a tool for description, while the development and application of multivariable regression models should proceed differently according to the type of question. Regression methods will no doubt remain central to statistical practice as they provide a powerful tool for representing variation in a response or outcome variable as a function of "input" variables, but their conceptualisation and usage should follow from the purpose at hand.
When modelling discontinuities (interfaces) using the finite element method, the standard approach is to use a conforming finite-element mesh in which the mesh matches the interfaces. However, this approach can prove cumbersome if the geometry is complex, in particular in 3D. In this work, we develop an efficient technique for a non-conforming finite-element treatment of weak discontinuities by using laminated microstructures. The approach is inspired by the so-called composite voxel technique that has been developed for FFT-based spectral solvers in computational homogenization. The idea behind the method is rather simple. Each finite element that is cut by an interface is treated as a simple laminate with the volume fraction of the phases and the lamination orientation determined in terms of the actual geometrical arrangement of the interface within the element. The approach is illustrated by several computational examples relevant to the micromechanics of heterogeneous materials. Elastic and elastic-plastic materials at small and finite strain are considered in the examples. The performance of the proposed method is compared to two alternative, simple methods showing that the new approach is in most cases superior to them while maintaining the simplicity.