Ongoing advances in microbiome profiling have allowed unprecedented insights into the molecular activities of microbial communities. This has fueled a strong scientific interest in understanding the critical role the microbiome plays in governing human health, by identifying microbial features associated with clinical outcomes of interest. Several aspects of microbiome data limit the applicability of existing variable selection approaches. In particular, microbiome data are high-dimensional, extremely sparse, and compositional. Importantly, many of the observed features, although categorized as different taxa, may play related functional roles. To address these challenges, we propose a novel compositional regression approach that leverages the data-adaptive clustering and variable selection properties of the spiked Dirichlet process to identify taxa that exhibit similar functional roles. Our proposed method, Bayesian Regression with Agglomerated Compositional Effects using a dirichLET process (BRACElet), enables the identification of a sparse set of features with shared impacts on the outcome, facilitating dimension reduction and model interpretation. We demonstrate that BRACElet outperforms existing approaches for microbiome variable selection through simulation studies and an application elucidating the impact of oral microbiome composition on insulin resistance.
Learning causal relationships between pairs of complex traits from observational studies is of great interest across various scientific domains. However, most existing methods assume the absence of unmeasured confounding and restrict causal relationships between two traits to be uni-directional, which may be violated in real-world systems. In this paper, we address the challenge of causal discovery and effect inference for two traits while accounting for unmeasured confounding and potential feedback loops. By leveraging possibly invalid instrumental variables, we provide identification conditions for causal parameters in a model that allows for bi-directional relationships, and we also establish identifiability of the causal direction under the introduced conditions. Then we propose a data-driven procedure to detect the causal direction and provide inference results about causal effects along the identified direction. We show that our method consistently recovers the true direction and produces valid confidence intervals for the causal effect. We conduct extensive simulation studies to show that our proposal outperforms existing methods. We finally apply our method to analyze real data sets from UK Biobank.
The cumulative residual extropy (CRJ) is a measure of uncertainty that serves as an alternative to extropy. It replaces the probability density function with the survival function in the expression of extropy. This work introduces a new concept called normalized dynamic survival extropy (NDSE), a dynamic variation of CRJ. We observe that NDSE is equivalent to CRJ of the random variable of interest $X_{[t]}$ in the age replacement model at a fixed time $t$. Additionally, we have demonstrated that NDSE remains constant exclusively for exponential distribution at any time. We categorize two classes, INDSE and DNDSE, based on their increasing and decreasing NDSE values. Next, we present a non-parametric test to assess whether a distribution follows an exponential pattern against INDSE. We derive the exact and asymptotic distribution for the test statistic $\widehat{\Delta}^*$. Additionally, a test for asymptotic behavior is presented in the paper for right censoring data. Finally, we determine the critical values and power of our exact test through simulation. The simulation demonstrates that the suggested test is easy to compute and has significant statistical power, even with small sample sizes. We also conduct a power comparison analysis among other tests, which shows better power for the proposed test against other alternatives mentioned in this paper. Some numerical real-life examples validating the test are also included.
The class of subweibull distributions has recently been shown to generalize the important properties of subexponential and subgaussian random variables. We describe alternative characterizations of subweibull distributions and detail the conditions under which their tail behavior is preserved after exponential tilting.
The pressure-correction method is a well established approach for simulating unsteady, incompressible fluids. It is well-known that implicit discretization of the time derivative in the momentum equation e.g. using a backward differentiation formula with explicit handling of the nonlinear term results in a conditionally stable method. In certain scenarios, employing explicit time integration in the momentum equation can be advantageous, as it avoids the need to solve for a system matrix involving each differential operator. Additionally, we will demonstrate that the fully discrete method can be expressed in the form of simple matrix-vector multiplications allowing for efficient implementation on modern and highly parallel acceleration hardware. Despite being a common practice in various commercial codes, there is currently no available literature on error analysis for this scenario. In this work, we conduct a theoretical analysis of both implicit and two explicit variants of the pressure-correction method in a fully discrete setting. We demonstrate to which extend the presented implicit and explicit methods exhibit conditional stability. Furthermore, we establish a Courant-Friedrichs-Lewy (CFL) type condition for the explicit scheme and show that the explicit variant demonstrate the same asymptotic behavior as the implicit variant when the CFL condition is satisfied.
We develop a nonparametric test for deciding whether volatility of an asset follows a standard semimartingale process, with paths of finite quadratic variation, or a rough process with paths of infinite quadratic variation. The test utilizes the fact that volatility is rough if and only if volatility increments are negatively autocorrelated at high frequencies. It is based on the sample autocovariance of increments of spot volatility estimates computed from high-frequency asset return data. By showing a feasible CLT for this statistic under the null hypothesis of semimartingale volatility paths, we construct a test with fixed asymptotic size and an asymptotic power equal to one. The test is derived under very general conditions for the data-generating process. In particular, it is robust to jumps with arbitrary activity and to the presence of market microstructure noise. In an application of the test to SPY high-frequency data, we find evidence for rough volatility.
We investigate the strong convergence properties of a proximal-gradient inertial algorithm with two Tikhonov regularization terms in connection to the minimization problem of the sum of a convex lower semi-continuous function $f$ and a smooth convex function $g$. For the appropriate setting of the parameters we provide strong convergence of the generated sequence $(x_k)$ to the minimum norm minimizer of our objective function $f+g$. Further, we obtain fast convergence to zero of the objective function values in a generated sequence but also for the discrete velocity and the sub-gradient of the objective function. We also show that for another settings of the parameters the optimal rate of order $\mathcal{O}(k^{-2})$ for the potential energy $(f+g)(x_k)-\min(f+g)$ can be obtained.
We introduce a new observational setting for Positive Unlabeled (PU) data where the observations at prediction time are also labeled. This occurs commonly in practice -- we argue that the additional information is important for prediction, and call this task "augmented PU prediction". We allow for labeling to be feature dependent. In such scenario, Bayes classifier and its risk is established and compared with a risk of a classifier which for unlabeled data is based only on predictors. We introduce several variants of the empirical Bayes rule in such scenario and investigate their performance. We emphasise dangers (and ease) of applying classical classification rule in the augmented PU scenario -- due to no preexisting studies, an unaware researcher is prone to skewing the obtained predictions. We conclude that the variant based on recently proposed variational autoencoder designed for PU scenario works on par or better than other considered variants and yields advantage over feature-only based methods in terms of accuracy for unlabeled samples.
We prove an abstract convergence result for a family of dual-mesh based quadrature rules on tensor products of simplical meshes. In the context of the multilinear tensor-product finite element discretization of reaction-drift-diffusion equations, our quadrature rule generalizes the mass-lump rule, retaining its most useful properties; for a nonnegative reaction coefficient, it gives an $O(h^2)$-accurate, nonnegative diagonalization of the reaction operator. The major advantage of our scheme in comparison with the standard mass lumping scheme is that, under mild conditions, it produces an $O(h^2)$ consistency error even when the integrand has a jump discontinuity. The finite-volume-type quadrature rule has been stated in a less general form and applied to systems of reaction-diffusion equations related to particle-based stochastic reaction-diffusion simulations (PBSRD); in this context, the reaction operator is \textit{required} to be an $M$-matrix and a standard model for bimolecular reactions has a discontinuous reaction coefficient. We apply our convergence results to a finite element discretization of scalar drift-diffusion-reaction model problem related to PBSRD systems, and provide new numerical convergence studies confirming the theory.
The discretization of fluid-poromechanics systems is typically highly demanding in terms of computational effort. This is particularly true for models of multiphysics flows in the brain, due to the geometrical complexity of the cerebral anatomy - requiring a very fine computational mesh for finite element discretization - and to the high number of variables involved. Indeed, this kind of problems can be modeled by a coupled system encompassing the Stokes equations for the cerebrospinal fluid in the brain ventricles and Multiple-network Poro-Elasticity (MPE) equations describing the brain tissue, the interstitial fluid, and the blood vascular networks at different space scales. The present work aims to rigorously derive a posteriori error estimates for the coupled Stokes-MPE problem, as a first step towards the design of adaptive refinement strategies or reduced order models to decrease the computational demand of the problem. Through numerical experiments, we verify the reliability and optimal efficiency of the proposed a posteriori estimator and identify the role of the different solution variables in its composition.
The rapid pace of development in quantum computing technology has sparked a proliferation of benchmarks for assessing the performance of quantum computing hardware and software. Good benchmarks empower scientists, engineers, programmers, and users to understand a computing system's power, but bad benchmarks can misdirect research and inhibit progress. In this Perspective, we survey the science of quantum computer benchmarking. We discuss the role of benchmarks and benchmarking, and how good benchmarks can drive and measure progress towards the long-term goal of useful quantum computations, i.e., "quantum utility". We explain how different kinds of benchmark quantify the performance of different parts of a quantum computer, we survey existing benchmarks, critically discuss recent trends in benchmarking, and highlight important open research questions in this field.