In this paper, we propose a novel approach to test the equality of high-dimensional mean vectors of several populations via the weighted $L_2$-norm. We establish the asymptotic normality of the test statistics under the null hypothesis. We also explain theoretically why our test statistics can be highly useful in weakly dense cases when the nonzero signal in mean vectors is present. Furthermore, we compare the proposed test with existing tests using simulation results, demonstrating that the weighted $L_2$-norm-based test statistic exhibits favorable properties in terms of both size and power.
Our knowledge of the organisation of the human brain at the population-level is yet to translate into power to predict functional differences at the individual-level, limiting clinical applications, and casting doubt on the generalisability of inferred mechanisms. It remains unknown whether the difficulty arises from the absence of individuating biological patterns within the brain, or from limited power to access them with the models and compute at our disposal. Here we comprehensively investigate the resolvability of such patterns with data and compute at unprecedented scale. Across 23 810 unique participants from UK Biobank, we systematically evaluate the predictability of 25 individual biological characteristics, from all available combinations of structural and functional neuroimaging data. Over 4526 GPU hours of computation, we train, optimize, and evaluate out-of-sample 700 individual predictive models, including fully-connected feed-forward neural networks of demographic, psychological, serological, chronic disease, and functional connectivity characteristics, and both uni- and multi-modal 3D convolutional neural network models of macro- and micro-structural brain imaging. We find a marked discrepancy between the high predictability of sex (balanced accuracy 99.7%), age (mean absolute error 2.048 years, R2 0.859), and weight (mean absolute error 2.609Kg, R2 0.625), for which we set new state-of-the-art performance, and the surprisingly low predictability of other characteristics. Neither structural nor functional imaging predicted psychology better than the coincidence of chronic disease (p<0.05). Serology predicted chronic disease (p<0.05) and was best predicted by it (p<0.001), followed by structural neuroimaging (p<0.05). Our findings suggest either more informative imaging or more powerful models are needed to decipher individual level characteristics from the human brain.
In this paper, we consider a new nonlocal approximation to the linear Stokes system with periodic boundary conditions in two and three dimensional spaces . A relaxation term is added to the equation of nonlocal divergence free equation, which is reminiscent to the relaxation of local Stokes equation with small artificial compressibility. Our analysis shows that the well-posedness of the nonlocal system can be established under some mild assumptions on the kernel of nonlocal interactions. Furthermore, the new nonlocal system converges to the conventional, local Stokes system in second order as the horizon parameter of the nonlocal interaction goes to zero. The study provides more theoretical understanding to some numerical methods, such as smoothed particle hydrodynamics, for simulating incompressible viscous flows.
In this paper, we develop a novel fast iterative moment method for the steady-state simulation of near-continuum flows, which are modeled by the high-order moment system derived from the Boltzmann-BGK equation. The fast convergence of the present method is mainly achieved by alternately solving the moment system and the hydrodynamic equations with compatible constitutive relations and boundary conditions. To be specific, the compatible hydrodynamic equations are solved in each iteration to get improved predictions of macroscopic quantities, which are subsequently utilized to expedite the evolution of the moment system. Additionally, a semi-implicit scheme treating the collision term implicitly is introduced for the moment system. With cell-by-cell sweeping strategy, the resulting alternating iteration can be further accelerated for steady-state computation. It is also worth mentioning that such an alternating iteration works well with the nonlinear multigrid method. Numerical experiments for planar Couette flow, shock structure, and lid-driven cavity flow are carried out to investigate the performance of the proposed fast iterative moment method, and all results show wonderful efficiency and robustness.
In this paper, we investigate hypothesis testing for the linear combination of mean vectors across multiple populations through the method of random integration. We have established the asymptotic distributions of the test statistics under both null and alternative hypotheses. Additionally, we provide a theoretical explanation for the special use of our test statistics in situations when the nonzero signal in the linear combination of the true mean vectors is weakly dense. Moreover, Monte-Carlo simulations are presented to evaluate the suggested test against existing high-dimensional tests. The findings from these simulations reveal that our test not only aligns with the performance of other tests in terms of size but also exhibits superior power.
In this paper, without requiring any constraint qualifications, we establish tight error bounds for the log-determinant cone, which is the closure of the hypograph of the perspective function of the log-determinant function. This error bound is obtained using the recently developed framework based on one-step facial residual functions.
In decision-making, maxitive functions are used for worst-case and best-case evaluations. Maxitivity gives rise to a rich structure that is well-studied in the context of the pointwise order. In this article, we investigate maxitivity with respect to general preorders and provide a representation theorem for such functionals. The results are illustrated for different stochastic orders in the literature, including the usual stochastic order, the increasing convex/concave order, and the dispersive order.
Charts, figures, and text derived from data play an important role in decision making, from data-driven policy development to day-to-day choices informed by online articles. Making sense of, or fact-checking, outputs means understanding how they relate to the underlying data. Even for domain experts with access to the source code and data sets, this poses a significant challenge. In this paper we introduce a new program analysis framework which supports interactive exploration of fine-grained I/O relationships directly through computed outputs, making use of dynamic dependence graphs. Our main contribution is a novel notion in data provenance which we call related inputs, a relation of mutual relevance or "cognacy" which arises between inputs when they contribute to common features of the output. Queries of this form allow readers to ask questions like "What outputs use this data element, and what other data elements are used along with it?". We show how Jonsson and Tarski's concept of conjugate operators on Boolean algebras appropriately characterises the notion of cognacy in a dependence graph, and give a procedure for computing related inputs over such a graph.
Generative models for multimodal data permit the identification of latent factors that may be associated with important determinants of observed data heterogeneity. Common or shared factors could be important for explaining variation across modalities whereas other factors may be private and important only for the explanation of a single modality. Multimodal Variational Autoencoders, such as MVAE and MMVAE, are a natural choice for inferring those underlying latent factors and separating shared variation from private. In this work, we investigate their capability to reliably perform this disentanglement. In particular, we highlight a challenging problem setting where modality-specific variation dominates the shared signal. Taking a cross-modal prediction perspective, we demonstrate limitations of existing models, and propose a modification how to make them more robust to modality-specific variation. Our findings are supported by experiments on synthetic as well as various real-world multi-omics data sets.
In this paper, we propose a new modified likelihood ratio test (LRT) for simultaneously testing mean vectors and covariance matrices of two-sample populations in high-dimensional settings. By employing tools from Random Matrix Theory (RMT), we derive the limiting null distribution of the modified LRT for generally distributed populations. Furthermore, we compare the proposed test with existing tests using simulation results, demonstrating that the modified LRT exhibits favorable properties in terms of both size and power.
Off-policy evaluation (OPE) is the problem of estimating the value of a target policy using historical data collected under a different logging policy. OPE methods typically assume overlap between the target and logging policy, enabling solutions based on importance weighting and/or imputation. In this work, we approach OPE without assuming either overlap or a well-specified model by considering a strategy based on partial identification under non-parametric assumptions on the conditional mean function, focusing especially on Lipschitz smoothness. Under such smoothness assumptions, we formulate a pair of linear programs whose optimal values upper and lower bound the contributions of the no-overlap region to the off-policy value. We show that these linear programs have a concise closed form solution that can be computed efficiently and that their solutions converge, under the Lipschitz assumption, to the sharp partial identification bounds on the off-policy value. Furthermore, we show that the rate of convergence is minimax optimal, up to log factors. We deploy our methods on two semi-synthetic examples, and obtain informative and valid bounds that are tighter than those possible without smoothness assumptions.