亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Principal stratification is a popular framework for causal inference in the presence of an intermediate outcome. While the principal average treatment effects have traditionally been the default target of inference, it may not be sufficient when the interest lies in the relative favorability of one potential outcome over the other within the principal stratum. We thus introduce the principal generalized causal effect estimands, which extend the principal average causal effects to accommodate nonlinear contrast functions. Under principal ignorability, we expand the theoretical results in Jiang et. al. (2022) to a much wider class of causal estimands in the presence of a binary intermediate variable. We develop identification formulas and derive the efficient influence functions of the generalized estimands for principal stratification analyses. These efficient influence functions motivate a set of multiply robust estimators and lay the ground for obtaining efficient debiased machine learning estimators via cross-fitting based on $U$-statistics. The proposed methods are illustrated through simulations and the analysis of a data example.

相關內容

The e-BH procedure is an e-value-based multiple testing procedure that provably controls the false discovery rate (FDR) under any dependence structure between the e-values. Despite this appealing theoretical FDR control guarantee, the e-BH procedure often suffers from low power in practice. In this paper, we propose a general framework that boosts the power of e-BH without sacrificing its FDR control under arbitrary dependence. This is achieved by the technique of conditional calibration, where we take as input the e-values and calibrate them to be a set of "boosted e-values" that are guaranteed to be no less -- and are often more -- powerful than the original ones. Our general framework is explicitly instantiated in three classes of multiple testing problems: (1) testing under parametric models, (2) conditional independence testing under the model-X setting, and (3) model-free conformalized selection. Extensive numerical experiments show that our proposed method significantly improves the power of e-BH while continuing to control the FDR. We also demonstrate the effectiveness of our method through an application to an observational study dataset for identifying individuals whose counterfactuals satisfy certain properties.

We investigate the bounding problem of causal effects in experimental studies in which the outcome is truncated by death, meaning that the subject dies before the outcome can be measured. Causal effects cannot be point identified without instruments and/or tight parametric assumptions but can be bounded under mild restrictions. Previous work on partial identification under the principal stratification framework has primarily focused on the `always-survivor' subpopulation. In this paper, we present a novel nonparametric unified framework to provide sharp bounds on causal effects on discrete and continuous square-integrable outcomes. These bounds are derived on the `always-survivor', `protected', and `harmed' subpopulations and on the entire population with/without assumptions of monotonicity and stochastic dominance. The main idea depends on rewriting the optimization problem in terms of the integrated tail probability expectation formula using a set of conditional probability distributions. The proposed procedure allows for settings with any type and number of covariates, and can be extended to incorporate average causal effects and complier average causal effects. Furthermore, we present several simulation studies conducted under various assumptions as well as the application of the proposed approach to a real dataset from the National Supported Work Demonstration.

Our research delves into the balance between maintaining privacy and preserving statistical accuracy when dealing with multivariate data that is subject to \textit{componentwise local differential privacy} (CLDP). With CLDP, each component of the private data is made public through a separate privacy channel. This allows for varying levels of privacy protection for different components or for the privatization of each component by different entities, each with their own distinct privacy policies. We develop general techniques for establishing minimax bounds that shed light on the statistical cost of privacy in this context, as a function of the privacy levels $\alpha_1, ... , \alpha_d$ of the $d$ components. We demonstrate the versatility and efficiency of these techniques by presenting various statistical applications. Specifically, we examine nonparametric density and covariance estimation under CLDP, providing upper and lower bounds that match up to constant factors, as well as an associated data-driven adaptive procedure. Furthermore, we quantify the probability of extracting sensitive information from one component by exploiting the fact that, on another component which may be correlated with the first, a smaller degree of privacy protection is guaranteed.

We study the problem of bivariate discrete or continuous probability density estimation under low-rank constraints.For discrete distributions, we assume that the two-dimensional array to estimate is a low-rank probability matrix.In the continuous case, we assume that the density with respect to the Lebesgue measure satisfies a generalized multi-view model, meaning that it is $\beta$-H{\"o}lder and can be decomposed as a sum of $K$ components, each of which is a product of one-dimensional functions.In both settings, we propose estimators that achieve, up to logarithmic factors, the minimax optimal convergence rates under such low-rank constraints.In the discrete case, the proposed estimator is adaptive to the rank $K$. In the continuous case, our estimator converges with the $L_1$ rate $\min((K/n)^{\beta/(2\beta+1)}, n^{-\beta/(2\beta+2)})$ up to logarithmic factors, and it is adaptive to the unknown support as well as to the smoothness $\beta$ and to the unknown number of separable components $K$. We present efficient algorithms for computing our estimators.

Benchmarks have emerged as the central approach for evaluating Large Language Models (LLMs). The research community often relies on a model's average performance across the test prompts of a benchmark to evaluate the model's performance. This is consistent with the assumption that the test prompts within a benchmark represent a random sample from a real-world distribution of interest. We note that this is generally not the case; instead, we hold that the distribution of interest varies according to the specific use case. We find that (1) the correlation in model performance across test prompts is non-random, (2) accounting for correlations across test prompts can change model rankings on major benchmarks, (3) explanatory factors for these correlations include semantic similarity and common LLM failure points.

Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a subject's gut. Often, practitioners are interested in learning how the dependencies between microbes vary across distinct populations or experimental conditions. In statistical terms, the goal is to estimate a covariance matrix for the (latent) log-abundances of the microbes in each of the populations. However, the compositional nature of the data prevents the use of standard estimators for these covariance matrices. In this article, we propose an estimator of multiple covariance matrices which allows for information sharing across distinct populations of samples. Compared to some existing estimators, which estimate the covariance matrices of interest indirectly, our estimator is direct, ensures positive definiteness, and is the solution to a convex optimization problem. We compute our estimator using a proximal-proximal gradient descent algorithm. Asymptotic properties of our estimator reveal that it can perform well in high-dimensional settings. Through simulation studies, we demonstrate that our estimator can outperform existing estimators. We show that our method provides more reliable estimates than competitors in an analysis of microbiome data from subjects with chronic fatigue syndrome.

We describe two families of statistical tests to detect partial correlation in vectorial timeseries. The tests measure whether an observed timeseries Y can be predicted from a second series X, even after accounting for a third series Z which may correlate with X. They do not make any assumptions on the nature of these timeseries, such as stationarity or linearity, but they do require that multiple statistically independent recordings of the 3 series are available. Intuitively, the tests work by asking if the series Y recorded on one experiment can be better predicted from X recorded on the same experiment than on a different experiment, after accounting for the prediction from Z recorded on both experiments.

The implication problem for conditional independence (CI) asks whether the fact that a probability distribution obeys a given finite set of CI relations implies that a further CI statement also holds in this distribution. This problem has a long and fascinating history, cumulating in positive results about implications now known as the semigraphoid axioms as well as impossibility results about a general finite characterization of CI implications. Motivated by violation of faithfulness assumptions in causal discovery, we study the implication problem in the special setting where the CI relations are obtained from a directed acyclic graphical (DAG) model along with one additional CI statement. Focusing on the Gaussian case, we give a complete characterization of when such an implication is graphical by using algebraic techniques. Moreover, prompted by the relevance of strong faithfulness in statistical guarantees for causal discovery algorithms, we give a graphical solution for an approximate CI implication problem, in which we ask whether small values of one additional partial correlation entail small values for yet a further partial correlation.

Replication studies are increasingly conducted to assess the credibility of scientific findings. Most of these replication attempts target studies with a superiority design, but there is a lack of methodology regarding the analysis of replication studies with alternative types of designs, such as equivalence. In order to fill this gap, we propose two approaches, the two-trials rule and the sceptical TOST procedure, adapted from methods used in superiority settings. Both methods have the same overall Type-I error rate, but the sceptical TOST procedure allows replication success even for non-significant original or replication studies. This leads to a larger project power and other differences in relevant operating characteristics. Both methods can be used for sample size calculation of the replication study, based on the results from the original one. The two methods are applied to data from the Reproducibility Project: Cancer Biology.

In contemporary problems involving genetic or neuroimaging data, thousands of hypotheses need to be tested. Due to their high power, and finite sample guarantees on type-1 error under weak assumptions, Monte-Carlo permutation tests are often considered as gold standard for these settings. However, the enormous computational effort required for (thousands of) permutation tests is a major burden. Recently, Fischer and Ramdas (2024) constructed a permutation test for a single hypothesis in which the permutations are drawn sequentially one-by-one and the testing process can be stopped at any point without inflating the type I error. They showed that the number of permutations can be substantially reduced (under null and alternative) while the power remains similar. We show how their approach can be modified to make it suitable for a broad class of multiple testing procedures. In particular, we discuss its use with the Benjamini-Hochberg procedure and illustrate the application on a large dataset.

北京阿比特科技有限公司