国产一国产一级毛片A久久久,亚洲国产精品成人综合一区

The effects of a treatment may differ between patients with different characteristics. Addressing such treatment heterogeneity is crucial to identify which patients benefit from a treatment, but can be complex in the context of multiple correlated binary outcomes. The current paper presents a novel Bayesian method for estimation and inference for heterogeneous treatment effects in a multivariate binary setting. The framework is suitable for prediction of heterogeneous treatment effects and superiority/inferiority decision-making within subpopulations, while taking advantage of the size of the entire study sample. We introduce a decision-making framework based on Bayesian multivariate logistic regression analysis with a P\'olya-Gamma expansion. The obtained regression coefficients are transformed into differences between success probabilities of treatments to allow for treatment comparison in terms of point estimation and superiority and/or inferiority decisions for different (sub)populations. Procedures for a priori sample size estimation under a non-informative prior distribution are included in the framework. A numerical evaluation demonstrated that a) average and conditional treatment effect parameters could be estimated unbiasedly when the sample is large enough; b) decisions based on a priori sample size estimation resulted in anticipated error rates. Application to the International Stroke Trial dataset revealed a heterogeneous treatment effect: The model showed conditional treatment effects in opposite directions for patients with different levels of blood pressure, while the average treatment effect among the trial population was close to zero.

相關內容

估(gu)(gu)計/估(gu)(gu)計量(liang)

關注 3

估計/估計量 · 推斷 · Networking · 欠估計 · INTERACT ·

2022 年 7 月 25 日

Causal Inference with Noncompliance and Unknown Interference

Tadao Hoshino,Takahide Yanagi

We consider a causal inference model in which individuals interact in a social network and they may not comply with the assigned treatments. Estimating causal parameters is challenging in the presence of network interference of unknown form, as each individual may be influenced by both close individuals and distant ones in complex ways. Noncompliance with treatment assignment further complicates this problem, and prior methods dealing with network spillovers but disregarding the noncompliance issue may underestimate the effect of the treatment receipt on the outcome. To estimate meaningful causal parameters, we introduce a new concept of exposure mapping, which summarizes potentially complicated spillover effects into a fixed dimensional statistic of instrumental variables. We investigate identification conditions for the intention-to-treat effect and the average causal effect for compliers, while explicitly considering the possibility of misspecification of exposure mapping. Based on our identification results, we develop nonparametric estimation procedures via inverse probability weighting. Their asymptotic properties, including consistency and asymptotic normality, are investigated using an approximate neighborhood interference framework, which is convenient for dealing with unknown forms of spillovers between individuals. For an empirical illustration, we apply our method to experimental data on the anti-conflict intervention school program.

簇 · Performer · AIM · 統計量 · Laplace分布 ·

2022 年 7 月 24 日

Clustering of bivariate satellite time series: a quantile approach

Victor Muthama Musau,Carlo Gaetan,Paolo Girardi

Clustering has received much attention in Statistics and Machine learning with the aim of developing statistical models and autonomous algorithms which are capable of acquiring information from raw data in order to perform exploratory analysis.Several techniques have been developed to cluster sampled univariate vectors only considering the average value over the whole period and as such they have not been able to explore fully the underlying distribution as well as other features of the data, especially in presence of structured time series. We propose a model-based clustering technique that is based on quantile regression permitting us to cluster bivariate time series at different quantile levels. We model the within cluster density using asymmetric Laplace distribution allowing us to take into account asymmetry in the distribution of the data. We evaluate the performance of the proposed technique through a simulation study. The method is then applied to cluster time series observed from Glob-colour satellite data related to trophic status indices with aim of evaluating their temporal dynamics in order to identify homogeneous areas, in terms of trophic status, in the Gulf of Gabes.

估計/估計量 · 無偏 · Extensibility · FAST · 無偏估計 ·

2022 年 7 月 22 日

Local search for efficient causal effect estimation

Debo Cheng,Jiuyong Li,Lin Liu,Jiji Zhang,Jixue Liu,Thuc Duy Le

from arxiv, 14 pages, 8 figures and 2 tables

Causal effect estimation from observational data is a challenging problem, especially with high dimensional data and in the presence of unobserved variables. The available data-driven methods for tackling the problem either provide an estimation of the bounds of a causal effect (i.e. nonunique estimation) or have low efficiency. The major hurdle for achieving high efficiency while trying to obtain unique and unbiased causal effect estimation is how to find a proper adjustment set for confounding control in a fast way, given the huge covariate space and considering unobserved variables. In this paper, we approach the problem as a local search task for finding valid adjustment sets in data. We establish the theorems to support the local search for adjustment sets, and we show that unique and unbiased estimation can be achieved from observational data even when there exist unobserved variables. We then propose a data-driven algorithm that is fast and consistent under mild assumptions. We also make use of a frequent pattern mining method to further speed up the search of minimal adjustment sets for causal effect estimation. Experiments conducted on extensive synthetic and real-world datasets demonstrate that the proposed algorithm outperforms the state-of-the-art criteria/estimators in both accuracy and time-efficiency.

Learning · 統計量 · Analysis · INFORMS · 假設檢驗 ·

2022 年 7 月 22 日

Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis

Paolo Braca,Leonardo M. Millefiori,Augusto Aubry,Stefano Marano,Antonio De Maio,Peter Willett

We study the performance -- and specifically the rate at which the error probability converges to zero -- of Machine Learning (ML) classification techniques. Leveraging the theory of large deviations, we provide the mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say $\sim \exp\left(-n\,I + o(n) \right)$, where $n$ is the number of informative observations available for testing (or another relevant parameter, such as the size of the target in an image) and $I$ is the error rate. Such conditions depend on the Fenchel-Legendre transform of the cumulant-generating function of the Data-Driven Decision Function (D3F, i.e., what is thresholded before the final binary decision is made) learned in the training phase. As such, the D3F and, consequently, the related error rate $I$, depend on the given training set, which is assumed of finite size. Interestingly, these conditions can be verified and tested numerically exploiting the available dataset, or a synthetic dataset, generated according to the available information on the underlying statistical model. In other words, the classification error probability convergence to zero and its rate can be computed on a portion of the dataset available for training. Coherently with the large deviations theory, we can also establish the convergence, for $n$ large enough, of the normalized D3F statistic to a Gaussian distribution. This property is exploited to set a desired asymptotic false alarm probability, which empirically turns out to be accurate even for quite realistic values of $n$. Furthermore, approximate error probability curves $\sim \zeta_n \exp\left(-n\,I \right)$ are provided, thanks to the refined asymptotic derivation (often referred to as exact asymptotics), where $\zeta_n$ represents the most representative sub-exponential terms of the error probabilities.

最近鄰圖 · 近鄰 · 圖 · 近似 · Extensibility ·

2022 年 7 月 22 日

Graph-Based Tests for Multivariate Covariate Balance Under Multi-Valued Treatments

Eric A. Dunipace

We propose the use of non-parametric, graph-based tests to assess the distributional balance of covariates in observational studies with multi-valued treatments. Our tests utilize graph structures ranging from Hamiltonian paths that connect all of the data to nearest neighbor graphs that maximally separates data into pairs. We consider algorithms that form minimal distance graphs, such as optimal Hamiltonian paths or non-bipartite matching, or approximate alternatives, such as greedy Hamiltonian paths or greedy nearest neighbor graphs. Extensive simulation studies demonstrate that the proposed tests are able to detect the misspecification of matching models that other methods miss. Contrary to intuition, we also find that tests ran on well-formed approximate graphs do better in most cases than tests run on optimally formed graphs, and that a properly formed test on an approximate nearest neighbor graph performs best, on average. In a multi-valued treatment setting with breast cancer data, these graph-based tests can also detect imbalances otherwise missed by common matching diagnostics. We provide a new R package graphTest to implement these methods and reproduce our results.

協變量偏移 · 推斷 · 包裹式方法 · 可交換的 · 估計/估計量 ·

2022 年 7 月 21 日

JAWS: Predictive Inference Under Covariate Shift

Drew Prinster,Anqi Liu,Suchi Saria

We propose \textbf{JAWS}, a series of wrapper methods for distribution-free uncertainty quantification tasks under covariate shift, centered on our core method \textbf{JAW}, the \textbf{JA}ckknife+ \textbf{W}eighted with likelihood-ratio weights. JAWS also includes computationally efficient \textbf{A}pproximations of JAW using higher-order influence functions: \textbf{JAWA}. Theoretically, we show that JAW relaxes the jackknife+'s assumption of data exchangeability to achieve the same finite-sample coverage guarantee even under covariate shift. JAWA further approaches the JAW guarantee in the limit of either the sample size or the influence function order under mild assumptions. Moreover, we propose a general approach to repurposing any distribution-free uncertainty quantification method and its guarantees to the task of risk assessment: a task that generates the estimated probability that the true label lies within a user-specified interval. We then propose \textbf{JAW-R} and \textbf{JAWA-R} as the repurposed versions of proposed methods for \textbf{R}isk assessment. Practically, JAWS outperform the state-of-the-art predictive inference baselines in a variety of biased real world data sets for both interval-generation and risk-assessment auditing tasks.

估計/估計量 · 方差 · Networking · 推斷 · 查準率/準確率 ·

2022 年 7 月 20 日

Experimental Design under Network Interference

Davide Viviano

This paper studies the design of two-wave experiments in the presence of spillover effects when the researcher aims to conduct precise inference on treatment effects. We consider units connected through a single network, local dependence among individuals, and a general class of estimands encompassing average treatment and average spillover effects. We introduce a statistical framework for designing two-wave experiments with networks, where the researcher optimizes over participants and treatment assignments to minimize the variance of the estimators of interest, using a first-wave (pilot) experiment to estimate the variance. We derive guarantees for inference on treatment effects and regret guarantees on the variance obtained from the proposed design mechanism. Our results illustrate the existence of a trade-off in the choice of the pilot study and formally characterize the pilot's size relative to the main experiment. Simulations using simulated and real-world networks illustrate the advantages of the method.

簇 · Learning · 噪聲 · Extensibility · 可辨認的 ·

2022 年 7 月 20 日

Clustering with Queries under Semi-Random Noise

Alberto Del Pia,Mingchen Ma,Christos Tzamos

from arxiv, Accepted for presentation at the Conference on Learning Theory (COLT) 2022

The seminal paper by Mazumdar and Saha \cite{MS17a} introduced an extensive line of work on clustering with noisy queries. Yet, despite significant progress on the problem, the proposed methods depend crucially on knowing the exact probabilities of errors of the underlying fully-random oracle. In this work, we develop robust learning methods that tolerate general semi-random noise obtaining qualitatively the same guarantees as the best possible methods in the fully-random model. More specifically, given a set of $n$ points with an unknown underlying partition, we are allowed to query pairs of points $u,v$ to check if they are in the same cluster, but with probability $p$, the answer may be adversarially chosen. We show that information theoretically $O\left(\frac{nk \log n} {(1-2p)^2}\right)$ queries suffice to learn any cluster of sufficiently large size. Our main result is a computationally efficient algorithm that can identify large clusters with $O\left(\frac{nk \log n} {(1-2p)^2}\right) + \text{poly}\left(\log n, k, \frac{1}{1-2p} \right)$ queries, matching the guarantees of the best known algorithms in the fully-random model. As a corollary of our approach, we develop the first parameter-free algorithm for the fully-random model, answering an open question by \cite{MS17a}.

INFORMS · 估計/估計量 · 點估計 · 推斷 · 統計量 ·

2021 年 8 月 10 日

Federated Causal Inference in Heterogeneous Observational Data

Ruoxuan Xiong,Allison Koenecke,Michael Powell,Zhu Shen,Joshua T. Vogelstein,Susan Athey

Analyzing observational data from multiple sources can be useful for increasing statistical power to detect a treatment effect; however, practical constraints such as privacy considerations may restrict individual-level information sharing across data sets. This paper develops federated methods that only utilize summary-level information from heterogeneous data sets. Our federated methods provide doubly-robust point estimates of treatment effects as well as variance estimates. We derive the asymptotic distributions of our federated estimators, which are shown to be asymptotically equivalent to the corresponding estimators from the combined, individual-level data. We show that to achieve these properties, federated methods should be adjusted based on conditions such as whether models are correctly specified and stable across heterogeneous data sets.

Networking · Extensibility · MoDELS · Neural Networks · 模型復雜度 ·

2018 年 9 月 6 日

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

Yen-Yu Chang,Fan-Yun Sun,Yueh-Hua Wu,Shou-De Lin

from arxiv, 8 pages, 4 figures, submitted to AAAI 2019

Multivariate time series forecasting is extensively studied throughout the years with ubiquitous applications in areas such as finance, traffic, environment, etc. Still, concerns have been raised on traditional methods for incapable of modeling complex patterns or dependencies lying in real word data. To address such concerns, various deep learning models, mainly Recurrent Neural Network (RNN) based methods, are proposed. Nevertheless, capturing extremely long-term patterns while effectively incorporating information from other variables remains a challenge for time-series forecasting. Furthermore, lack-of-explainability remains one serious drawback for deep neural network models. Inspired by Memory Network proposed for solving the question-answering task, we propose a deep learning based model named Memory Time-series network (MTNet) for time series forecasting. MTNet consists of a large memory component, three separate encoders, and an autoregressive component to train jointly. Additionally, the attention mechanism designed enable MTNet to be highly interpretable. We can easily tell which part of the historic data is referenced the most.