国产亚洲欧美日韩精品色狠二区_日本一区不卡在线观看_日韩一区久久久久久_亚洲一日欧美日韩中文字幕_欧色欧美中文字幕一二三四区人妻_很有味道的熟妇15P_亚洲日本中文字幕区第5页

We consider the conditional treatment effect for competing risks data in observational studies. While it is described as a constant difference between the hazard functions given the covariates, we do not assume specific functional forms for the covariates. We derive the efficient score for the treatment effect using modern semiparametric theory, as well as two doubly robust scores with respect to 1) the assumed propensity score for treatment and the censoring model, and 2) the outcome models for the competing risks. An important asymptotic result regarding the estimators is rate double robustness, in addition to the classical model double robustness. Rate double robustness enables the use of machine learning and nonparametric methods in order to estimate the nuisance parameters, while preserving the root-$n$ asymptotic normality of the estimators for inferential purposes. We study the performance of the estimators using simulation. The estimators are applied to the data from a cohort of Japanese men in Hawaii followed since 1960s in order to study the effect of mid-life drinking behavior on late life cognitive outcomes.

相關內容

估(gu)計(ji)/估(gu)計(ji)量

關注 3

估計/估計量 · 分數匹配 · Weight · 泛函 · 規范化的 ·

2022 年 4 月 20 日

Estimating Density Models with Truncation Boundaries using Score Matching

Song Liu,Takafumi Kanamori,Daniel J. Williams

from arxiv, to be published in the Journal of Machine Learning Research

Truncated densities are probability density functions defined on truncated domains. They share the same parametric form with their non-truncated counterparts up to a normalizing constant. Since the computation of their normalizing constants is usually infeasible, Maximum Likelihood Estimation cannot be easily applied to estimate truncated density models. Score Matching (SM) is a powerful tool for fitting parameters using only unnormalized models. However, it cannot be directly applied here as boundary conditions used to derive a tractable SM objective are not satisfied by truncated densities. In this paper, we study parameter estimation for truncated probability densities using SM. The estimator minimizes a weighted Fisher divergence. The weight function is simply the shortest distance from a data point to the boundary of the domain. We show this choice of weight function naturally arises from minimizing the Stein discrepancy as well as upperbounding the finite-sample estimation error. The usefulness of our method is demonstrated by numerical experiments and a study on the Chicago crime data set. We also show that the proposed density estimation can correct the outlier-trimming bias caused by aggressive outlier detection methods.

估計/估計量 · 離散化 · 穩健性 · 知識神經元 · 分解的 ·

2022 年 4 月 20 日

Robust Estimation of Discrete Distributions under Local Differential Privacy

Julien Chhor,Flore Sentenac

Although robust learning and local differential privacy are both widely studied fields of research, combining the two settings is just starting to be explored. We consider the problem of estimating a discrete distribution in total variation from $n$ contaminated data batches under a local differential privacy constraint. A fraction $1-\epsilon$ of the batches contain $k$ i.i.d. samples drawn from a discrete distribution $p$ over $d$ elements. To protect the users' privacy, each of the samples is privatized using an $\alpha$-locally differentially private mechanism. The remaining $\epsilon n $ batches are an adversarial contamination. The minimax rate of estimation under contamination alone, with no privacy, is known to be $\epsilon/\sqrt{k}+\sqrt{d/kn}$, up to a $\sqrt{\log(1/\epsilon)}$ factor. Under the privacy constraint alone, the minimax rate of estimation is $\sqrt{d^2/\alpha^2 kn}$. We show that combining the two constraints leads to a minimax estimation rate of $\epsilon\sqrt{d/\alpha^2 k}+\sqrt{d^2/\alpha^2 kn}$ up to a $\sqrt{\log(1/\epsilon)}$ factor, larger than the sum of the two separate rates. We provide a polynomial-time algorithm achieving this bound, as well as a matching information theoretic lower bound.

白盒 · 流 · 估計/估計量 · MoDELS · 可約的 ·

2022 年 4 月 19 日

The White-Box Adversarial Data Stream Model

Miklos Ajtai,Vladimir Braverman,T. S. Jayram,Sandeep Silwal,Alec Sun,David P. Woodruff,Samson Zhou

from arxiv, PODS 2022

We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the $L_1$-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long streams. If the white-box adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our $L_1$-heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in a stream with insertions and deletions achieving a multiplicative approximation and sublinear space; such an algorithm is impossible for deterministic algorithms. We also give a general technique that translates any two-player deterministic communication lower bound to a lower bound for {\it randomized} algorithms robust to a white-box adversary. In particular, our results show that for all $p\ge 0$, there exists a constant $C_p>1$ such that any $C_p$-approximation algorithm for $F_p$ moment estimation in insertion-only streams with a white-box adversary requires $\Omega(n)$ space for a universe of size $n$. Similarly, there is a constant $C>1$ such that any $C$-approximation algorithm in an insertion-only stream for matrix rank requires $\Omega(n)$ space with a white-box adversary. Our algorithmic results based on cryptography thus show a separation between computationally bounded and unbounded adversaries. (Abstract shortened to meet arXiv limits.)

查全率/召回率 · 估計/估計量 · 假陽性 · 假正例率 · MoDELS ·

2022 年 4 月 18 日

AB/BA analysis: A framework for estimating keyword spotting recall improvement while maintaining audio privacy

Raphael Petegrosso,Vasistakrishna Baderdinni,Thibaud Senechal,Benjamin L. Bullough

from arxiv, Accepted to NAACL 2022 Industry Track

Evaluation of keyword spotting (KWS) systems that detect keywords in speech is a challenging task under realistic privacy constraints. The KWS is designed to only collect data when the keyword is present, limiting the availability of hard samples that may contain false negatives, and preventing direct estimation of model recall from production data. Alternatively, complementary data collected from other sources may not be fully representative of the real application. In this work, we propose an evaluation technique which we call AB/BA analysis. Our framework evaluates a candidate KWS model B against a baseline model A, using cross-dataset offline decoding for relative recall estimation, without requiring negative examples. Moreover, we propose a formulation with assumptions that allow estimation of relative false positive rate between models with low variance even when the number of false positives is small. Finally, we propose to leverage machine-generated soft labels, in a technique we call Semi-Supervised AB/BA analysis, that improves the analysis time, privacy, and cost. Experiments with both simulation and real data show that AB/BA analysis is successful at measuring recall improvement in conjunction with the trade-off in relative false positive rate.

2022 年 4 月 17 日

Impact of Phase-Noise and Spatial Correlation on Double-RIS-Assisted Multiuser MISO Networks

Zaid Abdullah,Anastasios Papazafeiropoulos,Steven Kisseleff,Symeon Chatzinotas,Bjorn Ottersten

We study the performance of a phase-noise impaired double reconfigurable intelligent surface (RIS)-aided multiuser (MU) multiple-input single-output (MISO) system under spatial correlation at both RISs and base-station (BS). The downlink achievable rate is derived in closed-form under maximum ratio transmission (MRT) precoding. In addition, we obtain the optimal phase-shift design at both RISs in closed-form for the considered channel and phase-noise models. Numerical results validate the analytical expressions, and highlight the effects of different system parameters on the achievable rate. Our analysis shows that phase-noise can severely degrade the performance when users do not have direct links to both RISs, and can only be served via the double-reflection link. Also, we show that high spatial correlation at RISs is essential for high achievable rates.

估計/估計量 · 可約的 · 有向非循環圖 · 可辨認的 · 圖 ·

2022 年 4 月 17 日

Variable elimination, graph reduction and efficient g-formula

F. Richard Guo,Emilija Perkovi?,Andrea Rotnitzky

from arxiv, clarified the notion of marginal model; added simulations

We study efficient estimation of an interventional mean associated with a point exposure treatment under a causal graphical model represented by a directed acyclic graph without hidden variables. Under such a model, it may happen that a subset of the variables are uninformative in that failure to measure them neither precludes identification of the interventional mean nor changes the semiparametric variance bound for regular estimators of it. We develop a set of graphical criteria that are sound and complete for eliminating all the uninformative variables so that the cost of measuring them can be saved without sacrificing estimation efficiency, which could be useful when designing a planned observational or randomized study. Further, we construct a reduced directed acyclic graph on the set of informative variables only. We show that the interventional mean is identified from the marginal law by the g-formula (Robins, 1986) associated with the reduced graph, and the semiparametric variance bounds for estimating the interventional mean under the original and the reduced graphical model agree. This g-formula is an irreducible, efficient identifying formula in the sense that the nonparametric estimator of the formula, under regularity conditions, is asymptotically efficient under the original causal graphical model, and no formula with such property exists that only depends on a strict subset of the variables.

估計/估計量 · Weight · binary · 規范化的 · Notability ·

2022 年 4 月 15 日

Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect

Tymon S?oczyński,S. Derya Uysal,Jeffrey M. Wooldridge

In this paper we study the finite sample and asymptotic properties of various weighting estimators of the local average treatment effect (LATE), several of which are based on Abadie (2003)'s kappa theorem. Our framework presumes a binary endogenous explanatory variable ("treatment") and a binary instrumental variable, which may only be valid after conditioning on additional covariates. We argue that one of the Abadie estimators, which we show is weight normalized, is likely to dominate the others in many contexts. A notable exception is in settings with one-sided noncompliance, where certain unnormalized estimators have the advantage of being based on a denominator that is bounded away from zero. We use a simulation study and three empirical applications to illustrate our findings. In applications to causal effects of college education using the college proximity instrument (Card, 1995) and causal effects of childbearing using the sibling sex composition instrument (Angrist and Evans, 1998), the unnormalized estimates are clearly unreasonable, with "incorrect" signs, magnitudes, or both. Overall, our results suggest that (i) the relative performance of different kappa weighting estimators varies with features of the data-generating process; and that (ii) the normalized version of Tan (2006)'s estimator may be an attractive alternative in many contexts. Applied researchers with access to a binary instrumental variable should also consider covariate balancing or doubly robust estimators of the LATE.

估計/估計量 · FPG · PG · 估計誤差 · 價值函數 ·

2022 年 4 月 15 日

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Chengzhuo Ni,Ruiqi Zhang,Xiang Ji,Xuezhou Zhang,Mengdi Wang

Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy. Conventional methods for off-policy PG estimation often suffer from either significant bias or exponentially large variance. In this paper, we propose the double Fitted PG estimation (FPG) algorithm. FPG can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. In the case of linear value function approximation, we provide a tight finite-sample upper bound on policy gradient estimation error, that is governed by the amount of distribution mismatch measured in feature space. We also establish the asymptotic normality of FPG estimation error with a precise covariance characterization, which is further shown to be statistically optimal with a matching Cramer-Rao lower bound. Empirically, we evaluate the performance of FPG on both policy gradient estimation and policy optimization, using either softmax tabular or ReLU policy networks. Under various metrics, our results show that FPG significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.

估計/估計量 · 異常點 · Extensibility · 穩健性 · 中位數 ·

2022 年 4 月 15 日

Outlier-Resistant Estimators for Average Treatment Effect in Causal Inference

Kazuharu Harada,Hironori Fujisawa

The inverse probability (IPW) and doubly robust (DR) estimators are often used to estimate the average causal effect (ATE), but are vulnerable to outliers. The IPW/DR median can be used for outlier-resistant estimation of the ATE, but the outlier resistance of the median is limited and it is not resistant enough for heavy contamination. We propose extensions of the IPW/DR estimators with density power weighting, which can eliminate the influence of outliers almost completely. The outlier resistance of the proposed estimators is evaluated through the unbiasedness of the estimating equations. Unlike the median-based methods, our estimators are resistant to outliers even under heavy contamination. Interestingly, the naive extension of the DR estimator requires bias correction to keep the double robustness even under the most tractable form of contamination. In addition, the proposed estimators are found to be highly resistant to outliers in more difficult settings where the contamination ratio depends on the covariates. The outlier resistance of our estimators from the viewpoint of the influence function is also favorable. Our theoretical results are verified via Monte Carlo simulations and real data analysis. The proposed methods were found to have more outlier resistance than the median-based methods and estimated the potential mean with a smaller error than the median-based methods.

散度 · 秩 · 穩健性 · 講稿 · Performer ·

2022 年 4 月 15 日

On Variants of Root Normalised Order-aware Divergence and a Divergence based on Kendall's Tau

Tetsuya Sakai

This paper reports on a follow-up study of the work reported in Sakai, which explored suitable evaluation measures for ordinal quantification tasks. More specifically, the present study defines and evaluates, in addition to the quantification measures considered earlier, a few variants of an ordinal quantification measure called Root Normalised Order-aware Divergence (RNOD), as well as a measure which we call Divergence based on Kendall's $\tau$ (DNKT). The RNOD variants represent alternative design choices based on the idea of Sakai's Distance-Weighted sum of squares (DW), while DNKT is designed to ensure that the system's estimated distribution over classes is faithful to the target priorities over classes. As this Priority Preserving Property (PPP) of DNKT may be useful in some applications, we also consider combining some of the existing quantification measures with DNKT. Our experiments with eight ordinal quantification data sets suggest that the variants of RNOD do not offer any benefit over the original RNOD at least in terms of system ranking consistency, i.e., robustness of the system ranking to the choice of test data. Of all ordinal quantification measures considered in this study (including Normalised Match Distance, a.k.a. Earth Mover's Distance), RNOD is the most robust measure overall. Hence the design choice of RNOD is a good one from this viewpoint. Also, DNKT is the worst performer in terms of system ranking consistency. Hence, if DNKT seems appropriate for a task, sample size design should take its statistical instability into account.