欧美综合一本热第九页_狼友视频首页_欧美日韩国产一级视频大全_国产精品久久久久久久久不卡_国产综合免费视频一区二区_国产精品视频99RIAV_一级一级毛片免费看

Quantitative empirical inquiry in international relations often relies on dyadic data. Standard analytic techniques do not account for the fact that dyads are not generally independent of one another. That is, when dyads share a constituent member (e.g., a common country), they may be statistically dependent, or "clustered." Recent work has developed dyadic clustering robust standard errors (DCRSEs) that account for this dependence. Using these DCRSEs, we reanalyzed all empirical articles published in International Organization between January 2014 and January 2020 that feature dyadic data. We find that published standard errors for key explanatory variables are, on average, approximately half as large as DCRSEs, suggesting that dyadic clustering is leading researchers to severely underestimate uncertainty. However, most (67% of) statistically significant findings remain statistically significant when using DCRSEs. We conclude that accounting for dyadic clustering is both important and feasible, and offer software in R and Stata to facilitate use of DCRSEs in future research.

相關內容

簇(cu)

關注 1

置信度 · 估計/估計量 · 相關系數 · 最大似然估計 · 樣例 ·

2021 年 10 月 29 日

Interval Estimation of Relative Risks for Combined Unilateral and Bilateral Correlated Data

Kejia Wang,Chang-Xing Ma

Measurements are generally collected as unilateral or bilateral data in clinical trials or observational studies. For example, in ophthalmology studies, the primary outcome is often obtained from one eye or both eyes of an individual. In medical studies, the relative risk is usually the parameter of interest and is commonly used. In this article, we develop three confidence intervals for the relative risk for combined unilateral and bilateral correlated data under the equal dependence assumption. The proposed confidence intervals are based on maximum likelihood estimates of parameters derived using the Fisher scoring method. Simulation studies are conducted to evaluate the performance of proposed confidence intervals with respect to the empirical coverage probability, the mean interval width, and the ratio of mesial non-coverage probability to the distal non-coverage probability. We also compare the proposed methods with the confidence interval based on the method of variance estimates recovery and the confidence interval obtained from the modified Poisson regression model with correlated binary data. We recommend the score confidence interval for general applications because it best controls converge probabilities at the 95% level with reasonable mean interval width. We illustrate the methods with a real-world example.

entity · 簇 · 高斯混合（模型） · 高斯混合模型 · 極大似然 ·

2021 年 10 月 28 日

Coresets for Time Series Clustering

Lingxiao Huang,K. Sudhir,Nisheeth K. Vishnoi

from arxiv, Full version of a paper appearing in NeurIPS 2021

We study the problem of constructing coresets for clustering problems with time series data. This problem has gained importance across many fields including biology, medicine, and economics due to the proliferation of sensors facilitating real-time measurement and rapid drop in storage costs. In particular, we consider the setting where the time series data on $N$ entities is generated from a Gaussian mixture model with autocorrelations over $k$ clusters in $\mathbb{R}^d$. Our main contribution is an algorithm to construct coresets for the maximum likelihood objective for this mixture model. Our algorithm is efficient, and under a mild boundedness assumption on the covariance matrices of the underlying Gaussians, the size of the coreset is independent of the number of entities $N$ and the number of observations for each entity, and depends only polynomially on $k$, $d$ and $1/\varepsilon$, where $\varepsilon$ is the error parameter. We empirically assess the performance of our coreset with synthetic data.

INFORMS · 穩健性 · 準則 · 估計/估計量 · MoDELS ·

2021 年 10 月 27 日

Doubly Robust Criterion for Causal Inference

Takamichi Baba,Yoshiyuki Ninomiya

from arxiv, 21 pages

The semiparametric estimation approach, which includes inverse-probability-weighted and doubly robust estimation using propensity scores, is a standard tool for marginal structural models basically used in causal inference, and is rapidly being extended and generalized in various directions. On the other hand, although model selection is indispensable in statistical analysis, information criterion for selecting an appropriate marginal structure has just started to be developed. In this paper, based on the original idea of the information criterion, we derive an AIC-type criterion. We define a risk function based on the Kullback-Leibler divergence as the cornerstone of the information criterion, and treat a general causal inference model that is not necessarily of the type represented as a linear model. The causal effects to be estimated are those in the general population, such as the average treatment effect on the treated or the average treatment effect on the untreated. In light of the fact that doubly robust estimation, which allows either the model of the assignment variable or the model of the outcome variable to be wrong, is attached importance in this field, we will make the information criterion itself doubly robust, so that either one of the two can be wrong and still be a mathematically valid criterion.

估計/估計量 · Microsoft Surface · GROUP · 泛函 · MoDELS ·

2021 年 10 月 27 日

Spatial Bayesian GLM on the cortical surface produces reliable task activations in individuals and groups

Daniel Spencer, Yu, Yue,David Bolin,Sarah Ryan,Amanda F. Mejia

from arxiv, 44 pages, 24 figures

The general linear model (GLM) is a widely popular and convenient tool for estimating the functional brain response and identifying areas of significant activation during a task or stimulus. However, the classical GLM is based on a massive univariate approach that does not explicitly leverage the similarity of activation patterns among neighboring brain locations. As a result, it tends to produce noisy estimates and be underpowered to detect significant activations, particularly in individual subjects and small groups. A recent alternative, a cortical surface-based spatial Bayesian GLM, leverages spatial dependencies among neighboring cortical vertices to produce more accurate estimates and areas of functional activation. The spatial Bayesian GLM can be applied to individual and group-level analysis. In this study, we assess the reliability and power of individual and group-average measures of task activation produced via the surface-based spatial Bayesian GLM. We analyze motor task data from 45 subjects in the Human Connectome Project (HCP) and HCP Retest datasets. We also extend the model to multi-run analysis and employ subject-specific cortical surfaces rather than surfaces inflated to a sphere for more accurate distance-based modeling. Results show that the surface-based spatial Bayesian GLM produces highly reliable activations in individual subjects and is powerful enough to detect trait-like functional topologies. Additionally, spatial Bayesian modeling enhances reliability of group-level analysis even in moderately sized samples (n=45). The power of the spatial Bayesian GLM to detect activations above a scientifically meaningful effect size is nearly invariant to sample size, exhibiting high power even in small samples (n=10). The spatial Bayesian GLM is computationally efficient in individuals and groups and is convenient to implement with the open-source BayesfMRI R package.

估計/估計量 · 劃分 · Extensibility · 分類與回歸樹 · 估計誤差 ·

2021 年 10 月 27 日

Lattice partition recovery with dyadic CART

Oscar Hernan Madrid Padilla,Yi Yu,Alessandro Rinaldo

We study piece-wise constant signals corrupted by additive Gaussian noise over a $d$-dimensional lattice. Data of this form naturally arise in a host of applications, and the tasks of signal detection or testing, de-noising and estimation have been studied extensively in the statistical and signal processing literature. In this paper we consider instead the problem of partition recovery, i.e.~of estimating the partition of the lattice induced by the constancy regions of the unknown signal, using the computationally-efficient dyadic classification and regression tree (DCART) methodology proposed by \citep{donoho1997cart}. We prove that, under appropriate regularity conditions on the shape of the partition elements, a DCART-based procedure consistently estimates the underlying partition at a rate of order $\sigma^2 k^* \log (N)/\kappa^2$, where $k^*$ is the minimal number of rectangular sub-graphs obtained using recursive dyadic partitions supporting the signal partition, $\sigma^2$ is the noise variance, $\kappa$ is the minimal magnitude of the signal difference among contiguous elements of the partition and $N$ is the size of the lattice. Furthermore, under stronger assumptions, our method attains a sharper estimation error of order $\sigma^2\log(N)/\kappa^2$, independent of $k^*$, which we show to be minimax rate optimal. Our theoretical guarantees further extend to the partition estimator based on the optimal regression tree estimator (ORT) of \cite{chatterjee2019adaptive} and to the one obtained through an NP-hard exhaustive search method. We corroborate our theoretical findings and the effectiveness of DCART for partition recovery in simulations.

UniFormer · 簇 · 穩健性 · Continuity · 局部極小 ·

2021 年 10 月 27 日

Uniform Concentration Bounds toward a Unified Framework for Robust Clustering

Debolina Paul,Saptarshi Chakraborty,Swagatam Das,Jason Xu

from arxiv, To appear (spotlight) in the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), 2021

Recent advances in center-based clustering continue to improve upon the drawbacks of Lloyd's celebrated $k$-means algorithm over $60$ years after its introduction. Various methods seek to address poor local minima, sensitivity to outliers, and data that are not well-suited to Euclidean measures of fit, but many are supported largely empirically. Moreover, combining such approaches in a piecemeal manner can result in ad hoc methods, and the limited theoretical results supporting each individual contribution may no longer hold. Toward addressing these issues in a principled way, this paper proposes a cohesive robust framework for center-based clustering under a general class of dissimilarity measures. In particular, we present a rigorous theoretical treatment within a Median-of-Means (MoM) estimation framework, showing that it subsumes several popular $k$-means variants. In addition to unifying existing methods, we derive uniform concentration bounds that complete their analyses, and bridge these results to the MoM framework via Dudley's chaining arguments. Importantly, we neither require any assumptions on the distribution of the outlying observations nor on the relative number of observations $n$ to features $p$. We establish strong consistency and an error rate of $O(n^{-1/2})$ under mild conditions, surpassing the best-known results in the literature. The methods are empirically validated thoroughly on real and synthetic datasets.

MoDELS · 塊 · INTERACT · 相互獨立的 · 統計量 ·

2021 年 10 月 26 日

Measuring and Modeling Neighborhoods

Cory McCartan,Jacob R. Brown,Kosuke Imai

from arxiv, 21 pages, 6 figures, and appendices

With the availability of granular geographical data, social scientists are increasingly interested in examining how residential neighborhoods are formed and how they influence attitudes and behavior. To facilitate such studies, we develop an easy-to-use online survey instrument that allows respondents to draw their neighborhoods on a map. We then propose a statistical model to analyze how the characteristics of respondents, relevant local areas, and their interactions shape subjective neighborhoods. The model also generates out-of-sample predictions of one's neighborhood given these observed characteristics. We illustrate the proposed methodology by conducting a survey among registered voters in Miami, New York City, and Phoenix. We find that across these cities voters are more likely to include same-race and co-partisan census blocks in their neighborhoods. Net of other factors, White respondents are 6.1 to 16.9 percentage points more likely to include in their neighborhoods a census block composed entirely of White residents compared to one with no White residents. Similarly, Democratic and Republican respondents are 8.6 to 19.2 percentage points more likely to include an entirely co-partisan census block compared to one consisting entirely of out-partisans. Co-partisanship exhibits a similar, independent, influence. We also show that our model provides more accurate out-of-sample predictions than the standard distance-based measures of neighborhoods. Open-source software is available for implementing the proposed methodology.

估計/估計量 · 估計誤差 · MoDELS · 學成 · 無偏 ·

2020 年 12 月 17 日

The Causal Learning of Retail Delinquency

Yiyan Huang,Cheuk Hang Leung,Xing Yan,Qi Wu,Nanbo Peng,Dongdong Wang,Zhixiang Huang

from arxiv, This paper was accepted and will be published in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.

簇 · GROUP · 聚類方法 · Performer · contrastive ·

2019 年 2 月 28 日

Efficient Parameter-free Clustering Using First Neighbor Relations

M. Saquib Sarfraz,Vivek Sharma,Rainer Stiefelhagen

from arxiv, CVPR 2019

We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data. The main proposition is that the first neighbor of each sample is all one needs to discover large chains and finding the groups in the data. In contrast to most existing clustering algorithms our method does not require any hyper-parameters, distance thresholds and/or the need to specify the number of clusters. The proposed algorithm belongs to the family of hierarchical agglomerative methods. The technique has a very low computational overhead, is easily scalable and applicable to large practical problems. Evaluation on well known datasets from different domains ranging between 1077 and 8.1 million samples shows substantial performance gains when compared to the existing clustering techniques.

圖 · 知識圖譜 · CASES · state-of-the-art · contrastive ·

2018 年 10 月 22 日

Knowledge Graph Completion to Predict Polypharmacy Side Effects

Brandon Malone,Alberto García-Durán,Mathias Niepert

from arxiv, 13th International Conference on Data Integration in the Life Sciences (DILS2018)

The polypharmacy side effect prediction problem considers cases in which two drugs taken individually do not result in a particular side effect; however, when the two drugs are taken in combination, the side effect manifests. In this work, we demonstrate that multi-relational knowledge graph completion achieves state-of-the-art results on the polypharmacy side effect prediction problem. Empirical results show that our approach is particularly effective when the protein targets of the drugs are well-characterized. In contrast to prior work, our approach provides more interpretable predictions and hypotheses for wet lab validation.