青柠在线观看免费高清1,99视频在线播放喷射,亚洲成A人片在线观看网站黄

While methods for measuring and correcting differential performance in risk prediction models have proliferated in recent years, most existing techniques can only be used to assess fairness across relatively large subgroups. The purpose of algorithmic fairness efforts is often to redress discrimination against groups that are both marginalized and small, so this sample size limitation often prevents existing techniques from accomplishing their main aim. We take a three-pronged approach to address the problem of quantifying fairness with small subgroups. First, we propose new estimands built on the "counterfactual fairness" framework that leverage information across groups. Second, we estimate these quantities using a larger volume of data than existing techniques. Finally, we propose a novel data borrowing approach to incorporate "external data" that lacks outcomes and predictions but contains covariate and group membership information. This less stringent requirement on the external data allows for more possibilities for external data sources. We demonstrate practical application of our estimators to a risk prediction model used by a major Midwestern health system during the COVID-19 pandemic.

相關內容

Facebook AI Research

關注 10

隨機森林 · MoDELS · state-of-the-art · 統計量 · Weight ·

2023 年 12 月 19 日

Conditional autoregressive models fused with random forests to improve small-area spatial prediction

Cara MacBride,Vinny Davies,Duncan Lee

In areal unit data with missing or suppressed data, it desirable to create models that are able to predict observations that are not available. Traditional statistical methods achieve this through Bayesian hierarchical models that can capture the unexplained residual spatial autocorrelation through conditional autoregressive (CAR) priors, such that they can make predictions at geographically related spatial locations. In contrast, typical machine learning approaches such as random forests ignore this residual autocorrelation, and instead base predictions on complex non-linear feature-target relationships. In this paper, we propose CAR-Forest, a novel spatial prediction algorithm that combines the best features of both approaches by fusing them together. By iteratively refitting a random forest combined with a Bayesian CAR model in one algorithm, CAR-Forest can incorporate flexible feature-target relationships while still accounting for the residual spatial autocorrelation. Our results, based on a Scottish housing price data set, show that CAR-Forest outperforms Bayesian CAR models, random forests, and the state-of-the-art hybrid approach, geographically weighted random forest, providing a state-of-the-art framework for small-area spatial prediction.

估計/估計量 · 無偏 · 過估計 · HTTPS · TOOLS ·

2023 年 12 月 16 日

One step closer to unbiased aleatoric uncertainty estimation

Wang Zhang,Ziwen Ma,Subhro Das,Tsui-Wei Weng,Alexandre Megretski,Luca Daniel,Lam M. Nguyen

Neural networks are powerful tools in various applications, and quantifying their uncertainty is crucial for reliable decision-making. In the deep learning field, the uncertainties are usually categorized into aleatoric (data) and epistemic (model) uncertainty. In this paper, we point out that the existing popular variance attenuation method highly overestimates aleatoric uncertainty. To address this issue, we propose a new estimation method by actively de-noising the observed data \footnote{Source code available at \url{//github.com/wz16/DVA}.}. By conducting a broad range of experiments, we demonstrate that our proposed approach provides a much closer approximation to the actual data uncertainty than the standard method.

Color · MoDELS · Continuity · TOOLS · Analysis ·

2023 年 12 月 16 日

survivalContour: Visualizing predicted survival via colored contour plots

Yushu Shi,Liangliang Zhang,Kim-Anh Do,Robert J. Jenq,Christine B. Peterson

Advances in survival analysis have facilitated unprecedented flexibility in data modeling, yet there remains a lack of tools for graphically illustrating the influence of continuous covariates on predicted survival outcomes. We propose the utilization of a colored contour plot to depict the predicted survival probabilities over time, and provide a Shiny app and R package as implementations of this tool. Our approach is capable of supporting conventional models, including the Cox and Fine-Gray models. However, its capability shines when coupled with cutting-edge machine learning models such as random survival forests and deep neural networks.

集成 · 子采樣 · 交叉驗證 · 模型評估 · tuning ·

2023 年 12 月 15 日

Extrapolated cross-validation for randomized ensembles

Jin-Hong Du,Pratik Patil,Kathryn Roeder,Arun Kumar Kuchibhotla

from arxiv, Accepted by the Journal of Computational and Graphical Statistics

Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields $\delta$-optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general ensemble predictors, only requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As a practical case study, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests. In comparison to sample-split cross-validation and $K$-fold cross-validation, ECV achieves higher accuracy avoiding sample splitting. At the same time, its computational cost is considerably lower owing to the use of the risk extrapolation technique. Additional numerical results validate the finite-sample accuracy of ECV for several common ensemble predictors under a computational constraint on the maximum ensemble size.

估計/估計量 · 隨機場 · Processing（編程語言） · Continuity · Performer ·

2023 年 12 月 15 日

Spectral estimation for spatial point processes and random fields

Jake P. Grainger,Tuomas A. Rajala,David J. Murrell,Sofia C. Olhede

Spatial data can come in a variety of different forms, but two of the most common generating models for such observations are random fields and point processes. Whilst it is known that spectral analysis can unify these two different data forms, specific methodology for the related estimation is yet to be developed. In this paper, we solve this problem by extending multitaper estimation, to estimate the spectral density matrix function for multivariate spatial data, where processes can be any combination of either point processes or random fields. We discuss finite sample and asymptotic theory for the proposed estimators, as well as specific details on the implementation, including how to perform estimation on non-rectangular domains and the correct implementation of multitapering for processes sampled in different ways, e.g. continuously vs on a regular grid.

估計/估計量 · 泛函 · 推斷 · 置信度 · 可微函數 ·

2023 年 12 月 15 日

Debiased inference for a covariate-adjusted regression function

Kenta Takatsu,Ted Westling

In this article, we study nonparametric inference for a covariate-adjusted regression function. This parameter captures the average association between a continuous exposure and an outcome after adjusting for other covariates. In particular, under certain causal conditions, this parameter corresponds to the average outcome had all units been assigned to a specific exposure level, known as the causal dose-response curve. We propose a debiased local linear estimator of the covariate-adjusted regression function, and demonstrate that our estimator converges pointwise to a mean-zero normal limit distribution. We use this result to construct asymptotically valid confidence intervals for function values and differences thereof. In addition, we use approximation results for the distribution of the supremum of an empirical process to construct asymptotically valid uniform confidence bands. Our methods do not require undersmoothing, permit the use of data-adaptive estimators of nuisance functions, and our estimator attains the optimal rate of convergence for a twice differentiable function. We illustrate the practical performance of our estimator using numerical studies and an analysis of the effect of air pollution exposure on cardiovascular mortality.

基 · SimPLe · 平方損失 · 類標記 · Continuity ·

2023 年 12 月 15 日

Stochastic interpolants with data-dependent couplings

Michael S. Albergo,Mark Goldstein,Nicholas M. Boffi,Rajesh Ranganath,Eric Vanden-Eijnden

Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities, whereby samples from the base are computed conditionally given samples from the target in a way that is different from (but does preclude) incorporating information about class labels or continuous embeddings. This enables us to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.

估計/估計量 · MoDELS · Extensibility · 相互獨立的 · Performer ·

2023 年 12 月 15 日

Extreme value methods for estimating rare events in Utopia

L. M. André,R. Campbell,E. D'Arcy,A. Farrell,D. Healy,L. Kakampakou,C. Murphy,C. J. R. Murphy-Barltrop,M. Speers

from arxiv, 37 pages

To capture the extremal behaviour of complex environmental phenomena in practice, flexible techniques for modelling tail behaviour are required. In this paper, we introduce a variety of such methods, which were used by the Lancopula Utopiversity team to tackle the data challenge of the 2023 Extreme Value Analysis Conference. This data challenge was split into four sections, labelled C1-C4. Challenges C1 and C2 comprise univariate problems, where the goal is to estimate extreme quantiles for a non-stationary time series exhibiting several complex features. We propose a flexible modelling technique, based on generalised additive models, with diagnostics indicating generally good performance for the observed data. Challenges C3 and C4 concern multivariate problems where the focus is on estimating joint extremal probabilities. For challenge C3, we propose an extension of available models in the multivariate literature and use this framework to estimate extreme probabilities in the presence of non-stationary dependence. Finally, for challenge C4, which concerns a 50 dimensional random vector, we employ a clustering technique to achieve dimension reduction and use a conditional modelling approach to estimate extremal probabilities across independent groups of variables.

Networking · 代價 · Nuance · 分離的 · 操作 ·

2023 年 12 月 15 日

The cost of artificial latency in the PBS context

Umberto Natale,Michael Moser

We present a comprehensive analysis of the implications of artificial latency in the Proposer-Builder Separation framework on the Ethereum network. Focusing on the MEV-Boost auction system, we analyze how strategic latency manipulation affects Maximum Extractable Value yields and network integrity. Our findings reveal both increased profitability for node operators and significant systemic challenges, including heightened network inefficiencies and centralization risks. We empirically validates these insights with a pilot that Chorus One has been operating on Ethereum mainnet. We demonstrate the nuanced effects of latency on bid selection and validator dynamics. Ultimately, this research underscores the need for balanced strategies that optimize Maximum Extractable Value capture while preserving the Ethereum network's decentralization ethos.

估計/估計量 · 情景 · Performer · 規范化的 · 樣例 ·

2023 年 12 月 14 日

Stein estimation in a multivariate setting

Adrian Fischer,Robert E. Gaunt,Yvik Swan

from arxiv, 19 pages

We use Stein characterisations to derive new moment-type estimators for the parameters of several multivariate distributions in the i.i.d. case; we also derive the asymptotic properties of these estimators. Our examples include the multivariate truncated normal distribution and several spherical distributions. The estimators are explicit and therefore provide an interesting alternative to the maximum-likelihood estimator. The quality of these estimators is assessed through competitive simulation studies in which we compare their behaviour to the performance of other estimators available in the literature.