秋霞网一区二区三区-欧美人在线一区二区三区

While methods for measuring and correcting differential performance in risk prediction models have proliferated in recent years, most existing techniques can only be used to assess fairness across relatively large subgroups. The purpose of algorithmic fairness efforts is often to redress discrimination against groups that are both marginalized and small, so this sample size limitation often prevents existing techniques from accomplishing their main aim. We take a three-pronged approach to address the problem of quantifying fairness with small subgroups. First, we propose new estimands built on the "counterfactual fairness" framework that leverage information across groups. Second, we estimate these quantities using a larger volume of data than existing techniques. Finally, we propose a novel data borrowing approach to incorporate "external data" that lacks outcomes and predictions but contains covariate and group membership information. This less stringent requirement on the external data allows for more possibilities for external data sources. We demonstrate practical application of our estimators to a risk prediction model used by a major Midwestern health system during the COVID-19 pandemic.

相關內容

Facebook AI Research

關注 10

模型評估 · 共軛梯度 · Notability · 線性的 · 共軛 ·

2024 年 3 月 11 日

A preconditioning for the spectral solution of incompressible variable-density flows

L. Reynier,Bastien Di Pierro,Frédéric Alizard,Anne Cadiou,Lionel Le Penven,Marc Buffat

In the present study, the efficiency of preconditioners for solving linear systems associated with the discretized variable-density incompressible Navier-Stokes equations with semiimplicit second-order accuracy in time and spectral accuracy in space is investigated. The method, in which the inverse operator for the constant-density flow system acts as preconditioner, is implemented for three iterative solvers: the General Minimal Residual, the Conjugate Gradient and the Richardson Minimal Residual. We discuss the method, first, in the context of the one-dimensional flow case where a top-hat like profile for the density is used. Numerical evidence shows that the convergence is significantly improved due to the notable decrease in the condition number of the operators. Most importantly, we then validate the robustness and convergence properties of the method on two more realistic problems: the two-dimensional Rayleigh-Taylor instability problem and the three-dimensional variable-density swirling jet.

泛函 · 表示定理 · 表示 · 統計理論 ·

2024 年 3 月 11 日

Maxitive functions with respect to general orders

M. Kupper,J. M. Zapata

In decision-making, maxitive functions are used for worst-case and best-case evaluations. Maxitivity gives rise to a rich structure that is well-studied in the context of the pointwise order. In this article, we investigate maxitivity with respect to general preorders and provide a representation theorem for such functionals. The results are illustrated for different stochastic orders in the literature, including the usual stochastic order, the increasing convex/concave order, and the dispersive order.

分解的 · 多峰值 · 潛在 · 自編碼器 · 變分自編碼 ·

2024 年 3 月 10 日

Disentangling shared and private latent factors in multimodal Variational Autoencoders

Kaspar M?rtens,Christopher Yau

from arxiv, Accepted for publication in the Proceedings of Machine Learning in Computational Biology (MLCB 2023)

Generative models for multimodal data permit the identification of latent factors that may be associated with important determinants of observed data heterogeneity. Common or shared factors could be important for explaining variation across modalities whereas other factors may be private and important only for the explanation of a single modality. Multimodal Variational Autoencoders, such as MVAE and MMVAE, are a natural choice for inferring those underlying latent factors and separating shared variation from private. In this work, we investigate their capability to reliably perform this disentanglement. In particular, we highlight a challenging problem setting where modality-specific variation dominates the shared signal. Taking a cross-modal prediction perspective, we demonstrate limitations of existing models, and propose a modification how to make them more robust to modality-specific variation. Our findings are supported by experiments on synthetic as well as various real-world multi-omics data sets.

INTERACT · TOOLS · SimPLe · 基準 · 試驗 ·

2024 年 3 月 10 日

Interaction tests with covariate-adaptive randomization

Likun Zhang,Wei Ma

Treatment-covariate interaction tests are commonly applied by researchers to examine whether the treatment effect varies across patient subgroups defined by baseline characteristics. The objective of this study is to explore treatment-covariate interaction tests involving covariate-adaptive randomization. Without assuming a parametric data generating model, we investigate usual interaction tests and observe that they tend to be conservative: specifically, their limiting rejection probabilities under the null hypothesis do not exceed the nominal level and are typically strictly lower than it. To address this problem, we propose modifications to the usual tests to obtain corresponding valid tests. Moreover, we introduce a novel class of stratified-adjusted interaction tests that are simple, more powerful than the usual and modified tests, and broadly applicable to most covariate-adaptive randomization methods. The results are general to encompass two types of interaction tests: one involving stratification covariates and the other involving additional covariates that are not used for randomization. Our study clarifies the application of interaction tests in clinical trials and offers valuable tools for revealing treatment heterogeneity, crucial for advancing personalized medicine.

估計/估計量 · 極大似然 · 最大似然估計 · 分離的 · 簇 ·

2024 年 3 月 9 日

Nonparametric consistency for maximum likelihood estimation and clustering based on mixtures of elliptically-symmetric distributions

Pietro Coretto,Christian Hennig

The consistency of the maximum likelihood estimator for mixtures of elliptically-symmetric distributions for estimating its population version is shown, where the underlying distribution $P$ is nonparametric and does not necessarily belong to the class of mixtures on which the estimator is based. In a situation where $P$ is a mixture of well enough separated but nonparametric distributions it is shown that the components of the population version of the estimator correspond to the well separated components of $P$. This provides some theoretical justification for the use of such estimators for cluster analysis in case that $P$ has well separated subpopulations even if these subpopulations differ from what the mixture model assumes.

Learning · 約束 · 可約的 · 模型評估 · 設計 ·

2024 年 3 月 9 日

Index-aware learning of circuits

Idoia Cortes Garcia,Peter F?rster,Lennart Jansen,Wil Schilders,Sebastian Sch?ps

from arxiv, 21 pages, 16 figures

Electrical circuits are present in a variety of technologies, making their design an important part of computer aided engineering. The growing number of parameters that affect the final design leads to a need for new approaches to quantify their impact. Machine learning may play a key role in this regard, however current approaches often make suboptimal use of existing knowledge about the system at hand. In terms of circuits, their description via modified nodal analysis is well-understood. This particular formulation leads to systems of differential-algebraic equations (DAEs) which bring with them a number of peculiarities, e.g. hidden constraints that the solution needs to fulfill. We use the recently introduced dissection index that can decouple a given system of DAEs into ordinary differential equations, only depending on differential variables, and purely algebraic equations, that describe the relations between differential and algebraic variables. The idea is to then only learn the differential variables and reconstruct the algebraic ones using the relations from the decoupling. This approach guarantees that the algebraic constraints are fulfilled up to the accuracy of the nonlinear system solver, and it may also reduce the learning effort as only the differential variables need to be learned.

蒙特卡羅 · 近似 · 蒙特卡羅方法 · 離散化 · 相互獨立的 ·

2024 年 3 月 9 日

Multilevel Monte Carlo methods for positivity-preserving approximations of the Heston 3/2-model

Xiaojuan Wu,Siqing Gan

This article is concerned with the multilevel Monte Carlo (MLMC) methods for approximating expectations of some functions of the solution to the Heston 3/2-model from mathematical finance, which takes values in $(0, \infty)$ and possesses superlinearly growing drift and diffusion coefficients. To discretize the SDE model, a new Milstein-type scheme is proposed to produce independent sample paths. The proposed scheme can be explicitly solved and is positivity-preserving unconditionally, i.e., for any time step-size $h>0$. This positivity-preserving property for large discretization time steps is particularly desirable in the MLMC setting. Furthermore, a mean-square convergence rate of order one is proved in the non-globally Lipschitz regime, which is not trivial, as the diffusion coefficient grows super-linearly. The obtained order-one convergence in turn promises the desired relevant variance of the multilevel estimator and justifies the optimal complexity $\mathcal{O}(\epsilon^{-2})$ for the MLMC approach, where $\epsilon > 0$ is the required target accuracy. Numerical experiments are finally reported to confirm the theoretical findings.

共軛 · 輸出 · 數據要素 · 操作 · 圖 ·

2024 年 3 月 7 日

Conjugate operators for transparent, explorable research outputs

Joseph Bond,Cristina David,Minh Nguyen,Dominic Orchard,Roly Perera

Charts, figures, and text derived from data play an important role in decision making, from data-driven policy development to day-to-day choices informed by online articles. Making sense of, or fact-checking, outputs means understanding how they relate to the underlying data. Even for domain experts with access to the source code and data sets, this poses a significant challenge. In this paper we introduce a new program analysis framework which supports interactive exploration of fine-grained I/O relationships directly through computed outputs, making use of dynamic dependence graphs. Our main contribution is a novel notion in data provenance which we call related inputs, a relation of mutual relevance or "cognacy" which arises between inputs when they contribute to common features of the output. Queries of this form allow readers to ask questions like "What outputs use this data element, and what other data elements are used along with it?". We show how Jonsson and Tarski's concept of conjugate operators on Boolean algebras appropriately characterises the notion of cognacy in a dependence graph, and give a procedure for computing related inputs over such a graph.

缺失值 · Learning · 均值 · 監督學習 · 監督 ·

2024 年 3 月 7 日

On the consistency of supervised learning with missing values

Julie Josse,Jacob M. Chen,Nicolas Prost,Ga?l Varoquaux,Erwan Scornet

In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data. We show the consistency of two approaches in prediction. A striking result is that the widely-used method of imputing with a constant, such as the mean prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is pointed at for distorting the distribution of the data. That such a simple approach can be consistent is important in practice. We also show that a predictor suited for complete observations can predict optimally on incomplete data,through multiple imputation.Finally, to compare imputation with learning directly with a model that accounts for missing values, we analyze further decision trees. These can naturally tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing theoretically and empirically different missing values strategies in trees, we recommend using the "missing incorporated in attribute" method as it can handle both non-informative and informative missing values.

泛函 · 黑盒 · MoDELS · Analysis · 分解 ·

2024 年 3 月 7 日

Hoeffding decomposition of black-box models with dependent inputs

Marouane Il Idrissi,Nicolas Bousquet,Fabrice Gamboa,Bertrand Iooss,Jean-Michel Loubes

One of the main challenges for interpreting black-box models is the ability to uniquely decompose square-integrable functions of non-independent random inputs into a sum of functions of every possible subset of variables. However, dealing with dependencies among inputs can be complicated. We propose a novel framework to study this problem, linking three domains of mathematics: probability theory, functional analysis, and combinatorics. We show that, under two reasonable assumptions on the inputs (non-perfect functional dependence and non-degenerate stochastic dependence), it is always possible to decompose such a function uniquely. This generalizes the well-known Hoeffding decomposition. The elements of this decomposition can be expressed using oblique projections and allow for novel interpretability indices for evaluation and variance decomposition purposes. The properties of these novel indices are studied and discussed. This generalization offers a path towards a more precise uncertainty quantification, which can benefit sensitivity analysis and interpretability studies whenever the inputs are dependent. This decomposition is illustrated analytically, and the challenges for adopting these results in practice are discussed.