Classification, a heavily-studied data-driven machine learning task, drives an increasing number of prediction systems involving critical human decisions such as loan approval and criminal risk assessment. However, classifiers often demonstrate discriminatory behavior, especially when presented with biased data. Consequently, fairness in classification has emerged as a high-priority research area. Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness, including the topic of fair classification. The interdisciplinary efforts in fair classification, with machine learning research having the largest presence, have resulted in a large number of fairness notions and a wide range of approaches that have not been systematically evaluated and compared. In this paper, we contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, robustness to data errors, sensitivity to underlying ML model, data efficiency, and stability using a variety of metrics and real-world datasets. Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance. We also discuss general principles for choosing approaches suitable for different practical settings, and identify areas where data-management-centric solutions are likely to have the most impact.
In 5G and beyond, the newly emerging services, such as edge computing/intelligence services, may demand the provision of heterogeneous communications, computing, and storage (CCS) resources on and across network entities multihop apart. In such cases, traditional resource-oriented auction schemes, where buyers place bids on resources, may not be effective in providing end-to-end (E2E) quality-of-service (QoS) guarantees. To overcome these limitations, in this article, we coin the concept of E2E service auction where the auction commodities are E2E services rather than certain resource. Under this framework, buyers simply bid for services with E2E QoS requirements without having to know the inner working (which resources are behind). To guarantee E2E QoS for winning bids while ensuring essential economic properties, E2E service auction requires addressing the joint problem of network optimization and auction design with both economical and QoS constraints. To substantiate the mechanism design, we illustrate how to devise E2E service auctions for edge computing systems under various scenarios. We also identify the research opportunities on E2E service auction mechanism design for other critical use cases, including edge intelligence.
Policy evaluation based on A/B testing has attracted considerable interest in digital marketing, but such evaluation in ride-sourcing platforms (e.g., Uber and Didi) is not well studied primarily due to the complex structure of their temporal and/or spatial dependent experiments. Motivated by policy evaluation in ride-sourcing platforms, the aim of this paper is to establish causal relationship between platform's policies and outcomes of interest under a switchback design. We propose a novel potential outcome framework based on a temporal varying coefficient decision process (VCDP) model to capture the dynamic treatment effects in temporal dependent experiments. We further characterize the average treatment effect by decomposing it as the sum of direct effect (DE) and indirect effect (IE). We develop estimation and inference procedures for both DE and IE. Furthermore, we propose a spatio-temporal VCDP to deal with spatiotemporal dependent experiments. For both VCDP models, we establish the statistical properties (e.g., weak convergence and asymptotic power) of our estimation and inference procedures. We conduct extensive simulations to investigate the finite-sample performance of the proposed estimation and inference procedures. We examine how our VCDP models can help improve policy evaluation for various dispatching and dispositioning policies in Didi.
Bayesian nonparametric methods are a popular choice for analysing survival data due to their ability to flexibly model the distribution of survival times. These methods typically employ a nonparametric prior on the survival function that is conjugate with respect to right-censored data. Eliciting these priors, particularly in the presence of covariates, can be challenging and inference typically relies on computationally intensive Markov chain Monte Carlo schemes. In this paper, we build on recent work that recasts Bayesian inference as assigning a predictive distribution on the unseen values of a population conditional on the observed samples, thus avoiding the need to specify a complex prior. We describe a copula-based predictive update which admits a scalable sequential importance sampling algorithm to perform inference that properly accounts for right-censoring. We provide theoretical justification through an extension of Doob's consistency theorem and illustrate the method on a number of simulated and real data sets, including an example with covariates. Our approach enables analysts to perform Bayesian nonparametric inference through only the specification of a predictive distribution.
Deep learning has the potential to augment several clinically useful aspects of the radiologist's workflow such as medical imaging interpretation. However, the translation of deep learning algorithms into clinical practice has been hindered by relative lack of transparency in these algorithms compared to more traditional statistical methods. Specifically, common deep learning models lack intuitive and rigorous methods of conveying prediction confidence in a calibrated manner, which ultimately restricts widespread use of these "black box" systems for critical decision-making. Furthermore, numerous demonstrations of algorithmic bias in clinical machine learning have caused considerable hesitancy towards the deployment of these models for clinical application. To this end, we explore how conformal predictions can complement existing deep learning approaches by providing an intuitive way of expressing model uncertainty to facilitate greater transparency to clinical users. In this paper, we conduct field interviews with radiologists to assess potential use-cases of conformal predictors. Using insights collected from these interviews, we devise two use-cases and empirically evaluate several conformal methods on a dermatology photography dataset for skin lesion classification. Additionally, we show how group conformal predictors are more adaptive to differences between patient skin tones for malignant skin lesions. We find our conformal predictors to be a promising and generally applicable approach to increasing clinical usability and trustworthiness -- hopefully facilitating better modes of collaboration between medical AI tools and their clinical users.
The increasing interaction of industrial control systems (ICSs) with public networks and digital devices introduces new cyber threats to power systems and other critical infrastructure. Recent cyber-physical attacks such as Stuxnet and Irongate revealed unexpected ICS vulnerabilities and a need for improved security measures. Intrusion detection systems constitute a key security technology, which typically monitor network data for detecting malicious activities. However, a central characteristic of modern ICSs is the increasing interdependency of physical and cyber network processes. Thus, the integration of network and physical process data is seen as a promising approach to improve predictability in intrusion detection for ICSs by accounting for physical constraints and underlying process patterns. This work systematically assesses real-time cyber-physical intrusion detection and multiclass classification, based on a comparison to its purely network data-based counterpart and evaluation of misclassifications and detection delay. Multiple supervised machine learning models are applied on a recent cyber-physical dataset, describing various cyber attacks and physical faults on a generic ICS. A key finding is that integration of physical process data improves detection and classification of all attack types. In addition, it enables simultaneous processing of attacks and faults, paving the way for holistic cross-domain cause analysis.
Explicit knowledge of total community-level immune seroprevalence is critical to developing policies to mitigate the social and clinical impact of SARS-CoV-2. Publicly available vaccination data are frequently cited as a proxy for population immunity, but this metric ignores the effects of naturally-acquired immunity, which varies broadly throughout the country and world. Without broad or random sampling of the population, accurate measurement of persistent immunity post natural infection is generally unavailable. To enable tracking of both naturally-acquired and vaccine-induced immunity, we set up a synthetic random proxy based on routine hospital testing for estimating total Immunoglobulin G (IgG) prevalence in the sampled community. Our approach analyzes viral IgG testing data of asymptomatic patients who present for elective procedures within a hospital system. We apply multilevel regression and poststratification to adjust for demographic and geographic discrepancies between the sample and the community population. We then apply state-based vaccination data to categorize immune status as driven by natural infection or by vaccine. We have validated the model using verified clinical metrics of viral and symptomatic disease incidence to show the expected biological correlation of these entities with the timing, rate, and magnitude of seroprevalence. In mid-July 2021, the estimated immunity level was 74% with the administered vaccination rate of 45% in the two counties. The metric improves real-time understanding of immunity to COVID-19 as it evolves and the coordination of policy responses to the disease, toward an inexpensive and easily operational surveillance system that transcends the limits of vaccination datasets alone.
Ensemble methods based on subsampling, such as random forests, are popular in applications due to their high predictive accuracy. Existing literature views a random forest prediction as an infinite-order incomplete U-statistic to quantify its uncertainty. However, these methods focus on a small subsampling size of each tree, which is theoretically valid but practically limited. This paper develops an unbiased variance estimator based on incomplete U-statistics, which allows the tree size to be comparable with the overall sample size, making statistical inference possible in a broader range of real applications. Simulation results demonstrate that our estimators enjoy lower bias and more accurate confidence interval coverage without additional computational costs. We also propose a local smoothing procedure to reduce the variation of our estimator, which shows improved numerical performance when the number of trees is relatively small. Further, we investigate the ratio consistency of our proposed variance estimator under specific scenarios. In particular, we develop a new "double U-statistic" formulation to analyze the Hoeffding decomposition of the estimator's variance.
Experimental reproducibility and replicability are critical topics in machine learning. Authors have often raised concerns about their lack in scientific publications to improve the quality of the field. Recently, the graph representation learning field has attracted the attention of a wide research community, which resulted in a large stream of works. As such, several Graph Neural Network models have been developed to effectively tackle graph classification. However, experimental procedures often lack rigorousness and are hardly reproducible. Motivated by this, we provide an overview of common practices that should be avoided to fairly compare with the state of the art. To counter this troubling trend, we ran more than 47000 experiments in a controlled and uniform framework to re-evaluate five popular models across nine common benchmarks. Moreover, by comparing GNNs with structure-agnostic baselines we provide convincing evidence that, on some datasets, structural information has not been exploited yet. We believe that this work can contribute to the development of the graph learning field, by providing a much needed grounding for rigorous evaluations of graph classification models.
Algorithmic fairness has aroused considerable interests in data mining and machine learning communities recently. So far the existing research has been mostly focusing on the development of quantitative metrics to measure algorithm disparities across different protected groups, and approaches for adjusting the algorithm output to reduce such disparities. In this paper, we propose to study the problem of identification of the source of model disparities. Unlike existing interpretation methods which typically learn feature importance, we consider the causal relationships among feature variables and propose a novel framework to decompose the disparity into the sum of contributions from fairness-aware causal paths, which are paths linking the sensitive attribute and the final predictions, on the graph. We also consider the scenario when the directions on certain edges within those paths cannot be determined. Our framework is also model agnostic and applicable to a variety of quantitative disparity measures. Empirical evaluations on both synthetic and real-world data sets are provided to show that our method can provide precise and comprehensive explanations to the model disparities.
Background: Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. In prior work, [Crannell et. al.], we have studied an active cancer patient population on Twitter and compiled a set of tweets describing their experience with this disease. We refer to these online public testimonies as "Invisible Patient Reported Outcomes" (iPROs), because they carry relevant indicators, yet are difficult to capture by conventional means of self-report. Methods: Our present study aims to identify tweets related to the patient experience as an additional informative tool for monitoring public health. Using Twitter's public streaming API, we compiled over 5.3 million "breast cancer" related tweets spanning September 2016 until mid December 2017. We combined supervised machine learning methods with natural language processing to sift tweets relevant to breast cancer patient experiences. We analyzed a sample of 845 breast cancer patient and survivor accounts, responsible for over 48,000 posts. We investigated tweet content with a hedonometric sentiment analysis to quantitatively extract emotionally charged topics. Results: We found that positive experiences were shared regarding patient treatment, raising support, and spreading awareness. Further discussions related to healthcare were prevalent and largely negative focusing on fear of political legislation that could result in loss of coverage. Conclusions: Social media can provide a positive outlet for patients to discuss their needs and concerns regarding their healthcare coverage and treatment needs. Capturing iPROs from online communication can help inform healthcare professionals and lead to more connected and personalized treatment regimens.