The traditional variable control charts, such as the X-bar chart, are widely used to monitor variation in a process. They have been shown to perform well for monitoring processes under the general assumptions that the observations are normally distributed without data contamination and that the sample sizes from the process are all equal. However, these two assumptions may not be met and satisfied in many practical applications and thus make them potentially limited for widespread application especially in production processes. In this paper, we alleviate this limitation by providing a novel method for constructing the robust X-bar control charts, which can simultaneously deal with both data contamination and unequal sample sizes. The proposed method for the process parameters is optimal in a sense of the best linear unbiased estimation. Numerical results from extensive Monte Carlo simulations and a real data analysis reveal that traditional control charts seriously underperform for monitoring process in the presence of data contamination and are extremely sensitive to even a single contaminated value, while the proposed robust control charts outperform in a manner that is comparable with the traditional ones, whereas they are far superior when the data are contaminated by outliers.
Systems Theoretic Process Analysis (STPA) is a systematic approach for hazard analysis that has been effective in the safety analysis of systems across industrial sectors from transportation, energy, to national defence. The unstoppable trend of using Machine Learning (ML) in safety-critical systems has led to the pressing need of extending STPA to Learning-Enabled Systems (LESs). However, while work has been carried out over different example systems, without a systematic review, it is unclear how effective and generalisable the extended STPA methods are and, more importantly, if further improvements can be made. To this end, we present our survey on 29 papers selected through a systematic literature search. We summarise and compare relevant research from five perspectives (attributes of concern, object under study, modifications to STPA, derivatives of the analysis, and process modelled as a control loop) to conclude insights. Furthermore, based on the survey results, we identify room for improvement and accordingly introduce a new method named DeepSTPA, which enhances STPA from two aspects that are missing from the state-of-the-art: (i) it explicitly models how the control loop structures are extended to identify hazards from the data-driven development process at every stage of the ML lifecycle; (ii) it models fine-grained functionalities deep into the layer-levels of ML models to detect root causes. We demonstrate DeepSTPA through a case study on an autonomous underwater vehicle (AUV).
Theoretically, the conditional expectation of a square-integrable random variable $Y$ given a $d$-dimensional random vector $X$ can be obtained by minimizing the mean squared distance between $Y$ and $f(X)$ over all Borel measurable functions $f \colon \mathbb{R}^d \to \mathbb{R}$. However, in many applications this minimization problem cannot be solved exactly, and instead, a numerical method which computes an approximate minimum over a suitable subfamily of Borel functions has to be used. The quality of the result depends on the adequacy of the subfamily and the performance of the numerical method. In this paper, we derive an expected value representation of the minimal mean squared distance which in many applications can efficiently be approximated with a standard Monte Carlo average. This enables us to provide guarantees for the accuracy of any numerical approximation of a given conditional expectation. We illustrate the method by assessing the quality of approximate conditional expectations obtained by linear, polynomial and neural network regression in different concrete examples.
Augmenting the control arm of a randomized controlled trial (RCT) with external data may increase power at the risk of introducing bias. Existing data fusion estimators generally rely on stringent assumptions or may have decreased coverage or power in the presence of bias. Framing the problem as one of data-adaptive experiment selection, potential experiments include the RCT only or the RCT combined with different candidate real-world datasets. To select and analyze the experiment with the optimal bias-variance tradeoff, we develop a novel experiment-selector cross-validated targeted maximum likelihood estimator (ES-CVTMLE). The ES-CVTMLE uses two bias estimates: 1) a function of the difference in conditional mean outcome under control between the RCT and combined experiments and 2) an estimate of the average treatment effect on a negative control outcome (NCO). We define the asymptotic distribution of the ES-CVTMLE under varying magnitudes of bias and construct confidence intervals by Monte Carlo simulation. In simulations involving violations of identification assumptions, the ES-CVTMLE had better coverage than test-then-pool approaches and an NCO-based bias adjustment approach and higher power than one implementation of a Bayesian dynamic borrowing approach. We further demonstrate the ability of the ES-CVTMLE to distinguish biased from unbiased external controls through a re-analysis of the effect of liraglutide on glycemic control from the LEADER trial. The ES-CVTMLE has the potential to improve power while providing relatively robust inference for future hybrid RCT-RWD studies.
Gibbs sampling methods are standard tools to perform posterior inference for mixture models. These have been broadly classified into two categories: marginal and conditional methods. While conditional samplers are more widely applicable than marginal ones, they may suffer from slow mixing in infinite mixtures, where some form of truncation, either deterministic or random, is required. In mixtures with random number of components, the exploration of parameter spaces of different dimensions can also be challenging. We tackle these issues by expressing the mixture components in the random order of appearance in an exchangeable sequence directed by the mixing distribution. We derive a sampler that is straightforward to implement for mixing distributions with tractable size-biased ordered weights, and that can be readily adapted to mixture models for which marginal samplers are not available. In infinite mixtures, no form of truncation is necessary. As for finite mixtures with random dimension, a simple updating of the number of components is obtained by a blocking argument, thus, easing challenges found in trans-dimensional moves via Metropolis-Hastings steps. Additionally, sampling occurs in the space of ordered partitions with blocks labelled in the least element order, which endows the sampler with good mixing properties. The performance of the proposed algorithm is evaluated in a simulation study.
Stance detection refers to the task of extracting the standpoint (Favor, Against or Neither) towards a target in given texts. Such research gains increasing attention with the proliferation of social media contents. The conventional framework of handling stance detection is converting it into text classification tasks. Deep learning models have already replaced rule-based models and traditional machine learning models in solving such problems. Current deep neural networks are facing two main challenges which are insufficient labeled data and information in social media posts and the unexplainable nature of deep learning models. A new pre-trained language model chatGPT was launched on Nov 30, 2022. For the stance detection tasks, our experiments show that ChatGPT can achieve SOTA or similar performance for commonly used datasets including SemEval-2016 and P-Stance. At the same time, ChatGPT can provide explanation for its own prediction, which is beyond the capability of any existing model. The explanations for the cases it cannot provide classification results are especially useful. ChatGPT has the potential to be the best AI model for stance detection tasks in NLP, or at least change the research paradigm of this field. ChatGPT also opens up the possibility of building explanatory AI for stance detection.
Fine-grained information on translation errors is helpful for the translation evaluation community. Existing approaches can not synchronously consider error position and type, failing to integrate the error information of both. In this paper, we propose Fine-Grained Translation Error Detection (FG-TED) task, aiming at identifying both the position and the type of translation errors on given source-hypothesis sentence pairs. Besides, we build an FG-TED model to predict the \textbf{addition} and \textbf{omission} errors -- two typical translation accuracy errors. First, we use a word-level classification paradigm to form our model and use the shortcut learning reduction to relieve the influence of monolingual features. Besides, we construct synthetic datasets for model training, and relieve the disagreement of data labeling in authoritative datasets, making the experimental benchmark concordant. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results on the restored dataset. Our model also delivers more reliable predictions on low-resource and transfer scenarios than existing baselines. The related datasets and the source code will be released in the future.
Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. Marginal contribution feature importance (MCI) was developed to break this trend by providing a useful framework for quantifying the relationships in data. In this work, we aim to improve upon the theoretical properties, performance, and runtime of MCI by introducing ultra-marginal feature importance (UMFI), which uses dependence removal techniques from the AI fairness literature as its foundation. We first propose axioms for feature importance methods that seek to explain the causal and associative relationships in data, and we prove that UMFI satisfies these axioms under basic assumptions. We then show on real and simulated data that UMFI performs better than MCI, especially in the presence of correlated interactions and unrelated features, while partially learning the structure of the causal graph and reducing the exponential runtime of MCI to super-linear.
Bayesian Optimization (BO) links Gaussian Process (GP) surrogates with sequential design toward optimizing expensive-to-evaluate black-box functions. Example design heuristics, or so-called acquisition functions, like expected improvement (EI), balance exploration and exploitation to furnish global solutions under stringent evaluation budgets. However, they fall short when solving for robust optima, meaning a preference for solutions in a wider domain of attraction. Robust solutions are useful when inputs are imprecisely specified, or where a series of solutions is desired. A common mathematical programming technique in such settings involves an adversarial objective, biasing a local solver away from ``sharp'' troughs. Here we propose a surrogate modeling and active learning technique called robust expected improvement (REI) that ports adversarial methodology into the BO/GP framework. After describing the methods, we illustrate and draw comparisons to several competitors on benchmark synthetic and real problems of varying complexity.
Classic machine learning methods are built on the $i.i.d.$ assumption that training and testing data are independent and identically distributed. However, in real scenarios, the $i.i.d.$ assumption can hardly be satisfied, rendering the sharp drop of classic machine learning algorithms' performances under distributional shifts, which indicates the significance of investigating the Out-of-Distribution generalization problem. Out-of-Distribution (OOD) generalization problem addresses the challenging setting where the testing distribution is unknown and different from the training. This paper serves as the first effort to systematically and comprehensively discuss the OOD generalization problem, from the definition, methodology, evaluation to the implications and future directions. Firstly, we provide the formal definition of the OOD generalization problem. Secondly, existing methods are categorized into three parts based on their positions in the whole learning pipeline, namely unsupervised representation learning, supervised model learning and optimization, and typical methods for each category are discussed in detail. We then demonstrate the theoretical connections of different categories, and introduce the commonly used datasets and evaluation metrics. Finally, we summarize the whole literature and raise some future directions for OOD generalization problem. The summary of OOD generalization methods reviewed in this survey can be found at //out-of-distribution-generalization.com.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.