Labeling data is an important step in the supervised machine learning lifecycle. It is a laborious human activity comprised of repeated decision making: the human labeler decides which of several potential labels to apply to each example. Prior work has shown that providing AI assistance can improve the accuracy of binary decision tasks. However, the role of AI assistance in more complex data-labeling scenarios with a larger set of labels has not yet been explored. We designed an AI labeling assistant that uses a semi-supervised learning algorithm to predict the most probable labels for each example. We leverage these predictions to provide assistance in two ways: (i) providing a label recommendation and (ii) reducing the labeler's decision space by focusing their attention on only the most probable labels. We conducted a user study (n=54) to evaluate an AI-assisted interface for data labeling in this context. Our results highlight that the AI assistance improves both labeler accuracy and speed, especially when the labeler finds the correct label in the reduced label space. We discuss findings related to the presentation of AI assistance and design implications for intelligent labeling interfaces.
Growing individualization of products up to lot-size-1 and high volatility of product mixes lead to new challenges in the manufacturing domain, including the need for frequent reconfiguration of the system and reacting to changing orders. Thus, apart from functional aspects, safety aspects of the production system as well as product quality assurance aspects must be addressed for flexible and reconfigurable manufacturing systems at runtime. To cope with the mentioned challenges, we present an integrated model-based approach SQUADfps (machine Safety and product QUAlity for flexible proDuction systems) to support the automatic conduct of the risk assessment of flexible production scenarios in terms of safety as well as the process-FMEA to ensure that the requirements w.r.t. the quality of the production process and the resulting product are met. Our approach is based on a meta-model which captures all information needed to conduct both risk assessment and process-FMEA dynamically during the runtime, and thus enables flexible manufacturing scenarios with frequent changes of the production system and orders up to a lot-size of one while guaranteeing safety and product quality requirements. The automatically generated results will assist human in making further decisions. To demonstrate the feasibility of our approach, we apply it to a case study.
Despite the success of machine learning applications in science, industry, and society in general, many approaches are known to be non-robust, often relying on spurious correlations to make predictions. Spuriousness occurs when some features correlate with labels but are not causal; relying on such features prevents models from generalizing to unseen environments where such correlations break. In this work, we focus on image classification and propose two data generation processes to reduce spuriousness. Given human annotations of the subset of the features responsible (causal) for the labels (e.g. bounding boxes), we modify this causal set to generate a surrogate image that no longer has the same label (i.e. a counterfactual image). We also alter non-causal features to generate images still recognized as the original labels, which helps to learn a model invariant to these features. In several challenging datasets, our data generations outperform state-of-the-art methods in accuracy when spurious correlations break, and increase the saliency focus on causal features providing better explanations.
Language modeling (LM) for automatic speech recognition (ASR) does not usually incorporate utterance level contextual information. For some domains like voice assistants, however, additional context, such as the time at which an utterance was spoken, provides a rich input signal. We introduce an attention mechanism for training neural speech recognition language models on both text and non-linguistic contextual data. When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7.0% relative over a standard LM that does not incorporate contextual information. When evaluated on utterances extracted from the long tail of the dataset, our method improves perplexity by 9.0% relative over a standard LM and by over 2.8% relative when compared to a state-of-the-art model for contextual LM.
Interpolation and prediction have been useful approaches in modeling data in many areas of applications. The aim of this paper is the prediction of the next value of a time series (time series forecasting) using the techniques in interpolation of the spatial data, for the tow approaches kernel interpolation and kriging. We are interested in finding some sufficient conditions for the kernels and provide a detailed analyse of the prediction using kernel interpolation. Finally, we provide a natural idea to select a good kernel among a given family of kernels using only the data. We illustrate our results by application to the data set on the mean annual temperature of France and Morocco recorded for a period of 115 years (1901 to 2015).
Automatic techniques in the context of motor speech disorders (MSDs) are typically two-class techniques aiming to discriminate between dysarthria and neurotypical speech or between dysarthria and apraxia of speech (AoS). Further, although such techniques are proposed to support the perceptual assessment of clinicians, the automatic and perceptual classification accuracy has never been compared. In this paper, we investigate a three-class automatic technique and a set of handcrafted features for the discrimination of dysarthria, AoS and neurotypical speech. Instead of following the commonly used One-versus-One or One-versus-Rest approaches for multi-class classification, a hierarchical approach is proposed. Further, a perceptual study is conducted where speech and language pathologists are asked to listen to recordings of dysarthria, AoS, and neurotypical speech and decide which class the recordings belong to. The proposed automatic technique is evaluated on the same recordings and the automatic and perceptual classification performance are compared. The presented results show that the hierarchical classification approach yields a higher classification accuracy than baseline One-versus-One and One-versus-Rest approaches. Further, the presented results show that the automatic approach yields a higher classification accuracy than the perceptual assessment of speech and language pathologists, demonstrating the potential advantages of integrating automatic tools in clinical practice.
We examine counterfactual explanations for explaining the decisions made by model-based AI systems. The counterfactual approach we consider defines an explanation as a set of the system's data inputs that causally drives the decision (i.e., changing the inputs in the set changes the decision) and is irreducible (i.e., changing any subset of the inputs does not change the decision). We (1) demonstrate how this framework may be used to provide explanations for decisions made by general, data-driven AI systems that may incorporate features with arbitrary data types and multiple predictive models, and (2) propose a heuristic procedure to find the most useful explanations depending on the context. We then contrast counterfactual explanations with methods that explain model predictions by weighting features according to their importance (e.g., SHAP, LIME) and present two fundamental reasons why we should carefully consider whether importance-weight explanations are well-suited to explain system decisions. Specifically, we show that (i) features that have a large importance weight for a model prediction may not affect the corresponding decision, and (ii) importance weights are insufficient to communicate whether and how features influence decisions. We demonstrate this with several concise examples and three detailed case studies that compare the counterfactual approach with SHAP to illustrate various conditions under which counterfactual explanations explain data-driven decisions better than importance weights.
The pervasive use of smart speakers has raised numerous privacy concerns. While work to date provides an understanding of user perceptions of these threats, limited research focuses on how we can mitigate these concerns, either through redesigning the smart speaker or through dedicated privacy-preserving interventions. In this paper, we present the design and prototyping of two privacy-preserving interventions: `Obfuscator' targeted at disabling recording at the microphones, and `BlackOut' targeted at disabling power to the smart speaker. We present our findings from a technology probe study involving 24 households that interacted with our prototypes; the primary objective was to gain a better understanding of the design space for technological interventions that might address these concerns. Our data and findings reveal complex trade-offs among utility, privacy, and usability and stresses the importance of multi-functionality, aesthetics, ease-of-use, and form factor. We discuss the implications of our findings for the development of subsequent interventions and the future design of smart speakers.
Existing Collaborative Filtering (CF) methods are mostly designed based on the idea of matching, i.e., by learning user and item embeddings from data using shallow or deep models, they try to capture the associative relevance patterns in data, so that a user embedding can be matched with relevant item embeddings using designed or learned similarity functions. However, as a cognition rather than a perception intelligent task, recommendation requires not only the ability of pattern recognition and matching from data, but also the ability of cognitive reasoning in data. In this paper, we propose to advance Collaborative Filtering (CF) to Collaborative Reasoning (CR), which means that each user knows part of the reasoning space, and they collaborate for reasoning in the space to estimate preferences for each other. Technically, we propose a Neural Collaborative Reasoning (NCR) framework to bridge learning and reasoning. Specifically, we integrate the power of representation learning and logical reasoning, where representations capture similarity patterns in data from perceptual perspectives, and logic facilitates cognitive reasoning for informed decision making. An important challenge, however, is to bridge differentiable neural networks and symbolic reasoning in a shared architecture for optimization and inference. To solve the problem, we propose a modularized reasoning architecture, which learns logical operations such as AND ($\wedge$), OR ($\vee$) and NOT ($\neg$) as neural modules for implication reasoning ($\rightarrow$). In this way, logical expressions can be equivalently organized as neural networks, so that logical reasoning and prediction can be conducted in a continuous space. Experiments on real-world datasets verified the advantages of our framework compared with both shallow, deep and reasoning models.
It is becoming increasingly easy to automatically replace a face of one person in a video with the face of another person by using a pre-trained generative adversarial network (GAN). Recent public scandals, e.g., the faces of celebrities being swapped onto pornographic videos, call for automated ways to detect these Deepfake videos. To help developing such methods, in this paper, we present the first publicly available set of Deepfake videos generated from videos of VidTIMIT database. We used open source software based on GANs to create the Deepfakes, and we emphasize that training and blending parameters can significantly impact the quality of the resulted videos. To demonstrate this impact, we generated videos with low and high visual quality (320 videos each) using differently tuned parameter sets. We showed that the state of the art face recognition systems based on VGG and Facenet neural networks are vulnerable to Deepfake videos, with 85.62% and 95.00% false acceptance rates respectively, which means methods for detecting Deepfake videos are necessary. By considering several baseline approaches, we found that audio-visual approach based on lip-sync inconsistency detection was not able to distinguish Deepfake videos. The best performing method, which is based on visual quality metrics and is often used in presentation attack detection domain, resulted in 8.97% equal error rate on high quality Deepfakes. Our experiments demonstrate that GAN-generated Deepfake videos are challenging for both face recognition systems and existing detection methods, and the further development of face swapping technology will make it even more so.
Recent advances in the field of network embedding have shown the low-dimensional network representation is playing a critical role in network analysis. However, most of the existing principles of network embedding do not incorporate auxiliary information such as content and labels of nodes flexibly. In this paper, we take a matrix factorization perspective of network embedding, and incorporate structure, content and label information of the network simultaneously. For structure, we validate that the matrix we construct preserves high-order proximities of the network. Label information can be further integrated into the matrix via the process of random walk sampling to enhance the quality of embedding in an unsupervised manner, i.e., without leveraging downstream classifiers. In addition, we generalize the Skip-Gram Negative Sampling model to integrate the content of the network in a matrix factorization framework. As a consequence, network embedding can be learned in a unified framework integrating network structure and node content as well as label information simultaneously. We demonstrate the efficacy of the proposed model with the tasks of semi-supervised node classification and link prediction on a variety of real-world benchmark network datasets.