Demographic biases in source datasets have been shown as one of the causes of unfairness and discrimination in the predictions of Machine Learning models. One of the most prominent types of demographic bias are statistical imbalances in the representation of demographic groups in the datasets. In this paper, we study the measurement of these biases by reviewing the existing metrics, including those that can be borrowed from other disciplines. We develop a taxonomy for the classification of these metrics, providing a practical guide for the selection of appropriate metrics. To illustrate the utility of our framework, and to further understand the practical characteristics of the metrics, we conduct a case study of 20 datasets used in Facial Emotion Recognition (FER), analyzing the biases present in them. Our experimental results show that many metrics are redundant and that a reduced subset of metrics may be sufficient to measure the amount of demographic bias. The paper provides valuable insights for researchers in AI and related fields to mitigate dataset bias and improve the fairness and accuracy of AI models. The code is available at //github.com/irisdominguez/dataset_bias_metrics.
We develop the theory of Yoneda Ext groups over a ring in homotopy type theory (HoTT) and describe their interpretation into an $\infty$-topos. This is an abstract approach to Ext groups which does not require projective or injective resolutions. While it produces group objects that are a priori large, we show that the $\operatorname{Ext}^1$ groups are equivalent to small groups, leaving open the question of whether the higher Ext groups are essentially small as well. We also show that the $\operatorname{Ext}^1$ groups take on the usual form as a product of cyclic groups whenever the input modules are finitely presented and the ring is a PID (in the constructive sense). When interpreted into an $\infty$-topos of sheaves on a 1-category, our Ext groups recover (and give a resolution-free approach to) sheaf Ext groups, which arise in algebraic geometry. (These are also called "local" Ext groups.) We may therefore interpret results about Ext from HoTT and apply them to sheaf Ext. To show this, we prove that injectivity of modules in HoTT interprets to internal injectivity in these models. It follows, for example, that sheaf Ext can be computed using resolutions which are projective or injective in the sense of HoTT, when they exist, and we give an example of this in the projective case. We also discuss the relation between internal $\mathbb{Z} G$-modules (for a $0$-truncated group object $G$) and abelian groups in the slice over $BG$, and study the interpretation of our Ext groups in both settings.
Understanding the causal relationships among the variables of a system is paramount to explain and control its behaviour. Inferring the causal graph from observational data without interventions, however, requires a lot of strong assumptions that are not always realistic. Even for domain experts it can be challenging to express the causal graph. Therefore, metrics that quantitatively assess the goodness of a causal graph provide helpful checks before using it in downstream tasks. Existing metrics provide an absolute number of inconsistencies between the graph and the observed data, and without a baseline, practitioners are left to answer the hard question of how many such inconsistencies are acceptable or expected. Here, we propose a novel consistency metric by constructing a surrogate baseline through node permutations. By comparing the number of inconsistencies with those on the surrogate baseline, we derive an interpretable metric that captures whether the DAG fits significantly better than random. Evaluating on both simulated and real data sets from various domains, including biology and cloud monitoring, we demonstrate that the true DAG is not falsified by our metric, whereas the wrong graphs given by a hypothetical user are likely to be falsified.
This work is motivated by goal-oriented sensitivity analysis of inputs/outputs of complex simulators. More precisely we are interested in the ranking of the uncertain input variables that impact the most a feasible design domain. Most sensitivity analysis methods deal with scalar outputs. In this paper, we propose a way to perform sensitivity analysis when dealing with set-valued outputs. Our new methodology is driven by sensitivity analysis on excursion sets but can also be applied to setvalued simulators as in viability field, or when dealing with maps such as pollutant concentration maps or flooding zone maps. We propose a method based on the Hilbert Schmidt Independence Criterion (HSIC) with a kernel tailored to sets as outputs. A first contribution is the proof that this kernel is characteristic (i.e injectivity of the embedding in the associated Reproducing Kernel Hilbert Space), a required property for the HSIC interpretation in a sensitivity analysis context. We propose then to compute the HSIC-ANOVA indices which allow a decomposition of the input contributions. Using these indices, we can identify which inputs should be neglected (screening) and we can rank the others by influence (ranking). The estimation of these indices is also adapted to the set-valued outputs. Finally we test the proposed method on two test cases of excursion sets.
Chatbots based on Large Language Models (LLMs) have shown strong capabilities in language understanding. In this study, we explore the potential of LLMs in assisting corpus-based linguistic studies through automatic annotation of texts with specific categories of linguistic information. Specifically, we examined to what extent LLMs understand the functional elements constituting the speech act of apology from a local grammar perspective, by comparing the performance of ChatGPT (powered by GPT-3.5), Bing chatbot (powered by GPT-4), and a human coder in the annotation task. The results demonstrate that Bing chatbot significantly outperformed ChatGPT in the task. Compared to human annotator, the overall performance of Bing chatbot was slightly less satisfactory. However, it already achieved high F1 scores: 99.95% for the tag of APOLOGISING, 91.91% for REASON, 95.35% for APOLOGISER, 89.74% for APOLOGISEE, and 96.47% for INTENSIFIER. Therefore, we propose that LLM-assisted annotation is a promising automated approach for corpus studies.
The Shannon entropy of a random variable $X$ has much behaviour analogous to a signed measure. Previous work has concretized this connection by defining a signed measure $\mu$ on an abstract information space $\tilde{X}$, which is taken to represent the information that $X$ contains. This construction is sufficient to derive many measure-theoretical counterparts to information quantities such as the mutual information $I(X; Y) = \mu(\tilde{X} \cap \tilde{Y})$, the joint entropy $H(X,Y) = \mu(\tilde{X} \cup \tilde{Y})$, and the conditional entropy $H(X|Y) = \mu(\tilde{X}\, \setminus \, \tilde{Y})$. We demonstrate that there exists a much finer decomposition with intuitive properties which we call the logarithmic decomposition (LD). We show that this signed measure space has the useful property that its logarithmic atoms are easily characterised with negative or positive entropy, while also being coherent with Yeung's $I$-measure. We present the usability of our approach by re-examining the G\'acs-K\"orner common information from this new geometric perspective and characterising it in terms of our logarithmic atoms. We then highlight that our geometric refinement can account for an entire class of information quantities, which we call logarithmically decomposable quantities.
Nowadays, it is common for people to take photographs of every beverage, snack, or meal they eat and then post these photographs on social media platforms. Leveraging these social trends, real-time food recognition and reliable classification of these captured food images can potentially help replace some of the tedious recording and coding of food diaries to enable personalized dietary interventions. Although Central Asian cuisine is culturally and historically distinct, there has been little published data on the food and dietary habits of people in this region. To fill this gap, we aim to create a reliable dataset of regional foods that is easily accessible to both public consumers and researchers. To the best of our knowledge, this is the first work on creating a Central Asian Food Dataset (CAFD). The final dataset contains 42 food categories and over 16,000 images of national dishes unique to this region. We achieved a classification accuracy of 88.70\% (42 classes) on the CAFD using the ResNet152 neural network model. The food recognition models trained on the CAFD demonstrate computer vision's effectiveness and high accuracy for dietary assessment.
Most current audio-visual emotion recognition models lack the flexibility needed for deployment in practical applications. We envision a multimodal system that works even when only one modality is available and can be implemented interchangeably for either predicting emotional attributes or recognizing categorical emotions. Achieving such flexibility in a multimodal emotion recognition system is difficult due to the inherent challenges in accurately interpreting and integrating varied data sources. It is also a challenge to robustly handle missing or partial information while allowing direct switch between regression and classification tasks. This study proposes a \emph{versatile audio-visual learning} (VAVL) framework for handling unimodal and multimodal systems for emotion regression and emotion classification tasks. We implement an audio-visual framework that can be trained even when audio and visual paired data is not available for part of the training set (i.e., audio only or only video is present). We achieve this effective representation learning with audio-visual shared layers, residual connections over shared layers, and a unimodal reconstruction task. Our experimental results reveal that our architecture significantly outperforms strong baselines on both the CREMA-D and MSP-IMPROV corpora. Notably, VAVL attains a new state-of-the-art performance in the emotional attribute prediction task on the MSP-IMPROV corpus. Code available at: //github.com/ilucasgoncalves/VAVL
Emotion recognition in conversation (ERC) aims to detect the emotion label for each utterance. Motivated by recent studies which have proven that feeding training examples in a meaningful order rather than considering them randomly can boost the performance of models, we propose an ERC-oriented hybrid curriculum learning framework. Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC). In CC, we construct a difficulty measurer based on "emotion shift" frequency within a conversation, then the conversations are scheduled in an "easy to hard" schema according to the difficulty score returned by the difficulty measurer. For UC, it is implemented from an emotion-similarity perspective, which progressively strengthens the model's ability in identifying the confusing emotions. With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models and we are able to achieve new state-of-the-art results on four public ERC datasets.
Search in social networks such as Facebook poses different challenges than in classical web search: besides the query text, it is important to take into account the searcher's context to provide relevant results. Their social graph is an integral part of this context and is a unique aspect of Facebook search. While embedding-based retrieval (EBR) has been applied in eb search engines for years, Facebook search was still mainly based on a Boolean matching model. In this paper, we discuss the techniques for applying EBR to a Facebook Search system. We introduce the unified embedding framework developed to model semantic embeddings for personalized search, and the system to serve embedding-based retrieval in a typical search system based on an inverted index. We discuss various tricks and experiences on end-to-end optimization of the whole system, including ANN parameter tuning and full-stack optimization. Finally, we present our progress on two selected advanced topics about modeling. We evaluated EBR on verticals for Facebook Search with significant metrics gains observed in online A/B experiments. We believe this paper will provide useful insights and experiences to help people on developing embedding-based retrieval systems in search engines.
Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.