Translation of perceptual descriptors such as the perceived affective quality attributes in the soundscape standard (ISO/TS 12913-2:2018) is an inherently intricate task, especially if the target language is used in multiple countries. Despite geographical proximity and a shared language of Bahasa Melayu (Standard Malay), differences in culture and language education policies between Singapore and Malaysia could invoke peculiarities in the affective appraisal of sounds. To generate provisional translations of the eight perceived affective attributes -- eventful, vibrant, pleasant, calm, uneventful, monotonous, annoying, and chaotic -- into Bahasa Melayu that is applicable in both Singapore and Malaysia, a binational expert-led approach supplemented by a quantitative evaluation framework was adopted. A set of preliminary translation candidates were developed via a four-stage process, firstly by a qualified translator, which was then vetted by linguistics experts, followed by examination via an experiential evaluation, and finally reviewed by the core research team. A total of 66 participants were then recruited cross-nationally to quantitatively evaluate the preliminary translation candidates. Of the eight attributes, cross-national differences were observed only in the translation of annoying. For instance, "menjengkelkan" was found to be significantly less understood in Singapore than in Malaysia, as well as less understandable than "membingitkan" within Singapore. Results of the quantitative evaluation also revealed the imperfect nature of foreign language translations for perceptual descriptors, which suggests a possibility for exploring corrective measures.
Subsequence anomaly detection in long sequences is an important problem with applications in a wide range of domains. However, the approaches proposed so far in the literature have severe limitations: they either require prior domain knowledge used to design the anomaly discovery algorithms, or become cumbersome and expensive to use in situations with recurrent anomalies of the same type. In this work, we address these problems, and propose an unsupervised method suitable for domain agnostic subsequence anomaly detection. Our method, Series2Graph, is based on a graph representation of a novel low-dimensionality embedding of subsequences. Series2Graph needs neither labeled instances (like supervised techniques) nor anomaly-free data (like zero-positive learning techniques), and identifies anomalies of varying lengths. The experimental results, on the largest set of synthetic and real datasets used to date, demonstrate that the proposed approach correctly identifies single and recurrent anomalies without any prior knowledge of their characteristics, outperforming by a large margin several competing approaches in accuracy, while being up to orders of magnitude faster. This paper has appeared in VLDB 2020.
Clustering has received much attention in Statistics and Machine learning with the aim of developing statistical models and autonomous algorithms which are capable of acquiring information from raw data in order to perform exploratory analysis.Several techniques have been developed to cluster sampled univariate vectors only considering the average value over the whole period and as such they have not been able to explore fully the underlying distribution as well as other features of the data, especially in presence of structured time series. We propose a model-based clustering technique that is based on quantile regression permitting us to cluster bivariate time series at different quantile levels. We model the within cluster density using asymmetric Laplace distribution allowing us to take into account asymmetry in the distribution of the data. We evaluate the performance of the proposed technique through a simulation study. The method is then applied to cluster time series observed from Glob-colour satellite data related to trophic status indices with aim of evaluating their temporal dynamics in order to identify homogeneous areas, in terms of trophic status, in the Gulf of Gabes.
Falls, highly common in the constantly increasing global aging population, can have a variety of negative effects on their health, well-being, and quality of life, including restricting their capabilities to conduct Activities of Daily Living (ADLs), which are crucial for one's sustenance. Timely assistance during falls is highly necessary, which involves tracking the indoor location of the elderly during their diverse navigational patterns associated with ADLs to detect the precise location of a fall. With the decreasing caregiver population on a global scale, it is important that the future of intelligent living environments can detect falls during ADLs while being able to track the indoor location of the elderly in the real world. To address these challenges, this work proposes a cost-effective and simplistic design paradigm for an Ambient Assisted Living system that can capture multimodal components of user behaviors during ADLs that are necessary for performing fall detection and indoor localization in a simultaneous manner in the real world. Proof of concept results from real-world experiments are presented to uphold the effective working of the system. The findings from two comparison studies with prior works in this field are also presented to uphold the novelty of this work. The first comparison study shows how the proposed system outperforms prior works in the areas of indoor localization and fall detection in terms of the effectiveness of its software design and hardware design. The second comparison study shows that the cost for the development of this system is the least as compared to prior works in these fields, which involved real-world development of the underlining systems, thereby upholding its cost-effective nature.
The idea that social media platforms like Twitter are inhabited by vast numbers of social bots has become widely accepted in recent years. Social bots are assumed to be automated social media accounts operated by malicious actors with the goal of manipulating public opinion. They are credited with the ability to produce content autonomously and to interact with human users. Social bot activity has been reported in many different political contexts, including the U.S. presidential elections, discussions about migration, climate change, and COVID-19. However, the relevant publications either use crude and questionable heuristics to discriminate between supposed social bots and humans or -- in the vast majority of the cases -- fully rely on the output of automatic bot detection tools, most commonly Botometer. In this paper, we point out a fundamental theoretical flaw in the widely-used study design for estimating the prevalence of social bots. Furthermore, we empirically investigate the validity of peer-reviewed Botometer-based studies by closely and systematically inspecting hundreds of accounts that had been counted as social bots. We were unable to find a single social bot. Instead, we found mostly accounts undoubtedly operated by human users, the vast majority of them using Twitter in an inconspicuous and unremarkable fashion without the slightest traces of automation. We conclude that studies claiming to investigate the prevalence, properties, or influence of social bots based on Botometer have, in reality, just investigated false positives and artifacts of this approach.
Lighting is a determining factor in photography that affects the style, expression of emotion, and even quality of images. Creating or finding satisfying lighting conditions, in reality, is laborious and time-consuming, so it is of great value to develop a technology to manipulate illumination in an image as post-processing. Although previous works have explored techniques based on the physical viewpoint for relighting images, extensive supervisions and prior knowledge are necessary to generate reasonable images, restricting the generalization ability of these works. In contrast, we take the viewpoint of image-to-image translation and implicitly merge ideas of the conventional physical viewpoint. In this paper, we present an Illumination-Aware Network (IAN) which follows the guidance from hierarchical sampling to progressively relight a scene from a single image with high efficiency. In addition, an Illumination-Aware Residual Block (IARB) is designed to approximate the physical rendering process and to extract precise descriptors of light sources for further manipulations. We also introduce a depth-guided geometry encoder for acquiring valuable geometry- and structure-related representations once the depth information is available. Experimental results show that our proposed method produces better quantitative and qualitative relighting results than previous state-of-the-art methods. The code and models are publicly available on //github.com/NK-CS-ZZL/IAN.
The COVID19 pandemic has demonstrated a need for remote learning and virtual learning applications such as virtual reality (VR) and tablet-based solutions. Creating complex learning scenarios by developers is highly time-consuming and can take over a year. It is also costly to employ teams of system analysts, developers and 3D artists. There is a requirement to provide a simple method to enable lecturers to create their own content for their laboratory tutorials. Research has been undertaken into developing generic models to enable the semi-automatic creation of a virtual learning tools for subjects that require practical interactions with the lab resources. In addition to the system for creating digital twins, a case study describing the creation of a virtual learning application for an electrical laboratory tutorial has been presented.
Reducing the representational discrepancy between source and target domains is a key component to maximize the model generalization. In this work, we advocate for leveraging natural language supervision for the domain generalization task. We introduce two modules to ground visual representations with texts containing typical reasoning of humans: (1) Visual and Textual Joint Embedder and (2) Textual Explanation Generator. The former learns the image-text joint embedding space where we can ground high-level class-discriminative information into the model. The latter leverages an explainable model and generates explanations justifying the rationale behind its decision. To the best of our knowledge, this is the first work to leverage the vision-and-language cross-modality approach for the domain generalization task. Our experiments with a newly created CUB-DG benchmark dataset demonstrate that cross-modality supervision can be successfully used to ground domain-invariant visual representations and improve the model generalization. Furthermore, in the large-scale DomainBed benchmark, our proposed method achieves state-of-the-art results and ranks 1st in average performance for five multi-domain datasets. The dataset and codes are available at //github.com/mswzeus/GVRT.
Formal XAI (explainable AI) is a growing area that focuses on computing explanations with mathematical guarantees for the decisions made by ML models. Inside formal XAI, one of the most studied cases is that of explaining the choices taken by decision trees, as they are traditionally deemed as one of the most interpretable classes of models. Recent work has focused on studying the computation of "sufficient reasons", a kind of explanation in which given a decision tree $T$ and an instance $x$, one explains the decision $T(x)$ by providing a subset $y$ of the features of $x$ such that for any other instance $z$ compatible with $y$, it holds that $T(z) = T(x)$, intuitively meaning that the features in $y$ are already enough to fully justify the classification of $x$ by $T$. It has been argued, however, that sufficient reasons constitute a restrictive notion of explanation, and thus the community has started to study their probabilistic counterpart, in which one requires that the probability of $T(z) = T(x)$ must be at least some value $\delta \in (0, 1]$, where $z$ is a random instance that is compatible with $y$. Our paper settles the computational complexity of $\delta$-sufficient-reasons over decision trees, showing that both (1) finding $\delta$-sufficient-reasons that are minimal in size, and (2) finding $\delta$-sufficient-reasons that are minimal inclusion-wise, do not admit polynomial-time algorithms (unless P=NP). This is in stark contrast with the deterministic case ($\delta = 1$) where inclusion-wise minimal sufficient-reasons are easy to compute. By doing this, we answer two open problems originally raised by Izza et al. On the positive side, we identify structural restrictions of decision trees that make the problem tractable, and show how SAT solvers might be able to tackle these problems in practical settings.
Connecting Vision and Language plays an essential role in Generative Intelligence. For this reason, in the last few years, a large research effort has been devoted to image captioning, i.e. the task of describing images with syntactically and semantically meaningful sentences. Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoding step and a language model for text generation. During these years, both components have evolved considerably through the exploitation of object regions, attributes, and relationships and the introduction of multi-modal connections, fully-attentive approaches, and BERT-like early-fusion strategies. However, regardless of the impressive results obtained, research in image captioning has not reached a conclusive answer yet. This work aims at providing a comprehensive overview and categorization of image captioning approaches, from visual encoding and text generation to training strategies, used datasets, and evaluation metrics. In this respect, we quantitatively compare many relevant state-of-the-art approaches to identify the most impactful technical innovations in image captioning architectures and training strategies. Moreover, many variants of the problem and its open challenges are analyzed and discussed. The final goal of this work is to serve as a tool for understanding the existing state-of-the-art and highlighting the future directions for an area of research where Computer Vision and Natural Language Processing can find an optimal synergy.
Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area.