Detection of hate speech has been formulated as a standalone application of NLP and different approaches have been adopted for identifying the target groups, obtaining raw data, defining the labeling process, choosing the detection algorithm, and evaluating the performance in the desired setting. However, unlike other downstream tasks, hate speech suffers from the lack of large-sized, carefully curated, generalizable datasets owing to the highly subjective nature of the task. In this paper, we first analyze the issues surrounding hate speech detection through a data-centric lens. We then outline a holistic framework to encapsulate the data creation pipeline across seven broad dimensions by taking the specific example of hate speech towards sexual minorities. We posit that practitioners would benefit from following this framework as a form of best practice when creating hate speech datasets in the future.
Kleene Algebra (KA) is a useful tool for proving that two programs are equivalent by reasoning equationally. Because it abstracts from the meaning of primitive programs, KA's equational theory is decidable, so it integrates well with interactive theorem provers. This raises the question: which equations can we (not) prove using the laws of KA? Moreover, which models of KA are complete, in the sense that they satisfy exactly the provable equations? Kozen (1994) answered these questions by characterizing KA in terms of its language model. Concretely, equivalences provable in KA are exactly those that hold for regular expressions. Pratt (1980) observed that KA is complete w.r.t. relational models, i.e., that its provable equations are those that hold for any relational interpretation. A less known result due to Palka (2005) says that finite models are complete for KA, i.e., that provable equivalences coincide with equations satisfied by all finite KAs. Phrased contrapositively, the latter is a finite model property (FMP): any unprovable equation is falsified by a finite KA. These results can be argued using Kozen's theorem, but the implication is mutual: given that KA is complete w.r.t. finite (resp. relational) models, Palka's (resp. Pratt's) arguments show that it is complete w.r.t. the language model. We embark on a study of the different complete models of KA, and the connections between them. This yields a fourth result subsuming those of Palka and Pratt, namely that KA is complete w.r.t. finite relational models. Next, we put an algebraic spin on Palka's techniques, which yield an elementary proof of the finite model property, and by extension, of Kozen's and Pratt's theorems. In contrast with earlier approaches, this proof relies not on minimality or bisimilarity of automata, but rather on representing the regular expressions involved in terms of transformation automata.
Binary classification is a fundamental task in machine learning, with applications spanning various scientific domains. Whether scientists are conducting fundamental research or refining practical applications, they typically assess and rank classification techniques based on performance metrics such as accuracy, sensitivity, and specificity. However, reported performance scores may not always serve as a reliable basis for research ranking. This can be attributed to undisclosed or unconventional practices related to cross-validation, typographical errors, and other factors. In a given experimental setup, with a specific number of positive and negative test items, most performance scores can assume specific, interrelated values. In this paper, we introduce numerical techniques to assess the consistency of reported performance scores and the assumed experimental setup. Importantly, the proposed approach does not rely on statistical inference but uses numerical methods to identify inconsistencies with certainty. Through three different applications related to medicine, we demonstrate how the proposed techniques can effectively detect inconsistencies, thereby safeguarding the integrity of research fields. To benefit the scientific community, we have made the consistency tests available in an open-source Python package.
Multidimensional constellation shaping of up to 32 dimensions with different spectral efficiencies are compared through AWGN and fiber-optic simulations. The results show that no constellation is universal and the balance of required and effective SNRs should be jointly considered for the specific optical transmission scenario.
Explainable recommender systems (RS) have traditionally followed a one-size-fits-all approach, delivering the same explanation level of detail to each user, without considering their individual needs and goals. Further, explanations in RS have so far been presented mostly in a static and non-interactive manner. To fill these research gaps, we aim in this paper to adopt a user-centered, interactive explanation model that provides explanations with different levels of detail and empowers users to interact with, control, and personalize the explanations based on their needs and preferences. We followed a user-centered approach to design interactive explanations with three levels of detail (basic, intermediate, and advanced) and implemented them in the transparent Recommendation and Interest Modeling Application (RIMA). We conducted a qualitative user study (N=14) to investigate the impact of providing interactive explanations with varying level of details on the users' perception of the explainable RS. Our study showed qualitative evidence that fostering interaction and giving users control in deciding which explanation they would like to see can meet the demands of users with different needs, preferences, and goals, and consequently can have positive effects on different crucial aspects in explainable recommendation, including transparency, trust, satisfaction, and user experience.
An ever increasing number of applications can employ aerial unmanned vehicles, or so-called drones, to perform different sensing and possibly also actuation tasks from the air. In some cases, the data that is captured at a given point has to be processed before moving to the next one. Drones can exploit nearby edge servers to offload the computation instead of performing it locally. However, doing this in a naive way can be suboptimal if servers have limited computing resources and drones have limited energy resources. In this paper, we propose a protocol and resource reservation scheme for each drone and edge server to decide, in a dynamic and fully decentralized way, whether to offload the computation and respectively whether to accept such an offloading requests, with the objective to evenly reduce the drones' mission times. We evaluate our approach through extensive simulation experiments, showing that it can significantly reduce the mission times compared to a no-offloading scenario by up to 26.2%, while outperforming an offloading schedule that has been computed offline by up to 7.4% as well as a purely opportunistic approach by up to 23.9%.
The study of the generalized Hamming weight of linear codes is a significant research topic in coding theory as it conveys the structural information of the codes and determines their performance in various applications. However, determining the generalized Hamming weights of linear codes, especially the weight hierarchy, is generally challenging. In this paper, we investigate the generalized Hamming weights of a class of linear code $\C$ over $\bF_q$, which is constructed from defining sets. These defining sets are either special simplicial complexes or their complements in $\bF_q^m$. We determine the complete weight hierarchies of these codes by analyzing the maximum or minimum intersection of certain simplicial complexes and all $r$-dimensional subspaces of $\bF_q^m$, where $1\leq r\leq {\rm dim}_{\bF_q}(\C)$.
Clippy lints are considered as essential tools for Rust developers, as they can be configured as gate-keeping rules for a Rust project during continuous integration. Despite their availability, little was known about practical application and cost-effectiveness of the lints in reducing code quality issues. In this study, we embark on a comprehensive analysis to unveil the true impact of Clippy lints in the Rust development landscape. The study is structured around three interrelated components, each contributing to the overall effectiveness of Clippy. Firstly, we conduct a comprehensive analysis of Clippy lints in all idiomatic crates-io Rust projects with an average warning density of 21/KLOC. The analysis identifies the most cost-effective lint fixes, offering valuable opportunities for optimizing code quality. Secondly, we actively engage Rust developers through a user survey to garner invaluable feedback on their experiences with Clippy. User insights shed light on two crucial concerns: the prevalence of false positives in warnings and the need for auto-fix support for most warnings. Thirdly, building upon these findings, we engineer three innovative automated refactoring techniques to effectively fix the four most frequent Clippy lints. As a result, the warning density in Rosetta benchmarks has significantly decreased from 195/KLOC to an impressive 18/KLOC, already lower than the average density of the crates-io Rust projects. These results demonstrate tangible benefit and impact of our efforts in enhancing the overall code quality and maintainability for Rust developers.
Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs. In this work, we systematically evaluate the impact of the alignment process on logit-based uncertainty calibration of LMs under the multiple-choice setting. We first conduct a thoughtful empirical study on how aligned LMs differ in calibration from their pre-trained counterparts. Experimental results reveal that there are two distinct uncertainties in LMs under the multiple-choice setting, which are responsible for the answer decision and the format preference of the LMs, respectively. Then, we investigate the role of these two uncertainties on aligned LM's calibration through fine-tuning in simple synthetic alignment schemes and conclude that one reason for aligned LMs' overconfidence is the conflation of these two types of uncertainty. Furthermore, we examine the utility of common post-hoc calibration methods for aligned LMs and propose an easy-to-implement and sample-efficient method to calibrate aligned LMs. We hope our findings could provide insights into the design of more reliable alignment processes for LMs.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.