Context: Modern software systems (e.g., Apache Spark) are usually written in multiple programming languages (PLs). There is little understanding on the phenomenon of multi-programming-language commits (MPLCs), which involve modified source files written in multiple PLs. Objective: This work aims to explore MPLCs and their impacts on development difficulty and software quality. Methods: We performed an empirical study on eighteen non-trivial Apache projects with 197,566 commits. Results: (1) the most commonly used PL combination consists of all the four PLs, i.e., C/C++, Java, JavaScript, and Python; (2) 9% of the commits from all the projects are MPLCs, and the proportion of MPLCs in 83% of the projects goes to a relatively stable level; (3) more than 90% of the MPLCs from all the projects involve source files in two PLs; (4) the change complexity of MPLCs is significantly higher than that of non-MPLCs; (5) issues fixed in MPLCs take significantly longer to be resolved than issues fixed in non-MPLCs in 89% of the projects; (6) MPLCs do not show significant effects on issue reopen; (7) source files undergoing MPLCs tend to be more bug-prone; and (8) MPLCs introduce more bugs than non-MPLCs. Conclusions: MPLCs are related to increased development difficulty and decreased software quality.
We present DIALIGHT, a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems which facilitates systematic evaluations and comparisons between ToD systems using fine-tuning of Pretrained Language Models (PLMs) and those utilising the zero-shot and in-context learning capabilities of Large Language Models (LLMs). In addition to automatic evaluation, this toolkit features (i) a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level, and (ii) a microservice-based backend, improving efficiency and scalability. Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses. However, we also identify significant challenges of LLMs in adherence to task-specific instructions and generating outputs in multiple languages, highlighting areas for future research. We hope this open-sourced toolkit will serve as a valuable resource for researchers aiming to develop and properly evaluate multilingual ToD systems and will lower, currently still high, entry barriers in the field.
The system of Virtual Try-ON (VTON) allows a user to try a product virtually. In general, a VTON system takes a clothing source and a person's image to predict the try-on output of the person in the given clothing. Although existing methods perform well for simple poses, in case of bent or crossed arms posture or when there is a significant difference between the alignment of the source clothing and the pose of the target person, these methods fail by generating inaccurate clothing deformations. In the VTON methods that employ Thin Plate Spline (TPS) based clothing transformations, this mainly occurs for two reasons - (1)~the second-order smoothness constraint of TPS that restricts the bending of the object plane. (2)~Overlaps among different clothing parts (e.g., sleeves and torso) can not be modeled by a single TPS transformation, as it assumes the clothing as a single planar object; therefore, disregards the independence of movement of different clothing parts. To this end, we make two major contributions. Concerning the bending limitations of TPS, we propose a human AnaTomy-Aware Geometric (ATAG) transformation. Regarding the overlap issue, we propose a part-based warping approach that divides the clothing into independently warpable parts to warp them separately and later combine them. Extensive analysis shows the efficacy of this approach.
Python has become one of the most popular programming languages for software development due to its simplicity, readability, and versatility. As the Python ecosystem grows, developers face increasing challenges in avoiding module conflicts, which occur when different packages have the same namespace modules. Unfortunately, existing work has neither investigated the module conflict comprehensively nor provided tools to detect the conflict. Therefore, this paper systematically investigates the module conflict problem and its impact on the Python ecosystem. We propose a novel technique called InstSimulator, which leverages semantics and installation simulation to achieve accurate and efficient module extraction. Based on this, we implement a tool called ModuleGuard to detect module conflicts for the Python ecosystem. For the study, we first collect 97 MC issues, classify the characteristics and causes of these MC issues, summarize three different conflict patterns, and analyze their potential threats. Then, we conducted a large-scale analysis of the whole PyPI ecosystem (4.2 million packages) and GitHub popular projects (3,711 projects) to detect each MC pattern and analyze their potential impact. We discovered that module conflicts still impact numerous TPLs and GitHub projects. This is primarily due to developers' lack of understanding of the modules within their direct dependencies, not to mention the modules of the transitive dependencies. Our work reveals Python's shortcomings in handling naming conflicts and provides a tool and guidelines for developers to detect conflicts.
Mobile edge computing (MEC) is powerful to alleviate the heavy computing tasks in integrated sensing and communication (ISAC) systems. In this paper, we investigate joint beamforming and offloading design in a three-tier integrated sensing, communication and computation (ISCC) framework comprising one cloud server, multiple mobile edge servers, and multiple terminals. While executing sensing tasks, the user terminals can optionally offload sensing data to either MEC server or cloud servers. To minimize the execution latency, we jointly optimize the transmit beamforming matrices and offloading decision variables under the constraint of sensing performance. An alternating optimization algorithm based on multidimensional fractional programming is proposed to tackle the non-convex problem. Simulation results demonstrates the superiority of the proposed mechanism in terms of convergence and task execution latency reduction, compared with the state-of-the-art two-tier ISCC framework.
We introduce a new challenge to the software development community: 1) leveraging AI to accurately detect and flag up secrets in code and on popular document sharing platforms that frequently used by developers, such as Confluence and 2) automatically remediating the detections (e.g. by suggesting password vault functionality). This is a challenging, and mostly unaddressed task. Existing methods leverage heuristics and regular expressions, that can be very noisy, and therefore increase toil on developers. The next step - modifying code itself - to automatically remediate a detection, is a complex task. We introduce two baseline AI models that have good detection performance and propose an automatic mechanism for remediating secrets found in code, opening up the study of this task to the wider community.
We introduce EQ-Bench, a novel benchmark designed to evaluate aspects of emotional intelligence in Large Language Models (LLMs). We assess the ability of LLMs to understand complex emotions and social interactions by asking them to predict the intensity of emotional states of characters in a dialogue. The benchmark is able to discriminate effectively between a wide range of models. We find that EQ-Bench correlates strongly with comprehensive multi-domain benchmarks like MMLU (Hendrycks et al., 2020) (r=0.97), indicating that we may be capturing similar aspects of broad intelligence. Our benchmark produces highly repeatable results using a set of 60 English-language questions. We also provide open-source code for an automated benchmarking pipeline at //github.com/EQ-bench/EQ-Bench and a leaderboard at //eqbench.com
Existing knowledge graph (KG) embedding models have primarily focused on static KGs. However, real-world KGs do not remain static, but rather evolve and grow in tandem with the development of KG applications. Consequently, new facts and previously unseen entities and relations continually emerge, necessitating an embedding model that can quickly learn and transfer new knowledge through growth. Motivated by this, we delve into an expanding field of KG embedding in this paper, i.e., lifelong KG embedding. We consider knowledge transfer and retention of the learning on growing snapshots of a KG without having to learn embeddings from scratch. The proposed model includes a masked KG autoencoder for embedding learning and update, with an embedding transfer strategy to inject the learned knowledge into the new entity and relation embeddings, and an embedding regularization method to avoid catastrophic forgetting. To investigate the impacts of different aspects of KG growth, we construct four datasets to evaluate the performance of lifelong KG embedding. Experimental results show that the proposed model outperforms the state-of-the-art inductive and lifelong embedding baselines.
Deployment of Internet of Things (IoT) devices and Data Fusion techniques have gained popularity in public and government domains. This usually requires capturing and consolidating data from multiple sources. As datasets do not necessarily originate from identical sensors, fused data typically results in a complex data problem. Because military is investigating how heterogeneous IoT devices can aid processes and tasks, we investigate a multi-sensor approach. Moreover, we propose a signal to image encoding approach to transform information (signal) to integrate (fuse) data from IoT wearable devices to an image which is invertible and easier to visualize supporting decision making. Furthermore, we investigate the challenge of enabling an intelligent identification and detection operation and demonstrate the feasibility of the proposed Deep Learning and Anomaly Detection models that can support future application that utilizes hand gesture data from wearable devices.
Deep Learning (DL) is vulnerable to out-of-distribution and adversarial examples resulting in incorrect outputs. To make DL more robust, several posthoc anomaly detection techniques to detect (and discard) these anomalous samples have been proposed in the recent past. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection for DL based applications. We provide a taxonomy for existing techniques based on their underlying assumptions and adopted approaches. We discuss various techniques in each of the categories and provide the relative strengths and weaknesses of the approaches. Our goal in this survey is to provide an easier yet better understanding of the techniques belonging to different categories in which research has been done on this topic. Finally, we highlight the unsolved research challenges while applying anomaly detection techniques in DL systems and present some high-impact future research directions.
Reasoning with knowledge expressed in natural language and Knowledge Bases (KBs) is a major challenge for Artificial Intelligence, with applications in machine reading, dialogue, and question answering. General neural architectures that jointly learn representations and transformations of text are very data-inefficient, and it is hard to analyse their reasoning process. These issues are addressed by end-to-end differentiable reasoning systems such as Neural Theorem Provers (NTPs), although they can only be used with small-scale symbolic KBs. In this paper we first propose Greedy NTPs (GNTPs), an extension to NTPs addressing their complexity and scalability limitations, thus making them applicable to real-world datasets. This result is achieved by dynamically constructing the computation graph of NTPs and including only the most promising proof paths during inference, thus obtaining orders of magnitude more efficient models. Then, we propose a novel approach for jointly reasoning over KBs and textual mentions, by embedding logic facts and natural language sentences in a shared embedding space. We show that GNTPs perform on par with NTPs at a fraction of their cost while achieving competitive link prediction results on large datasets, providing explanations for predictions, and inducing interpretable models. Source code, datasets, and supplementary material are available online at //github.com/uclnlp/gntp.