Unit testing is a commonly-used approach in software engineering to test the correctness and robustness of written code. Unit tests are tests designed to test small components of a codebase in isolation, such as an individual function or method. Although unit tests have historically been written by human programmers, recent advancements in AI, particularly LLMs, have shown corresponding advances in automatic unit test generation. In this study, we explore the effect of different prompts on the quality of unit tests generated by Code Interpreter, a GPT-4-based LLM, on Python functions provided by the Quixbugs dataset, and we focus on prompting due to the ease with which users can make use of our findings and observations. We find that the quality of the generated unit tests is not sensitive to changes in minor details in the prompts provided. However, we observe that Code Interpreter is often able to effectively identify and correct mistakes in code that it writes, suggesting that providing it runnable code to check the correctness of its outputs would be beneficial, even though we find that it is already often able to generate correctly-formatted unit tests. Our findings suggest that, when prompting models similar to Code Interpreter, it is important to include the basic information necessary to generate unit tests, but minor details are not as important.
Deep Reinforcement Learning (DRL) is vital in various AI applications. DRL algorithms comprise diverse compute kernels, which may not be simultaneously optimized using a homogeneous architecture. However, even with available heterogeneous architectures, optimizing DRL performance remains a challenge due to the complexity of hardware and programming models employed in modern data centers. To address this, we introduce PEARL, a toolkit for composing parallel DRL systems on heterogeneous platforms consisting of general-purpose processors (CPUs) and accelerators (GPUs, FPGAs). Our innovations include: 1. A general training protocol agnostic of the underlying hardware, enabling portable implementations across various processors and accelerators. 2. Incorporation of DRL-specific scheduling optimizations within the protocol, facilitating parallelized training and enhancing the overall system performance. 3. High-level API for productive development using the toolkit. 4. Automatic optimization of DRL task-to-device assignments through performance estimation, supporting various optimization metrics including throughput and power efficiency. We showcase our toolkit through experimentation with two widely used DRL algorithms, DQN and DDPG, on two diverse heterogeneous platforms. The generated implementations outperform state-of-the-art libraries for CPU-GPU platforms by throughput improvements of up to 2.1$\times$ and power efficiency improvements of up to 3.4$\times$.
In power systems, the incorporation of capacitors offers a wide range of established advantages. These benefits encompass the enhancement of the systems power factor, optimization of voltage profiles, increased capacity for current flow through cables and transformers, and the mitigation of losses attributed to the compensation of reactive power components. Different techniques have been applied to enhance the performance of the distribution system by reducing line losses. This paper focuses on reducing line losses through the optimal placement and sizing of capacitors. Optimal capacitor placement is analysed using load flow analysis with the Newton Raphson method. The placement of capacitor optimization is related to the sensitivity of the buses, which depends on the loss sensitivity factor. The optimal capacitor size is determined using Particle Swarm Optimization (PSO). The analysis is conducted using the IEEE 14 bus system in MATLAB. The results reveal that placing capacitors at the most sensitive bus locations leads to a significant reduction in line losses. Additionally, the optimal capacitor size has a substantial impact on improving the voltage profile and the power loss is reduced by 21.02 percent through the proposed method.
GUI testing checks if a software system behaves as expected when users interact with its graphical interface, e.g., testing specific functionality or validating relevant use case scenarios. Currently, deciding what to test at this high level is a manual task since automated GUI testing tools target lower level adequacy metrics such as structural code coverage or activity coverage. We propose DroidAgent, an autonomous GUI testing agent for Android, for semantic, intent-driven automation of GUI testing. It is based on Large Language Models and support mechanisms such as long- and short-term memory. Given an Android app, DroidAgent sets relevant task goals and subsequently tries to achieve them by interacting with the app. Our empirical evaluation of DroidAgent using 15 apps from the Themis benchmark shows that it can set up and perform realistic tasks, with a higher level of autonomy. For example, when testing a messaging app, DroidAgent created a second account and added a first account as a friend, testing a realistic use case, without human intervention. On average, DroidAgent achieved 61% activity coverage, compared to 51% for current state-of-the-art GUI testing techniques. Further, manual analysis shows that 317 out of the 374 autonomously created tasks are realistic and relevant to app functionalities, and also that DroidAgent interacts deeply with the apps and covers more features.
Designing privacy-preserving deep learning models is a major challenge within the deep learning community. Homomorphic Encryption (HE) has emerged as one of the most promising approaches in this realm, enabling the decoupling of knowledge between the model owner and the data owner. Despite extensive research and application of this technology, primarily in convolutional neural networks, incorporating HE into transformer models has been challenging because of the difficulties in converting these models into a polynomial form. We break new ground by introducing the first polynomial transformer, providing the first demonstration of secure inference over HE with transformers. This includes a transformer architecture tailored for HE, alongside a novel method for converting operators to their polynomial equivalent. This innovation enables us to perform secure inference on LMs with WikiText-103. It also allows us to perform image classification with CIFAR-100 and Tiny-ImageNet. Our models yield results comparable to traditional methods, bridging the performance gap with transformers of similar scale and underscoring the viability of HE for state-of-the-art applications. Finally, we assess the stability of our models and conduct a series of ablations to quantify the contribution of each model component.
Compositional generalisation (CG), in NLP and in machine learning more generally, has been assessed mostly using artificial datasets. It is important to develop benchmarks to assess CG also in real-world natural language tasks in order to understand the abilities and limitations of systems deployed in the wild. To this end, our GenBench Collaborative Benchmarking Task submission utilises the distribution-based compositionality assessment (DBCA) framework to split the Europarl translation corpus into a training and a test set in such a way that the test set requires compositional generalisation capacity. Specifically, the training and test sets have divergent distributions of dependency relations, testing NMT systems' capability of translating dependencies that they have not been trained on. This is a fully-automated procedure to create natural language compositionality benchmarks, making it simple and inexpensive to apply it further to other datasets and languages. The code and data for the experiments is available at //github.com/aalto-speech/dbca.
Nowadays, many companies design and develop their software systems as a set of loosely coupled microservices that communicate via their Application Programming Interfaces (APIs). While the loose coupling improves maintainability, scalability, and fault tolerance, it poses new challenges to the API evolution process. Related works identified communication and integration as major API evolution challenges but did not provide the underlying reasons and research directions to mitigate them. In this paper, we aim to identify microservice API evolution strategies and challenges in practice and gain a broader perspective of their relationships. We conducted 17 semi-structured interviews with developers, architects, and managers in 11 companies and analyzed the interviews with open coding used in grounded theory. In total, we identified six strategies and six challenges for REpresentational State Transfer (REST) and event-driven communication via message brokers. The strategies mainly focus on API backward compatibility, versioning, and close collaboration between teams. The challenges include change impact analysis efforts, ineffective communication of changes, and consumer reliance on outdated versions, leading to API design degradation. We defined two important problems in microservice API evolution resulting from the challenges and their coping strategies: tight organizational coupling and consumer lock-in. To mitigate these two problems, we propose automating the change impact analysis and investigating effective communication of changes as open research directions.
Quantum copy protection, introduced by Aaronson, enables giving out a quantum program-description that cannot be meaningfully duplicated. Despite over a decade of study, copy protection is only known to be possible for a very limited class of programs. As our first contribution, we show how to achieve "best-possible" copy protection for all programs. We do this by introducing quantum state indistinguishability obfuscation (qsiO), a notion of obfuscation for quantum descriptions of classical programs. We show that applying qsiO to a program immediately achieves best-possible copy protection. Our second contribution is to show that, assuming injective one-way functions exist, qsiO is concrete copy protection for a large family of puncturable programs -- significantly expanding the class of copy-protectable programs. A key tool in our proof is a new variant of unclonable encryption (UE) that we call coupled unclonable encryption (cUE). While constructing UE in the standard model remains an important open problem, we are able to build cUE from one-way functions. If we additionally assume the existence of UE, then we can further expand the class of puncturable programs for which qsiO is copy protection. Finally, we construct qsiO relative to an efficient quantum oracle.
Deep learning models often suffer from forgetting previously learned information when trained on new data. This problem is exacerbated in federated learning (FL), where the data is distributed and can change independently for each user. Many solutions are proposed to resolve this catastrophic forgetting in a centralized setting. However, they do not apply directly to FL because of its unique complexities, such as privacy concerns and resource limitations. To overcome these challenges, this paper presents a framework for \textbf{federated class incremental learning} that utilizes a generative model to synthesize samples from past distributions. This data can be later exploited alongside the training data to mitigate catastrophic forgetting. To preserve privacy, the generative model is trained on the server using data-free methods at the end of each task without requesting data from clients. Moreover, our solution does not demand the users to store old data or models, which gives them the freedom to join/leave the training at any time. Additionally, we introduce SuperImageNet, a new regrouping of the ImageNet dataset specifically tailored for federated continual learning. We demonstrate significant improvements compared to existing baselines through extensive experiments on multiple datasets.
Context: Modern software systems (e.g., Apache Spark) are usually written in multiple programming languages (PLs). There is little understanding on the phenomenon of multi-programming-language commits (MPLCs), which involve modified source files written in multiple PLs. Objective: This work aims to explore MPLCs and their impacts on development difficulty and software quality. Methods: We performed an empirical study on eighteen non-trivial Apache projects with 197,566 commits. Results: (1) the most commonly used PL combination consists of all the four PLs, i.e., C/C++, Java, JavaScript, and Python; (2) 9% of the commits from all the projects are MPLCs, and the proportion of MPLCs in 83% of the projects goes to a relatively stable level; (3) more than 90% of the MPLCs from all the projects involve source files in two PLs; (4) the change complexity of MPLCs is significantly higher than that of non-MPLCs; (5) issues fixed in MPLCs take significantly longer to be resolved than issues fixed in non-MPLCs in 89% of the projects; (6) MPLCs do not show significant effects on issue reopen; (7) source files undergoing MPLCs tend to be more bug-prone; and (8) MPLCs introduce more bugs than non-MPLCs. Conclusions: MPLCs are related to increased development difficulty and decreased software quality.
Textual entailment is a fundamental task in natural language processing. Most approaches for solving the problem use only the textual content present in training data. A few approaches have shown that information from external knowledge sources like knowledge graphs (KGs) can add value, in addition to the textual content, by providing background knowledge that may be critical for a task. However, the proposed models do not fully exploit the information in the usually large and noisy KGs, and it is not clear how it can be effectively encoded to be useful for entailment. We present an approach that complements text-based entailment models with information from KGs by (1) using Personalized PageR- ank to generate contextual subgraphs with reduced noise and (2) encoding these subgraphs using graph convolutional networks to capture KG structure. Our technique extends the capability of text models exploiting structural and semantic information found in KGs. We evaluate our approach on multiple textual entailment datasets and show that the use of external knowledge helps improve prediction accuracy. This is particularly evident in the challenging BreakingNLI dataset, where we see an absolute improvement of 5-20% over multiple text-based entailment models.