The correctness of software systems is vital for their effective operation. It makes discovering and fixing software bugs an important development task. The increasing use of Artificial Intelligence (AI) techniques in Software Engineering led to the development of a number of techniques that can assist software developers in identifying potential bugs in code. In this paper, we present a comprehensible comparison and analysis of the efficacy of two AI-based approaches, namely single AI models and ensemble AI models, for predicting the probability of a Java class being buggy. We used two open-source Apache Commons Project's Java components for training and evaluating the models. Our experimental findings indicate that the ensemble of AI models can outperform the results of applying individual AI models. We also offer insight into the factors that contribute to the enhanced performance of the ensemble AI model. The presented results demonstrate the potential of using ensemble AI models to enhance bug prediction results, which could ultimately result in more reliable software systems.
Domain experts often possess valuable physical insights that are overlooked in fully automated decision-making processes such as Bayesian optimisation. In this article we apply high-throughput (batch) Bayesian optimisation alongside anthropological decision theory to enable domain experts to influence the selection of optimal experiments. Our methodology exploits the hypothesis that humans are better at making discrete choices than continuous ones and enables experts to influence critical early decisions. At each iteration we solve an augmented multi-objective optimisation problem across a number of alternate solutions, maximising both the sum of their utility function values and the determinant of their covariance matrix, equivalent to their total variability. By taking the solution at the knee point of the Pareto front, we return a set of alternate solutions at each iteration that have both high utility values and are reasonably distinct, from which the expert selects one for evaluation. We demonstrate that even in the case of an uninformed practitioner, our algorithm recovers the regret of standard Bayesian optimisation.
The TCP congestion control protocol serves as the cornerstone of reliable internet communication. However, as new applications require more specific guarantees regarding data rate and delay, network management must adapt. Thus, service providers are shifting from decentralized to centralized control of the network using a software-defined network controller (SDN). The SDN classifies applications and allocates logically separate resources called slices, over the physical network. We propose TCP Slice, a congestion control algorithm that meets specific delay and bandwidth guarantees. Obtaining closed-form delay bounds for a client is challenging due to dependencies on other clients and their traffic stochasticity. We use network calculus to derive the client's delay bound and incorporate it as a constraint in the Network Utility Maximization problem. We solve the resulting optimization using dual decomposition and obtain a semi-distributed TCP protocol that can be implemented with the help of SDN controller and the use of an Explicit Congestion Notification (ECN) bit. Additionally, we also propose a proactive approach for congestion control using digital twin. TCP Slice represents a significant step towards accommodating evolving internet traffic patterns and the need for better network management in the face of increasing application diversity.
In the face of growing vulnerabilities found in open-source software, the need to identify {discreet} security patches has become paramount. The lack of consistency in how software providers handle maintenance often leads to the release of security patches without comprehensive advisories, leaving users vulnerable to unaddressed security risks. To address this pressing issue, we introduce a novel security patch detection system, LLMDA, which capitalizes on Large Language Models (LLMs) and code-text alignment methodologies for patch review, data enhancement, and feature combination. Within LLMDA, we initially utilize LLMs for examining patches and expanding data of PatchDB and SPI-DB, two security patch datasets from recent literature. We then use labeled instructions to direct our LLMDA, differentiating patches based on security relevance. Following this, we apply a PTFormer to merge patches with code, formulating hybrid attributes that encompass both the innate details and the interconnections between the patches and the code. This distinctive combination method allows our system to capture more insights from the combined context of patches and code, hence improving detection precision. Finally, we devise a probabilistic batch contrastive learning mechanism within batches to augment the capability of the our LLMDA in discerning security patches. The results reveal that LLMDA significantly surpasses the start of the art techniques in detecting security patches, underscoring its promise in fortifying software maintenance.
Evaluating quantum circuits is currently very noisy. Therefore, developing classical bootstraps that help minimize the number of times quantum circuits have to be executed on noisy quantum devices is a powerful technique for improving the practicality of Variational Quantum Algorithms. CAFQA is a previously proposed classical bootstrap for VQAs that uses an initial ansatz that reduces to Clifford operators. CAFQA has been shown to produce fairly accurate initialization for VQA applied to molecular chemistry Hamiltonians. Motivated by this result, in this paper we seek to analyze the Clifford states that optimize the cost function for a new type of Hamiltonian, namely Transverse Field Ising Hamiltonians. Our primary result connects the problem of finding the optimal CAFQA initialization to a submodular minimization problem which in turn can be solved in polynomial time.
Background: The company-internal reuse of software components owned by organizational units in different countries is taxable. To comply with international taxation standards, multinational enterprises need to consider a geographical perspective on their software architecture. However, there is no viewpoint that frames the concerns of tax authorities as stakeholders towards a globally distributed software architecture. Objective: In this article, we introduce the reader to the concerns of tax authorities as stakeholders and we investigate how software companies can describe their globally distributed software architectures to tax authorities. Method: In an in-virtuo experiment, we (1) develop a viewpoint that frames the concerns of tax authorities, (2) create a view of a large-scale, globally distributed microservice architecture from a multinational enterprise, and (3) evaluate the resulting software architecture description with a panel of four tax experts. Results: The panel of tax experts found that our proposed architectural viewpoint properly and sufficiently frames the concerns of taxation stakeholders. However, the resulting view falls short: Although the architecture description reveals that almost 70% of all reuse relationships between the 2560 microservices in our case are cross-border and, therefore, taxable, unclear jurisdictions of owners and a potentially insufficient definition of ownership introduce significant noise to the view that limits its usefulness and explanatory power of our software architecture description. Conclusion: Although our software architecture description provides a solid foundation for the subsequent step, namely valuing software components, we stumbled over several theoretical and practical problems when identifying and defining ownership in distributed teams, which requires further interdisciplinary research.
Bug reports are an essential aspect of software development, and it is crucial to identify and resolve them quickly to ensure the consistent functioning of software systems. Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs. In this paper, we compared the effectiveness of semantic textual similarity methods for retrieving similar bug reports based on a similarity score. We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA. We used the Software Defects Data containing bug reports for various software projects to evaluate the performance of these models. Our experimental results showed that BERT generally outperformed the rest of the models regarding recall, followed by ADA, Gensim, FastText, and TFIDF. Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task. Our code is available on GitHub.
To ensure the usefulness of Reinforcement Learning (RL) in real systems, it is crucial to ensure they are robust to noise and adversarial attacks. In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment. We study the full class of online manipulation attacks, which include (i) state attacks, (ii) observation attacks (which are a generalization of perceived-state attacks), (iii) action attacks, and (iv) reward attacks. We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward, which often corresponds to minimizing the victim's value, is captured by a Markov Decision Process (MDP) that we call a meta-MDP since it is not the true environment but a higher level environment induced by the attacked interaction. We show that the attacker can derive optimal attacks by planning in polynomial time or learning with polynomial sample complexity using standard RL techniques. We argue that the optimal defense policy for the victim can be computed as the solution to a stochastic Stackelberg game, which can be further simplified into a partially-observable turn-based stochastic game (POTBSG). Neither the attacker nor the victim would benefit from deviating from their respective optimal policies, thus such solutions are truly robust. Although the defense problem is NP-hard, we show that optimal Markovian defenses can be computed (learned) in polynomial time (sample complexity) in many scenarios.
Autonomic computing investigates how systems can achieve (user) specified control outcomes on their own, without the intervention of a human operator. Autonomic computing fundamentals have been substantially influenced by those of control theory for closed and open-loop systems. In practice, complex systems may exhibit a number of concurrent and inter-dependent control loops. Despite research into autonomic models for managing computer resources, ranging from individual resources (e.g., web servers) to a resource ensemble (e.g., multiple resources within a data center), research into integrating Artificial Intelligence (AI) and Machine Learning (ML) to improve resource autonomy and performance at scale continues to be a fundamental challenge. The integration of AI/ML to achieve such autonomic and self-management of systems can be achieved at different levels of granularity, from full to human-in-the-loop automation. In this article, leading academics, researchers, practitioners, engineers, and scientists in the fields of cloud computing, AI/ML, and quantum computing join to discuss current research and potential future directions for these fields. Further, we discuss challenges and opportunities for leveraging AI and ML in next generation computing for emerging computing paradigms, including cloud, fog, edge, serverless and quantum computing environments.
The new era of technology has brought us to the point where it is convenient for people to share their opinions over an abundance of platforms. These platforms have a provision for the users to express themselves in multiple forms of representations, including text, images, videos, and audio. This, however, makes it difficult for users to obtain all the key information about a topic, making the task of automatic multi-modal summarization (MMS) essential. In this paper, we present a comprehensive survey of the existing research in the area of MMS.
Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.