Question answering over source code provides software engineers and project managers with helpful information about the implemented features of a software product. This paper presents a work devoted to using large language models for question answering over source code in Python. The proposed method for implementing a source code question answering system involves fine-tuning a large language model on a unified dataset of questions and answers for Python code. To achieve the highest quality answers, we tested various models trained on datasets preprocessed in different ways: a dataset without grammar correction, a dataset with grammar correction, and a dataset augmented with the generated summaries. The model answers were also analyzed for errors manually. We report BLEU-4, BERTScore F1, BLEURT, and Exact Match metric values, along with the conclusions from the manual error analysis. The obtained experimental results highlight the current problems of the research area, such as poor quality of the public genuine question-answering datasets. In addition, the findings include the positive effect of the grammar correction of the training data on the testing metric values. The addressed findings and issues could be important for other researchers who attempt to improve the quality of source code question answering solutions. The training and evaluation code is publicly available at //github.com/IU-AES-AI4Code/CodeQuestionAnswering.
SocialED is a comprehensive, open-source Python library designed to support social event detection (SED) tasks, integrating 19 detection algorithms and 14 diverse datasets. It provides a unified API with detailed documentation, offering researchers and practitioners a complete solution for event detection in social media. The library is designed with modularity in mind, allowing users to easily adapt and extend components for various use cases. SocialED supports a wide range of preprocessing techniques, such as graph construction and tokenization, and includes standardized interfaces for training models and making predictions. By integrating popular deep learning frameworks, SocialED ensures high efficiency and scalability across both CPU and GPU environments. The library is built adhering to high code quality standards, including unit testing, continuous integration, and code coverage, ensuring that SocialED delivers robust, maintainable software. SocialED is publicly available at \url{//github.com/RingBDStack/SocialED} and can be installed via PyPI.
This paper presents HyperGraphOS, a significant innovation in the domain of operating systems, specifically designed to address the needs of scientific and engineering domains. This platform aims to combine model-based engineering, graph modeling, data containers, and documents, along with tools for handling computational elements. HyperGraphOS functions as an Operating System offering to users an infinite workspace for creating and managing complex models represented as graphs with customizable semantics. By leveraging a web-based architecture, it requires only a modern web browser for access, allowing organization of knowledge, documents, and content into models represented in a network of workspaces. Elements of the workspace are defined in terms of domain-specific languages (DSLs). These DSLs are pivotal for navigating workspaces, generating code, triggering AI components, and organizing information and processes. The models' dual nature as both visual drawings and data structures allows dynamic modifications and inspections both interactively as well as programaticaly. We evaluated HyperGraphOS's efficiency and applicability across a large set of diverse domains, including the design and development of a virtual Avatar dialog system, a robotic task planner based on large language models (LLMs), a new meta-model for feature-based code development and many others. Our findings show that HyperGraphOS offers substantial benefits in the interaction with a computer as information system, as platoform for experiments and data analysis, as streamlined engineering processes, demonstrating enhanced flexibility in managing data, computation and documents, showing an innovative approaches to persistent desktop environments.
Code retrieval, which retrieves code snippets based on users' natural language descriptions, is widely used by developers and plays a pivotal role in real-world software development. The advent of deep learning has shifted the retrieval paradigm from lexical-based matching towards leveraging deep learning models to encode source code and queries into vector representations, facilitating code retrieval according to vector similarity. Despite the effectiveness of these models, managing large-scale code database presents significant challenges. Previous research proposes deep hashing-based methods, which generate hash codes for queries and code snippets and use Hamming distance for rapid recall of code candidates. However, this approach's reliance on linear scanning of the entire code base limits its scalability. To further improve the efficiency of large-scale code retrieval, we propose a novel approach SECRET (Scalable and Efficient Code Retrieval via SegmEnTed deep hashing). SECRET converts long hash codes calculated by existing deep hashing approaches into several short hash code segments through an iterative training strategy. After training, SECRET recalls code candidates by looking up the hash tables for each segment, the time complexity of recall can thus be greatly reduced. Extensive experimental results demonstrate that SECRET can drastically reduce the retrieval time by at least 95% while achieving comparable or even higher performance of existing deep hashing approaches. Besides, SECRET also exhibits superior performance and efficiency compared to the classical hash table-based approach known as LSH under the same number of hash tables.
LLM-powered coding and development assistants have become prevalent to programmers' workflows. However, concerns about the trustworthiness of LLMs for code persist despite their widespread use. Much of the existing research focused on either training or evaluation, raising questions about whether stakeholders in training and evaluation align in their understanding of model trustworthiness and whether they can move toward a unified direction. In this paper, we propose a vision for a unified trustworthiness auditing framework, DataTrust, which adopts a data-centric approach that synergistically emphasizes both training and evaluation data and their correlations. DataTrust aims to connect model trustworthiness indicators in evaluation with data quality indicators in training. It autonomously inspects training data and evaluates model trustworthiness using synthesized data, attributing potential causes from specific evaluation data to corresponding training data and refining indicator connections. Additionally, a trustworthiness arena powered by DataTrust will engage crowdsourced input and deliver quantitative outcomes. We outline the benefits that various stakeholders can gain from DataTrust and discuss the challenges and opportunities it presents.
The frequent discovery of security vulnerabilities in both open-source and proprietary software underscores the urgent need for earlier detection during the development lifecycle. Initiatives such as DARPA's Artificial Intelligence Cyber Challenge (AIxCC) aim to accelerate Automated Vulnerability Detection (AVD), seeking to address this challenge by autonomously analyzing source code to identify vulnerabilities. This paper addresses two primary research questions: (RQ1) How is current AVD research distributed across its core components? (RQ2) What key areas should future research target to bridge the gap in the practical applicability of AVD throughout software development? To answer these questions, we conduct a systematization over 79 AVD articles and 17 empirical studies, analyzing them across five core components: task formulation and granularity, input programming languages and representations, detection approaches and key solutions, evaluation metrics and datasets, and reported performance. Our systematization reveals that the narrow focus of AVD research-mainly on specific tasks and programming languages-limits its practical impact and overlooks broader areas crucial for effective, real-world vulnerability detection. We identify significant challenges, including the need for diversified problem formulations, varied detection granularities, broader language support, better dataset quality, enhanced reproducibility, and increased practical impact. Based on these findings we identify research directions that will enhance the effectiveness and applicability of AVD solutions in software security.
Sustainable software development involves creating software in a manner that meets present goals without undermining our ability to meet future goals. In a software engineering context, sustainability has at least four dimensions: ecological, economic, social, and technical. No interventions for improving social sustainability in software engineering have been tested in rigorous lab-based experiments, and little evidence-based guidance is available. The purpose of this study is to evaluate the effectiveness of two interventions-stakeholder maps and persona models-for improving social sustainability through software feature prioritization. We conducted a randomized controlled factorial experiment with 79 undergraduate computer science students. Participants were randomly assigned to one of four groups and asked to prioritize a backlog of prosocial, neutral, and antisocial user stories for a shopping mall's digital screen display and facial recognition software. Participants received either persona models, a stakeholder map, both, or neither. We compared the differences in prioritization levels assigned to prosocial and antisocial user stories using Cumulative Link Mixed Model regression. Participants who received persona models gave significantly lower priorities to antisocial user stories but no significant difference was evident for prosocial user stories. The effects of the stakeholder map were not significant. The interaction effects were not significant. Providing aspiring software professionals with well-crafted persona models causes them to de-prioritize antisocial software features. The impact of persona modelling on sustainable software development therefore warrants further study with more experience professionals. Moreover, the novel methodological strategy of assessing social sustainability behavior through backlog prioritization appears feasible in lab-based settings.
With the growing use of Transformer models hosted on cloud platforms to offer inference services, privacy concerns are escalating, especially concerning sensitive data like investment plans and bank account details. Secure Multi-Party Computing (SMPC) emerges as a promising solution to protect the privacy of inference data and model parameters. However, the application of SMPC in Privacy-Preserving Inference (PPI) for Transformer models often leads to considerable slowdowns or declines in performance. This is largely due to the multitude of nonlinear operations in the Transformer architecture, which are not well-suited to SMPC and difficult to circumvent or optimize effectively. To address this concern, we introduce a comprehensive PPI framework called SecFormer to achieve fast and accurate PPI for Transformer models. We successfully eliminate the high-cost exponential and maximum operations in PPI without sacrificing model performance and develop a suite of efficient SMPC protocols by employing suitable numerical computation methods to boost other complex nonlinear functions in PPI, including GeLU, LayerNorm, and a redesigned Softmax. Our extensive experiments reveal that SecFormer outperforms MPCFormer in performance, showing improvements of $3.4\%$ and $24.7\%$ for BERT$_{\text{BASE}}$ and BERT$_{\text{LARGE}}$, respectively. In terms of efficiency, SecFormer is 3.57 and 3.58 times faster than PUMA for BERT$_{\text{BASE}}$ and BERT$_{\text{LARGE}}$, demonstrating its effectiveness and speed.
This paper presents the development of a software tool that enables the translation of first-order predicate logic with at most three variables into relation algebra. The tool was developed using the Z3 theorem prover, leveraging its capabilities to enhance reliability, generate code, and expedite development. The resulting standalone Python program allows users to translate first-order logic formulas into relation algebra, eliminating the need to work with relation algebra explicitly. This paper outlines the theoretical background of first-order logic, relation algebra, and the translation process. It also describes the implementation details, including validation of the software tool using Z3 for testing correctness. By demonstrating the feasibility of utilizing first-order logic as an alternative language for expressing relation algebra, this tool paves the way for integrating first-order logic into tools traditionally relying on relation algebra as input.
This paper presents HyperGraphOS, a significant innovation in the domain of operating systems, specifically designed to address the needs of scientific and engineering domains. This platform aims to combine model-based engineering, graph modeling, data containers, and documents, along with tools for handling computational elements. HyperGraphOS functions as an Operating System offering to users an infinite workspace for creating and managing complex models represented as graphs with customizable semantics. By leveraging a web-based architecture, it requires only a modern web browser for access, allowing organization of knowledge, documents, and content into models represented in a network of workspaces. Elements of the workspace are defined in terms of domain-specific languages (DSLs). These DSLs are pivotal for navigating workspaces, generating code, triggering AI components, and organizing information and processes. The models' dual nature as both visual drawings and data structures allows dynamic modifications and inspections both interactively as well as programaticaly. We evaluated HyperGraphOS's efficiency and applicability across a large set of diverse domains, including the design and development of a virtual Avatar dialog system, a robotic task planner based on large language models (LLMs), a new meta-model for feature-based code development and many others. Our findings show that HyperGraphOS offers substantial benefits in the interaction with a computer as information system, as platoform for experiments and data analysis, as streamlined engineering processes, demonstrating enhanced flexibility in managing data, computation and documents, showing an innovative approaches to persistent desktop environments.
Recommender systems exploit interaction history to estimate user preference, having been heavily used in a wide range of industry applications. However, static recommendation models are difficult to answer two important questions well due to inherent shortcomings: (a) What exactly does a user like? (b) Why does a user like an item? The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. The recent rise of conversational recommender systems (CRSs) changes this situation fundamentally. In a CRS, users and the system can dynamically communicate through natural language interactions, which provide unprecedented opportunities to explicitly obtain the exact preference of users. Considerable efforts, spread across disparate settings and applications, have been put into developing CRSs. Existing models, technologies, and evaluation methods for CRSs are far from mature. In this paper, we provide a systematic review of the techniques used in current CRSs. We summarize the key challenges of developing CRSs into five directions: (1) Question-based user preference elicitation. (2) Multi-turn conversational recommendation strategies. (3) Dialogue understanding and generation. (4) Exploitation-exploration trade-offs. (5) Evaluation and user simulation. These research directions involve multiple research fields like information retrieval (IR), natural language processing (NLP), and human-computer interaction (HCI). Based on these research directions, we discuss some future challenges and opportunities. We provide a road map for researchers from multiple communities to get started in this area. We hope this survey helps to identify and address challenges in CRSs and inspire future research.