亚州AV无码专区在线电影_日韩纯肉无遮挡一区二区视频_久久久人妻一区二区三区精品_美国黄色版视频网站_丁香婷婷综合激情国产_精品无码永久在线观看VA大片_免费A级毛片视频观看

Successful software projects depend on the quality of software requirements. Creating high-quality requirements is a crucial step toward successful software development. Effective support in this area can significantly reduce development costs and enhance the software quality. In this paper, we introduce and assess the capabilities of a Large Language Model (LLM) to evaluate the quality characteristics of software requirements according to the ISO 29148 standard. We aim to further improve the support of stakeholders engaged in requirements engineering (RE). We show how an LLM can assess requirements, explain its decision-making process, and examine its capacity to propose improved versions of requirements. We conduct a study with software engineers to validate our approach. Our findings emphasize the potential of LLMs for improving the quality of software requirements.

相關內容

大(da)語(yu)言模型

關注 56

大語(yu)言(yan)模(mo)(mo)(mo)型(xing)是(shi)基于(yu)海(hai)量(liang)文本(ben)(ben)(ben)數據訓練的(de)(de)(de)(de)深度學習模(mo)(mo)(mo)型(xing)。它不僅能(neng)(neng)(neng)夠生(sheng)成自然(ran)(ran)語(yu)言(yan)文本(ben)(ben)(ben)，還能(neng)(neng)(neng)夠深入理(li)(li)解文本(ben)(ben)(ben)含義，處理(li)(li)各種(zhong)自然(ran)(ran)語(yu)言(yan)任(ren)務，如(ru)文本(ben)(ben)(ben)摘(zhai)要、問答、翻譯等。2023年，大語(yu)言(yan)模(mo)(mo)(mo)型(xing)及其(qi)在(zai)人(ren)(ren)工智(zhi)能(neng)(neng)(neng)領域的(de)(de)(de)(de)應用已成為全球(qiu)科(ke)技研究的(de)(de)(de)(de)熱點，其(qi)在(zai)規模(mo)(mo)(mo)上的(de)(de)(de)(de)增(zeng)長尤(you)為引(yin)人(ren)(ren)注目，參數量(liang)已從最初的(de)(de)(de)(de)十幾億(yi)躍升到如(ru)今(jin)的(de)(de)(de)(de)一萬億(yi)。參數量(liang)的(de)(de)(de)(de)提(ti)升使得模(mo)(mo)(mo)型(xing)能(neng)(neng)(neng)夠更加精(jing)細地捕捉人(ren)(ren)類語(yu)言(yan)微妙(miao)之處，更加深入地理(li)(li)解人(ren)(ren)類語(yu)言(yan)的(de)(de)(de)(de)復雜(za)性。在(zai)過去(qu)的(de)(de)(de)(de)一年里，大語(yu)言(yan)模(mo)(mo)(mo)型(xing)在(zai)吸納(na)新知識、分解復雜(za)任(ren)務以(yi)及圖文對(dui)齊等多方(fang)面都(dou)有顯著提(ti)升。隨著技術的(de)(de)(de)(de)不斷(duan)成熟，它將不斷(duan)拓展(zhan)其(qi)應用范圍(wei)，為人(ren)(ren)類提(ti)供(gong)更加智(zhi)能(neng)(neng)(neng)化(hua)和個性化(hua)的(de)(de)(de)(de)服務，進一步改善人(ren)(ren)們(men)的(de)(de)(de)(de)生(sheng)活(huo)和生(sheng)產(chan)方(fang)式。

代碼 · 生成式人工智能 · AI · Extensibility · 多樣性 ·

2024 年 9 月 30 日

N-Version Assessment and Enhancement of Generative AI

Marcus Kessel,Colin Atkinson

from arxiv, This work has been accepted for publication in an upcoming issue of IEEE Software. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Generative AI (GAI) holds great potential to improve software engineering productivity, but its untrustworthy outputs, particularly in code synthesis, pose significant challenges. The need for extensive verification and validation (V&V) of GAI-generated artifacts may undermine the potential productivity gains. This paper proposes a way of mitigating these risks by exploiting GAI's ability to generate multiple versions of code and tests to facilitate comparative analysis across versions. Rather than relying on the quality of a single test or code module, this "differential GAI" (D-GAI) approach promotes more reliable quality evaluation through version diversity. We introduce the Large-Scale Software Observatorium (LASSO), a platform that supports D-GAI by executing and analyzing large sets of code versions and tests. We discuss how LASSO enables rigorous evaluation of GAI-generated artifacts and propose its application in both software development and GAI research.

正則化項 · 可理解性 · Networking · Learning · 軟件工程 ·

2024 年 9 月 27 日

An Overview and Catalogue of Dependency Challenges in Open Source Software Package Registries

Tom Mens,Alexandre Decan

from arxiv, Submitted to the BENEVOL24 23rd Belgium-Netherlands Software Evolution Workshop

While open-source software has enabled significant levels of reuse to speed up software development, it has also given rise to the dreadful dependency hell that all software practitioners face on a regular basis. This article provides a catalogue of dependency-related challenges that come with relying on OSS packages or libraries. The catalogue is based on a review of the abundant scientific literature on empirical research that has been conducted to understand, quantify and overcome these challenges. Our results can be used as a starting point for junior and senior researchers as well as practitioners that would like to learn more about research advances in dealing with the challenges that come with the dependency networks of large OSS package registries.

Linux · Analysis · 講稿 · GCC · 論文 ·

2024 年 9 月 27 日

A Static Analysis of Popular C Packages in Linux

Jukka Ruohonen,Mubashrah Saddiqa,Krzysztof Sierszecki

from arxiv, Submitted

Static analysis is a classical technique for improving software security and software quality in general. Fairly recently, a new static analyzer was implemented in the GNU Compiler Collection (GCC). The present paper uses the GCC's analyzer to empirically examine popular Linux packages. The dataset used is based on those packages in the Gentoo Linux distribution that are either written in C or contain C code. In total, $3,538$ such packages are covered. According to the results, uninitialized variables and NULL pointer dereference issues are the most common problems according to the analyzer. Classical memory management issues are relatively rare. The warnings also follow a long-tailed probability distribution across the packages; a few packages are highly warning-prone, whereas no warnings are present for as much as 89% of the packages. Furthermore, the warnings do not vary across different application domains. With these results, the paper contributes to the domain of large-scale empirical research on software quality and security. In addition, a discussion is presented about practical implications of the results.

簇 · 分解 · SimPLe · 論文 · CASE ·

2024 年 9 月 27 日

Decomposing the Jaccard Distance and the Jaccard Index in ABCDE

Stephan van Staden

ABCDE is a sophisticated technique for evaluating differences between very large clusterings. Its main metric that characterizes the magnitude of the difference between two clusterings is the JaccardDistance, which is a true distance metric in the space of all clusterings of a fixed set of (weighted) items. The JaccardIndex is the complementary metric that characterizes the similarity of two clusterings. Its relationship with the JaccardDistance is simple: JaccardDistance + JaccardIndex = 1. This paper decomposes the JaccardDistance and the JaccardIndex further. In each case, the decomposition yields Impact and Quality metrics. The Impact metrics measure aspects of the magnitude of the clustering diff, while Quality metrics use human judgements to measure how much the clustering diff improves the quality of the clustering. The decompositions of this paper offer more and deeper insight into a clustering change. They also unlock new techniques for debugging and exploring the nature of the clustering diff. The new metrics are mathematically well-behaved and they are interrelated via simple equations. While the work can be seen as an alternative formal framework for ABCDE, we prefer to view it as complementary. It certainly offers a different perspective on the magnitude and the quality of a clustering change, and users can use whatever they want from each approach to gain more insight into a change.

優化器 · Consistent Optimization · Microsoft Surface · 設計 · Performer ·

2024 年 9 月 25 日

Electromagnetically Consistent Optimization Algorithms for the Global Design of RIS

M. W. Shabir,M. Di Renzo,A. Zappone,M. Debbah

from arxiv, Submitted for IEEE publication

The reconfigurable intelligent surface is an emerging technology for wireless communications. We model it as an inhomogeneous boundary of surface impedance, and consider various optimization problems that offer different tradeoffs in terms of performance and implementation complexity. The considered non-convex optimization problems are reformulated as a sequence of approximating linear quadratically constrained or semidefinite programs, which are proved to have a polynomial complexity and to converge monotonically in the objective value.

語言模型化 · Prompt · Performer · MoDELS · 大語言模型 ·

2024 年 9 月 25 日

On the Evaluation of Large Language Models in Unit Test Generation

Lin Yang,Chen Yang,Shutao Gao,Weijing Wang,Bo Wang,Qihao Zhu,Xiao Chu,Jianyi Zhou,Guangtai Liang,Qianxiang Wang,Junjie Chen

from arxiv, Accepted by ASE 2024, Research Paper Track

Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting strategies, leaving the capabilities of advanced open-source LLMs with various prompting settings unexplored. Particularly, open-source LLMs offer advantages in data privacy protection and have demonstrated superior performance in some tasks. Moreover, effective prompting is crucial for maximizing LLMs' capabilities. In this paper, we conduct the first empirical study to fill this gap, based on 17 Java projects, five widely-used open-source LLMs with different structures and parameter sizes, and comprehensive evaluation metrics. Our findings highlight the significant influence of various prompt factors, show the performance of open-source LLMs compared to the commercial GPT-4 and the traditional Evosuite, and identify limitations in LLM-based unit test generation. We then derive a series of implications from our study to guide future research and practical use of LLM-based unit test generation.

語言模型化 · 大語言模型 · MoDELS · Performer · INFORMS ·

2024 年 9 月 24 日

Contextual Evaluation of Large Language Models for Classifying Tropical and Infectious Diseases

Mercy Asiedu,Nenad Tomasev,Chintan Ghate,Tiya Tiyasirichokchai,Awa Dieng,Oluwatosin Akande,Geoffrey Siwo,Steve Adudans,Sylvanus Aitkins,Odianosen Ehiakhamen,Eric Ndombi,Katherine Heller

While large language models (LLMs) have shown promise for medical question answering, there is limited work focused on tropical and infectious disease-specific exploration. We build on an opensource tropical and infectious diseases (TRINDs) dataset, expanding it to include demographic and semantic clinical and consumer augmentations yielding 11000+ prompts. We evaluate LLM performance on these, comparing generalist and medical LLMs, as well as LLM outcomes to human experts. We demonstrate through systematic experimentation, the benefit of contextual information such as demographics, location, gender, risk factors for optimal LLM response. Finally we develop a prototype of TRINDs-LM, a research tool that provides a playground to navigate how context impacts LLM outputs for health.

Engineering · 代碼 · Prompt · 模型評估 · MoDELS ·

2024 年 9 月 24 日

Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity

Chung-Yu Wang,Alireza DaghighFarsoodeh,Hung Viet Pham

from arxiv, 18 pages + reference

Large Language Models (LLMs) have demonstrated impressive performance in software engineering tasks. However, improving their accuracy in generating correct and reliable code remains challenging. Numerous prompt engineering techniques (PETs) have been developed to address this, but no single approach is universally optimal. Selecting the right PET for each query is difficult for two primary reasons: (1) interactive prompting techniques may not consistently deliver the expected benefits, especially for simpler queries, and (2) current automated prompt engineering methods lack adaptability and fail to fully utilize multi-stage responses. To overcome these challenges, we propose PET-Select, a PET-agnostic selection model that uses code complexity as a proxy to classify queries and select the most appropriate PET. By incorporating contrastive learning, PET-Select effectively distinguishes between simple and complex problems, allowing it to choose PETs that are best suited for each query's complexity level. Our evaluations on the MBPP and HumanEval benchmarks using GPT-3.5 Turbo and GPT-4o show up to a 1.9% improvement in pass@1 accuracy, along with a 74.8% reduction in token usage. Additionally, we provide both quantitative and qualitative results to demonstrate how PET-Select effectively selects the most appropriate techniques for each code generation query, further showcasing its efficiency in optimizing PET selection.

prototype · Automator · RE · 大語言模型 · Integration ·

2024 年 9 月 24 日

Self-Elicitation of Requirements with Automated GUI Prototyping

Kristian Kolthoff,Christian Bartelt,Simone Paolo Ponzetto,Kurt Schneider

Requirements Elicitation (RE) is a crucial activity especially in the early stages of software development. GUI prototyping has widely been adopted as one of the most effective RE techniques for user-facing software systems. However, GUI prototyping requires (i) the availability of experienced requirements analysts, (ii) typically necessitates conducting multiple joint sessions with customers and (iii) creates considerable manual effort. In this work, we propose SERGUI, a novel approach enabling the Self-Elicitation of Requirements (SER) based on an automated GUI prototyping assistant. SERGUI exploits the vast prototyping knowledge embodied in a large-scale GUI repository through Natural Language Requirements (NLR) based GUI retrieval and facilitates fast feedback through GUI prototypes. The GUI retrieval approach is closely integrated with a Large Language Model (LLM) driving the prompting-based recommendation of GUI features for the current GUI prototyping context and thus stimulating the elicitation of additional requirements. We envision SERGUI to be employed in the initial RE phase, creating an initial GUI prototype specification to be used by the analyst as a means for communicating the requirements. To measure the effectiveness of our approach, we conducted a preliminary evaluation. Video presentation of SERGUI at: //youtu.be/pzAAB9Uht80

Processing（編程語言） · MoDELS · 自然語言處理 · 語言處理 · XAI ·

2020 年 10 月 1 日

A Survey of the State of Explainable AI for Natural Language Processing

Marina Danilevsky,Kun Qian,Ranit Aharonov,Yannis Katsis,Ban Kawas,Prithviraj Sen

from arxiv, To appear in AACL-IJCNLP 2020

Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area.