99视频在线播放喷射,亚洲日韩网站在线观看

Modern software development extensively depends on existing libraries written by other developer teams from the same or a different organization. When a developer executes the software, the execution trace may go across the boundaries of multiple software products and create cross-project failures (CPFs). Existing studies show that a stand-alone executable failure report may enable the most effective communication, but creating such a report is often challenging due to the complicated files and dependencies interactions in the software ecosystems. In this paper, to solve the CPF report trilemma, we developed PExReport, which automatically creates stand-alone executable CPF reports. PExReport leverages build tools to prune source code and dependencies, and further analyzes the build process to create a pruned build environment for reproducing the CPF. We performed an evaluation on 74 software project issues with 198 CPFs, and the evaluation results show that PExReport can create executable CPF reports for 184 out of 198 test failures in our dataset, with an average reduction of 72.97% on source classes and the classes in internal JARs.

相關內容

剪枝

關注 2

多峰值 · 語言模型化 · MoDELS · 邊界框 · Performer ·

2023 年 6 月 27 日

Kosmos-2: Grounding Multimodal Large Language Models to the World

Zhiliang Peng,Wenhui Wang,Li Dong,Yaru Hao,Shaohan Huang,Shuming Ma,Furu Wei

from arxiv, 20 pages

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e.g., bounding boxes) and grounding text to the visual world. Specifically, we represent refer expressions as links in Markdown, i.e., ``[text span](bounding boxes)'', where object descriptions are sequences of location tokens. Together with multimodal corpora, we construct large-scale data of grounded image-text pairs (called GrIT) to train the model. In addition to the existing capabilities of MLLMs (e.g., perceiving general modalities, following instructions, and performing in-context learning), Kosmos-2 integrates the grounding capability into downstream applications. We evaluate Kosmos-2 on a wide range of tasks, including (i) multimodal grounding, such as referring expression comprehension, and phrase grounding, (ii) multimodal referring, such as referring expression generation, (iii) perception-language tasks, and (iv) language understanding and generation. This work lays out the foundation for the development of Embodiment AI and sheds light on the big convergence of language, multimodal perception, action, and world modeling, which is a key step toward artificial general intelligence. Data, demo, and pretrained models are available at //aka.ms/kosmos-2.

INTERACT · 代碼 · 回合 · Extensibility · 網絡爬蟲 ·

2023 年 6 月 27 日

InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback

John Yang,Akshara Prabhakar,Karthik Narasimhan,Shunyu Yao

from arxiv, Project site with code and data: //intercode-benchmark.github.io

Humans write code in a fundamentally interactive manner and rely on constant execution feedback to correct errors, resolve ambiguities, and decompose tasks. While LLMs have recently exhibited promising coding capabilities, current coding benchmarks mostly consider a static instruction-to-code sequence transduction process, which has the potential for error propagation and a disconnect between the generated code and its final execution environment. To address this gap, we introduce InterCode, a lightweight, flexible, and easy-to-use framework of interactive coding as a standard reinforcement learning (RL) environment, with code as actions and execution feedback as observations. Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution, and is compatible out-of-the-box with traditional seq2seq coding methods, while enabling the development of new methods for interactive code generation. We use InterCode to create two interactive code environments with Bash and SQL as action spaces, leveraging data from the static Spider and NL2Bash datasets. We demonstrate InterCode's viability as a testbed by evaluating multiple state-of-the-art LLMs configured with different prompting strategies such as ReAct and Plan & Solve. Our results showcase the benefits of interactive code generation and demonstrate that InterCode can serve as a challenging benchmark for advancing code understanding and generation capabilities. InterCode is designed to be easily extensible and can even be used to incorporate new tasks such as Capture the Flag, a popular coding puzzle that is inherently multi-step and involves multiple programming languages. Project site with code and data: //intercode-benchmark.github.io

AI · SimPLe · Agent · 人工智能 ·

2023 年 6 月 26 日

Experiments with Detecting and Mitigating AI Deception

Ismail Sahbane,Francis Rhys Ward,C Henrik ?slund

from arxiv, 4 pages, 2 figures, 3 algorithms, 1 table

How to detect and mitigate deceptive AI systems is an open problem for the field of safe and trustworthy AI. We analyse two algorithms for mitigating deception: The first is based on the path-specific objectives framework where paths in the game that incentivise deception are removed. The second is based on shielding, i.e., monitoring for unsafe policies and replacing them with a safe reference policy. We construct two simple games and evaluate our algorithms empirically. We find that both methods ensure that our agent is not deceptive, however, shielding tends to achieve higher reward.

Apache Storm · ForCES · 相互獨立的 · AI · TOOLS ·

2023 年 6 月 26 日

AI could create a perfect storm of climate misinformation

Victor Galaz,Hannah Metzler,Stefan Daume,Andreas Olsson,Bj?rn Lindstr?m,Arvid Marklund

from arxiv, 26 pages, 1 table, 3 figures

We are in the midst of a transformation of the digital news ecosystem. The expansion of online social networks, the influence of recommender systems, increased automation, and new generative artificial intelligence tools are rapidly changing the speed and the way misinformation about climate change and sustainability issues moves around the world. Policymakers, researchers and the public need to combine forces to address the dangerous combination of opaque social media algorithms, polarizing social bots, and a new generation of AI-generated content. This synthesis brief is the result of a collaboration between Stockholm Resilience Centre at Stockholm University, the Beijer Institute of Ecological Economics at the Royal Swedish Academy of Sciences, the Complexity Science Hub Vienna, and Karolinska Institutet. It has been put together as an independent contribution to the Nobel Prize Summit 2023, Truth, Trust and Hope, Washington D.C., 24th to 26th of May 2023.

MoDELS · Performer · 穩健性 · Processing（編程語言） · 數據集 ·

2023 年 6 月 24 日

MeWEHV: Mel and Wave Embeddings for Human Voice Tasks

Andrés Carofilis,Laura Fernández-Robles,Enrique Alegre,Eduardo Fidalgo

from arxiv, Submitted to IEEE Access

A recent trend in speech processing is the use of embeddings created through machine learning models trained on a specific task with large datasets. By leveraging the knowledge already acquired, these models can be reused in new tasks where the amount of available data is small. This paper proposes a pipeline to create a new model, called Mel and Wave Embeddings for Human Voice Tasks (MeWEHV), capable of generating robust embeddings for speech processing. MeWEHV combines the embeddings generated by a pre-trained raw audio waveform encoder model, and deep features extracted from Mel Frequency Cepstral Coefficients (MFCCs) using Convolutional Neural Networks (CNNs). We evaluate the performance of MeWEHV on three tasks: speaker, language, and accent identification. For the first one, we use the VoxCeleb1 dataset and present YouSpeakers204, a new and publicly available dataset for English speaker identification that contains 19607 audio clips from 204 persons speaking in six different accents, allowing other researchers to work with a very balanced dataset, and to create new models that are robust to multiple accents. For evaluating the language identification task, we use the VoxForge and Common Language datasets. Finally, for accent identification, we use the Latin American Spanish Corpora (LASC) and Common Voice datasets. Our approach allows a significant increase in the performance of state-of-the-art models on all the tested datasets, with a low additional computational cost.

貝葉斯網/貝葉斯網絡 · Networking · MoDELS · Networks · Analysis ·

2023 年 6 月 23 日

Availability Analysis of Redundant and Replicated Cloud Services with Bayesian Networks

Otto Bibartiu,Frank Dürr,Kurt Rothermel,Beate Ottenw?lder,Andreas Grau

from arxiv, 16 pages, 12 figures, journal

Due to the growing complexity of modern data centers, failures are not uncommon any more. Therefore, fault tolerance mechanisms play a vital role in fulfilling the availability requirements. Multiple availability models have been proposed to assess compute systems, among which Bayesian network models have gained popularity in industry and research due to its powerful modeling formalism. In particular, this work focuses on assessing the availability of redundant and replicated cloud computing services with Bayesian networks. So far, research on availability has only focused on modeling either infrastructure or communication failures in Bayesian networks, but have not considered both simultaneously. This work addresses practical modeling challenges of assessing the availability of large-scale redundant and replicated services with Bayesian networks, including cascading and common-cause failures from the surrounding infrastructure and communication network. In order to ease the modeling task, this paper introduces a high-level modeling formalism to build such a Bayesian network automatically. Performance evaluations demonstrate the feasibility of the presented Bayesian network approach to assess the availability of large-scale redundant and replicated services. This model is not only applicable in the domain of cloud computing it can also be applied for general cases of local and geo-distributed systems.

MoDELS · Performer · Projection · 轉錄 · 語言模型化 ·

2023 年 6 月 22 日

The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique

Couture Beatrice,Verret Farah,Gohier Maxime,Deslandres Dominique

The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.

TOOLS · 情景 · 廣義函數 · Performer · 語言模型化 ·

2023 年 6 月 21 日

Testing of Detection Tools for AI-Generated Text

Debora Weber-Wulff,Alla Anohina-Naumeca,Sonja Bjelobaba,Tomá? Foltynek,Jean Guerrero-Dib,Olumide Popoola,Petr ?igut,Lorna Waddington

from arxiv, 38 pages, 13 figures and 10 tables, with appendix. Submitted to the International Journal of Educational Technology in Higher Education

Recent advances in generative pre-trained transformer large language models have emphasised the potential risks of unfair use of artificial intelligence (AI) generated content in an academic environment and intensified efforts in searching for solutions to detect such content. The paper examines the general functionality of detection tools for artificial intelligence generated text and evaluates them based on accuracy and error type analysis. Specifically, the study seeks to answer research questions about whether existing detection tools can reliably differentiate between human-written text and ChatGPT-generated text, and whether machine translation and content obfuscation techniques affect the detection of AIgenerated text. The research covers 12 publicly available tools and two commercial systems (Turnitin and PlagiarismCheck) that are widely used in the academic setting. The researchers conclude that the available detection tools are neither accurate nor reliable and have a main bias towards classifying the output as human-written rather than detecting AIgenerated text. Furthermore, content obfuscation techniques significantly worsen the performance of tools. The study makes several significant contributions. First, it summarises up-to-date similar scientific and non-scientific efforts in the field. Second, it presents the result of one of the most comprehensive tests conducted so far, based on a rigorous research methodology, an original document set, and a broad coverage of tools. Third, it discusses the implications and drawbacks of using detection tools for AI-generated text in academic settings.

語言模型化 · 自動問答 · MoDELS · 可約的 · entity ·

2021 年 9 月 22 日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Fu Sun,Feng-Lin Li,Ruize Wang,Qianglong Chen,Xingyi Cheng,Ji Zhang

from arxiv, CIKM 2021

Knowledge enhanced pre-trained language models (K-PLMs) are shown to be effective for many public tasks in the literature but few of them have been successfully applied in practice. To address this problem, we propose K-AID, a systematic approach that includes a low-cost knowledge acquisition process for acquiring domain knowledge, an effective knowledge infusion module for improving model performance, and a knowledge distillation component for reducing the model size and deploying K-PLMs on resource-restricted devices (e.g., CPU) for real-world application. Importantly, instead of capturing entity knowledge like the majority of existing K-PLMs, our approach captures relational knowledge, which contributes to better-improving sentence-level text classification and text matching tasks that play a key role in question answering (QA). We conducted a set of experiments on five text classification tasks and three text matching tasks from three domains, namely E-commerce, Government, and Film&TV, and performed online A/B tests in E-commerce. Experimental results show that our approach is able to achieve substantial improvement on sentence-level question answering tasks and bring beneficial business value in industrial settings.

Re-ID · 行人重識別 · Extensibility · 判別器 · INFORMS ·

2018 年 1 月 16 日

Re-ID done right: towards good practices for person re-identification

Jon Almazan,Bojana Gajic,Naila Murray,Diane Larlus

Training a deep architecture using a ranking loss has become standard for the person re-identification task. Increasingly, these deep architectures include additional components that leverage part detections, attribute predictions, pose estimators and other auxiliary information, in order to more effectively localize and align discriminative image regions. In this paper we adopt a different approach and carefully design each component of a simple deep architecture and, critically, the strategy for training it effectively for person re-identification. We extensively evaluate each design choice, leading to a list of good practices for person re-identification. By following these practices, our approach outperforms the state of the art, including more complex methods with auxiliary components, by large margins on four benchmark datasets. We also provide a qualitative analysis of our trained representation which indicates that, while compact, it is able to capture information from localized and discriminative regions, in a manner akin to an implicit attention mechanism.