爱琴海论坛视频播放三免费_在线亚洲91SE亚洲综合在线_在线人成免费视频观看视频_欧美精品日韩精品国内精品_午夜福利精品视频在线观看不卡_亚洲中文字幕一区二区不三区不卡_精品国产一区二区三区香蕉欧美

With the rise of AI in SE, researchers have shown how AI can be applied to assist software developers in a wide variety of activities. However, it has not been accompanied by a complementary increase in labelled datasets, which is required in many supervised learning methods. Several studies have been using crowdsourcing platforms to collect labelled training data in recent years. However, research has shown that the quality of labelled data is unstable due to participant bias, knowledge variance, and task difficulty. Thus, we present CodeLabeller, a web-based tool that aims to provide a more efficient approach in handling the process of labelling Java source files at scale by improving the data collection process throughout, and improving the degree of reliability of responses by requiring each labeller to attach a confidence rating to each of their responses. We test CodeLabeller by constructing a corpus of over a thousand source files obtained from a large collection of opensource Java projects, and labelling each Java source file with their respective design patterns and summaries. Apart from assisting researchers to crowdsource a labelled dataset, the tool has practical applicability in software engineering education and assists in building expert ratings for software artefacts. This paper discusses the motivation behind the creation of CodeLabeller, the intended users, a tool demonstration and its UI, its implementation, benefits, and lastly, the evaluation through a user study and in-practice usage.

相關內容

標(biao)注

關注 2

多樣性 · Pair · 數據集 · HTTPS · 注意力機制 ·

2022 年 1 月 27 日

Audio Retrieval with Natural Language Queries: A Benchmark Study

A. Sophia Koepke,Andreea-Maria Oncescu,Jo?o F. Henriques,Zeynep Akata,Samuel Albanie

from arxiv, Submitted to Transactions on Multimedia. arXiv admin note: substantial text overlap with arXiv:2105.02192

The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the goal is to retrieve the audio content from a pool of candidates that best matches a given written description and vice versa. Text-audio retrieval enables users to search large databases through an intuitive interface: they simply issue free-form natural language descriptions of the sound they would like to hear. To study the tasks of text-audio and audio-text retrieval, which have received limited attention in the existing literature, we introduce three challenging new benchmarks. We first construct text-audio and audio-text retrieval benchmarks from the AudioCaps and Clotho audio captioning datasets. Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in AudioCaps and Clotho. We employ these three benchmarks to establish baselines for cross-modal text-audio and audio-text retrieval, where we demonstrate the benefits of pre-training on diverse audio tasks. We hope that our benchmarks will inspire further research into audio retrieval with free-form text queries. Code, audio features for all datasets used, and the SoundDescs dataset are publicly available at //github.com/akoepke/audio-retrieval-benchmark.

INFORMS · CASE · TOOLS · 置信度 · 可行 ·

2022 年 1 月 27 日

A Strategy for Advancing Research and Impact in New Computing Paradigms

Rajkumar Buyya,Sukhpal Singh Gill,Satish Narayana Srirama,Rami Bahsoon,San Murugesan

from arxiv, 8 pages, 1 figure

In the world of Information Technology, new computing paradigms, driven by requirements of different classes of problems and applications, emerge rapidly. These new computing paradigms pose many new research challenges. Researchers from different disciplines are working together to develop innovative solutions addressing them. In newer research areas with many unknowns, creating roadmaps, enabling tools, inspiring technological and application demonstrators offer confidence and prove feasibility and effectiveness of new paradigm. Drawing on our experience, we share strategy for advancing the field and community building in new and emerging computing research areas. We discuss how the development simulators can be cost-effective in accelerating design of real systems. We highlight strategic role played by different types of publications, conferences, and educational programs. We illustrate effectiveness of elements of our strategy with a case study on progression of cloud computing paradigm.

Performer · Better · MoDELS · API · Stack Overflow ·

2022 年 1 月 27 日

Aspect-Based API Review Classification: How Far Can Pre-Trained Transformer Model Go?

chengran Yang,Bowen Xu,Junaed younus Khan,Gias Uddin,Donggyun Han,Zhou Yang,David Lo

from arxiv, Accepted by Research Track in SANER 2022

APIs (Application Programming Interfaces) are reusable software libraries and are building blocks for modern rapid software development. Previous research shows that programmers frequently share and search for reviews of APIs on the mainstream software question and answer (Q&A) platforms like Stack Overflow, which motivates researchers to design tasks and approaches related to process API reviews automatically. Among these tasks, classifying API reviews into different aspects (e.g., performance or security), which is called the aspect-based API review classification, is of great importance. The current state-of-the-art (SOTA) solution to this task is based on the traditional machine learning algorithm. Inspired by the great success achieved by pre-trained models on many software engineering tasks, this study fine-tunes six pre-trained models for the aspect-based API review classification task and compares them with the current SOTA solution on an API review benchmark collected by Uddin et al. The investigated models include four models (BERT, RoBERTa, ALBERT and XLNet) that are pre-trained on natural languages, BERTOverflow that is pre-trained on text corpus extracted from posts on Stack Overflow, and CosSensBERT that is designed for handling imbalanced data. The results show that all the six fine-tuned models outperform the traditional machine learning-based tool. More specifically, the improvement on the F1-score ranges from 21.0% to 30.2%. We also find that BERTOverflow, a model pre-trained on the corpus from Stack Overflow, does not show better performance than BERT. The result also suggests that CosSensBERT also does not exhibit better performance than BERT in terms of F1, but it is still worthy of being considered as it achieves better performance on MCC and AUC.

Feel · 學成 · 邊 · 分布式機器學習 · 相互獨立的 ·

2022 年 1 月 27 日

Federated Edge Learning : Design Issues and Challenges

Afaf Ta?k,Soumaya Cherkaoui

from arxiv, Submitted to IEEE Network Magazine

Federated Learning (FL) is a distributed machine learning technique, where each device contributes to the learning model by independently computing the gradient based on its local training data. It has recently become a hot research topic, as it promises several benefits related to data privacy and scalability. However, implementing FL at the network edge is challenging due to system and data heterogeneity and resources constraints. In this article, we examine the existing challenges and trade-offs in Federated Edge Learning (FEEL). The design of FEEL algorithms for resources-efficient learning raises several challenges. These challenges are essentially related to the multidisciplinary nature of the problem. As the data is the key component of the learning, this article advocates a new set of considerations for data characteristics in wireless scheduling algorithms in FEEL. Hence, we propose a general framework for the data-aware scheduling as a guideline for future research directions. We also discuss the main axes and requirements for data evaluation and some exploitable techniques and metrics.

數據增強 · 訓練數據 · INTERACT · Processing（編程語言） · 樣例 ·

2021 年 6 月 3 日

LearnDA: Learnable Knowledge-Guided Data Augmentation for Event Causality Identification

Xinyu Zuo,Pengfei Cao,Yubo Chen,Kang Liu,Jun Zhao,Weihua Peng,Yuguang Chen

from arxiv, Accepted to ACL 2021

Modern models for event causality identification (ECI) are mainly based on supervised learning, which are prone to the data lacking problem. Unfortunately, the existing NLP-related augmentation methods cannot directly produce the available data required for this task. To solve the data lacking problem, we introduce a new approach to augment training data for event causality identification, by iteratively generating new examples and classifying event causality in a dual learning framework. On the one hand, our approach is knowledge-guided, which can leverage existing knowledge bases to generate well-formed new sentences. On the other hand, our approach employs a dual mechanism, which is a learnable augmentation framework and can interactively adjust the generation process to generate task-related sentences. Experimental results on two benchmarks EventStoryLine and Causal-TimeBank show that 1) our method can augment suitable task-related training data for ECI; 2) our method outperforms previous methods on EventStoryLine and Causal-TimeBank (+2.5 and +2.1 points on F1 value respectively).

語言模型化 · 數據選擇 · 簇 · MoDELS · 無監督 ·

2020 年 4 月 5 日

Unsupervised Domain Clusters in Pretrained Language Models

Roee Aharoni,Yoav Goldberg

from arxiv, Accepted as a long paper in ACL 2020

The notion of "in-domain data" in NLP is often over-simplistic and vague, as textual data varies in many nuanced linguistic aspects such as topic, style or level of formality. In addition, domain labels are many times unavailable, making it challenging to build domain-specific systems. We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision -- suggesting a simple data-driven definition of domains in textual data. We harness this property and propose domain data selection methods based on such models, which require only a small set of in-domain monolingual data. We evaluate our data selection methods for neural machine translation across five diverse domains, where they outperform an established approach as measured by both BLEU and by precision and recall of sentence selection with respect to an oracle.

Extensibility · 學習器 · MoDELS · INFORMS · 泛化理論 ·

2019 年 6 月 2 日

Sequential Scenario-Specific Meta Learner for Online Recommendation

Zhengxiao Du,Xiaowei Wang,Hongxia Yang,Jingren Zhou,Jie Tang

from arxiv, Accepted to KDD 2019

Cold-start problems are long-standing challenges for practical recommendations. Most existing recommendation algorithms rely on extensive observed data and are brittle to recommendation scenarios with few interactions. This paper addresses such problems using few-shot learning and meta learning. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. To accomplish this, we combine the scenario-specific learning with a model-agnostic sequential meta-learning and unify them into an integrated end-to-end framework, namely Scenario-specific Sequential Meta learner (or s^2 meta). By doing so, our meta-learner produces a generic initial model through aggregating contextual information from a variety of prediction tasks while effectively adapting to specific tasks by leveraging learning-to-learn knowledge. Extensive experiments on various real-world datasets demonstrate that our proposed model can achieve significant gains over the state-of-the-arts for cold-start problems in online recommendation. Deployment is at the Guess You Like session, the front page of the Mobile Taobao.

MINE · 數據挖掘 · 圖 · 知識圖譜 · entity ·

2018 年 8 月 7 日

AceKG: A Large-scale Knowledge Graph for Academic Data Mining

Ruijie Wang,Yuchen Yan,Jialu Wang,Yuting Jia,Ye Zhang,Weinan Zhang,Xinbing Wang

from arxiv, CIKM 2018

Most existing knowledge graphs (KGs) in academic domains suffer from problems of insufficient multi-relational information, name ambiguity and improper data format for large-scale machine processing. In this paper, we present AceKG, a new large-scale KG in academic domain. AceKG not only provides clean academic information, but also offers a large-scale benchmark dataset for researchers to conduct challenging data mining projects including link prediction, community detection and scholar classification. Specifically, AceKG describes 3.13 billion triples of academic facts based on a consistent ontology, including necessary properties of papers, authors, fields of study, venues and institutes, as well as the relations among them. To enrich the proposed knowledge graph, we also perform entity alignment with existing databases and rule-based inference. Based on AceKG, we conduct experiments of three typical academic data mining tasks and evaluate several state-of- the-art knowledge embedding and network representation learning approaches on the benchmark datasets built from AceKG. Finally, we discuss several promising research directions that benefit from AceKG.

情感分析 · Engineering · TOOLS · 可辨認的 · Performer ·

2018 年 3 月 17 日

A Benchmark Study on Sentiment Analysis for Software Engineering Research

Nicole Novielli,Daniela Girardi,Filippo Lanubile

from arxiv, Proceedings of 15th International Conference on Mining Software Repositories (MSR 2018)

A recent research trend has emerged to identify developers' emotions, by applying sentiment analysis to the content of communication traces left in collaborative development environments. Trying to overcome the limitations posed by using off-the-shelf sentiment analysis tools, researchers recently started to develop their own tools for the software engineering domain. In this paper, we report a benchmark study to assess the performance and reliability of three sentiment analysis tools specifically customized for software engineering. Furthermore, we offer a reflection on the open challenges, as they emerge from a qualitative analysis of misclassified texts.

圖 · 學成 · FreeBASIC · Processing（編程語言） · 可辨認的 ·

2018 年 1 月 21 日

Learning to Speed Up Query Planning in Graph Databases

Mohammad Hossain Namaki,F A Rezaur Rahman Chowdhury,Md Rakibul Islam,Janardhan Rao Doppa,Yinghui Wu

from arxiv, Published in the Proceedings of the 27th International Conference on Automated Planning and Scheduling (ICAPS), 2017

Querying graph structured data is a fundamental operation that enables important applications including knowledge graph search, social network analysis, and cyber-network security. However, the growing size of real-world data graphs poses severe challenges for graph databases to meet the response-time requirements of the applications. Planning the computational steps of query processing - Query Planning - is central to address these challenges. In this paper, we study the problem of learning to speedup query planning in graph databases towards the goal of improving the computational-efficiency of query processing via training queries.We present a Learning to Plan (L2P) framework that is applicable to a large class of query reasoners that follow the Threshold Algorithm (TA) approach. First, we define a generic search space over candidate query plans, and identify target search trajectories (query plans) corresponding to the training queries by performing an expensive search. Subsequently, we learn greedy search control knowledge to imitate the search behavior of the target query plans. We provide a concrete instantiation of our L2P framework for STAR, a state-of-the-art graph query reasoner. Our experiments on benchmark knowledge graphs including DBpedia, YAGO, and Freebase show that using the query plans generated by the learned search control knowledge, we can significantly improve the speed of STAR with negligible loss in accuracy.