精品夜色国产国偷自产乱码_日本国产欧美精品视频一区二区三区_日韩精品一区二区三区试看_欧美人成一本免费观看视频_韩国日本中文字幕一区二区_亚洲日韩精品欧美中文字幕一区_九九热久久免费视频

Rajdeep Mukherjee,Abhinav Bohra,Akash Banerjee,Soumya Sharma,Manjunath Hegde,Afreen Shaikh,Shivani Shrivastava,Koustuv Dasgupta,Niloy Ganguly,Saptarshi Ghosh,Pawan Goyal

from arxiv, 14 pages; Accepted as a Long Paper in EMNLP 2022 (Main Conference); Codes: //github.com/rajdeep345/ECTSum

Despite tremendous progress in automatic summarization, state-of-the-art methods are predominantly trained to excel in summarizing short newswire articles, or documents with strong layout biases such as scientific articles or government reports. Efficient techniques to summarize financial documents, including facts and figures, have largely been unexplored, majorly due to the unavailability of suitable datasets. In this work, we present ECTSum, a new dataset with transcripts of earnings calls (ECTs), hosted by publicly traded companies, as documents, and short experts-written telegram-style bullet point summaries derived from corresponding Reuters articles. ECTs are long unstructured documents without any prescribed length limit or format. We benchmark our dataset with state-of-the-art summarizers across various metrics evaluating the content quality and factual consistency of the generated summaries. Finally, we present a simple-yet-effective approach, ECT-BPS, to generate a set of bullet points that precisely capture the important facts discussed in the calls.

相關內容

state-of-the-art

關注 7

Pegasus · MoDELS · seq2seq · CASE · 推斷 ·

2022 年 12 月 12 日

Implementing Deep Learning-Based Approaches for Article Summarization in Indian Languages

Rahul Tangsali,Aabha Pingle,Aditya Vyawahare,Isha Joshi,Raviraj Joshi

from arxiv, Accepted at ILSUM at FIRE 2022

The research on text summarization for low-resource Indian languages has been limited due to the availability of relevant datasets. This paper presents a summary of various deep-learning approaches used for the ILSUM 2022 Indic language summarization datasets. The ISUM 2022 dataset consists of news articles written in Indian English, Hindi, and Gujarati respectively, and their ground-truth summarizations. In our work, we explore different pre-trained seq2seq models and fine-tune those with the ILSUM 2022 datasets. In our case, the fine-tuned SoTA PEGASUS model worked the best for English, the fine-tuned IndicBART model with augmented data for Hindi, and again fine-tuned PEGASUS model along with a translation mapping-based approach for Gujarati. Our scores on the obtained inferences were evaluated using ROUGE-1, ROUGE-2, and ROUGE-4 as the evaluation metrics.

任務對話系統 · 可理解性 · 聲紋識別 · Extensibility · INFORMS ·

2022 年 12 月 12 日

A Benchmark for Understanding and Generating Dialogue between Characters in Stories

Jianzhu Yao,Ziqi Liu,Jian Guan,Minlie Huang

Many classical fairy tales, fiction, and screenplays leverage dialogue to advance story plots and establish characters. We present the first study to explore whether machines can understand and generate dialogue in stories, which requires capturing traits of different characters and the relationships between them. To this end, we propose two new tasks including Masked Dialogue Generation and Dialogue Speaker Recognition, i.e., generating missing dialogue turns and predicting speakers for specified dialogue turns, respectively. We build a new dataset DialStory, which consists of 105k Chinese stories with a large amount of dialogue weaved into the plots to support the evaluation. We show the difficulty of the proposed tasks by testing existing models with automatic and manual evaluation on DialStory. Furthermore, we propose to learn explicit character representations to improve performance on these tasks. Extensive experiments and case studies show that our approach can generate more coherent and informative dialogue, and achieve higher speaker recognition accuracy than strong baselines.

Performer · 值域 · 近似 · Unstructured · 模型評估 ·

2022 年 12 月 11 日

Benchmarking the face-centred finite volume method for compressible laminar flows

Jordi Vila-Pérez,Matteo Giacomini,Antonio Huerta

from arxiv, 39 pages, 18 figures, 12 tables

Purpose: This study aims to assess the robustness and accuracy of the face-centred finite volume (FCFV) method for the simulation of compressible laminar flows in different regimes, using numerical benchmarks. Design/methodology/approach: The work presents a detailed comparison with reference solutions published in the literature -- when available -- and numerical results computed using a commercial cell-centred finite volume software. Findings: The FCFV scheme provides first-order accurate approximations of the viscous stress tensor and the heat flux, insensitively to cell distortion or stretching. The strategy demonstrates its efficiency in inviscid and viscous flows, for a wide range of Mach numbers, also in the incompressible limit. In purely inviscid flows, non-oscillatory approximations are obtained in the presence of shock waves. In the incompressible limit, accurate solutions are computed without pressure correction algorithms. The method shows its superior performance for viscous high Mach number flows, achieving physically admissible solutions without carbuncle effect and predictions of quantities of interest with errors below 5%. Originality/value: The FCFV method accurately evaluates, for a wide range of compressible laminar flows, quantities of engineering interest, such as drag, lift and heat transfer coefficients, on unstructured meshes featuring distorted and highly stretched cells, with an aspect ratio up to ten thousand. The method is suitable to simulate industrial flows on complex geometries, relaxing the requirements on mesh quality introduced by existing finite volume solvers and alleviating the need for time-consuming manual procedures for mesh generation to be performed by specialised technicians.

INFORMS · 信息抽取 · 知識 (knowledge) · 講稿 · entity ·

2022 年 12 月 11 日

MORTY: Structured Summarization for Targeted Information Extraction from Scholarly Articles

Mohamad Yaser Jaradeh,Markus Stocker,S?ren Auer

from arxiv, Published as a short paper in ICADL 2022

Information extraction from scholarly articles is a challenging task due to the sizable document length and implicit information hidden in text, figures, and citations. Scholarly information extraction has various applications in exploration, archival, and curation services for digital libraries and knowledge management systems. We present MORTY, an information extraction technique that creates structured summaries of text from scholarly articles. Our approach condenses the article's full-text to property-value pairs as a segmented text snippet called structured summary. We also present a sizable scholarly dataset combining structured summaries retrieved from a scholarly knowledge graph and corresponding publicly available scientific articles, which we openly publish as a resource for the research community. Our results show that structured summarization is a suitable approach for targeted information extraction that complements other commonly used methods such as question answering and named entity recognition.

相關系數 · Scopus · 秩 · 線性的 · Excel ·

2022 年 12 月 11 日

In which fields are citations indicators of research quality?

Mike Thelwall,Kayvan Kousha,Mahshid Abdoli,Emma Stuart,Meiko Makita,Paul Wilson,Jonathan Levitt

Citation counts are widely used as indicators of research quality to support or replace human peer review and for lists of top cited papers, researchers, and institutions. Nevertheless, the extent to which citation counts reflect research quality is not well understood. We report the largest-scale evaluation of the relationship between research quality and citation counts, correlating them for 87,739 journal articles in 34 field-based Units of Assessment (UoAs) from the UK. We show that the two correlate positively in all academic fields examined, from very weak (0.1) to strong (0.5). The highest correlations are in health, life sciences and physical sciences and the lowest are in the arts and humanities. The patterns are similar for the field classification schemes of Scopus and Dimensions.ai. We also show that there is no citation threshold in any field beyond which all articles are excellent quality, so lists of top cited articles are not definitive collections of excellence. Moreover, log transformed citation counts have a close to linear relationship with UK research quality ranked scores that is shallow in some fields but steep in others. In conclusion, whilst appropriately field normalised citations associate positively with research quality in all fields, they never perfectly reflect it, even at very high values.

SimPLe · 圖 · 類別 · 情景 · Sphering ·

2022 年 12 月 10 日

Improved enumeration of simple topological graphs

Jan Kyn?l

from arxiv, 41 pages, 19 figures; removed an incorrect remark after Proposition 6

A simple topological graph T = (V(T), E(T)) is a drawing of a graph in the plane where every two edges have at most one common point (an endpoint or a crossing) and no three edges pass through a single crossing. Topological graphs G and H are isomorphic if H can be obtained from G by a homeomorphism of the sphere, and weakly isomorphic if G and H have the same set of pairs of crossing edges. We generalize results of Pach and Toth and the author's previous results on counting different drawings of a graph under both notions of isomorphism. We prove that for every graph G with n vertices, m edges and no isolated vertices the number of weak isomorphism classes of simple topological graphs that realize G is at most 2^O(n^2 log(m/n)), and at most 2^O(mn^{1/2} log n) if m < n^{3/2}. As a consequence we obtain a new upper bound 2^O(n^{3/2} log n) on the number of intersection graphs of n pseudosegments. We improve the upper bound on the number of weak isomorphism classes of simple complete topological graphs with n vertices to 2^{n^2 alpha(n)^O(1)}, using an upper bound on the size of a set of permutations with bounded VC-dimension recently proved by Cibulka and the author. We show that the number of isomorphism classes of simple topological graphs that realize G is at most 2^{m^2+O(mn)} and at least 2^Omega(m^2) for graphs with m > (6+epsilon)n.

MoDELS · 代碼 · INFORMS · 樣本 · 語言模型化 ·

2022 年 12 月 9 日

Fault-Aware Neural Code Rankers

Jeevana Priya Inala,Chenglong Wang,Mei Yang,Andres Codas,Mark Encarnación,Shuvendu K Lahiri,Madanlal Musuvathi,Jianfeng Gao

from arxiv, In the proceedings of Advances in Neural Information Processing Systems, 2022

Large language models (LLMs) have demonstrated an impressive ability to generate code for various programming tasks. In many instances, LLMs can generate a correct program for a task when given numerous trials. Consequently, a recent trend is to do large scale sampling of programs using a model and then filtering/ranking the programs based on the program execution on a small number of known unit tests to select one candidate solution. However, these approaches assume that the unit tests are given and assume the ability to safely execute the generated programs (which can do arbitrary dangerous operations such as file manipulations). Both of the above assumptions are impractical in real-world software development. In this paper, we propose CodeRanker, a neural ranker that can predict the correctness of a sampled program without executing it. Our CodeRanker is fault-aware i.e., it is trained to predict different kinds of execution information such as predicting the exact compile/runtime error type (e.g., an IndexError or a TypeError). We show that CodeRanker can significantly increase the pass@1 accuracy of various code generation models (including Codex, GPT-Neo, GPT-J) on APPS, HumanEval and MBPP datasets.

特征選擇 · Performer · Machine Learning · Analysis · Learning ·

2022 年 12 月 9 日

A Comparative Performance Analysis of Explainable Machine Learning Models With And Without RFECV Feature Selection Technique Towards Ransomware Classification

Rawshan Ara Mowri,Madhuri Siddula,Kaushik Roy

from arxiv, arXiv admin note: text overlap with arXiv:2210.11235

Ransomware has emerged as one of the major global threats in recent days. The alarming increasing rate of ransomware attacks and new ransomware variants intrigue the researchers in this domain to constantly examine the distinguishing traits of ransomware and refine their detection or classification strategies. Among the broad range of different behavioral characteristics, the trait of Application Programming Interface (API) calls and network behaviors have been widely utilized as differentiating factors for ransomware detection, or classification. Although many of the prior approaches have shown promising results in detecting and classifying ransomware families utilizing these features without applying any feature selection techniques, feature selection, however, is one of the potential steps toward an efficient detection or classification Machine Learning model because it reduces the probability of overfitting by removing redundant data, improves the model's accuracy by eliminating irrelevant features, and therefore reduces training time. There have been a good number of feature selection techniques to date that are being used in different security scenarios to optimize the performance of the Machine Learning models. Hence, the aim of this study is to present the comparative performance analysis of widely utilized Supervised Machine Learning models with and without RFECV feature selection technique towards ransomware classification utilizing the API call and network traffic features. Thereby, this study provides insight into the efficiency of the RFECV feature selection technique in the case of ransomware classification which can be used by peers as a reference for future work in choosing the feature selection technique in this domain.

Pegasus · Performer · state-of-the-art · MoDELS · ROUGE ·

2020 年 6 月 2 日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Jingqing Zhang,Yao Zhao,Mohammad Saleh,Peter J. Liu

from arxiv, Added Human Evaluation results; Code link added; Accepted for ICML 2020

Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples. Finally we validated our results using human evaluation and show that our model summaries achieve human performance on multiple datasets.