顾美玲国产一区二区三区,东京热加勒比中文无码,狠狠色婷婷丁香综合久久韩国,亚洲一区二区三区国产不卡

Long sentences have been a persistent issue in written communication for many years since they make it challenging for readers to grasp the main points or follow the initial intention of the writer. This survey, conducted using the PRISMA guidelines, systematically reviews two main strategies for addressing the issue of long sentences: a) sentence compression and b) sentence splitting. An increased trend of interest in this area has been observed since 2005, with significant growth after 2017. Current research is dominated by supervised approaches for both sentence compression and splitting. Yet, there is a considerable gap in weakly and self-supervised techniques, suggesting an opportunity for further research, especially in domains with limited data. In this survey, we categorize and group the most representative methods into a comprehensive taxonomy. We also conduct a comparative evaluation analysis of these methods on common sentence compression and splitting datasets. Finally, we discuss the challenges and limitations of current methods, providing valuable insights for future research directions. This survey is meant to serve as a comprehensive resource for addressing the complexities of long sentences. We aim to enable researchers to make further advancements in the field until long sentences are no longer a barrier to effective communication.

知識薈萃

精品入門和進階教程、論文和代碼整理等

查看相關VIP內容、論文、資訊等

Engineering · 可理解性 · Vision · 可辨認的 · MoDELS ·

2024 年 1 月 29 日

AVELA - A Vision for Engineering Literacy & Access: Understanding Why Technology Alone Is Not Enough

Kyle Johnson,Vicente Arroyos,Celeste Garcia,Liban Hussein,Aisha Cora,Tsewone Melaku,Jay L. Cunningham,R. Benjamin Shapiro,Vikram Iyer

from arxiv, This is the author's version of the work. It is posted here for personal use, not for redistribution

Unequal technology access for Black and Latine communities has been a persistent economic, social justice, and human rights issue despite increased technology accessibility due to advancements in consumer electronics like phones, tablets, and computers. We contextualize socio-technical access inequalities for Black and Latine urban communities and find that many students are hesitant to engage with available technologies due to a lack of engaging support systems. We present a holistic student-led STEM engagement model through AVELA - A Vision for Engineering Literacy and Access leveraging culturally responsive lessons, mentor embodied community representation, and service learning. To evaluate the model's impact after 4 years of mentoring 200+ university student instructors in teaching to 2,500+ secondary school students in 100+ classrooms, we conducted 24 semi-structured interviews with college AnonymizedOrganization members. We identify access barriers and provide principled recommendations for designing future STEM education programs.

剪枝 · MoDELS · 語言模型化 · 大語言模型 · Performer ·

2024 年 1 月 29 日

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

George Chrysostomou,Zhixue Zhao,Miles Williams,Nikolaos Aletras

Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate. Hallucinations are concerning because they erode reliability and raise safety issues. Pruning is a technique that reduces model size by removing redundant weights, enabling more efficient sparse inference. Pruned models yield downstream task performance comparable to the original, making them ideal alternatives when operating on a limited budget. However, the effect that pruning has upon hallucinations in abstractive summarization with LLMs has yet to be explored. In this paper, we provide an extensive empirical study across five summarization datasets, two state-of-the-art pruning methods, and five instruction-tuned LLMs. Surprisingly, we find that hallucinations from pruned LLMs are less prevalent than the original models. Our analysis suggests that pruned models tend to depend more on the source document for summary generation. This leads to a higher lexical overlap between the generated summary and the source document, which could be a reason for the reduction in hallucination risk.

操作 · 講稿 · 跡 · 流 · 泛函 ·

2024 年 1 月 29 日

A Fully Compositional Theory of Sequential Digital Circuits: Denotational, Operational and Algebraic Semantics

Dan R. Ghica,George Kaye,David Sprunger

from arxiv, Improved content and presentation, 31 pages

Digital circuits, despite having been studied for nearly a century and used at scale for about half that time, have until recently evaded a fully compositional theoretical understanding, in which arbitrary circuits may be freely composed together without consulting their internals. Recent work remedied this theoretical shortcoming by showing how digital circuits can be presented compositionally as morphisms in a freely generated symmetric traced category. However, this was done informally; in this paper we refine and expand the previous work in several ways, culminating in the presentation of three sound and complete semantics for digital circuits: denotational, operational and algebraic. For the denotational semantics, we establish a correspondence between stream functions with certain properties and circuits constructed syntactically. For the operational semantics, we present the reductions required to model how a circuit processes a value, including the addition of a new reduction for eliminating non-delay-guarded feedback; this leads to an adequate notion of observational equivalence for digital circuits. Finally, we define a new family of equations for translating circuits into bisimilar circuits of a 'normal form', leading to a complete algebraic semantics for sequential circuits

ATS · Networking · MoDELS · FT · Extensibility ·

2024 年 1 月 29 日

Querying Fault and Attack Trees: Property Specification on a Water Network

Stefano M. Nicoletti,Milan Lopuha?-Zwakenberg,E. Moritz Hahn,Mari?lle Stoelinga

We provide an overview of three different query languages whose objective is to specify properties on the highly popular formalisms of fault trees (FTs) and attack trees (ATs). These are BFL, a Boolean Logic for FTs, PFL, a probabilistic extension of BFL and ATM, a logic for security metrics on ATs. We validate the framework composed by these three logics by applying them to the case study of a water distribution network. We extend the FT for this network - found in the literature - and we propose to model the system under analysis with the Fault Trees/Attack Trees (FT/ATs) formalism, combining both FTs and ATs in a unique model. Furthermore, we propose a novel combination of the showcased logics to account for queries that jointly consider both the FT and the AT of the model, integrating influences of attacks on failure probabilities of different components. Finally, we extend the domain specific language for PFL with novel constructs to capture the interplay between metrics of attacks - e.g., "cost", success probabilities - and failure probabilities in the system.

Automator · 正則的 · MoDELS · 多樣性 · 生成模型 ·

2024 年 1 月 28 日

PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models

Simin Chen,Xiaoning Feng,Xiaohong Han,Cong Liu,Wei Yang

from arxiv, This paper has been accepted to The ACM International Conference on the Foundations of Software Engineering FSE 2024

In recent times, a plethora of Large Code Generation Models (LCGMs) have been proposed, showcasing significant potential in assisting developers with complex programming tasks. Benchmarking LCGMs necessitates the creation of a set of diverse programming problems, and each problem comprises the prompt (including the task description), canonical solution, and test inputs. The existing methods for constructing such a problem set can be categorized into two main types: manual methods and perturbation-based methods. However, manual methods demand high effort and lack scalability, while also risking data integrity due to LCGMs' potentially contaminated data collection, and perturbation-based approaches mainly generate semantically homogeneous problems with the same canonical solutions and introduce typos that can be easily auto-corrected by IDE, making them ineffective and unrealistic. In this work, we propose the idea of programming problem merging (PPM) and provide two implementation of this idea, we utilize our tool on two widely-used datasets and compare it against nine baseline methods using eight code generation models. The results demonstrate the effectiveness of our tool in generating more challenging, diverse, and natural programming problems, comparing to the baselines.

過濾式方法 · FAST · Performer · 講稿 · SimPLe ·

2024 年 1 月 27 日

Stabilization and Variations to the Adaptive Local Iterative Filtering Algorithm: the Fast Resampled Iterative Filtering Method

Giovanni Barbarino,Antonio Cicone

Non-stationary signals are ubiquitous in real life. Many techniques have been proposed in the last decades which allow decomposing multi-component signals into simple oscillatory mono-components, like the groundbreaking Empirical Mode Decomposition technique and the Iterative Filtering method. When a signal contains mono-components that have rapid varying instantaneous frequencies, we can think, for instance, to chirps or whistles, it becomes particularly hard for most techniques to properly factor out these components. The Adaptive Local Iterative Filtering technique has recently gained interest in many applied fields of research for being able to deal with non-stationary signals presenting amplitude and frequency modulation. In this work, we address the open question of how to guarantee a priori convergence of this technique, and propose two new algorithms. The first method, called Stable Adaptive Local Iterative Filtering, is a stabilized version of the Adaptive Local Iterative Filtering that we prove to be always convergent. The stability, however, comes at the cost of higher complexity in the calculations. The second technique, called Resampled Iterative Filtering, is a new generalization of the Iterative Filtering method. We prove that Resampled Iterative Filtering is guaranteed to converge a priori for any kind of signal. Furthermore, in the discrete setting, by leveraging on the mathematical properties of the matrices involved, we show that its calculations can be accelerated drastically. Finally, we present some artificial and real-life examples to show the powerfulness and performance of the proposed methods.

簇 · MoDELS · 周期的 · Continuity · 講稿 ·

2024 年 1 月 26 日

Clustering Longitudinal Ordinal Data via Finite Mixture of Matrix-Variate Distributions

Francesco Amato,Julien Jacques,Isabelle Prim-Allaz

In social sciences, studies are often based on questionnaires asking participants to express ordered responses several times over a study period. We present a model-based clustering algorithm for such longitudinal ordinal data. Assuming that an ordinal variable is the discretization of a underlying latent continuous variable, the model relies on a mixture of matrix-variate normal distributions, accounting simultaneously for within- and between-time dependence structures. The model is thus able to concurrently model the heterogeneity, the association among the responses and the temporal dependence structure. An EM algorithm is developed and presented for parameters estimation. An evaluation of the model through synthetic data shows its estimation abilities and its advantages when compared to competitors. A real-world application concerning changes in eating behaviours during the Covid-19 pandemic period in France will be presented.

圖 · Networking · 學成 · Performer · 深度學習 ·

2020 年 10 月 9 日

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Emanuele Rossi,Ben Chamberlain,Fabrizio Frasca,Davide Eynard,Federico Monti,Michael Bronstein

Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.

語言模型化 · Processing（編程語言） · 機器閱讀理解 · Cognition · MoDELS ·

2020 年 5 月 13 日

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

Zhuosheng Zhang,Hai Zhao,Rui Wang

from arxiv, 51 pages

Machine reading comprehension (MRC) aims to teach machines to read and comprehend human languages, which is a long-standing goal of natural language processing (NLP). With the burst of deep neural networks and the evolution of contextualized language models (CLMs), the research of MRC has experienced two significant breakthroughs. MRC and CLM, as a phenomenon, have a great impact on the NLP community. In this survey, we provide a comprehensive and comparative review on MRC covering overall research topics about 1) the origin and development of MRC and CLM, with a particular focus on the role of CLMs; 2) the impact of MRC and CLM to the NLP community; 3) the definition, datasets, and evaluation of MRC; 4) general MRC architecture and technical methods in the view of two-stage Encoder-Decoder solving architecture from the insights of the cognitive process of humans; 5) previous highlights, emerging topics, and our empirical analysis, among which we especially focus on what works in different periods of MRC researches. We propose a full-view categorization and new taxonomies on these topics. The primary views we have arrived at are that 1) MRC boosts the progress from language processing to understanding; 2) the rapid improvement of MRC systems greatly benefits from the development of CLMs; 3) the theme of MRC is gradually moving from shallow text matching to cognitive reasoning.

同義詞集 · 知識庫 · 基 · INFORMS · 極小點 ·

2019 年 12 月 4 日

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets

Fanchao Qi,Liang Chang,Maosong Sun,Sicong Ouyang,Zhiyuan Liu

from arxiv, Accepted by AAAI Conference on Artificial Intelligence 2020 for oral presentation

A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread utilization. To address the issue, we propose to build a unified sememe KB for multiple languages based on BabelNet, a multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It manually annotates sememes for over $15$ thousand synsets (the entries of BabelNet). Then, we present a novel task of automatic sememe prediction for synsets, aiming to expand the seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative analyses to explore important factors and difficulties in the task. All the source code and data of this work can be obtained on //github.com/thunlp/BabelNet-Sememe-Prediction.