五月丁香四月婷婷激情综合,精品激情在线观看视频,国产精品一区二区色,一区二区不卡不卡高清在线

This paper introduces PRobELM (Plausibility Ranking Evaluation for Language Models), a benchmark designed to assess language models' ability to discern more plausible from less plausible scenarios through their parametric knowledge. While benchmarks such as TruthfulQA emphasise factual accuracy or truthfulness, and others such as COPA explore plausible scenarios without explicitly incorporating world knowledge, PRobELM seeks to bridge this gap by evaluating models' capabilities to prioritise plausible scenarios that leverage world knowledge over less plausible alternatives. This design allows us to assess the potential of language models for downstream use cases such as literature-based discovery where the focus is on identifying information that is likely but not yet known. Our benchmark is constructed from a dataset curated from Wikidata edit histories, tailored to align the temporal bounds of the training data for the evaluated models. PRobELM facilitates the evaluation of language models across multiple prompting types, including statement, text completion, and question-answering. Experiments with 10 models of various sizes and architectures on the relationship between model scales, training recency, and plausibility performance, reveal that factual accuracy does not directly correlate with plausibility performance and that up-to-date training data enhances plausibility assessment across different model architectures.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 多樣性 · 模型評估 · 語音識別 · 評論員 ·

2024 年 10 月 2 日

VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka

Li-Wei Chen,Hung-Shin Lee,Chen-Chi Chang

from arxiv, Accepted to O-COCOSDA 2024

This paper introduces VoxHakka, a text-to-speech (TTS) system designed for Taiwanese Hakka, a critically under-resourced language spoken in Taiwan. Leveraging the YourTTS framework, VoxHakka achieves high naturalness and accuracy and low real-time factor in speech synthesis while supporting six distinct Hakka dialects. This is achieved by training the model with dialect-specific data, allowing for the generation of speaker-aware Hakka speech. To address the scarcity of publicly available Hakka speech corpora, we employed a cost-effective approach utilizing a web scraping pipeline coupled with automatic speech recognition (ASR)-based data cleaning techniques. This process ensured the acquisition of a high-quality, multi-speaker, multi-dialect dataset suitable for TTS training. Subjective listening tests conducted using comparative mean opinion scores (CMOS) demonstrate that VoxHakka significantly outperforms existing publicly available Hakka TTS systems in terms of pronunciation accuracy, tone correctness, and overall naturalness. This work represents a significant advancement in Hakka language technology and provides a valuable resource for language preservation and revitalization efforts.

語言模型化 · 數據集 · MoDELS · 知識 (knowledge) · 大語言模型 ·

2024 年 9 月 30 日

LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models

Haitao Li,You Chen,Qingyao Ai,Yueyue Wu,Ruizhe Zhang,Yiqun Liu

from arxiv, NeurIPs 2024

Large language models (LLMs) have made significant progress in natural language processing tasks and demonstrate considerable potential in the legal domain. However, legal applications demand high standards of accuracy, reliability, and fairness. Applying existing LLMs to legal systems without careful evaluation of their potential and limitations could pose significant risks in legal practice. To this end, we introduce a standardized comprehensive Chinese legal benchmark LexEval. This benchmark is notable in the following three aspects: (1) Ability Modeling: We propose a new taxonomy of legal cognitive abilities to organize different tasks. (2) Scale: To our knowledge, LexEval is currently the largest Chinese legal evaluation dataset, comprising 23 tasks and 14,150 questions. (3) Data: we utilize formatted existing datasets, exam datasets and newly annotated datasets by legal experts to comprehensively evaluate the various capabilities of LLMs. LexEval not only focuses on the ability of LLMs to apply fundamental legal knowledge but also dedicates efforts to examining the ethical issues involved in their application. We evaluated 38 open-source and commercial LLMs and obtained some interesting findings. The experiments and findings offer valuable insights into the challenges and potential solutions for developing Chinese legal systems and LLM evaluation pipelines. The LexEval dataset and leaderboard are publicly available at \url{//github.com/CSHaitao/LexEval} and will be continuously updated.

TRUSTLLM · 語言模型化 · MoDELS · Principle · 大語言模型 ·

2024 年 9 月 30 日

TrustLLM: Trustworthiness in Large Language Models

Yue Huang,Lichao Sun,Haoran Wang,Siyuan Wu,Qihui Zhang,Yuan Li,Chujie Gao,Yixin Huang,Wenhan Lyu,Yixuan Zhang,Xiner Li,Zhengliang Liu,Yixin Liu,Yijue Wang,Zhikun Zhang,Bertie Vidgen,Bhavya Kailkhura,Caiming Xiong,Chaowei Xiao,Chunyuan Li,Eric Xing,Furong Huang,Hao Liu,Heng Ji,Hongyi Wang,Huan Zhang,Huaxiu Yao,Manolis Kellis,Marinka Zitnik,Meng Jiang,Mohit Bansal,James Zou,Jian Pei,Jian Liu,Jianfeng Gao,Jiawei Han,Jieyu Zhao,Jiliang Tang,Jindong Wang,Joaquin Vanschoren,John Mitchell,Kai Shu,Kaidi Xu,Kai-Wei Chang,Lifang He,Lifu Huang,Michael Backes,Neil Zhenqiang Gong,Philip S. Yu,Pin-Yu Chen,Quanquan Gu,Ran Xu,Rex Ying,Shuiwang Ji,Suman Jana,Tianlong Chen,Tianming Liu,Tianyi Zhou,William Wang,Xiang Li,Xiangliang Zhang,Xiao Wang,Xing Xie,Xun Chen,Xuyu Wang,Yan Liu,Yanfang Ye,Yinzhi Cao,Yong Chen,Yue Zhao

from arxiv, This work is still under work and we welcome your contribution

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

估計/估計量 · SLAM · INFORMS · entity · Continuity ·

2024 年 9 月 30 日

DynORecon: Dynamic Object Reconstruction for Navigation

Yiduo Wang,Jesse Morris,Lan Wu,Teresa Vidal-Calleja,Viorela Ila

from arxiv, 7 pages, 6 figures, submitted to ICRA 2025

This paper presents DynORecon, a Dynamic Object Reconstruction system that leverages the information provided by Dynamic SLAM to simultaneously generate a volumetric map of observed moving entities while estimating free space to support navigation. By capitalising on the motion estimations provided by Dynamic SLAM, DynORecon continuously refines the representation of dynamic objects to eliminate residual artefacts from past observations and incrementally reconstructs each object, seamlessly integrating new observations to capture previously unseen structures. Our system is highly efficient (~20 FPS) and produces accurate (~10 cm) reconstructions of dynamic objects using simulated and real-world outdoor datasets.

容差 · Performer · FAST · Networking · 代碼 ·

2024 年 9 月 29 日

Hamster: A Fast Synchronous Byzantine Fault Tolerance Protocol

Ximing Fu,Mo Li,Qingming Zeng,Tianyang Li,Shenghao Yang,Yonghui Guan,Chuanyi Liu

This paper introduces Hamster, a novel synchronous Byzantine Fault Tolerance protocol that achieves better performance and has weaker dependency on synchrony. Specifically, Hamster employs coding techniques to significantly decrease communication complexity and addresses coding related security issues. Consequently, Hamster achieves a throughput gain that increases linearly with the number of nodes, compared to Sync HotStuff. By adjusting the block size, Hamster outperforms Sync HotStuff in terms of both throughput and latency. Moreover, With minor modifications, Hamster can also function effectively in mobile sluggish environments, further reducing its dependency on strict synchrony. We implement Hamster and the experimental results demonstrate its performance advantages. Specifically, Hamster's throughput in a network of $9$ nodes is $2.5\times$ that of Sync HotStuff, and this gain increases to $10$ as the network scales to $65$ nodes.

MoDELS · 語言模型化 · Learning · ML · Machine Learning ·

2024 年 9 月 27 日

LML: Language Model Learning a Dataset for Data-Augmented Prediction

Praneeth Vadlapati

from arxiv, First version

This paper introduces a new approach to using Large Language Models (LLMs) for classification tasks, which are typically handled using Machine Learning (ML) models. Unlike ML models that rely heavily on data cleaning and feature engineering, this method streamlines the process using LLMs. This paper proposes a new concept called "Language Model Learning (LML)" powered by a new method called "Data-Augmented Prediction (DAP)". The classification is performed by LLMs using a method similar to humans manually exploring and understanding the data and deciding classifications using data as a reference. Training data is summarized and evaluated to determine the features that lead to the classification of each label the most. In the process of DAP, the system uses the data summary to automatically create a query, which is used to retrieve relevant rows from the dataset. A classification is generated by the LLM using data summary and relevant rows, ensuring satisfactory accuracy even with complex data. Usage of data summary and similar data in DAP ensures context-aware decision-making. The proposed method uses the words "Act as an Explainable Machine Learning Model" in the prompt to enhance the interpretability of the predictions by allowing users to review the logic behind each prediction. In some test cases, the system scored an accuracy above 90%, proving the effectiveness of the system and its potential to outperform conventional ML models in various scenarios. The code is available at //github.com/Pro-GenAI/LML-DAP

語音增強 · CC · Boosting（一種模型訓練加速方式） · 設計 · MoDELS ·

2024 年 9 月 27 日

Speech Boosting: Low-Latency Live Speech Enhancement for TWS Earbuds

Hanbin Bae,Pavel Andreev,Azat Saginbaev,Nicholas Babaev,Won-Jun Lee,Hosang Sung,Hoon-Young Cho

from arxiv, Accepted by Interspeech 2024

This paper introduces a speech enhancement solution tailored for true wireless stereo (TWS) earbuds on-device usage. The solution was specifically designed to support conversations in noisy environments, with active noise cancellation (ANC) activated. The primary challenges for speech enhancement models in this context arise from computational complexity that limits on-device usage and latency that must be less than 3 ms to preserve a live conversation. To address these issues, we evaluated several crucial design elements, including the network architecture and domain, design of loss functions, pruning method, and hardware-specific optimization. Consequently, we demonstrated substantial improvements in speech enhancement quality compared with that in baseline models, while simultaneously reducing the computational complexity and algorithmic latency.

文本分類 · 圖 · Neural Networks · 圖形處理器 · Networking ·

2023 年 4 月 27 日

Graph Neural Networks for Text Classification: A Survey

Kunze Wang,Yihao Ding,Soyeon Caren Han

from arxiv, 28 pages

Text Classification is the most essential and fundamental problem in Natural Language Processing. While numerous recent text classification models applied the sequential deep learning technique, graph neural network-based models can directly deal with complex structured text data and exploit global information. Many real text classification applications can be naturally cast into a graph, which captures words, documents, and corpus global features. In this survey, we bring the coverage of methods up to 2023, including corpus-level and document-level graph neural networks. We discuss each of these methods in detail, dealing with the graph construction mechanisms and the graph-based learning process. As well as the technological survey, we look at issues behind and future directions addressed in text classification using graph neural networks. We also cover datasets, evaluation metrics, and experiment design and present a summary of published performance on the publicly available benchmarks. Note that we present a comprehensive comparison between different techniques and identify the pros and cons of various evaluation metrics in this survey.

圖 · Processing（編程語言） · NLP · Neural Networks · 圖形處理器 ·

2021 年 6 月 10 日

Graph Neural Networks for Natural Language Processing: A Survey

Lingfei Wu,Yu Chen,Kai Shen,Xiaojie Guo,Hanning Gao,Shucheng Li,Jian Pei,Bo Long

from arxiv, 127 pages

Deep learning has become the dominant approach in coping with various tasks in Natural LanguageProcessing (NLP). Although text inputs are typically represented as a sequence of tokens, there isa rich variety of NLP problems that can be best expressed with a graph structure. As a result, thereis a surge of interests in developing new deep learning techniques on graphs for a large numberof NLP tasks. In this survey, we present a comprehensive overview onGraph Neural Networks(GNNs) for Natural Language Processing. We propose a new taxonomy of GNNs for NLP, whichsystematically organizes existing research of GNNs for NLP along three axes: graph construction,graph representation learning, and graph based encoder-decoder models. We further introducea large number of NLP applications that are exploiting the power of GNNs and summarize thecorresponding benchmark datasets, evaluation metrics, and open-source codes. Finally, we discussvarious outstanding challenges for making the full use of GNNs for NLP as well as future researchdirections. To the best of our knowledge, this is the first comprehensive overview of Graph NeuralNetworks for Natural Language Processing.

圖 · 學成 · 知識圖譜 · FreeBASIC · 強化學習 ·

2018 年 1 月 8 日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Wenhan Xiong,Thien Hoang,William Yang Wang

We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.