秋霞网一区二区三区,东京热无码一区二区三区无码,AAA国产黄色大片

Speech is the most natural way of expressing ourselves as humans. Identifying emotion from speech is a nontrivial task due to the ambiguous definition of emotion itself. Speaker Emotion Recognition (SER) is essential for understanding human emotional behavior. The SER task is challenging due to the variety of speakers, background noise, complexity of emotions, and speaking styles. It has many applications in education, healthcare, customer service, and Human-Computer Interaction (HCI). Previously, conventional machine learning methods such as SVM, HMM, and KNN have been used for the SER task. In recent years, deep learning methods have become popular, with convolutional neural networks and recurrent neural networks being used for SER tasks. The input of these methods is mostly spectrograms and hand-crafted features. In this work, we study the use of self-supervised transformer-based models, Wav2Vec2 and HuBERT, to determine the emotion of speakers from their voice. The models automatically extract features from raw audio signals, which are then used for the classification task. The proposed solution is evaluated on reputable datasets, including RAVDESS, SHEMO, SAVEE, AESDD, and Emo-DB. The results show the effectiveness of the proposed method on different datasets. Moreover, the model has been used for real-world applications like call center conversations, and the results demonstrate that the model accurately predicts emotions.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · Integration · Machine Learning · 可辨認的 · 優化器 ·

2024 年 12 月 18 日

Enhancing Rhetorical Figure Annotation: An Ontology-Based Web Application with RAG Integration

Ramona Kühn,Jelena Mitrovi?,Michael Granitzer

from arxiv, The 31st International Conference on Computational Linguistics (COLING 2025)

Rhetorical figures play an important role in our communication. They are used to convey subtle, implicit meaning, or to emphasize statements. We notice them in hate speech, fake news, and propaganda. By improving the systems for computational detection of rhetorical figures, we can also improve tasks such as hate speech and fake news detection, sentiment analysis, opinion mining, or argument mining. Unfortunately, there is a lack of annotated data, as well as qualified annotators that would help us build large corpora to train machine learning models for the detection of rhetorical figures. The situation is particularly difficult in languages other than English, and for rhetorical figures other than metaphor, sarcasm, and irony. To overcome this issue, we develop a web application called "Find your Figure" that facilitates the identification and annotation of German rhetorical figures. The application is based on the German Rhetorical ontology GRhOOT which we have specially adapted for this purpose. In addition, we improve the user experience with Retrieval Augmented Generation (RAG). In this paper, we present the restructuring of the ontology, the development of the web application, and the built-in RAG pipeline. We also identify the optimal RAG settings for our application. Our approach is one of the first to practically use rhetorical ontologies in combination with RAG and shows promising results.

Agent · 約束 · Integration · Processing（編程語言） · 可辨認的 ·

2024 年 12 月 18 日

ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning

Jie-Jing Shao,Xiao-Wen Yang,Bo-Wen Zhang,Baizhi Chen,Wen-Da Wei,Lan-Zhe Guo,Yu-feng Li

from arxiv, Webpage: //www.lamda.nju.edu.cn/shaojj/chinatravel

Recent advances in LLMs, particularly in language reasoning and tool integration, have rapidly sparked the real-world development of Language Agents. Among these, travel planning represents a prominent domain, combining academic challenges with practical value due to its complexity and market demand. However, existing benchmarks fail to reflect the diverse, real-world requirements crucial for deployment. To address this gap, we introduce ChinaTravel, a benchmark specifically designed for authentic Chinese travel planning scenarios. We collect the travel requirements from questionnaires and propose a compositionally generalizable domain-specific language that enables a scalable evaluation process, covering feasibility, constraint satisfaction, and preference comparison. Empirical studies reveal the potential of neuro-symbolic agents in travel planning, achieving a constraint satisfaction rate of 27.9%, significantly surpassing purely neural models at 2.6%. Moreover, we identify key challenges in real-world travel planning deployments, including open language reasoning and unseen concept composition. These findings highlight the significance of ChinaTravel as a pivotal milestone for advancing language agents in complex, real-world planning scenarios.

語言模型化 · 層 · MoDELS · 解碼 · 自助法/自舉法 ·

2024 年 12 月 17 日

MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning

Ningyuan Xi,Xiaoyu Wang,Yetao Wu,Teng Chen,Qingqing Gu,Yue Zhao,Jinxian Qu,Zhonglin Jiang,Yong Chen,Luo Ji

from arxiv, 19 pages, 7 figures

Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS which allows it to first consider the thoughts and then express the response based upon the query. We design several pipelines to annotate or generate the thought contents from prompt-response samples, then add language heads in a middle layer which behaves as the thinking layer. We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses. Both qualitative examples and quantitative results validate the effectiveness and performance of TaS. Our code is available at //anonymous.4open.science/r/TadE.

多峰值 · Learning · 均值 · Networking · MoDELS ·

2024 年 12 月 17 日

RCLMuFN: Relational Context Learning and Multiplex Fusion Network for Multimodal Sarcasm Detection

Tongguan Wang,Junkai Li,Guixin Su,Yongcheng Zhang,Dongyu Su,Yuxue Hu,Ying Sha

Sarcasm typically conveys emotions of contempt or criticism by expressing a meaning that is contrary to the speaker's true intent. Accurate detection of sarcasm aids in identifying and filtering undesirable information on the Internet, thereby reducing malicious defamation and rumor-mongering. Nonetheless, the task of automatic sarcasm detection remains highly challenging for machines, as it critically depends on intricate factors such as relational context. Most existing multimodal sarcasm detection methods focus on introducing graph structures to establish entity relationships between text and images while neglecting to learn the relational context between text and images, which is crucial evidence for understanding the meaning of sarcasm. In addition, the meaning of sarcasm changes with the evolution of different contexts, but existing methods may not be accurate in modeling such dynamic changes, limiting the generalization ability of the models. To address the above issues, we propose a relational context learning and multiplex fusion network (RCLMuFN) for multimodal sarcasm detection. Firstly, we employ four feature extractors to comprehensively extract features from raw text and images, aiming to excavate potential features that may have been previously overlooked. Secondly, we utilize the relational context learning module to learn the contextual information of text and images and capture the dynamic properties through shallow and deep interactions. Finally, we employ a multiplex feature fusion module to enhance the generalization of the model by penetratingly integrating multimodal features derived from various interaction contexts. Extensive experiments on two multimodal sarcasm detection datasets show that our proposed method achieves state-of-the-art performance.

INTERACT · 極大 · 黑盒 · MoDELS · 可理解性 ·

2024 年 12 月 16 日

PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization

Ruoxi Cheng,Yizhong Ding,Shuirong Cao,Ranjie Duan,Xiaoshuang Jia,Shaowei Yuan,Zhiqiang Wang,Xiaojun Jia

from arxiv, Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization

Understanding the vulnerabilities of Large Vision Language Models (LVLMs) to jailbreak attacks is essential for their responsible real-world deployment. Most previous work requires access to model gradients, or is based on human knowledge (prompt engineering) to complete jailbreak, and they hardly consider the interaction of images and text, resulting in inability to jailbreak in black box scenarios or poor performance. To overcome these limitations, we propose a Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for toxicity maximization, referred to as PBI-Attack. Our method begins by extracting malicious features from a harmful corpus using an alternative LVLM and embedding these features into a benign image as prior information. Subsequently, we enhance these features through bidirectional cross-modal interaction optimization, which iteratively optimizes the bimodal perturbations in an alternating manner through greedy search, aiming to maximize the toxicity of the generated response. The toxicity level is quantified using a well-trained evaluation model. Experiments demonstrate that PBI-Attack outperforms previous state-of-the-art jailbreak methods, achieving an average attack success rate of 92.5% across three open-source LVLMs and around 67.3% on three closed-source LVLMs. Disclaimer: This paper contains potentially disturbing and offensive content.

自動問答 · 多樣性 · 基準 · Processing（編程語言） · 相同 ·

2024 年 12 月 16 日

SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types

Xuanliang Zhang,Dingzirui Wang,Baoxin Wang,Longxu Dou,Xinyuan Lu,Keyan Xu,Dayong Wu,Qingfu Zhu,Wanxiang Che

Scientific question answering (SQA) is an important task aimed at answering questions based on papers. However, current SQA datasets have limited reasoning types and neglect the relevance between tables and text, creating a significant gap with real scenarios. To address these challenges, we propose a QA benchmark for scientific tables and text with diverse reasoning types (SciTaT). To cover more reasoning types, we summarize various reasoning types from real-world questions. To involve both tables and text, we require the questions to incorporate tables and text as much as possible. Based on SciTaT, we propose a strong baseline (CaR), which combines various reasoning methods to address different reasoning types and process tables and text at the same time. CaR brings average improvements of 12.9% over other baselines on SciTaT, validating its effectiveness. Error analysis reveals the challenges of SciTaT, such as complex numerical calculations and domain knowledge.

語言模型化 · MoDELS · CRAFT · 大語言模型 · INFORMS ·

2024 年 12 月 15 日

SpearBot: Leveraging Large Language Models in a Generative-Critique Framework for Spear-Phishing Email Generation

Qinglin Qi,Yun Luo,Yijia Xu,Wenbo Guo,Yong Fang

Large Language Models (LLMs) are increasingly capable, aiding in tasks such as content generation, yet they also pose risks, particularly in generating harmful spear-phishing emails. These emails, crafted to entice clicks on malicious URLs, threaten personal information security. This paper proposes an adversarial framework, SpearBot, which utilizes LLMs to generate spear-phishing emails with various phishing strategies. Through specifically crafted jailbreak prompts, SpearBot circumvents security policies and introduces other LLM instances as critics. When a phishing email is identified by the critic, SpearBot refines the generated email based on the critique feedback until it can no longer be recognized as phishing, thereby enhancing its deceptive quality. To evaluate the effectiveness of SpearBot, we implement various machine-based defenders and assess how well the phishing emails generated could deceive them. Results show these emails often evade detection to a large extent, underscoring their deceptive quality. Additionally, human evaluations of the emails' readability and deception are conducted through questionnaires, confirming their convincing nature and the significant potential harm of the generated phishing emails.

控制器 · 語言模型化 · MoDELS · 樣本 · 多峰值 ·

2024 年 12 月 14 日

Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models

Qingni Wang,Tiantian Geng,Zhiyuan Wang,Teng Wang,Bo Fu,Feng Zheng

Multimodal Large Language Models (MLLMs) exhibit promising advancements across various tasks, yet they still encounter significant trustworthiness issues. Prior studies apply Split Conformal Prediction (SCP) in language modeling to construct prediction sets with statistical guarantees. However, these methods typically rely on internal model logits or are restricted to multiple-choice settings, which hampers their generalizability and adaptability in dynamic, open-ended environments. In this paper, we introduce TRON, a two-step framework for risk control and assessment, applicable to any MLLM that supports sampling in both open-ended and closed-ended scenarios. TRON comprises two main components: (1) a novel conformal score to sample response sets of minimum size, and (2) a nonconformity score to identify high-quality responses based on self-consistency theory, controlling the error rates by two specific risk levels. Furthermore, we investigate semantic redundancy in prediction sets within open-ended contexts for the first time, leading to a promising evaluation metric for MLLMs based on average set size. Our comprehensive experiments across four Video Question-Answering (VideoQA) datasets utilizing eight MLLMs show that TRON achieves desired error rates bounded by two user-specified risk levels. Additionally, deduplicated prediction sets maintain adaptiveness while being more efficient and stable for risk assessment under different risk levels.

設計 · Sphering · TOOLS · 論文 · motivation ·

2024 年 12 月 13 日

Cultivating a Supportive Sphere: Designing Technology to Increase Social Support for Foster-Involved Youth

Ila Kumar,Craig Ferguson,Jiayi Wu,Rosalind W Picard

from arxiv, Accepted for publication in the 28th ACM SIGCHI Conference on Computer-Supported Cooperative Work & Social Computing (CSCW 2025)

Approximately 400,000 youth in the US are living in foster care due to experiences with abuse or neglect at home. For multiple reasons, these youth often don't receive adequate social support from those around them. Despite technology's potential, very little work has explored how these tools can provide more support to foster-involved youth. To begin to fill this gap, we worked with current and former foster-involved youth to develop the first digital tool that aims to increase social support for this population, creating a novel system in which users complete reflective check-ins in an online community setting. We then conducted a pilot study with 15 current and former foster-involved youth, comparing the effect of using the app for two weeks to two weeks of no intervention. We collected qualitative and quantitative data, which demonstrated that this type of interface can provide youth with types of social support that are often not provided by foster care services and other digital interventions. The paper details the motivation behind the app, the trauma-informed design process, and insights gained from this initial evaluation study. Finally, the paper concludes with recommendations for designing digital tools that effectively provide social support to foster-involved youth.

XAI · 有向 · CARS · Taxonomy · state-of-the-art ·

2021 年 12 月 21 日

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions

Shahin Atakishiyev,Mohammad Salameh,Hengshuai Yao,Randy Goebel

from arxiv, 18 pages

Autonomous driving has achieved a significant milestone in research and development over the last decade. There is increasing interest in the field as the deployment of self-operating vehicles on roads promises safer and more ecologically friendly transportation systems. With the rise of computationally powerful artificial intelligence (AI) techniques, autonomous vehicles can sense their environment with high precision, make safe real-time decisions, and operate more reliably without human interventions. However, intelligent decision-making in autonomous cars is not generally understandable by humans in the current state of the art, and such deficiency hinders this technology from being socially acceptable. Hence, aside from making safe real-time decisions, the AI systems of autonomous vehicles also need to explain how these decisions are constructed in order to be regulatory compliant across many jurisdictions. Our study sheds a comprehensive light on developing explainable artificial intelligence (XAI) approaches for autonomous vehicles. In particular, we make the following contributions. First, we provide a thorough overview of the present gaps with respect to explanations in the state-of-the-art autonomous vehicle industry. We then show the taxonomy of explanations and explanation receivers in this field. Thirdly, we propose a framework for an architecture of end-to-end autonomous driving systems and justify the role of XAI in both debugging and regulating such systems. Finally, as future research directions, we provide a field guide on XAI approaches for autonomous driving that can improve operational safety and transparency towards achieving public approval by regulators, manufacturers, and all engaged stakeholders.