在线亚洲91SE亚洲综合在线_欧美成人亚洲国产中文精品_精品人妻中文字幕无码二区三区_亚洲精品国产精品国在线_女高中生被强奷到抽搐的激情视频_久久免费久久精品久久免费精品_精品一区二区三区蜜臀牛牛

Emotion recognition from speech and music shares similarities due to their acoustic overlap, which has led to interest in transferring knowledge between these domains. However, the shared acoustic cues between speech and music, particularly those encoded by Self-Supervised Learning (SSL) models, remain largely unexplored, given the fact that SSL models for speech and music have rarely been applied in cross-domain research. In this work, we revisit the acoustic similarity between emotion speech and music, starting with an analysis of the layerwise behavior of SSL models for Speech Emotion Recognition (SER) and Music Emotion Recognition (MER). Furthermore, we perform cross-domain adaptation by comparing several approaches in a two-stage fine-tuning process, examining effective ways to utilize music for SER and speech for MER. Lastly, we explore the acoustic similarities between emotional speech and music using Frechet audio distance for individual emotions, uncovering the issue of emotion bias in both speech and music SSL models. Our findings reveal that while speech and music SSL models do capture shared acoustic features, their behaviors can vary depending on different emotions due to their training strategies and domain-specificities. Additionally, parameter-efficient fine-tuning can enhance SER and MER performance by leveraging knowledge from each other. This study provides new insights into the acoustic similarity between emotional speech and music, and highlights the potential for cross-domain generalization to improve SER and MER systems.

相關內容

相似度

關注 2

可約的 · 數據縮減 · Networking · Performer · INFORMS ·

2024 年 11 月 4 日

Reducing Communication Overhead in the IoT-Edge-Cloud Continuum: A Survey on Protocols and Data Reduction Strategies

Dora Krekovi?,Petar Krivi?,Ivana Podnar ?arko,Mario Ku?ek,Danh Le-Phuoc

The adoption of the Internet of Things (IoT) deployments has led to a sharp increase in network traffic as a vast number of IoT devices communicate with each other and IoT services through the IoT-edge-cloud continuum. This network traffic increase poses a major challenge to the global communications infrastructure since it hinders communication performance and also puts significant strain on the energy consumption of IoT devices. To address these issues, efficient and collaborative IoT solutions which enable information exchange while reducing the transmitted data and associated network traffic are crucial. This survey provides a comprehensive overview of the communication technologies and protocols as well as data reduction strategies that contribute to this goal. First, we present a comparative analysis of prevalent communication technologies in the IoT domain, highlighting their unique characteristics and exploring the potential for protocol composition and joint usage to enhance overall communication efficiency within the IoT-edge-cloud continuum. Next, we investigate various data traffic reduction techniques tailored to the IoT-edge-cloud context and evaluate their applicability and effectiveness on resource-constrained and devices. Finally, we investigate the emerging concepts that have the potential to further reduce the communication overhead in the IoT-edge-cloud continuum, including cross-layer optimization strategies and Edge AI techniques for IoT data reduction. The paper offers a comprehensive roadmap for developing efficient and scalable solutions across the layers of the IoT-edge-cloud continuum that are beneficial for real-time processing to alleviate network congestion in complex IoT environments.

通道 · 估計/估計量 · Neural Networks · 正交 · 優化器 ·

2024 年 11 月 4 日

RSRP Measurement Based Channel Autocorrelation Estimation for IRS-Aided Wideband Communication

He Sun,Lipeng Zhu,Weidong Mei,Rui Zhang

The passive and frequency-flat reflection of IRS, as well as the high-dimensional IRS-reflected channels, have posed significant challenges for efficient IRS channel estimation, especially in wideband communication systems with significant multi-path channel delay spread. To address these challenges, we propose a novel neural network (NN)-empowered framework for IRS channel autocorrelation matrix estimation in wideband orthogonal frequency division multiplexing (OFDM) systems. This framework relies only on the easily accessible reference signal received power (RSRP) measurements at users in existing wideband communication systems, without requiring additional pilot transmission. Based on the estimates of channel autocorrelation matrix, the passive reflection of IRS is optimized to maximize the average user received signal-to-noise ratio (SNR) over all subcarriers in the OFDM system. Numerical results verify that the proposed algorithm significantly outperforms existing powermeasurement-based IRS reflection designs in wideband channels.

語音識別 · 可理解性 · MoDELS · 大語言模型 · Extensibility ·

2024 年 11 月 4 日

Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM

Robin Shing-Hei Yuen,Timothy Tin-Long Tse,Jian Zhu

from arxiv, Updated for reviewer comments

Current speech-based LLMs are predominantly trained on extensive ASR and TTS datasets, excelling in tasks related to these domains. However, their ability to handle direct speech-to-speech conversations remains notably constrained. These models often rely on an ASR-to-TTS chain-of-thought pipeline, converting speech into text for processing before generating audio responses, which introduces latency and loses audio features. We propose a method that implicitly internalizes ASR chain of thought into a speech LLM, enhancing its native speech understanding capabilities. Our approach reduces latency and improves the model's native understanding of speech, paving the way for more efficient and natural real-time audio interactions. We also release a large-scale synthetic conversational dataset to facilitate further research.

語音識別 · MoDELS · 自動語音識別 · Performance · 錯誤率 ·

2024 年 11 月 3 日

Leveraging Self-Supervised Models for Automatic Whispered Speech Recognition

Aref Farhadipour,Homa Asadi,Volker Dellwo

from arxiv, This paper was accepted at the ICCKE2024 conference

In automatic speech recognition, any factor that alters the acoustic properties of speech can pose a challenge to the system's performance. This paper presents a novel approach for automatic whispered speech recognition in the Irish dialect using the self-supervised WavLM model. Conventional automatic speech recognition systems often fail to accurately recognise whispered speech due to its distinct acoustic properties and the scarcity of relevant training data. To address this challenge, we utilized a pre-trained WavLM model, fine-tuned with a combination of whispered and normal speech data from the wTIMIT and CHAINS datasets, which include the English language in Singaporean and Irish dialects, respectively. Our baseline evaluation with the OpenAI Whisper model highlighted its limitations, achieving a Word Error Rate (WER) of 18.8% and a Character Error Rate (CER) of 4.24% on whispered speech. In contrast, the proposed WavLM-based system significantly improved performance, achieving a WER of 9.22% and a CER of 2.59%. These results demonstrate the efficacy of our approach in recognising whispered speech and underscore the importance of tailored acoustic modeling for robust automatic speech recognition systems. This study provides valuable insights into developing effective automatic speech recognition solutions for challenging speech affected by whisper and dialect. The source codes for this paper are freely available.

Learning · Networking · MoDELS · 邊 · 聯邦學習 ·

2024 年 11 月 2 日

False Data Injection Attack Detection in Edge-based Smart Metering Networks with Federated Learning

Md Raihan Uddin,Ratun Rahman,Dinh C. Nguyen

Smart metering networks are increasingly susceptible to cyber threats, where false data injection (FDI) appears as a critical attack. Data-driven-based machine learning (ML) methods have shown immense benefits in detecting FDI attacks via data learning and prediction abilities. Literature works have mostly focused on centralized learning and deploying FDI attack detection models at the control center, which requires data collection from local utilities like meters and transformers. However, this data sharing may raise privacy concerns due to the potential disclosure of household information like energy usage patterns. This paper proposes a new privacy-preserved FDI attack detection by developing an efficient federated learning (FL) framework in the smart meter network with edge computing. Distributed edge servers located at the network edge run an ML-based FDI attack detection model and share the trained model with the grid operator, aiming to build a strong FDI attack detection model without data sharing. Simulation results demonstrate the efficiency of our proposed FL method over the conventional method without collaboration.

Learning · Agent · 深度強化學習 · 強化學習 · 代價 ·

2024 年 11 月 1 日

Enhancing Adaptive Mixed-Criticality Scheduling with Deep Reinforcement Learning

Bruno Mendes,Pedro F. Souto,Pedro C. Diniz

from arxiv, Version submitted to RTNS 2024, on 17/08/2024 (with some typos fixed)

Adaptive Mixed-Criticality (AMC) is a fixed-priority preemptive scheduling algorithm for mixed-criticality hard real-time systems. It dominates many other scheduling algorithms for mixed-criticality systems, but does so at the cost of occasionally dropping jobs of less important/critical tasks, when low-priority jobs overrun their time budgets. In this paper we enhance AMC with a deep reinforcement learning (DRL) approach based on a Deep-Q Network. The DRL agent is trained off-line, and at run-time adjusts the low-criticality budgets of tasks to avoid budget overruns, while ensuring that no job misses its deadline if it does not overrun its budget. We have implemented and evaluated this approach by simulating realistic workloads from the automotive domain. The results show that the agent is able to reduce budget overruns by at least up to 50%, even when the budget of each task is chosen based on sampling the distribution of its execution time. To the best of our knowledge, this is the first use of DRL in AMC reported in the literature.

Weight · 控制器 · 估計/估計量 · 情景 · MoDELS ·

2024 年 11 月 1 日

Deep-Learning Estimation of Weight Distribution Using Joint Kinematics for Lower-Limb Exoskeleton Control

Clément Lhoste,Emek Bar?? Kü?üktabak,Lorenzo Vianello,Lorenzo Amato,Matthew R. Short,Kevin Lynch,Jose L. Pons

In the control of lower-limb exoskeletons with feet, the phase in the gait cycle can be identified by monitoring the weight distribution at the feet. This phase information can be used in the exoskeleton's controller to compensate the dynamics of the exoskeleton and to assign impedance parameters. Typically the weight distribution is calculated using data from sensors such as treadmill force plates or insole force sensors. However, these solutions increase both the setup complexity and cost. For this reason, we propose a deep-learning approach that uses a short time window of joint kinematics to predict the weight distribution of an exoskeleton in real time. The model was trained on treadmill walking data from six users wearing a four-degree-of-freedom exoskeleton and tested in real time on three different users wearing the same device. This test set includes two users not present in the training set to demonstrate the model's ability to generalize across individuals. Results show that the proposed method is able to fit the actual weight distribution with R2=0.9 and is suitable for real-time control with prediction times less than 1 ms. Experiments in closed-loop exoskeleton control show that deep-learning-based weight distribution estimation can be used to replace force sensors in overground and treadmill walking.

穩健性 · DeepFakes · MoDELS · 數據集 · 多樣性 ·

2024 年 10 月 31 日

I Can Hear You: Selective Robust Training for Deepfake Audio Detection

Zirui Zhang,Wei Hao,Aroon Sankoh,William Lin,Emanuel Mendiola-Ortiz,Junfeng Yang,Chengzhi Mao

Recent advances in AI-generated voices have intensified the challenge of detecting deepfake audio, posing risks for scams and the spread of disinformation. To tackle this issue, we establish the largest public voice dataset to date, named DeepFakeVox-HQ, comprising 1.3 million samples, including 270,000 high-quality deepfake samples from 14 diverse sources. Despite previously reported high accuracy, existing deepfake voice detectors struggle with our diversely collected dataset, and their detection success rates drop even further under realistic corruptions and adversarial attacks. We conduct a holistic investigation into factors that enhance model robustness and show that incorporating a diversified set of voice augmentations is beneficial. Moreover, we find that the best detection models often rely on high-frequency features, which are imperceptible to humans and can be easily manipulated by an attacker. To address this, we propose the F-SAT: Frequency-Selective Adversarial Training method focusing on high-frequency components. Empirical results demonstrate that using our training dataset boosts baseline model performance (without robust training) by 33%, and our robust training further improves accuracy by 7.7% on clean samples and by 29.3% on corrupted and attacked samples, over the state-of-the-art RawNet3 model.

設計 · Processing（編程語言） · 變分自編碼 · 自編碼器 · MoDELS ·

2024 年 10 月 31 日

Univariate Conditional Variational Autoencoder for Morphogenic Patterns Design in Frontal Polymerization-Based Manufacturing

Qibang Liu,Pengfei Cai,Diab Abueidda,Sagar Vyas,Seid Koric,Rafael Gomez-Bombarelli,Philippe Geubelle

Under some initial and boundary conditions, the rapid reaction-thermal diffusion process taking place during frontal polymerization (FP) destabilizes the planar mode of front propagation, leading to spatially varying, complex hierarchical patterns in thermoset polymeric materials. Although modern reaction-diffusion models can predict the patterns resulting from unstable FP, the inverse design of patterns, which aims to retrieve process conditions that produce a desired pattern, remains an open challenge due to the non-unique and non-intuitive mapping between process conditions and manufactured patterns. In this work, we propose a probabilistic generative model named univariate conditional variational autoencoder (UcVAE) for the inverse design of hierarchical patterns in FP-based manufacturing. Unlike the cVAE, which encodes both the design space and the design target, the UcVAE encodes only the design space. In the encoder of the UcVAE, the number of training parameters is significantly reduced compared to the cVAE, resulting in a shorter training time while maintaining comparable performance. Given desired pattern images, the trained UcVAE can generate multiple process condition solutions that produce high-fidelity hierarchical patterns.

圖 · Networking · Processing（編程語言） · 圖卷積 · 圖卷積神經網絡/圖卷積網絡 ·

2021 年 12 月 27 日

Powerful Graph Convolutioal Networks with Adaptive Propagation Mechanism for Homophily and Heterophily

Tao Wang,Rui Wang,Di Jin,Dongxiao He,Yuxiao Huang

Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.