亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Emotion recognition from speech and music shares similarities due to their acoustic overlap, which has led to interest in transferring knowledge between these domains. However, the shared acoustic cues between speech and music, particularly those encoded by Self-Supervised Learning (SSL) models, remain largely unexplored, given the fact that SSL models for speech and music have rarely been applied in cross-domain research. In this work, we revisit the acoustic similarity between emotion speech and music, starting with an analysis of the layerwise behavior of SSL models for Speech Emotion Recognition (SER) and Music Emotion Recognition (MER). Furthermore, we perform cross-domain adaptation by comparing several approaches in a two-stage fine-tuning process, examining effective ways to utilize music for SER and speech for MER. Lastly, we explore the acoustic similarities between emotional speech and music using Frechet audio distance for individual emotions, uncovering the issue of emotion bias in both speech and music SSL models. Our findings reveal that while speech and music SSL models do capture shared acoustic features, their behaviors can vary depending on different emotions due to their training strategies and domain-specificities. Additionally, parameter-efficient fine-tuning can enhance SER and MER performance by leveraging knowledge from each other. This study provides new insights into the acoustic similarity between emotional speech and music, and highlights the potential for cross-domain generalization to improve SER and MER systems.

相關內容

The adoption of the Internet of Things (IoT) deployments has led to a sharp increase in network traffic as a vast number of IoT devices communicate with each other and IoT services through the IoT-edge-cloud continuum. This network traffic increase poses a major challenge to the global communications infrastructure since it hinders communication performance and also puts significant strain on the energy consumption of IoT devices. To address these issues, efficient and collaborative IoT solutions which enable information exchange while reducing the transmitted data and associated network traffic are crucial. This survey provides a comprehensive overview of the communication technologies and protocols as well as data reduction strategies that contribute to this goal. First, we present a comparative analysis of prevalent communication technologies in the IoT domain, highlighting their unique characteristics and exploring the potential for protocol composition and joint usage to enhance overall communication efficiency within the IoT-edge-cloud continuum. Next, we investigate various data traffic reduction techniques tailored to the IoT-edge-cloud context and evaluate their applicability and effectiveness on resource-constrained and devices. Finally, we investigate the emerging concepts that have the potential to further reduce the communication overhead in the IoT-edge-cloud continuum, including cross-layer optimization strategies and Edge AI techniques for IoT data reduction. The paper offers a comprehensive roadmap for developing efficient and scalable solutions across the layers of the IoT-edge-cloud continuum that are beneficial for real-time processing to alleviate network congestion in complex IoT environments.

The passive and frequency-flat reflection of IRS, as well as the high-dimensional IRS-reflected channels, have posed significant challenges for efficient IRS channel estimation, especially in wideband communication systems with significant multi-path channel delay spread. To address these challenges, we propose a novel neural network (NN)-empowered framework for IRS channel autocorrelation matrix estimation in wideband orthogonal frequency division multiplexing (OFDM) systems. This framework relies only on the easily accessible reference signal received power (RSRP) measurements at users in existing wideband communication systems, without requiring additional pilot transmission. Based on the estimates of channel autocorrelation matrix, the passive reflection of IRS is optimized to maximize the average user received signal-to-noise ratio (SNR) over all subcarriers in the OFDM system. Numerical results verify that the proposed algorithm significantly outperforms existing powermeasurement-based IRS reflection designs in wideband channels.

Current speech-based LLMs are predominantly trained on extensive ASR and TTS datasets, excelling in tasks related to these domains. However, their ability to handle direct speech-to-speech conversations remains notably constrained. These models often rely on an ASR-to-TTS chain-of-thought pipeline, converting speech into text for processing before generating audio responses, which introduces latency and loses audio features. We propose a method that implicitly internalizes ASR chain of thought into a speech LLM, enhancing its native speech understanding capabilities. Our approach reduces latency and improves the model's native understanding of speech, paving the way for more efficient and natural real-time audio interactions. We also release a large-scale synthetic conversational dataset to facilitate further research.

In automatic speech recognition, any factor that alters the acoustic properties of speech can pose a challenge to the system's performance. This paper presents a novel approach for automatic whispered speech recognition in the Irish dialect using the self-supervised WavLM model. Conventional automatic speech recognition systems often fail to accurately recognise whispered speech due to its distinct acoustic properties and the scarcity of relevant training data. To address this challenge, we utilized a pre-trained WavLM model, fine-tuned with a combination of whispered and normal speech data from the wTIMIT and CHAINS datasets, which include the English language in Singaporean and Irish dialects, respectively. Our baseline evaluation with the OpenAI Whisper model highlighted its limitations, achieving a Word Error Rate (WER) of 18.8% and a Character Error Rate (CER) of 4.24% on whispered speech. In contrast, the proposed WavLM-based system significantly improved performance, achieving a WER of 9.22% and a CER of 2.59%. These results demonstrate the efficacy of our approach in recognising whispered speech and underscore the importance of tailored acoustic modeling for robust automatic speech recognition systems. This study provides valuable insights into developing effective automatic speech recognition solutions for challenging speech affected by whisper and dialect. The source codes for this paper are freely available.

Smart metering networks are increasingly susceptible to cyber threats, where false data injection (FDI) appears as a critical attack. Data-driven-based machine learning (ML) methods have shown immense benefits in detecting FDI attacks via data learning and prediction abilities. Literature works have mostly focused on centralized learning and deploying FDI attack detection models at the control center, which requires data collection from local utilities like meters and transformers. However, this data sharing may raise privacy concerns due to the potential disclosure of household information like energy usage patterns. This paper proposes a new privacy-preserved FDI attack detection by developing an efficient federated learning (FL) framework in the smart meter network with edge computing. Distributed edge servers located at the network edge run an ML-based FDI attack detection model and share the trained model with the grid operator, aiming to build a strong FDI attack detection model without data sharing. Simulation results demonstrate the efficiency of our proposed FL method over the conventional method without collaboration.

Adaptive Mixed-Criticality (AMC) is a fixed-priority preemptive scheduling algorithm for mixed-criticality hard real-time systems. It dominates many other scheduling algorithms for mixed-criticality systems, but does so at the cost of occasionally dropping jobs of less important/critical tasks, when low-priority jobs overrun their time budgets. In this paper we enhance AMC with a deep reinforcement learning (DRL) approach based on a Deep-Q Network. The DRL agent is trained off-line, and at run-time adjusts the low-criticality budgets of tasks to avoid budget overruns, while ensuring that no job misses its deadline if it does not overrun its budget. We have implemented and evaluated this approach by simulating realistic workloads from the automotive domain. The results show that the agent is able to reduce budget overruns by at least up to 50%, even when the budget of each task is chosen based on sampling the distribution of its execution time. To the best of our knowledge, this is the first use of DRL in AMC reported in the literature.

In the control of lower-limb exoskeletons with feet, the phase in the gait cycle can be identified by monitoring the weight distribution at the feet. This phase information can be used in the exoskeleton's controller to compensate the dynamics of the exoskeleton and to assign impedance parameters. Typically the weight distribution is calculated using data from sensors such as treadmill force plates or insole force sensors. However, these solutions increase both the setup complexity and cost. For this reason, we propose a deep-learning approach that uses a short time window of joint kinematics to predict the weight distribution of an exoskeleton in real time. The model was trained on treadmill walking data from six users wearing a four-degree-of-freedom exoskeleton and tested in real time on three different users wearing the same device. This test set includes two users not present in the training set to demonstrate the model's ability to generalize across individuals. Results show that the proposed method is able to fit the actual weight distribution with R2=0.9 and is suitable for real-time control with prediction times less than 1 ms. Experiments in closed-loop exoskeleton control show that deep-learning-based weight distribution estimation can be used to replace force sensors in overground and treadmill walking.

Recent advances in AI-generated voices have intensified the challenge of detecting deepfake audio, posing risks for scams and the spread of disinformation. To tackle this issue, we establish the largest public voice dataset to date, named DeepFakeVox-HQ, comprising 1.3 million samples, including 270,000 high-quality deepfake samples from 14 diverse sources. Despite previously reported high accuracy, existing deepfake voice detectors struggle with our diversely collected dataset, and their detection success rates drop even further under realistic corruptions and adversarial attacks. We conduct a holistic investigation into factors that enhance model robustness and show that incorporating a diversified set of voice augmentations is beneficial. Moreover, we find that the best detection models often rely on high-frequency features, which are imperceptible to humans and can be easily manipulated by an attacker. To address this, we propose the F-SAT: Frequency-Selective Adversarial Training method focusing on high-frequency components. Empirical results demonstrate that using our training dataset boosts baseline model performance (without robust training) by 33%, and our robust training further improves accuracy by 7.7% on clean samples and by 29.3% on corrupted and attacked samples, over the state-of-the-art RawNet3 model.

Under some initial and boundary conditions, the rapid reaction-thermal diffusion process taking place during frontal polymerization (FP) destabilizes the planar mode of front propagation, leading to spatially varying, complex hierarchical patterns in thermoset polymeric materials. Although modern reaction-diffusion models can predict the patterns resulting from unstable FP, the inverse design of patterns, which aims to retrieve process conditions that produce a desired pattern, remains an open challenge due to the non-unique and non-intuitive mapping between process conditions and manufactured patterns. In this work, we propose a probabilistic generative model named univariate conditional variational autoencoder (UcVAE) for the inverse design of hierarchical patterns in FP-based manufacturing. Unlike the cVAE, which encodes both the design space and the design target, the UcVAE encodes only the design space. In the encoder of the UcVAE, the number of training parameters is significantly reduced compared to the cVAE, resulting in a shorter training time while maintaining comparable performance. Given desired pattern images, the trained UcVAE can generate multiple process condition solutions that produce high-fidelity hierarchical patterns.

Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.

北京阿比特科技有限公司