蜜芽亚洲精品国产品国语在线试看-亚洲欧洲国产精品你懂的

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Deep learning has the potential to enhance speech signals and increase their intelligibility for users of hearing aids. Deep models suited for real-world application should feature a low computational complexity and low processing delay of only a few milliseconds. In this paper, we explore deep speech enhancement that matches these requirements and contrast monaural and binaural processing algorithms in two complex acoustic scenes. Both algorithms are evaluated with objective metrics and in experiments with hearing-impaired listeners performing a speech-in-noise test. Results are compared to two traditional enhancement strategies, i.e., adaptive differential microphone processing and binaural beamforming. While in diffuse noise, all algorithms perform similarly, the binaural deep learning approach performs best in the presence of spatial interferers. Through a post-analysis, this can be attributed to improvements at low SNRs and to precise spatial filtering.

相關內容

Processing（編程語言）

關注 121

Processing 是一門開源編程語言和與之配套的集成開發環境（IDE）的名稱。Processing 在電子藝術和視覺設計社區被用來教授編程基礎，并運用于大量的新媒體和互動藝術作品中。

ARM · 模型評估 · INFORMS · 控制器 · 均值 ·

2024 年 6 月 12 日

The impact of deep learning aid on the workload and interpretation accuracy of radiologists on chest computed tomography: a cross-over reader study

Anvar Kurmukov,Valeria Chernina,Regina Gareeva,Maria Dugova,Ekaterina Petrash,Olga Aleshina,Maxim Pisov,Boris Shirokikh,Valentin Samokhin,Vladislav Proskurov,Stanislav Shimovolos,Maria Basova,Mikhail Goncahrov,Eugenia Soboleva,Maria Donskova,Farukh Yaushev,Alexey Shevtsov,Alexey Zakharov,Talgat Saparov,Victor Gombolevskiy,Mikhail Belyaev

from arxiv, 17 pages, 6 figures, 8 tables

Interpretation of chest computed tomography (CT) is time-consuming. Previous studies have measured the time-saving effect of using a deep-learning-based aid (DLA) for CT interpretation. We evaluated the joint impact of a multi-pathology DLA on the time and accuracy of radiologists' reading. 40 radiologists were randomly split into three experimental arms: control (10), who interpret studies without assistance; informed group (10), who were briefed about DLA pathologies, but performed readings without it; and the experimental group (20), who interpreted half studies with DLA, and half without. Every arm used the same 200 CT studies retrospectively collected from BIMCV-COVID19 dataset; each radiologist provided readings for 20 CT studies. We compared interpretation time, and accuracy of participants diagnostic report with respect to 12 pathological findings. Mean reading time per study was 15.6 minutes [SD 8.5] in the control arm, 13.2 minutes [SD 8.7] in the informed arm, 14.4 [SD 10.3] in the experimental arm without DLA, and 11.4 minutes [SD 7.8] in the experimental arm with DLA. Mean sensitivity and specificity were 41.5 [SD 30.4], 86.8 [SD 28.3] in the control arm; 53.5 [SD 22.7], 92.3 [SD 9.4] in the informed non-assisted arm; 63.2 [SD 16.4], 92.3 [SD 8.2] in the experimental arm without DLA; and 91.6 [SD 7.2], 89.9 [SD 6.0] in the experimental arm with DLA. DLA speed up interpretation time per study by 2.9 minutes (CI95 [1.7, 4.3], p<0.0005), increased sensitivity by 28.4 (CI95 [23.4, 33.4], p<0.0005), and decreased specificity by 2.4 (CI95 [0.6, 4.3], p=0.13). Of 20 radiologists in the experimental arm, 16 have improved reading time and sensitivity, two improved their time with a marginal drop in sensitivity, and two participants improved sensitivity with increased time. Overall, DLA introduction decreased reading time by 20.6%.

Networking · 知識 (knowledge) · 語音增強 · INFORMS · ICASSP ·

2024 年 6 月 11 日

RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention

Mingshuai Liu,Zhuangqi Chen,Xiaopeng Yan,Yuanjun Lv,Xianjun Xia,Chuanzeng Huang,Yijian Xiao,Lei Xie

from arxiv, Accepted by Interspeech 2024

In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's performance. To mitigate these problems, we extend RaD-Net to its upgraded version, RaD-Net 2. Specifically, a causality-based knowledge distillation is introduced in the first stage to use future information in a causal way. We use the non-causal repairing network as the teacher to improve the performance of the causal repairing network. In addition, in the second stage, complex axial self-attention is applied in the denoising network's complex feature encoder/decoder. Experimental results on the ICASSP 2024 SSI Challenge blind test set show that RaD-Net 2 brings 0.10 OVRL DNSMOS improvement compared to RaD-Net.

MoDELS · 情景 · CP · Performer · state-of-the-art ·

2024 年 6 月 11 日

ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks

Xin Jing,Andreas Triantafyllopoulos,Bj?rn Schuller

from arxiv, Accepted by Interspeech 2024

Contrastive language-audio pretraining (CLAP) has recently emerged as a method for making audio analysis more generalisable. Specifically, CLAP-style models are able to `answer' a diverse set of language queries, extending the capabilities of audio models beyond a closed set of labels. However, CLAP relies on a large set of (audio, query) pairs for pretraining. While such sets are available for general audio tasks, like captioning or sound event detection, there are no datasets with matched audio and text queries for computational paralinguistic (CP) tasks. As a result, the community relies on generic CLAP models trained for general audio with limited success. In the present study, we explore training considerations for ParaCLAP, a CLAP-style model suited to CP, including a novel process for creating audio-language queries. We demonstrate its effectiveness on a set of computational paralinguistic tasks, where it is shown to surpass the performance of open-source state-of-the-art models.

評論員 · MoDELS · 統計量 · Performer · 向量空間 ·

2024 年 6 月 11 日

The statistical thermodynamics of generative diffusion models: Phase transitions, symmetry breaking and critical instability

Luca Ambrogioni

Generative diffusion models have achieved spectacular performance in many areas of machine learning and generative modeling. While the fundamental ideas behind these models come from non-equilibrium physics, variational inference and stochastic calculus, in this paper we show that many aspects of these models can be understood using the tools of equilibrium statistical mechanics. Using this reformulation, we show that generative diffusion models undergo second-order phase transitions corresponding to symmetry breaking phenomena. We show that these phase-transitions are always in a mean-field universality class, as they are the result of a self-consistency condition in the generative dynamics. We argue that the critical instability that arises from the phase transitions lies at the heart of their generative capabilities, which are characterized by a set of mean-field critical exponents. Finally, we show that the dynamic equation of the generative process can be interpreted as a stochastic adiabatic transformation that minimizes the free energy while keeping the system in thermal equilibrium.

離散化 · 近似 · 平穩的 · Performer · 均勻分布 ·

2024 年 6 月 11 日

A mesh-constrained discrete point method for incompressible flows with moving boundaries

Takeharu Matsuda,Satoshi Ii

Particle-based methods are a practical tool in computational fluid dynamics, and novel types of methods have been proposed. However, widely developed Lagrangian-type formulations suffer from the nonuniform distribution of particles, which is enhanced over time and result in problems in computational efficiency and parallel computations. To mitigate these problems, a mesh-constrained discrete point (MCD) method was developed for stationary boundary problems (Matsuda et al., 2022). Although the MCD method is a meshless method that uses moving least-squares approximation, the arrangement of particles (or discrete points (DPs)) is specialized so that their positions are constrained in background meshes to obtain a closely uniform distribution. This achieves a reasonable approximation for spatial derivatives with compact stencils without encountering any ill-posed condition and leads to good performance in terms of computational efficiency. In this study, a novel meshless method based on the MCD method for incompressible flows with moving boundaries is proposed. To ensure the mesh constraint of each DP in moving boundary problems, a novel updating algorithm for the DP arrangement is developed so that the position of DPs is not only rearranged but the DPs are also reassigned the role of being on the boundary or not. The proposed method achieved reasonable results in numerical experiments for well-known moving boundary problems.

INTERACT · Analysis · INFORMS · 真實值 · 查全率/召回率 ·

2024 年 6 月 10 日

An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats

Andreas Triantafyllopoulos,Alexander Gebhard,Manuel Milling,Simon Rampp,Bj?rn Schuller

from arxiv, Accepted at EUSIPCO 2024

Prior work in computational bioacoustics has mostly focused on the detection of animal presence in a particular habitat. However, animal sounds contain much richer information than mere presence; among others, they encapsulate the interactions of those animals with other members of their species. Studying these interactions is almost impossible in a naturalistic setting, as the ground truth is often lacking. The use of animals in captivity instead offers a viable alternative pathway. However, most prior works follow a traditional, statistics-based approach to analysing interactions. In the present work, we go beyond this standard framework by attempting to predict the underlying context in interactions between captive \emph{Rousettus Aegyptiacus} using deep neural networks. We reach an unweighted average recall of over 30\% -- more than thrice the chance level -- and show error patterns that differ from our statistical analysis. This work thus represents an important step towards the automatic analysis of states in animals from sound.

Processing（編程語言） · Signal Processing · Less · 損失 · Analysis ·

2024 年 6 月 7 日

Signal processing algorithm effective for sound quality of hearing loss simulators

Toshio Irino,Shintaro Doan,Minami Ishikawa

from arxiv, This paper has been accepted for publication in Interspeech 2024

Hearing loss (HL) simulators, which allow normal hearing (NH) listeners to experience HL, have been used in speech intelligibility experiments, but not in sound quality experiments due to perceptible distortion. If they produced less distortion, they might be useful for NH listeners to evaluate the sound quality of, for example, hearing aids. We conducted perceptual sound quality experiments to compare the Cambridge version of HL simulator (CamHLS) and the Wakayama version of the HL simulator (WHIS), which has the two algorithms of filterbank analysis synthesis (FBAS) and direct time-varying filter (DTVF). The experimental results showed that WHIS with DTVF produces less perceptible distortion in speech sounds than CamHLS and WHIS with FBAS, even when the nonlinear process is working. This advantage is mainly due to the use of the DTVF algorithm, which could be applied to various signal synthesis applications with filterbank analysis.

近似 · 邊 · 閉式解 · 平穩的 · Continuity ·

2024 年 6 月 7 日

An optimization-based equilibrium measure describes non-equilibrium steady state dynamics: application to edge of chaos

Junbin Qiu,Haiping Huang

from arxiv, 21 pages, 9 figures, revised version 2

Understanding neural dynamics is a central topic in machine learning, non-linear physics and neuroscience. However, the dynamics is non-linear, stochastic and particularly non-gradient, i.e., the driving force can not be written as gradient of a potential. These features make analytic studies very challenging. The common tool is the path integral approach or dynamical mean-field theory, but the drawback is that one has to solve the integro-differential or dynamical mean-field equations, which is computationally expensive and has no closed form solutions in general. From the aspect of associated Fokker-Planck equation, the steady state solution is generally unknown. Here, we treat searching for the steady states as an optimization problem, and construct an approximate potential related to the speed of the dynamics, and find that searching for the ground state of this potential is equivalent to running an approximate stochastic gradient dynamics or Langevin dynamics. Only in the zero temperature limit, the distribution of the original steady states can be achieved. The resultant stationary state of the dynamics follows exactly the canonical Boltzmann measure. Within this framework, the quenched disorder intrinsic in the neural networks can be averaged out by applying the replica method, which leads naturally to order parameters for the non-equilibrium steady states. Our theory reproduces the well-known result of edge-of-chaos, and further the order parameters characterizing the continuous transition are derived, and the order parameters are explained as fluctuations and responses of the steady states. Our method thus opens the door to analytically study the steady state landscape of the deterministic or stochastic high dimensional dynamics.

路徑 · Pig · Principle · EATS · 樣例 ·

2024 年 6 月 6 日

The syntax-semantics interface in a child's path: A study of 3- to 11-year-olds' elicited production of Mandarin recursive relative clauses

Caimei Yang,Qihang Yang,Xingzhi Su,Chenxi Fu,Xiaoyi Wang,Ying Yan,Zaijiang Man

There have been apparently conflicting claims over the syntax-semantics relationship in child acquisition. However, few of them have assessed the child's path toward the acquisition of recursive relative clauses (RRCs). The authors of the current paper did experiments to investigate 3- to 11-year-olds' most-structured elicited production of eight Mandarin RRCs in a 4 (syntactic types)*2 (semantic conditions) design. The four syntactic types were RRCs with a subject-gapped RC embedded in an object-gapped RC (SORRCs), RRCs with an object-gapped RC embedded in another object-gapped RC (OORRCs), RRCs with an object-gapped RC embedded in a subject-gapped RC (OSRRCs), and RRCs with a subject-gapped RC embedded in another subject-gapped RC (SSRRCs). Each syntactic type was put in two conditions differing in internal semantics: irreversible internal semantics (IIS) and reversible internal semantics (RIS). For example, "the balloon that [the girl that _ eats the banana] holds _" is SORRCs in the IIS condition; "the monkey that [the dog that _ bites the pig] hits_" is SORRCs in the RIS condition. For each target, the participants were provided with a speech-visual stimulus constructing a condition of irreversible external semantics (IES). The results showed that SSRRCs, OSRRCs and SORRCs in the IIS-IES condition were produced two years earlier than their counterparts in the RIS-IES condition. Thus, a 2-stage development path is proposed: the language acquisition device starts with the interface between (irreversible) syntax and IIS, and ends with the interface between syntax and IES, both abiding by the syntax-semantic interface principle.

小樣本學習 · 泛化理論 · 學成 · Performance · 監督 ·

2020 年 2 月 21 日

Few-shot acoustic event detection via meta-learning

Bowen Shi,Ming Sun,Krishna C. Puvvada,Chieh-Chi Kao,Spyros Matsoukas,Chao Wang

from arxiv, ICASSP 2020

We study few-shot acoustic event detection (AED) in this paper. Few-shot learning enables detection of new events with very limited labeled data. Compared to other research areas like computer vision, few-shot learning for audio recognition has been under-studied. We formulate few-shot AED problem and explore different ways of utilizing traditional supervised methods for this setting as well as a variety of meta-learning approaches, which are conventionally used to solve few-shot classification problem. Compared to supervised baselines, meta-learning models achieve superior performance, thus showing its effectiveness on generalization to new audio events. Our analysis including impact of initialization and domain discrepancy further validate the advantage of meta-learning approaches in few-shot AED.