秋霞网一区二区三区_欧美一级片免费观看_可以免费观看的黄页_午夜无码中文字幕影院_国产成人午夜视频在线播放网站_少妇无码一区二区三区视频_亚洲综合色一区二区三区丿

Machine Listening focuses on developing technologies to extract relevant information from audio signals. A critical aspect of these projects is the acquisition and labeling of contextualized data, which is inherently complex and requires specific resources and strategies. Despite the availability of some audio datasets, many are unsuitable for commercial applications. The paper emphasizes the importance of Active Learning (AL) using expert labelers over crowdsourcing, which often lacks detailed insights into dataset structures. AL is an iterative process combining human labelers and AI models to optimize the labeling budget by intelligently selecting samples for human review. This approach addresses the challenge of handling large, constantly growing datasets that exceed available computational resources and memory. The paper presents a comprehensive data-centric framework for Machine Listening projects, detailing the configuration of recording nodes, database structure, and labeling budget optimization in resource-constrained scenarios. Applied to an industrial port in Valencia, Spain, the framework successfully labeled 6540 ten-second audio samples over five months with a small team, demonstrating its effectiveness and adaptability to various resource availability situations.

相關內容

標注

關注 2

Projection · Networking · 塑造 · MoDELS · 均值 ·

2024 年 7 月 9 日

Deep-Motion-Net: GNN-based volumetric organ shape reconstruction from single-view 2D projections

Isuru Wijesinghe,Michael Nix,Arezoo Zakeri,Alireza Hokmabadi,Bashar Al-Qaisieh,Ali Gooya,Zeike A. Taylor

We propose Deep-Motion-Net: an end-to-end graph neural network (GNN) architecture that enables 3D (volumetric) organ shape reconstruction from a single in-treatment kV planar X-ray image acquired at any arbitrary projection angle. Estimating and compensating for true anatomical motion during radiotherapy is essential for improving the delivery of planned radiation dose to target volumes while sparing organs-at-risk, and thereby improving the therapeutic ratio. Achieving this using only limited imaging available during irradiation and without the use of surrogate signals or invasive fiducial markers is attractive. The proposed model learns the mesh regression from a patient-specific template and deep features extracted from kV images at arbitrary projection angles. A 2D-CNN encoder extracts image features, and four feature pooling networks fuse these features to the 3D template organ mesh. A ResNet-based graph attention network then deforms the feature-encoded mesh. The model is trained using synthetically generated organ motion instances and corresponding kV images. The latter is generated by deforming a reference CT volume aligned with the template mesh, creating digitally reconstructed radiographs (DRRs) at required projection angles, and DRR-to-kV style transferring with a conditional CycleGAN model. The overall framework was tested quantitatively on synthetic respiratory motion scenarios and qualitatively on in-treatment images acquired over full scan series for liver cancer patients. Overall mean prediction errors for synthetic motion test datasets were 0.16$\pm$0.13 mm, 0.18$\pm$0.19 mm, 0.22$\pm$0.34 mm, and 0.12$\pm$0.11 mm. Mean peak prediction errors were 1.39 mm, 1.99 mm, 3.29 mm, and 1.16 mm.

估計/估計量 · 圖像配準 · 變換 · Performer · state-of-the-art ·

2024 年 7 月 9 日

POLAFFINI: Efficient feature-based polyaffine initialization for improved non-linear image registration

Antoine Legouhy,Ross Callaghan,Hojjat Azadbakht,Hui Zhang

from arxiv, submitted and accepted to IPMI 2023

This paper presents an efficient feature-based approach to initialize non-linear image registration. Today, nonlinear image registration is dominated by methods relying on intensity-based similarity measures. A good estimate of the initial transformation is essential, both for traditional iterative algorithms and for recent one-shot deep learning (DL)-based alternatives. The established approach to estimate this starting point is to perform affine registration, but this may be insufficient due to its parsimonious, global, and non-bending nature. We propose an improved initialization method that takes advantage of recent advances in DL-based segmentation techniques able to instantly estimate fine-grained regional delineations with state-of-the-art accuracies. Those segmentations are used to produce local, anatomically grounded, feature-based affine matchings using iteration-free closed-form expressions. Estimated local affine transformations are then fused, with the log-Euclidean polyaffine framework, into an overall dense diffeomorphic transformation. We show that, compared to its affine counterpart, the proposed initialization leads to significantly better alignment for both traditional and DL-based non-linear registration algorithms. The proposed approach is also more robust and significantly faster than commonly used affine registration algorithms such as FSL FLIRT.

Attention · 正則化項 · 掩碼 · 穩健性 · Networking ·

2024 年 7 月 9 日

Toward Motion Robustness: A masked attention regularization framework in remote photoplethysmography

Pengfei Zhao,Qigong Sun,Xiaolin Tian,Yige Yang,Shuo Tao,Jie Cheng,Jiantong Chen

from arxiv, CVPR workshop 2024 accepted

There has been growing interest in facial video-based remote photoplethysmography (rPPG) measurement recently, with a focus on assessing various vital signs such as heart rate and heart rate variability. Despite previous efforts on static datasets, their approaches have been hindered by inaccurate region of interest (ROI) localization and motion issues, and have shown limited generalization in real-world scenarios. To address these challenges, we propose a novel masked attention regularization (MAR-rPPG) framework that mitigates the impact of ROI localization and complex motion artifacts. Specifically, our approach first integrates a masked attention regularization mechanism into the rPPG field to capture the visual semantic consistency of facial clips, while it also employs a masking technique to prevent the model from overfitting on inaccurate ROIs and subsequently degrading its performance. Furthermore, we propose an enhanced rPPG expert aggregation (EREA) network as the backbone to obtain rPPG signals and attention maps simultaneously. Our EREA network is capable of discriminating divergent attentions from different facial areas and retaining the consistency of spatiotemporal attention maps. For motion robustness, a simple open source detector MediaPipe for data preprocessing is sufficient for our framework due to its superior capability of rPPG signal extraction and attention regularization. Exhaustive experiments on three benchmark datasets (UBFC-rPPG, PURE, and MMPD) substantiate the superiority of our proposed method, outperforming recent state-of-the-art works by a considerable margin.

簇 · Processing（編程語言） · MoDELS · 表示 · ONCE ·

2024 年 7 月 8 日

Dirichlet process mixture model based on topologically augmented signal representation for clustering infant vocalizations

Guillem Bonafos,Clara Bourot,Pierre Pudlo,Jean-Marc Freyermuth,Laurence Reboul,Samuel Tron?on,Arnaud Rey

Based on audio recordings made once a month during the first 12 months of a child's life, we propose a new method for clustering this set of vocalizations. We use a topologically augmented representation of the vocalizations, employing two persistence diagrams for each vocalization: one computed on the surface of its spectrogram and one on the Takens' embeddings of the vocalization. A synthetic persistent variable is derived for each diagram and added to the MFCCs (Mel-frequency cepstral coefficients). Using this representation, we fit a non-parametric Bayesian mixture model with a Dirichlet process prior to model the number of components. This procedure leads to a novel data-driven categorization of vocal productions. Our findings reveal the presence of 8 clusters of vocalizations, allowing us to compare their temporal distribution and acoustic profiles in the first 12 months of life.

MoDELS · Performer · 數據集 · 回合 · 訓練集 ·

2024 年 7 月 7 日

Synthetic training set generation using text-to-audio models for environmental sound classification

Francesca Ronchini,Luca Comanducci,Fabio Antonacci

In recent years, text-to-audio models have revolutionized the field of automatic audio generation. This paper investigates their application in generating synthetic datasets for training data-driven models. Specifically, this study analyzes the performance of two environmental sound classification systems trained with data generated from text-to-audio models. We considered three scenarios: a) augmenting the training dataset with data generated by text-to-audio models; b) using a mixed training dataset combining real and synthetic text-driven generated data; and c) using a training dataset composed entirely of synthetic audio. In all cases, the performance of the classification models was tested on real data. Results indicate that text-to-audio models are effective for dataset augmentation, with consistent performance when replacing a subset of the recorded dataset. However, the performance of the audio recognition models drops when relying entirely on generated audio.

圖像還原 · MoDELS · 正則化項 · Analysis · 相似度 ·

2024 年 7 月 5 日

On a nonlinear nonlocal reaction-diffusion system applied to image restoration

Yuhang Li,Zhichang Guo,Jingfeng Shao,Boying Wu

from arxiv, 28 pages,7 figures

This paper deals with a novel nonlinear coupled nonlocal reaction-diffusion system proposed for image restoration, characterized by the advantages of preserving low gray level features and textures.The gray level indicator in the proposed model is regularized using a new method based on porous media type equations, which is suitable for recovering noisy blurred images. The well-posedness, regularity, and other properties of the model are investigated, addressing the lack of theoretical analysis in those existing similar types of models. Numerical experiments conducted on texture and satellite images demonstrate the effectiveness of the proposed model in denoising and deblurring tasks.

圖 · MoDELS · 講稿 · Extensibility · 相關系數 ·

2024 年 7 月 5 日

Gotta match 'em all: Solution diversification in graph matching matched filters

Zhirui Li,Ben Johnson,Daniel L. Sussman,Carey E. Priebe,Vince Lyzinski

from arxiv, 27 pages, 12 figures, 3 tables

We present a novel approach for finding multiple noisily embedded template graphs in a very large background graph. Our method builds upon the graph-matching-matched-filter technique proposed in Sussman et al., with the discovery of multiple diverse matchings being achieved by iteratively penalizing a suitable node-pair similarity matrix in the matched filter algorithm. In addition, we propose algorithmic speed-ups that greatly enhance the scalability of our matched-filter approach. We present theoretical justification of our methodology in the setting of correlated Erdos-Renyi graphs, showing its ability to sequentially discover multiple templates under mild model conditions. We additionally demonstrate our method's utility via extensive experiments both using simulated models and real-world dataset, include human brain connectomes and a large transactional knowledge base.

噪聲 · 語音增強 · 無監督 · 早停 · 回合 ·

2024 年 7 月 4 日

Unsupervised speech enhancement with spectral kurtosis and double deep priors

Hien Ohnaka,Ryoichi Miyazaki

from arxiv, 11 pages, 12 figures, and 2 Tables, submitted to Acoustical Science and Technology

This paper proposes an unsupervised DNN-based speech enhancement approach founded on deep priors (DPs). Here, DP signifies that DNNs are more inclined to produce clean speech signals than noises. Conventional methods based on DP typically involve training on a noisy speech signal using a random noise feature as input, stopping training only a clean speech signal is generated. However, such conventional approaches encounter challenges in determining the optimal stop timing, experience performance degradation due to environmental background noise, and suffer a trade-off between distortion of the clean speech signal and noise reduction performance. To address these challenges, we utilize two DNNs: one to generate a clean speech signal and the other to generate noise. The combined output of these networks closely approximates the noisy speech signal, with a loss term based on spectral kurtosis utilized to separate the noisy speech signal into a clean speech signal and noise. The key advantage of this method lies in its ability to circumvent trade-offs and early stopping problems, as the signal is decomposed by enough steps. Through evaluation experiments, we demonstrate that the proposed method outperforms conventional methods in the case of white Gaussian and environmental noise while effectively mitigating early stopping problems.

Tensor · 流 · 近似 · 講稿 · 線性的 ·

2024 年 7 月 4 日

A sequential multilinear Nystr?m algorithm for streaming low-rank approximation of tensors in Tucker format

Alberto Bucci,Behnam Hashemi

We present a sequential version of the multilinear Nystr\"om algorithm which is suitable for the low-rank Tucker approximation of tensors given in a streaming format. Accessing the tensor $\mathcal{A}$ exclusively through random sketches of the original data, the algorithm effectively leverages structures in $\mathcal{A}$, such as low-rankness, and linear combinations. We present a deterministic analysis of the algorithm and demonstrate its superior speed and efficiency in numerical experiments including an application in video processing.

Performer · Color · Networking · CRAFT · 均方誤差 ·

2018 年 1 月 25 日

C2MSNet: A Novel approach for single image haze removal

Akshay Dudhane,Subrahmanyam Murala

from arxiv, Accepted in Winter Conference on Applications of Computer Vision (WACV-2018)

Degradation of image quality due to the presence of haze is a very common phenomenon. Existing DehazeNet [3], MSCNN [11] tackled the drawbacks of hand crafted haze relevant features. However, these methods have the problem of color distortion in gloomy (poor illumination) environment. In this paper, a cardinal (red, green and blue) color fusion network for single image haze removal is proposed. In first stage, network fusses color information present in hazy images and generates multi-channel depth maps. The second stage estimates the scene transmission map from generated dark channels using multi channel multi scale convolutional neural network (McMs-CNN) to recover the original scene. To train the proposed network, we have used two standard datasets namely: ImageNet [5] and D-HAZY [1]. Performance evaluation of the proposed approach has been carried out using structural similarity index (SSIM), mean square error (MSE) and peak signal to noise ratio (PSNR). Performance analysis shows that the proposed approach outperforms the existing state-of-the-art methods for single image dehazing.