姑娘日本电影免费观看全集中文_高清一区二区三区视频在线观看_人人狠狠综合久久亚洲爱D下载_成人一级黄色大片_日韩一卡2卡3卡4卡新区乱码视频_舔下面视频无码在线观看_国产爽人人爽人人片AV

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Controllable neural audio synthesis of sound effects is a challenging task due to the potential scarcity and spectro-temporal variance of the data. Differentiable digital signal processing (DDSP) synthesisers have been successfully employed to model and control musical and harmonic signals using relatively limited data and computational resources. Here we propose NoiseBandNet, an architecture capable of synthesising and controlling sound effects by filtering white noise through a filterbank, thus going further than previous systems that make assumptions about the harmonic nature of sounds. We evaluate our approach via a series of experiments, modelling footsteps, thunderstorm, pottery, knocking, and metal sound effects. Comparing NoiseBandNet audio reconstruction capabilities to four variants of the DDSP-filtered noise synthesiser, NoiseBandNet scores higher in nine out of ten evaluation categories, establishing a flexible DDSP method for generating time-varying, inharmonic sound effects of arbitrary length with both good time and frequency resolution. Finally, we introduce some potential creative uses of NoiseBandNet, by generating variations, performing loudness transfer, and by training it on user-defined control curves.

相關內容

控制(zhi)器(qi)

關注 5

曼哈頓距離 · Storage · 噪聲 · 優化器 · 比特 ·

2023 年 9 月 8 日

Eliminating Media Noise While Preserving Storage Capacity: Reconfigurable Constrained Codes for Two-Dimensional Magnetic Recording

Iven Guzel,Do?ukan ?zbayrak,Robert Calderbank,Ahmed Hareedy

from arxiv, 27 pages (single column), 12 figures, submitted to the IEEE Transactions on Information Theory (TIT)

Magnetic recording devices are still competitive in the storage density race thanks to new technologies such as two-dimensional magnetic recording (TDMR). Error-prone patterns where a bit is surrounded by complementary bits at the four positions with Manhattan distance $1$ on the TDMR grid are called plus isolation (PIS) patterns. Recently, we introduced optimal plus LOCO (OP-LOCO) codes that prevent these patterns from being written. However, as the device ages, error-prone patterns where a bit is surrounded by complementary bits at only three positions with Manhattan distance $1$ emerge, and we call these incomplete PIS (IPIS) patterns. In this paper, we present capacity-achieving codes that forbid both PIS and IPIS patterns in TDMR systems with wide read heads. We collectively call these patterns rotated T isolation (RTIS) patterns, and we call the new codes optimal T LOCO (OT-LOCO) codes. We analyze OT-LOCO codes and derive their encoding-decoding rule. Simulation results demonstrate that OT-LOCO codes entirely eliminate media noise at practical TD densities. We suggest using OP-LOCO codes early in the device lifetime, then reconfiguring to OT-LOCO codes later on. Moreover, we introduce another coding scheme to remove RTIS patterns which offers lower complexity, lower error propagation, and track separation.

Networking · 閾值 · Learning · FAST · 模式識別 ·

2023 年 9 月 8 日

PRISTA-Net: Deep Iterative Shrinkage Thresholding Network for Coded Diffraction Patterns Phase Retrieval

Aoxu Liu,Xiaohong Fan,Yin Yang,Jianping Zhang

from arxiv, 12 pages

The problem of phase retrieval (PR) involves recovering an unknown image from limited amplitude measurement data and is a challenge nonlinear inverse problem in computational imaging and image processing. However, many of the PR methods are based on black-box network models that lack interpretability and plug-and-play (PnP) frameworks that are computationally complex and require careful parameter tuning. To address this, we have developed PRISTA-Net, a deep unfolding network (DUN) based on the first-order iterative shrinkage thresholding algorithm (ISTA). This network utilizes a learnable nonlinear transformation to address the proximal-point mapping sub-problem associated with the sparse priors, and an attention mechanism to focus on phase information containing image edges, textures, and structures. Additionally, the fast Fourier transform (FFT) is used to learn global features to enhance local information, and the designed logarithmic-based loss function leads to significant improvements when the noise level is low. All parameters in the proposed PRISTA-Net framework, including the nonlinear transformation, threshold parameters, and step size, are learned end-to-end instead of being manually set. This method combines the interpretability of traditional methods with the fast inference ability of deep learning and is able to handle noise at each iteration during the unfolding stage, thus improving recovery quality. Experiments on Coded Diffraction Patterns (CDPs) measurements demonstrate that our approach outperforms the existing state-of-the-art methods in terms of qualitative and quantitative evaluations. Our source codes are available at \emph{//github.com/liuaxou/PRISTA-Net}.

圖像修復 · 變換 · Transformer · INFORMS · Performer ·

2023 年 9 月 7 日

ProPainter: Improving Propagation and Transformer for Video Inpainting

Shangchen Zhou,Chongyi Li,Kelvin C. K. Chan,Chen Change Loy

from arxiv, Accepted by ICCV 2023. Code: //github.com/sczhou/ProPainter

Flow-based propagation and spatiotemporal Transformer are two mainstream mechanisms in video inpainting (VI). Despite the effectiveness of these components, they still suffer from some limitations that affect their performance. Previous propagation-based approaches are performed separately either in the image or feature domain. Global image propagation isolated from learning may cause spatial misalignment due to inaccurate optical flow. Moreover, memory or computational constraints limit the temporal range of feature propagation and video Transformer, preventing exploration of correspondence information from distant frames. To address these issues, we propose an improved framework, called ProPainter, which involves enhanced ProPagation and an efficient Transformer. Specifically, we introduce dual-domain propagation that combines the advantages of image and feature warping, exploiting global correspondences reliably. We also propose a mask-guided sparse video Transformer, which achieves high efficiency by discarding unnecessary and redundant tokens. With these components, ProPainter outperforms prior arts by a large margin of 1.46 dB in PSNR while maintaining appealing efficiency.

可辨認的 · Extensibility · Prompt · Performer · 3D ·

2023 年 9 月 7 日

DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

Jingyu Zhuang,Chen Wang,Lingjie Liu,Liang Lin,Guanbin Li

from arxiv, Accepted by SIGGRAPH Asia 2023

Neural fields have achieved impressive advancements in view synthesis and scene reconstruction. However, editing these neural fields remains challenging due to the implicit encoding of geometry and texture information. In this paper, we propose DreamEditor, a novel framework that enables users to perform controlled editing of neural fields using text prompts. By representing scenes as mesh-based neural fields, DreamEditor allows localized editing within specific regions. DreamEditor utilizes the text encoder of a pretrained text-to-Image diffusion model to automatically identify the regions to be edited based on the semantics of the text prompts. Subsequently, DreamEditor optimizes the editing region and aligns its geometry and texture with the text prompts through score distillation sampling [29]. Extensive experiments have demonstrated that DreamEditor can accurately edit neural fields of real-world scenes according to the given text prompts while ensuring consistency in irrelevant areas. DreamEditor generates highly realistic textures and geometry, significantly surpassing previous works in both quantitative and qualitative evaluations.

MoDELS · Continuity · Performer · 數據集 · Networking ·

2023 年 9 月 6 日

Leveraging Geometrical Acoustic Simulations of Spatial Room Impulse Responses for Improved Sound Event Detection and Localization

Christopher Ick,Brian McFee

from arxiv, 5 pages, 3 figures, 3 tables, presented in the Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)

As deeper and more complex models are developed for the task of sound event localization and detection (SELD), the demand for annotated spatial audio data continues to increase. Annotating field recordings with 360$^{\circ}$ video takes many hours from trained annotators, while recording events within motion-tracked laboratories are bounded by cost and expertise. Because of this, localization models rely on a relatively limited amount of spatial audio data in the form of spatial room impulse response (SRIR) datasets, which limits the progress of increasingly deep neural network based approaches. In this work, we demonstrate that simulated geometrical acoustics can provide an appealing solution to this problem. We use simulated geometrical acoustics to generate a novel SRIR dataset that can train a SELD model to provide similar performance to that of a real SRIR dataset. Furthermore, we demonstrate using simulated data to augment existing datasets, improving on benchmarks set by state of the art SELD models. We explore the potential and limitations of geometric acoustic simulation for localization and event detection. We also propose further studies to verify the limitations of this method, as well as further methods to generate synthetic data for SELD tasks without the need to record more data.

穩健性 · 樣本 · 可辨認的 · Extensibility · MoDELS ·

2023 年 9 月 6 日

Watch This Space: Securing Satellite Communication through Resilient Transmitter Fingerprinting

Joshua Smailes,Sebastian K?hler,Simon Birnbach,Martin Strohmeier,Ivan Martinovic

from arxiv, 14 pages, 16 figures

Due to an increase in the availability of cheap off-the-shelf radio hardware, spoofing and replay attacks on satellite ground systems have become more accessible than ever. This is particularly a problem for legacy systems, many of which do not offer cryptographic security and cannot be patched to support novel security measures. In this paper we explore radio transmitter fingerprinting in satellite systems. We introduce the SatIQ system, proposing novel techniques for authenticating transmissions using characteristics of transmitter hardware expressed as impairments on the downlinked signal. We look in particular at high sample rate fingerprinting, making fingerprints difficult to forge without similarly high sample rate transmitting hardware, thus raising the budget for attacks. We also examine the difficulty of this approach with high levels of atmospheric noise and multipath scattering, and analyze potential solutions to this problem. We focus on the Iridium satellite constellation, for which we collected 1705202 messages at a sample rate of 25 MS/s. We use this data to train a fingerprinting model consisting of an autoencoder combined with a Siamese neural network, enabling the model to learn an efficient encoding of message headers that preserves identifying information. We demonstrate the system's robustness under attack by replaying messages using a Software-Defined Radio, achieving an Equal Error Rate of 0.120, and ROC AUC of 0.946. Finally, we analyze its stability over time by introducing a time gap between training and testing data, and its extensibility by introducing new transmitters which have not been seen before. We conclude that our techniques are useful for building systems that are stable over time, can be used immediately with new transmitters without retraining, and provide robustness against spoofing and replay by raising the required budget for attacks.

DeepFakes · INFORMS · 相似度 · 卷積 · Extensibility ·

2023 年 9 月 6 日

An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection

Yuankun Xie,Haonan Cheng,Yutian Wang,Long Ye

from arxiv, Submitted to ICASSP 2024

Partially spoofed audio detection is a challenging task, lying in the need to accurately locate the authenticity of audio at the frame level. To address this issue, we propose a fine-grained partially spoofed audio detection method, namely Temporal Deepfake Location (TDL), which can effectively capture information of both features and locations. Specifically, our approach involves two novel parts: embedding similarity module and temporal convolution operation. To enhance the identification between the real and fake features, the embedding similarity module is designed to generate an embedding space that can separate the real frames from fake frames. To effectively concentrate on the position information, temporal convolution operation is proposed to calculate the frame-specific similarities among neighboring frames, and dynamically select informative neighbors to convolution. Extensive experiments show that our method outperform baseline models in ASVspoof2019 Partial Spoof dataset and demonstrate superior performance even in the crossdataset scenario. The code is released online.

圖片分類 · Automator · 標注 · 規范化的 · Performer ·

2023 年 9 月 6 日

Improving Image Classification of Knee Radiographs: An Automated Image Labeling Approach

Jikai Zhang,Carlos Santos,Christine Park,Maciej Mazurowski,Roy Colglazier

from arxiv, This is the preprint version

Large numbers of radiographic images are available in knee radiology practices which could be used for training of deep learning models for diagnosis of knee abnormalities. However, those images do not typically contain readily available labels due to limitations of human annotations. The purpose of our study was to develop an automated labeling approach that improves the image classification model to distinguish normal knee images from those with abnormalities or prior arthroplasty. The automated labeler was trained on a small set of labeled data to automatically label a much larger set of unlabeled data, further improving the image classification performance for knee radiographic diagnosis. We developed our approach using 7,382 patients and validated it on a separate set of 637 patients. The final image classification model, trained using both manually labeled and pseudo-labeled data, had the higher weighted average AUC (WAUC: 0.903) value and higher AUC-ROC values among all classes (normal AUC-ROC: 0.894; abnormal AUC-ROC: 0.896, arthroplasty AUC-ROC: 0.990) compared to the baseline model (WAUC=0.857; normal AUC-ROC: 0.842; abnormal AUC-ROC: 0.848, arthroplasty AUC-ROC: 0.987), trained using only manually labeled data. DeLong tests show that the improvement is significant on normal (p-value<0.002) and abnormal (p-value<0.001) images. Our findings demonstrated that the proposed automated labeling approach significantly improves the performance of image classification for radiographic knee diagnosis, allowing for facilitating patient care and curation of large knee datasets.

任務對話系統 · INFORMS · 圖 · Networking · entity ·

2020 年 8 月 11 日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Xiaoze Jiang,Siyi Du,Zengchang Qin,Yajing Sun,Jing Yu

from arxiv, Accepted by the 28th ACM International Conference on Multimedia (ACM MM 2020)

Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts. Classical approaches pay more attention to the integration of the current question, vision knowledge and text knowledge, despising the heterogeneous semantic gaps between the cross-modal information. In the meantime, the concatenation operation has become de-facto standard to the cross-modal information fusion, which has a limited ability in information retrieval. In this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by using graph to bridge the cross-modal semantic relations between vision and text knowledge in fine granularity, as well as retrieving required knowledge via an adaptive information selection mode. Moreover, the reasoning clues for visual dialogue can be clearly drawn from intra-modal entities and inter-modal bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets demonstrate that our model outperforms exiting models with state-of-the-art results.

樣例 · 相似度 · 語音識別 · 端到端 · 轉錄 ·

2018 年 1 月 5 日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Nicholas Carlini,David Wagner

We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second). We apply our iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.