五月丁香四月婷婷激情综合_日本高清区一区二区三区四区五区_视频一区二区在线观看_国内一级作爱视频免费观看午夜网站_蜜臀AV夜夜澡人人爽人人软件_亚洲一区AI换脸在线观看_水野優香亚洲一区二区三区

Neural networks and deep learning are often deployed for the sake of the most comprehensive music generation with as little involvement as possible from the human musician. Implementations in aid of, or being a tool for, music practitioners are sparse. This paper proposes the integration of generative stacked autoencoder structures for rhythm generation, within a conventional melodic step-sequencer. It further aims to work towards its implementation being accessible to the average electronic music practitioner. Several model architectures have been trained and tested for their creative potential. While the currently implementations do display limitations, they do represent viable creative solutions for music practitioners.

相關內容

自編碼器(qi)

關注 140

自動編(bian)碼(ma)器(qi)是(shi)一種(zhong)人工神經(jing)網(wang)絡(luo)，用于以無監督的(de)(de)方式學習(xi)有效(xiao)的(de)(de)數據(ju)(ju)編(bian)碼(ma)。自動編(bian)碼(ma)器(qi)的(de)(de)目的(de)(de)是(shi)通過訓練網(wang)絡(luo)忽略信(xin)號(hao)“噪聲”來(lai)學習(xi)一組數據(ju)(ju)的(de)(de)表(biao)示(shi)（編(bian)碼(ma)），通常用于降維(wei)。與簡化(hua)方面一起，學習(xi)了重構方面，在此，自動編(bian)碼(ma)器(qi)嘗試從(cong)(cong)簡化(hua)編(bian)碼(ma)中生成(cheng)盡可(ke)能(neng)接(jie)近其原始輸(shu)入(ru)的(de)(de)表(biao)示(shi)形(xing)式，從(cong)(cong)而得到其名(ming)稱。基本(ben)模(mo)型存在幾種(zhong)變(bian)體(ti)，其目的(de)(de)是(shi)迫使學習(xi)的(de)(de)輸(shu)入(ru)表(biao)示(shi)形(xing)式具有有用的(de)(de)屬性。自動編(bian)碼(ma)器(qi)可(ke)有效(xiao)地(di)解決許(xu)多應用問題，從(cong)(cong)面部識別(bie)到獲取單詞的(de)(de)語義。

ALT · MoDELS · Performer · INFORMS · Performance ·

2024 年 6 月 25 日

Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model

Jiawen Huang,Emmanouil Benetos

from arxiv, Accepted at EUSIPCO 2024

Multilingual automatic lyrics transcription (ALT) is a challenging task due to the limited availability of labelled data and the challenges introduced by singing, compared to multilingual automatic speech recognition. Although some multilingual singing datasets have been released recently, English continues to dominate these collections. Multilingual ALT remains underexplored due to the scale of data and annotation quality. In this paper, we aim to create a multilingual ALT system with available datasets. Inspired by architectures that have been proven effective for English ALT, we adapt these techniques to the multilingual scenario by expanding the target vocabulary set. We then evaluate the performance of the multilingual model in comparison to its monolingual counterparts. Additionally, we explore various conditioning methods to incorporate language information into the model. We apply analysis by language and combine it with the language classification performance. Our findings reveal that the multilingual model performs consistently better than the monolingual models trained on the language subsets. Furthermore, we demonstrate that incorporating language information significantly enhances performance.

MoDELS · 判別器 · Integration · GaN · GANs ·

2024 年 6 月 22 日

Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data

Yu-Hua Chen,Woosung Choi,Wei-Hsiang Liao,Marco Martínez-Ramírez,Kin Wai Cheuk,Yuki Mitsufuji,Jyh-Shing Roger Jang,Yi-Hsuan Yang

from arxiv, Accepted to DAFx 2024

Recent years have seen increasing interest in applying deep learning methods to the modeling of guitar amplifiers or effect pedals. Existing methods are mainly based on the supervised approach, requiring temporally-aligned data pairs of unprocessed and rendered audio. However, this approach does not scale well, due to the complicated process involved in creating the data pairs. A very recent work done by Wright et al. has explored the potential of leveraging unpaired data for training, using a generative adversarial network (GAN)-based framework. This paper extends their work by using more advanced discriminators in the GAN, and using more unpaired data for training. Specifically, drawing inspiration from recent advancements in neural vocoders, we employ in our GAN-based model for guitar amplifier modeling two sets of discriminators, one based on multi-scale discriminator (MSD) and the other multi-period discriminator (MPD). Moreover, we experiment with adding unprocessed audio signals that do not have the corresponding rendered audio of a target tone to the training data, to see how much the GAN model benefits from the unpaired data. Our experiments show that the proposed two extensions contribute to the modeling of both low-gain and high-gain guitar amplifiers.

小樣本學習 · 泛化理論 · 推斷 · MoDELS · 統計量 ·

2024 年 6 月 21 日

Detecting Synthetic Lyrics with Few-Shot Inference

Yanis Labrak,Gabriel Meseguer-Brocal,Elena V. Epure

from arxiv, Under review

In recent years, generated content in music has gained significant popularity, with large language models being effectively utilized to produce human-like lyrics in various styles, themes, and linguistic structures. This technological advancement supports artists in their creative processes but also raises issues of authorship infringement, consumer satisfaction and content spamming. To address these challenges, methods for detecting generated lyrics are necessary. However, existing works have not yet focused on this specific modality or on creative text in general regarding machine-generated content detection methods and datasets. In response, we have curated the first dataset of high-quality synthetic lyrics and conducted a comprehensive quantitative evaluation of various few-shot content detection approaches, testing their generalization capabilities and complementing this with a human evaluation. Our best few-shot detector, based on LLM2Vec, surpasses stylistic and statistical methods, which are shown competitive in other domains at distinguishing human-written from machine-generated content. It also shows good generalization capabilities to new artists and models, and effectively detects post-generation paraphrasing. This study emphasizes the need for further research on creative content detection, particularly in terms of generalization and scalability with larger song catalogs. All datasets, pre-processing scripts, and code are available publicly on GitHub and Hugging Face under the Apache 2.0 license.

圖 · 邊 · Processing（編程語言） · 在線 · Performer ·

2024 年 6 月 20 日

Online Matching and Contention Resolution for Edge Arrivals with Vanishing Probabilities

Will Ma,Calum MacRury,Pranav Nuti

We study the performance of sequential contention resolution and matching algorithms on random graphs with vanishing edge probabilities. When the edges of the graph are processed in an adversarially-chosen order, we derive a new OCRS that is $0.382$-selectable, attaining the "independence benchmark" from the literature under the vanishing edge probabilities assumption. Complementary to this positive result, we show that no OCRS can be more than $0.390$-selectable, significantly improving upon the upper bound of $0.428$ from the literature. We also derive negative results that are specialized to bipartite graphs or subfamilies of OCRS's. Meanwhile, when the edges of the graph are processed in a uniformly random order, we show that the simple greedy contention resolution scheme which accepts all active and feasible edges is $1/2$-selectable. This result is tight due to a known upper bound. Finally, when the algorithm can choose the processing order, we show that a slight tweak to the random order -- give each vertex a random priority and process edges in lexicographic order -- results in a strictly better contention resolution scheme that is $1-\ln(2-1/e)\approx0.510$-selectable. Our positive results also apply to online matching on $1$-uniform random graphs with vanishing (non-identical) edge probabilities, extending and unifying some results from the random graphs literature.

Analysis · Neural Networks · Networking · 圖 · 圖形處理器 ·

2024 年 6 月 20 日

Graph Neural Networks in Histopathology: Emerging Trends and Future Directions

Siemen Brussee,Giorgio Buzzanca,Anne M. R. Schrader,Jesper Kers

Histopathological analysis of Whole Slide Images (WSIs) has seen a surge in the utilization of deep learning methods, particularly Convolutional Neural Networks (CNNs). However, CNNs often fall short in capturing the intricate spatial dependencies inherent in WSIs. Graph Neural Networks (GNNs) present a promising alternative, adept at directly modeling pairwise interactions and effectively discerning the topological tissue and cellular structures within WSIs. Recognizing the pressing need for deep learning techniques that harness the topological structure of WSIs, the application of GNNs in histopathology has experienced rapid growth. In this comprehensive review, we survey GNNs in histopathology, discuss their applications, and explore emerging trends that pave the way for future advancements in the field. We begin by elucidating the fundamentals of GNNs and their potential applications in histopathology. Leveraging quantitative literature analysis, we identify four emerging trends: Hierarchical GNNs, Adaptive Graph Structure Learning, Multimodal GNNs, and Higher-order GNNs. Through an in-depth exploration of these trends, we offer insights into the evolving landscape of GNNs in histopathological analysis. Based on our findings, we propose future directions to propel the field forward. Our analysis serves to guide researchers and practitioners towards innovative approaches and methodologies, fostering advancements in histopathological analysis through the lens of graph neural networks.

MoDELS · 標注 · Automator · PAM · 未標記 ·

2024 年 6 月 19 日

Automated Bioacoustic Monitoring for South African Bird Species on Unlabeled Data

Michael Doell,Dominik Kuehn,Vanessa Suessle,Matthew J. Burnett,Colleen T. Downs,Andreas Weinmann,Elke Hergenroether

from arxiv, preprint

Analyses for biodiversity monitoring based on passive acoustic monitoring (PAM) recordings is time-consuming and challenged by the presence of background noise in recordings. Existing models for sound event detection (SED) worked only on certain avian species and the development of further models required labeled data. The developed framework automatically extracted labeled data from available platforms for selected avian species. The labeled data were embedded into recordings, including environmental sounds and noise, and were used to train convolutional recurrent neural network (CRNN) models. The models were evaluated on unprocessed real world data recorded in urban KwaZulu-Natal habitats. The Adapted SED-CRNN model reached a F1 score of 0.73, demonstrating its efficiency under noisy, real-world conditions. The proposed approach to automatically extract labeled data for chosen avian species enables an easy adaption of PAM to other species and habitats for future conservation projects.

傳感器 · 估計/估計量 · Continuity · Integration · Performance ·

2024 年 6 月 17 日

SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking

Tianhong Catherine Yu, Manru, Zhang,Peter He,Chi-Jung Lee,Cassidy Cheesman,Saif Mahmud,Ruidong Zhang,Fran?ois Guimbretière,Cheng Zhang

Seams are areas of overlapping fabric formed by stitching two or more pieces of fabric together in the cut-and-sew apparel manufacturing process. In SeamPose, we repurposed seams as capacitive sensors in a shirt for continuous upper-body pose estimation. Compared to previous all-textile motion-capturing garments that place the electrodes on the surface of clothing, our solution leverages existing seams inside of a shirt by machine-sewing insulated conductive threads over the seams. The unique invisibilities and placements of the seams afford the sensing shirt to look and wear the same as a conventional shirt while providing exciting pose-tracking capabilities. To validate this approach, we implemented a proof-of-concept untethered shirt. With eight capacitive sensing seams, our customized deep-learning pipeline accurately estimates the upper-body 3D joint positions relative to the pelvis. With a 12-participant user study, we demonstrated promising cross-user and cross-session tracking performance. SeamPose represents a step towards unobtrusive integration of smart clothing for everyday pose estimation.

監督 · Attention · Taxonomy · Learning · 有向 ·

2022 年 7 月 4 日

A Survey on Label-efficient Deep Segmentation: Bridging the Gap between Weak Supervision and Dense Prediction

Wei Shen,Zelin Peng,Xuehui Wang,Huayu Wang,Jiazhong Cen,Dongsheng Jiang,Lingxi Xie,Xiaokang Yang,Qi Tian

The rapid development of deep learning has made a great progress in segmentation, one of the fundamental tasks of computer vision. However, the current segmentation algorithms mostly rely on the availability of pixel-level annotations, which are often expensive, tedious, and laborious. To alleviate this burden, the past years have witnessed an increasing attention in building label-efficient, deep-learning-based segmentation algorithms. This paper offers a comprehensive review on label-efficient segmentation methods. To this end, we first develop a taxonomy to organize these methods according to the supervision provided by different types of weak labels (including no supervision, coarse supervision, incomplete supervision and noisy supervision) and supplemented by the types of segmentation problems (including semantic segmentation, instance segmentation and panoptic segmentation). Next, we summarize the existing label-efficient segmentation methods from a unified perspective that discusses an important question: how to bridge the gap between weak supervision and dense prediction -- the current methods are mostly based on heuristic priors, such as cross-pixel similarity, cross-label constraint, cross-view consistency, cross-image relation, etc. Finally, we share our opinions about the future research directions for label-efficient deep segmentation.

語言模型化 · MoDELS · 詞表 · 優化器 · state-of-the-art ·

2019 年 9 月 25 日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Sanqiang Zhao,Raghav Gupta,Yang Song,Denny Zhou

Pre-trained deep neural network language models such as ELMo, GPT, BERT and XLNet have recently achieved state-of-the-art performance on a variety of language understanding tasks. However, their size makes them impractical for a number of scenarios, especially on mobile and edge devices. In particular, the input word embedding matrix accounts for a significant proportion of the model's memory footprint, due to the large input vocabulary and embedding dimensions. Knowledge distillation techniques have had success at compressing large neural network models, but they are ineffective at yielding student models with vocabularies different from the original teacher models. We introduce a novel knowledge distillation technique for training a student model with a significantly smaller vocabulary as well as lower embedding and hidden state dimensions. Specifically, we employ a dual-training mechanism that trains the teacher and student models simultaneously to obtain optimal word embeddings for the student vocabulary. We combine this approach with learning shared projection matrices that transfer layer-wise knowledge from the teacher model to the student model. Our method is able to compress the BERT_BASE model by more than 60x, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7MB. Experimental results also demonstrate higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques.

VGG · 相似度 · Nuance · SimPLe · 評論員 ·

2018 年 1 月 11 日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Richard Zhang,Phillip Isola,Alexei A. Efros,Eli Shechtman,Oliver Wang

from arxiv, Code and data available at //www.github.com/richzhang/PerceptualSimilarity

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.