男女一边脱一边亲一边膜_99久热这里精品免费观看_午夜免费免费啪视频观看_玖玖国产精品久久_伊人无码不卡视频_日韩人妻无码潮喷中文视频_欧美黄色精品一区二区三区

This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by preconditioning the model with an invertible Constant-Q Transform (CQT), whose logarithmically-spaced frequency axis represents pitch equivariance as translation equivariance. The proposed method is evaluated with objective and subjective metrics in three different and varied tasks: audio bandwidth extension, inpainting, and declipping. The results show that CQT-Diff outperforms the compared baselines and ablations in audio bandwidth extension and, without retraining, delivers competitive performance against modern baselines in audio inpainting and declipping. This work represents the first diffusion-based general framework for solving inverse problems in audio processing.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · MoDELS · 生成模型 · Networking · 解碼 ·

2023 年 5 月 9 日

AudioSlots: A slot-centric generative model for audio separation

Pradyumna Reddy,Scott Wisdom,Klaus Greff,John R. Hershey,Thomas Kipf

from arxiv, Accepted at the Self-supervision in Audio, Speech and Beyond (SASB) Workshop at ICASSP 2023

In a range of recent works, object-centric architectures have been shown to be suitable for unsupervised scene decomposition in the vision domain. Inspired by these methods we present AudioSlots, a slot-centric generative model for blind source separation in the audio domain. AudioSlots is built using permutation-equivariant encoder and decoder networks. The encoder network based on the Transformer architecture learns to map a mixed audio spectrogram to an unordered set of independent source embeddings. The spatial broadcast decoder network learns to generate the source spectrograms from the source embeddings. We train the model in an end-to-end manner using a permutation invariant loss function. Our results on Libri2Mix speech separation constitute a proof of concept that this approach shows promise. We discuss the results and limitations of our approach in detail, and further outline potential ways to overcome the limitations and directions for future work.

估計/估計量 · 最大后驗估計 · 最大后驗 · MASS · 峰值 ·

2023 年 5 月 8 日

An order-theoretic perspective on modes and maximum a posteriori estimation in Bayesian inverse problems

Hefin Lambley,T. J. Sullivan

from arxiv, 38 pages

It is often desirable to summarise a probability measure on a space $X$ in terms of a mode, or MAP estimator, i.e.\ a point of maximum probability. Such points can be rigorously defined using masses of metric balls in the small-radius limit. However, the theory is not entirely straightforward: the literature contains multiple notions of mode and various examples of pathological measures that have no mode in any sense. Since the masses of balls induce natural orderings on the points of $X$, this article aims to shed light on some of the problems in non-parametric MAP estimation by taking an order-theoretic perspective, which appears to be a new one in the inverse problems community. This point of view opens up attractive proof strategies based upon the Cantor and Kuratowski intersection theorems; it also reveals that many of the pathologies arise from the distinction between greatest and maximal elements of an order, and from the existence of incomparable elements of $X$, which we show can be dense in $X$, even for an absolutely continuous measure on $X = \mathbb{R}$.

線性的 · MIMO · CASES · Performer · CASE ·

2023 年 5 月 8 日

Solving Linear Inverse Problems using Higher-Order Annealed Langevin Diffusion

Nicolas Zilberstein,Ashutosh Sabharwal,Santiago Segarra

We propose a solution for linear inverse problems based on higher-order Langevin diffusion. More precisely, we propose pre-conditioned second-order and third-order Langevin dynamics that provably sample from the posterior distribution of our unknown variables of interest while being computationally more efficient than their first-order counterpart and the non-conditioned versions of both dynamics. Moreover, we prove that both pre-conditioned dynamics are well-defined and have the same unique invariant distributions as the non-conditioned cases. We also incorporate an annealing procedure that has the double benefit of further accelerating the convergence of the algorithm and allowing us to accommodate the case where the unknown variables are discrete. Numerical experiments in two different tasks (MIMO symbol detection and channel estimation) showcase the generality of our method and illustrate the high performance achieved relative to competing approaches (including learning-based ones) while having comparable or lower computational complexity.

流 · 分離的 · CASE · 無限 · Algorithmica ·

2023 年 5 月 8 日

Oblivious algorithms for the Max-$k$AND Problem

Noah G. Singer

from arxiv, 29 pages, 1 table. In submission

Motivated by recent works on streaming algorithms for constraint satisfaction problems (CSPs), we define and analyze oblivious algorithms for the Max-$k$AND problem. This generalizes the definition by Feige and Jozeph (Algorithmica '15) of oblivious algorithms for Max-DICUT, a special case of Max-$2$AND. Oblivious algorithms round each variable with probability depending only on a quantity called the variable's bias. For each oblivious algorithm, we design a so-called "factor-revealing linear program" (LP) which captures its worst-case instance, generalizing one of Feige and Jozeph for Max-DICUT. Then, departing from their work, we perform a fully explicit analysis of these (infinitely many!) LPs. In particular, we show that for all $k$, oblivious algorithms for Max-$k$AND provably outperform a special subclass of algorithms we call "superoblivious" algorithms. Our result has implications for streaming algorithms: Generalizing the result for Max-DICUT of Saxena, Singer, Sudan, and Velusamy (SODA'23), we prove that certain separation results hold between streaming models for infinitely many CSPs: for every $k$, $O(\log n)$-space sketching algorithms for Max-$k$AND known to be optimal in $o(\sqrt n)$-space can be beaten in (a) $O(\log n)$-space under a random-ordering assumption, and (b) $O(n^{1-1/k} D^{1/k})$ space under a maximum-degree-$D$ assumption. Even in the previously-known case of Max-DICUT, our analytic proof gives a fuller, computer-free picture of these separation results.

Extensibility · AIM · HTTPS · 可行 · 邊 ·

2023 年 5 月 8 日

A-ePA*SE: Anytime Edge-Based Parallel A* for Slow Evaluations

Hanlan Yang,Shohin Mukherjee,Maxim Likhachev

from arxiv, Proceedings of the International Symposium on Combinatorial Search (SoCS) 2023. arXiv admin note: text overlap with arXiv:2301.10347

Anytime search algorithms are useful for planning problems where a solution is desired under a limited time budget. Anytime algorithms first aim to provide a feasible solution quickly and then attempt to improve it until the time budget expires. On the other hand, parallel search algorithms utilize the multithreading capability of modern processors to speed up the search. One such algorithm, ePA*SE (Edge-Based Parallel A* for Slow Evaluations), parallelizes edge evaluations to achieve faster planning and is especially useful in domains with expensive-to-compute edges. In this work, we propose an extension that brings the anytime property to ePA*SE, resulting in A-ePA*SE. We evaluate A-ePA*SE experimentally and show that it is significantly more efficient than other anytime search methods. The open-source code for A-ePA*SE, along with the baselines, is available here: //github.com/shohinm/parallel_search

MoDELS · 去噪 · Processing（編程語言） · 圖像還原 · state-of-the-art ·

2023 年 5 月 7 日

A Variational Perspective on Solving Inverse Problems with Diffusion Models

Morteza Mardani,Jiaming Song,Jan Kautz,Arash Vahdat

Diffusion models have emerged as a key pillar of foundation models in visual domains. One of their critical applications is to universally solve different downstream inverse tasks via a single diffusion prior without re-training for each task. Most inverse tasks can be formulated as inferring a posterior distribution over data (e.g., a full image) given a measurement (e.g., a masked image). This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable. To cope with this challenge, we propose a variational approach that by design seeks to approximate the true posterior distribution. We show that our approach naturally leads to regularization by denoising diffusion process (RED-Diff) where denoisers at different timesteps concurrently impose different structural constraints over the image. To gauge the contribution of denoisers from different timesteps, we propose a weighting mechanism based on signal-to-noise-ratio (SNR). Our approach provides a new variational perspective for solving inverse problems with diffusion models, allowing us to formulate sampling as stochastic optimization, where one can simply apply off-the-shelf solvers with lightweight iterates. Our experiments for image restoration tasks such as inpainting and superresolution demonstrate the strengths of our method compared with state-of-the-art sampling-based diffusion models.

MoDELS · 多峰值 · Performer · state-of-the-art · 正則化項 ·

2023 年 5 月 7 日

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning

Shengfang Zhai,Yinpeng Dong,Qingni Shen,Shi Pu,Yuejian Fang,Hang Su

With the help of conditioning mechanisms, the state-of-the-art diffusion models have achieved tremendous success in guided image generation, particularly in text-to-image synthesis. To gain a better understanding of the training process and potential risks of text-to-image synthesis, we perform a systematic investigation of backdoor attack on text-to-image diffusion models and propose BadT2I, a general multimodal backdoor attack framework that tampers with image synthesis in diverse semantic levels. Specifically, we perform backdoor attacks on three levels of the vision semantics: Pixel-Backdoor, Object-Backdoor and Style-Backdoor. By utilizing a regularization loss, our methods efficiently inject backdoors into a large-scale text-to-image diffusion model while preserving its utility with benign inputs. We conduct empirical experiments on Stable Diffusion, the widely-used text-to-image diffusion model, demonstrating that the large-scale diffusion model can be easily backdoored within a few fine-tuning steps. We conduct additional experiments to explore the impact of different types of textual triggers. Besides, we discuss the backdoor persistence during further training, the findings of which provide insights for the development of backdoor defense methods.

寬度 · 優化器 · 縮放 · 示例 · 約束 ·

2023 年 5 月 7 日

DPMS: An ADD-Based Symbolic Approach for Generalized MaxSAT Solving

Anastasios Kyrillidis,Moshe Y. Vardi,Zhiwei Zhang

Boolean MaxSAT, as well as generalized formulations such as Min-MaxSAT and Max-hybrid-SAT, are fundamental optimization problems in Boolean reasoning. Existing methods for MaxSAT have been successful in solving benchmarks in CNF format. They lack, however, the ability to handle 1) (non-CNF) hybrid constraints, such as XORs and 2) generalized MaxSAT problems natively. To address this issue, we propose a novel dynamic-programming approach for solving generalized MaxSAT problems with hybrid constraints -- called \emph{Dynamic-Programming-MaxSAT} or DPMS for short -- based on Algebraic Decision Diagrams (ADDs). With the power of ADDs and the (graded) project-join-tree builder, our versatile framework admits many generalizations of CNF-MaxSAT, such as MaxSAT, Min-MaxSAT, and MinSAT with hybrid constraints. Moreover, DPMS scales provably well on instances with low width. Empirical results indicate that DPMS is able to solve certain problems quickly, where other algorithms based on various techniques all fail. Hence, DPMS is a promising framework and opens a new line of research that invites more investigation in the future.

分解的 · 成比例 · 數值分析 ·

2023 年 5 月 6 日

A new approach to shooting methods for terminal value problems of fractional differential equations

Kai Diethelm,Frank Uhlig

from arxiv, 26 pages, 1 figure. The software described in this paper can be downloaded from //doi.org/10.5281/zenodo.7678311

For terminal value problems of fractional differential equations of order $\alpha \in (0,1)$ that use Caputo derivatives, shooting methods are a well developed and investigated approach. Based on recently established analytic properties of such problems, we develop a new technique to select the required initial values that solves such shooting problems quickly and accurately. Numerical experiments indicate that this new proportional secting technique converges very quickly and accurately to the solution. Run time measurements indicate a speedup factor of between 4 and 10 when compared to the standard bisection method.

圖像字幕 · 數據監管 · MoDELS · 生成模型 · state-of-the-art ·

2023 年 5 月 5 日

Data Curation for Image Captioning with Text-to-Image Generative Models

Wenyan Li,Jonas F. Lotz,Chen Qiu,Desmond Elliott

Recent advances in image captioning are mainly driven by large-scale vision-language pretraining, relying heavily on computational resources and increasingly large multimodal datasets. Instead of scaling up pretraining data, we ask whether it is possible to improve performance by improving the quality of the samples in existing datasets. We pursue this question through two approaches to data curation: one that assumes that some examples should be avoided due to mismatches between the image and caption, and one that assumes that the mismatch can be addressed by replacing the image, for which we use the state-of-the-art Stable Diffusion model. These approaches are evaluated using the BLIP model on MS COCO and Flickr30K in both finetuning and few-shot learning settings. Our simple yet effective approaches consistently outperform baselines, indicating that better image captioning models can be trained by curating existing resources. Finally, we conduct a human study to understand the errors made by the Stable Diffusion model and highlight directions for future work in text-to-image generation.