亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='tfqip'></tfoot>

<legend id='tfqip'><style id='tfqip'><dir id='tfqip'><q id='tfqip'></q></dir></style></legend>

<i id='tfqip'><tr id='tfqip'><dt id='tfqip'><q id='tfqip'><span id='tfqip'><b id='tfqip'><form id='tfqip'><ins id='tfqip'></ins><ul id='tfqip'></ul><sub id='tfqip'></sub></form><legend id='tfqip'></legend><bdo id='tfqip'><pre id='tfqip'><center id='tfqip'></center></pre></bdo></b><th id='tfqip'></th></span></q></dt></tr></i><div id='tfqip'><tfoot id='tfqip'></tfoot><dl id='tfqip'><fieldset id='tfqip'></fieldset></dl></div>

<li id='tfqip'><abbr id='tfqip'></abbr></li>

·

控制器 · Performer · 分離的 · 語言模型化 · 多峰值 ·

2024 年 2 月 6 日

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience

Xilin Jiang,Cong Han,Yinghao Aaron Li,Nima Mesgarani

from arxiv, preprint

In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume. Our work introduces "Listen, Chat, and Edit" (LCE), a novel multimodal sound mixture editor that modifies each sound source in a mixture based on user-provided text instructions. LCE distinguishes itself with a user-friendly chat interface and its unique ability to edit multiple sound sources simultaneously within a mixture, without needing to separate them. Users input open-vocabulary text prompts, which are interpreted by a large language model to create a semantic filter for editing the sound mixture. The system then decomposes the mixture into its components, applies the semantic filter, and reassembles it into the desired output. We developed a 160-hour dataset with over 100k mixtures, including speech and various audio sources, along with text prompts for diverse editing tasks like extraction, removal, and volume control. Our experiments demonstrate significant improvements in signal quality across all editing tasks and robust performance in zero-shot scenarios with varying numbers and types of sound sources.

相關內容

控制器

控制器 · 值域 · Prompt · MoDELS · AIM ·

2024 年 3 月 18 日

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

Yongqi Wang,Ruofan Hu,Rongjie Huang,Zhiqing Hong,Ruiqi Li,Wenrui Liu,Fuming You,Tao Jin,Zhou Zhao

from arxiv, Accepted by NAACL 2024 (main conference)

Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation that enables text-conditioned vocal range control while keeping melodic accuracy. Furthermore, we explore various experiment settings, including different types of text representations, text encoder fine-tuning, and introducing speech data to alleviate data scarcity, aiming to facilitate further research. Experiments show that our model achieves favorable controlling ability and audio quality. Audio samples are available at //prompt-singer.github.io .

分離的 · MoDELS · 推斷 · Dirac · 樣例 ·

2024 年 3 月 18 日

Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

Giorgio Mariani,Irene Tallini,Emilian Postolache,Michele Mancusi,Luca Cosmo,Emanuele Rodolà

from arxiv, ICLR 2024 oral presentation. Demo page: //gladia-research-group.github.io/multi-source-diffusion-models/

In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference tasks (i.e., generating a mixture, separating the sources), we also introduce and experiment on the partial generation task of source imputation, where we generate a subset of the sources given the others (e.g., play a piano track that goes well with the drums). Additionally, we introduce a novel inference method for the separation task based on Dirac likelihood functions. We train our model on Slakh2100, a standard dataset for musical source separation, provide qualitative results in the generation settings, and showcase competitive quantitative results in the source separation setting. Our method is the first example of a single model that can handle both generation and separation tasks, thus representing a step toward general audio models.

變換 · state-of-the-art · 3D · 歸納偏好 · HTTPS ·

2024 年 3 月 18 日

VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation

Weiyao Wang,Yutian Lei,Gregory D. Hage,Liangjun Zhang

In this work, we introduce the Virtual In-Hand Eye Transformer (VIHE), a novel method designed to enhance 3D manipulation capabilities through action-aware view rendering. VIHE autoregressively refines actions in multiple stages by conditioning on rendered views posed from action predictions in the earlier stages. These virtual in-hand views provide a strong inductive bias for effectively recognizing the correct pose for the hand, especially for challenging high-precision tasks such as peg insertion. On 18 manipulation tasks in RLBench simulated environments, VIHE achieves a new state-of-the-art, with a 12% absolute improvement, increasing from 65% to 77% over the existing state-of-the-art model using 100 demonstrations per task. In real-world scenarios, VIHE can learn manipulation tasks with just a handful of demonstrations, highlighting its practical utility. Videos and code implementation can be found at our project site: //vihe-3d.github.io.

MoDELS · 生成方法 · 潛在 · 3D · 數據集 ·

2024 年 3 月 17 日

FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

Shivangi Aneja,Justus Thies,Angela Dai,Matthias Nie?ner

from arxiv, Paper Video: //youtu.be/7Jf0kawrA3Q Project Page: //shivangi-aneja.github.io/projects/facetalk/

We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signal. To capture the expressive, detailed nature of human heads, including hair, ears, and finer-scale eye movements, we propose to couple speech signal with the latent space of neural parametric head models to create high-fidelity, temporally coherent motion sequences. We propose a new latent diffusion model for this task, operating in the expression space of neural parametric head models, to synthesize audio-driven realistic head sequences. In the absence of a dataset with corresponding NPHM expressions to audio, we optimize for these correspondences to produce a dataset of temporally-optimized NPHM expressions fit to audio-video recordings of people talking. To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of volumetric human heads, representing a significant advancement in the field of audio-driven 3D animation. Notably, our approach stands out in its ability to generate plausible motion sequences that can produce high-fidelity head animation coupled with the NPHM shape space. Our experimental results substantiate the effectiveness of FaceTalk, consistently achieving superior and visually natural motion, encompassing diverse facial expressions and styles, outperforming existing methods by 75% in perceptual user study evaluation.

可辨認的 · 可約的 · 全 · Processing（編程語言） · 情景 ·

2024 年 3 月 17 日

PodReels: Human-AI Co-Creation of Video Podcast Teasers

Sitong Wang,Zheng Ning,Anh Truong,Mira Dontcheva,Dingzeyu Li,Lydia B. Chilton

Video podcast teasers are short videos that can be shared on social media platforms to capture interest in the full episodes of a video podcast. These teasers enable long-form podcasters to reach new audiences and gain new followers. However, creating a compelling teaser from an hour-long episode is challenging. Selecting interesting clips requires significant mental effort; editing the chosen clips into a cohesive, well-produced teaser is time-consuming. To support the creation of video podcast teasers, we first investigate what makes a good teaser. We combine insights from both audience comments and creator interviews to determine a set of essential ingredients. We also identify a common workflow shared by creators during the process. Based on these findings, we introduce a human-AI co-creative tool called PodReels to assist video podcasters in creating teasers. Our user study shows that PodReels significantly reduces creators' mental demand and improves their efficiency in producing video podcast teasers.

Facebook AI Research · 模型評估 · Neural Networks · Networking · Automator ·

2024 年 3 月 17 日

JustQ: Automated Deployment of Fair and Accurate Quantum Neural Networks

Ruhan Wang,Fahiz Baba-Yara,Fan Chen

Despite the success of Quantum Neural Networks (QNNs) in decision-making systems, their fairness remains unexplored, as the focus primarily lies on accuracy. This work conducts a design space exploration, unveiling QNN unfairness, and highlighting the significant influence of QNN deployment and quantum noise on accuracy and fairness. To effectively navigate the vast QNN deployment design space, we propose JustQ, a framework for deploying fair and accurate QNNs on NISQ computers. It includes a complete NISQ error model, reinforcement learning-based deployment, and a flexible optimization objective incorporating both fairness and accuracy. Experimental results show JustQ outperforms previous methods, achieving superior accuracy and fairness. This work pioneers fair QNN design on NISQ computers, paving the way for future investigations.

優化器 · Learning · Performer · PIN · 約束 ·

2024 年 3 月 15 日

Multi-Objective Reinforcement Learning-based Approach for Pressurized Water Reactor Optimization

Paul Seurin,Koroush Shirvan

A novel method, the Pareto Envelope Augmented with Reinforcement Learning (PEARL), has been developed to address the challenges posed by multi-objective problems, particularly in the field of engineering where the evaluation of candidate solutions can be time-consuming. PEARL distinguishes itself from traditional policy-based multi-objective Reinforcement Learning methods by learning a single policy, eliminating the need for multiple neural networks to independently solve simpler sub-problems. Several versions inspired from deep learning and evolutionary techniques have been crafted, catering to both unconstrained and constrained problem domains. Curriculum Learning is harnessed to effectively manage constraints in these versions. PEARL's performance is first evaluated on classical multi-objective benchmarks. Additionally, it is tested on two practical PWR core Loading Pattern optimization problems to showcase its real-world applicability. The first problem involves optimizing the Cycle length and the rod-integrated peaking factor as the primary objectives, while the second problem incorporates the mean average enrichment as an additional objective. Furthermore, PEARL addresses three types of constraints related to boron concentration, peak pin burnup, and peak pin power. The results are systematically compared against conventional approaches. Notably, PEARL, specifically the PEARL-NdS variant, efficiently uncovers a Pareto front without necessitating additional efforts from the algorithm designer, as opposed to a single optimization with scaled objectives. It also outperforms the classical approach across multiple performance metrics, including the Hyper-volume.

INFORMS · 代碼 · Networking · 數據集 · Notability ·

2024 年 3 月 15 日

CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement

Qiang Zhu,Jinhua Hao,Yukang Ding,Yu Liu,Qiao Mo,Ming Sun,Chao Zhou,Shuyuan Zhu

Recently, numerous approaches have achieved notable success in compressed video quality enhancement (VQE). However, these methods usually ignore the utilization of valuable coding priors inherently embedded in compressed videos, such as motion vectors and residual frames, which carry abundant temporal and spatial information. To remedy this problem, we propose the Coding Priors-Guided Aggregation (CPGA) network to utilize temporal and spatial information from coding priors. The CPGA mainly consists of an inter-frame temporal aggregation (ITA) module and a multi-scale non-local aggregation (MNA) module. Specifically, the ITA module aggregates temporal information from consecutive frames and coding priors, while the MNA module globally captures spatial information guided by residual frames. In addition, to facilitate research in VQE task, we newly construct the Video Coding Priors (VCP) dataset, comprising 300 videos with various coding priors extracted from corresponding bitstreams. It remedies the shortage of previous datasets on the lack of coding information. Experimental results demonstrate the superiority of our method compared to existing state-of-the-art methods. The code and dataset will be released at //github.com/CPGA/CPGA.git.

Networking · CNN · MoDELS · Performer · 數學 ·

2023 年 3 月 5 日

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Xuan Shen,Yaohua Wang,Ming Lin,Yilun Huang,Hao Tang,Xiuyu Sun,Yanzhi Wang

from arxiv, Accepted by CVPR 2023

The rapid advances in Vision Transformer (ViT) refresh the state-of-the-art performances in various vision tasks, overshadowing the conventional CNN-based models. This ignites a few recent striking-back research in the CNN world showing that pure CNN models can achieve as good performance as ViT models when carefully tuned. While encouraging, designing such high-performance CNN models is challenging, requiring non-trivial prior knowledge of network design. To this end, a novel framework termed Mathematical Architecture Design for Deep CNN (DeepMAD) is proposed to design high-performance CNN models in a principled way. In DeepMAD, a CNN network is modeled as an information processing system whose expressiveness and effectiveness can be analytically formulated by their structural parameters. Then a constrained mathematical programming (MP) problem is proposed to optimize these structural parameters. The MP problem can be easily solved by off-the-shelf MP solvers on CPUs with a small memory footprint. In addition, DeepMAD is a pure mathematical framework: no GPU or training data is required during network design. The superiority of DeepMAD is validated on multiple large-scale computer vision benchmark datasets. Notably on ImageNet-1k, only using conventional convolutional layers, DeepMAD achieves 0.7% and 1.5% higher top-1 accuracy than ConvNeXt and Swin on Tiny level, and 0.8% and 0.9% higher on Small level.

XAI · Responsible AI · Taxonomy · AI · MoDELS ·

2019 年 10 月 22 日

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Alejandro Barredo Arrieta,Natalia Díaz-Rodríguez,Javier Del Ser,Adrien Bennetot,Siham Tabik,Alberto Barbado,Salvador García,Sergio Gil-López,Daniel Molina,Richard Benjamins,Raja Chatila,Francisco Herrera

from arxiv, 67 pages, 13 figures, under review in the Information Fusion journal

In the last years, Artificial Intelligence (AI) has achieved a notable momentum that may deliver the best of expectations over many application sectors across the field. For this to occur, the entire community stands in front of the barrier of explainability, an inherent problem of AI techniques brought by sub-symbolism (e.g. ensembles or Deep Neural Networks) that were not present in the last hype of AI. Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is acknowledged as a crucial feature for the practical deployment of AI models. This overview examines the existing literature in the field of XAI, including a prospect toward what is yet to be reached. We summarize previous efforts to define explainability in Machine Learning, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought. We then propose and discuss about a taxonomy of recent contributions related to the explainability of different Machine Learning models, including those aimed at Deep Learning methods for which a second taxonomy is built. This literature analysis serves as the background for a series of challenges faced by XAI, such as the crossroads between data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial Intelligence, namely, a methodology for the large-scale implementation of AI methods in real organizations with fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to XAI with a reference material in order to stimulate future research advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any prior bias for its lack of interpretability.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

語言模型化

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='tfqip'></tfoot>

<legend id='tfqip'><style id='tfqip'><dir id='tfqip'><q id='tfqip'></q></dir></style></legend>

<i id='tfqip'><tr id='tfqip'><dt id='tfqip'><q id='tfqip'><span id='tfqip'><b id='tfqip'><form id='tfqip'><ins id='tfqip'></ins><ul id='tfqip'></ul><sub id='tfqip'></sub></form><legend id='tfqip'></legend><bdo id='tfqip'><pre id='tfqip'><center id='tfqip'></center></pre></bdo></b><th id='tfqip'></th></span></q></dt></tr></i><div id='tfqip'><tfoot id='tfqip'></tfoot><dl id='tfqip'><fieldset id='tfqip'></fieldset></dl></div>