夏娃韩剧电视剧在剧免费韩剧TV,一区二区三区人妻美穴又白,国产精品秘密入口导航,亚洲色资源站资源网

We propose SMPLitex, a method for estimating and manipulating the complete 3D appearance of humans captured from a single image. SMPLitex builds upon the recently proposed generative models for 2D images, and extends their use to the 3D domain through pixel-to-surface correspondences computed on the input image. To this end, we first train a generative model for complete 3D human appearance, and then fit it into the input image by conditioning the generative model to the visible parts of the subject. Furthermore, we propose a new dataset of high-quality human textures built by sampling SMPLitex conditioned on subject descriptions and images. We quantitatively and qualitatively evaluate our method in 3 publicly available datasets, demonstrating that SMPLitex significantly outperforms existing methods for human texture estimation while allowing for a wider variety of tasks such as editing, synthesis, and manipulation

相關內容

估計/估計量

關注 3

去噪 · MoDELS · Performer · 模型評估 · FAST ·

2023 年 10 月 20 日

Music Augmentation and Denoising For Peak-Based Audio Fingerprinting

Kamil Akesbi,Dorian Desblancs,Benjamin Martin

Audio fingerprinting is a well-established solution for song identification from short recording excerpts. Popular methods rely on the extraction of sparse representations, generally spectral peaks, and have proven to be accurate, fast, and scalable to large collections. However, real-world applications of audio identification often happen in noisy environments, which can cause these systems to fail. In this work, we tackle this problem by introducing and releasing a new audio augmentation pipeline that adds noise to music snippets in a realistic way, by stochastically mimicking real-world scenarios. We then propose and release a deep learning model that removes noisy components from spectrograms in order to improve peak-based fingerprinting systems' accuracy. We show that the addition of our model improves the identification performance of commonly used audio fingerprinting systems, even under noisy conditions.

線性的 · 塑造 · RGB-D · 分離的 · 相同 ·

2023 年 10 月 20 日

Simultaneous Shape Tracking of Multiple Deformable Linear Objects with Global-Local Topology Preservation

Jingyi Xiang,Holly Dinkel

This work presents an algorithm for tracking the shape of multiple entangling Deformable Linear Objects (DLOs) from a sequence of RGB-D images. This algorithm runs in real-time and improves on previous single-DLO tracking approaches by enabling tracking of multiple objects. This is achieved using Global-Local Topology Preservation (GLTP). This work uses the geodesic distance in GLTP to define the distance between separate objects and the distance between different parts of the same object. Tracking multiple entangling DLOs is demonstrated experimentally. The source code is publicly released.

Copilot · 環 · MoDELS · 集成 · AI ·

2023 年 10 月 19 日

Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing

Yixiao Zhang,Akira Maezawa,Gus Xia,Kazuhiko Yamamoto,Simon Dixon

from arxiv, Source code and demo video are available at \url{//sites.google.com/view/loop-copilot}

Creating music is iterative, requiring varied methods at each stage. However, existing AI music systems fall short in orchestrating multiple subsystems for diverse needs. To address this gap, we introduce Loop Copilot, a novel system that enables users to generate and iteratively refine music through an interactive, multi-round dialogue interface. The system uses a large language model to interpret user intentions and select appropriate AI models for task execution. Each backend model is specialized for a specific task, and their outputs are aggregated to meet the user's requirements. To ensure musical coherence, essential attributes are maintained in a centralized table. We evaluate the effectiveness of the proposed system through semi-structured interviews and questionnaires, highlighting its utility not only in facilitating music creation but also its potential for broader applications.

離散化 · HTTPS · SimPLe · Pair · FAST ·

2023 年 10 月 18 日

Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units

Gallil Maimon,Yossi Adi

from arxiv, Accepted at EMNLP 2023

We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and timbre of a recording to a target speaker in a textless manner. Unlike DISSC, most voice conversion (VC) methods focus primarily on timbre, and ignore people's unique speaking style (prosody). The proposed approach uses a pretrained, self-supervised model for encoding speech to discrete units, which makes it simple, effective, and fast to train. All conversion modules are only trained on reconstruction like tasks, thus suitable for any-to-many VC with no paired data. We introduce a suite of quantitative and qualitative evaluation metrics for this setup, and empirically demonstrate that DISSC significantly outperforms the evaluated baselines. Code and samples are available at //pages.cs.huji.ac.il/adiyoss-lab/dissc/.

Machine Translation · 有偏 · INFORMS · MoDELS · 機器翻譯 ·

2023 年 10 月 18 日

A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation

Giuseppe Attanasio,Flor Miriam Plaza-del-Arco,Debora Nozza,Anne Lauscher

from arxiv, Accepted at EMNLP 2023. Code and data at //github.com/MilaNLProc/interpretability-mt-gender-bias

Recent instruction fine-tuned models can solve multiple NLP tasks when prompted to do so, with machine translation (MT) being a prominent use case. However, current research often focuses on standard performance benchmarks, leaving compelling fairness and ethical considerations behind. In MT, this might lead to misgendered translations, resulting, among other harms, in the perpetuation of stereotypes and prejudices. In this work, we address this gap by investigating whether and to what extent such models exhibit gender bias in machine translation and how we can mitigate it. Concretely, we compute established gender bias metrics on the WinoMT corpus from English to German and Spanish. We discover that IFT models default to male-inflected translations, even disregarding female occupational stereotypes. Next, using interpretability methods, we unveil that models systematically overlook the pronoun indicating the gender of a target occupation in misgendered translations. Finally, based on this finding, we propose an easy-to-implement and effective bias mitigation solution based on few-shot learning that leads to significantly fairer translations.

回合 · MoDELS · 損失函數（機器學習） · HTTPS · 數據集 ·

2023 年 10 月 18 日

SegmATRon: Embodied Adaptive Semantic Segmentation for Indoor Environment

Tatiana Zemskova,Margarita Kichik,Dmitry Yudin,Aleksei Staroverov,Aleksandr Panov

from arxiv, 14 pages, 6 figures

This paper presents an adaptive transformer model named SegmATRon for embodied image semantic segmentation. Its distinctive feature is the adaptation of model weights during inference on several images using a hybrid multicomponent loss function. We studied this model on datasets collected in the photorealistic Habitat and the synthetic AI2-THOR Simulators. We showed that obtaining additional images using the agent's actions in an indoor environment can improve the quality of semantic segmentation. The code of the proposed approach and datasets are publicly available at //github.com/wingrune/SegmATRon.

TOOLS · 語言模型化 · AI Agent · MoDELS · 可理解性 ·

2023 年 10 月 18 日

MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

Dingyao Yu,Kaitao Song,Peiling Lu,Tianyu He,Xu Tan,Wei Ye,Shikun Zhang,Jiang Bian

AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks. Consequently, it is necessary to build a system to organize and integrate these tasks, and thus help practitioners to automatically analyze their demand and call suitable tools as solutions to fulfill their requirements. Inspired by the recent success of large language models (LLMs) in task automation, we develop a system, named MusicAgent, which integrates numerous music-related tools and an autonomous workflow to address user requirements. More specifically, we build 1) toolset that collects tools from diverse sources, including Hugging Face, GitHub, and Web API, etc. 2) an autonomous workflow empowered by LLMs (e.g., ChatGPT) to organize these tools and automatically decompose user requests into multiple sub-tasks and invoke corresponding music tools. The primary goal of this system is to free users from the intricacies of AI-music tools, enabling them to concentrate on the creative aspect. By granting users the freedom to effortlessly combine tools, the system offers a seamless and enriching music experience.

Prompt · 3D · 查準率/準確率 · 優化器 · Extensibility ·

2023 年 10 月 18 日

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

Xinhua Cheng,Tianyu Yang,Jianan Wang,Yu Li,Lei Zhang,Jian Zhang,Li Yuan

from arxiv, Project Page: //cxh0519.github.io/projects/Progressive3D/

Recent text-to-3D generation methods achieve impressive 3D content creation capacity thanks to the advances in image diffusion models and optimizing strategies. However, current methods struggle to generate correct 3D content for a complex prompt in semantics, i.e., a prompt describing multiple interacted objects binding with different attributes. In this work, we propose a general framework named Progressive3D, which decomposes the entire generation into a series of locally progressive editing steps to create precise 3D content for complex prompts, and we constrain the content change to only occur in regions determined by user-defined region prompts in each editing step. Furthermore, we propose an overlapped semantic component suppression technique to encourage the optimization process to focus more on the semantic differences between prompts. Extensive experiments demonstrate that the proposed Progressive3D framework generates precise 3D content for prompts with complex semantics and is general for various text-to-3D methods driven by different 3D representations.

CoT · 掩碼語言模型化 · 語言模型化 · tuning · NLU ·

2023 年 10 月 18 日

Chain-of-Thought Tuning: Masked Language Models can also Think Step By Step in Natural Language Understanding

Caoyun Fan,Jidong Tian,Yitian Li,Wenqing Chen,Hao He,Yaohui Jin

from arxiv, EMNLP2023 Main Conference

Chain-of-Thought (CoT) is a technique that guides Large Language Models (LLMs) to decompose complex tasks into multi-step reasoning through intermediate steps in natural language form. Briefly, CoT enables LLMs to think step by step. However, although many Natural Language Understanding (NLU) tasks also require thinking step by step, LLMs perform less well than small-scale Masked Language Models (MLMs). To migrate CoT from LLMs to MLMs, we propose Chain-of-Thought Tuning (CoTT), a two-step reasoning framework based on prompt tuning, to implement step-by-step thinking for MLMs on NLU tasks. From the perspective of CoT, CoTT's two-step framework enables MLMs to implement task decomposition; CoTT's prompt tuning allows intermediate steps to be used in natural language form. Thereby, the success of CoT can be extended to NLU tasks through MLMs. To verify the effectiveness of CoTT, we conduct experiments on two NLU tasks: hierarchical classification and relation extraction, and the results show that CoTT outperforms baselines and achieves state-of-the-art performance.

Processing（編程語言） · 可辨認的 · 可約的 · INTERACT · TOOLS ·

2023 年 10 月 18 日

CrossData: Leveraging Text-Data Connections for Authoring Data Documents

Chen Zhu-Tian,Haijun Xia

Data documents play a central role in recording, presenting, and disseminating data. Despite the proliferation of applications and systems designed to support the analysis, visualization, and communication of data, writing data documents remains a laborious process, requiring a constant back-and-forth between data processing and writing tools. Interviews with eight professionals revealed that their workflows contained numerous tedious, repetitive, and error-prone operations. The key issue that we identified is the lack of persistent connection between text and data. Thus, we developed CrossData, a prototype that treats text-data connections as persistent, interactive, first-class objects. By automatically identifying, establishing, and leveraging text-data connections, CrossData enables rich interactions to assist in the authoring of data documents. An expert evaluation with eight users demonstrated the usefulness of CrossData, showing that it not only reduced the manual effort in writing data documents but also opened new possibilities to bridge the gap between data exploration and writing.