在线点播亚洲日韩国产欧美_久久久久精品一区二区三区_亚洲看片一区二区三区激情_国产精品1204永久免费视频_久久久久精品国产免费AV_日韩精品成人片在线观看_免费无遮挡无码视频色

This paper introduces DanceFusion, a novel framework for reconstructing and generating dance movements synchronized to music, utilizing a Spatio-Temporal Skeleton Diffusion Transformer. The framework adeptly handles incomplete and noisy skeletal data common in short-form dance videos on social media platforms like TikTok. DanceFusion incorporates a hierarchical Transformer-based Variational Autoencoder (VAE) integrated with a diffusion model, significantly enhancing motion realism and accuracy. Our approach introduces sophisticated masking techniques and a unique iterative diffusion process that refines the motion sequences, ensuring high fidelity in both motion generation and synchronization with accompanying audio cues. Comprehensive evaluations demonstrate that DanceFusion surpasses existing methods, providing state-of-the-art performance in generating dynamic, realistic, and stylistically diverse dance motions. Potential applications of this framework extend to content creation, virtual reality, and interactive entertainment, promising substantial advancements in automated dance generation. Visit our project page at //th-mlab.github.io/DanceFusion/.

相關內容

變(bian)分自(zi)編碼

關注 51

Analysis · 值域 · 優化器 · 可約的 · Google DeepMind ·

2024 年 12 月 19 日

Cirbo: A New Tool for Boolean Circuit Analysis and Synthesis

Daniil Averkov,Tatiana Belova,Gregory Emdin,Mikhail Goncharov,Viktoriia Krivogornitsyna,Alexander S. Kulikov,Fedor Kurmazov,Daniil Levtsov,Georgie Levtsov,Vsevolod Vaskin,Aleksey Vorobiev

from arxiv, To appear in AAAI 2025

We present an open-source tool for manipulating Boolean circuits. It implements efficient algorithms, both existing and novel, for a rich variety of frequently used circuit tasks such as satisfiability, synthesis, and minimization. We tested the tool on a wide range of practically relevant circuits (computing, in particular, symmetric and arithmetic functions) that have been optimized intensively by the community for the last three years. The tool helped us to win the IWLS 2024 Programming Contest. In 2023, it was Google DeepMind who took the first place in the competition. We were able to reduce the size of the best circuits from 2023 by 12\% on average, whereas for some individual circuits, our size reduction was as large as 83\%.

詞元分析器 · 優化器 · MoDELS · 語言模型化 · Performer ·

2024 年 12 月 19 日

When Every Token Counts: Optimal Segmentation for Low-Resource Language Models

Bharath Raj S,Garvit Suri,Vikrant Dewangan,Raghav Sonavane

from arxiv, LoResLM @ COLING 2025

Traditional greedy tokenization methods have been a critical step in Natural Language Processing (NLP), influencing how text is converted into tokens and directly impacting model performance. While subword tokenizers like Byte-Pair Encoding (BPE) are widely used, questions remain about their optimality across model scales and languages. In this work, we demonstrate through extensive experiments that an optimal BPE configuration significantly reduces token count compared to greedy segmentation, yielding improvements in token-saving percentages and performance benefits, particularly for smaller models. We evaluate tokenization performance across various intrinsic and extrinsic tasks, including generation and classification. Our findings suggest that compression-optimized tokenization strategies could provide substantial advantages for multilingual and low-resource language applications, highlighting a promising direction for further research and inclusive NLP.

秩 · 有向 · 優化器 · MoDELS · 監督 ·

2024 年 12 月 18 日

ChainRank-DPO: Chain Rank Direct Preference Optimization for LLM Rankers

Haowei Liu,Xuyang Wu,Guohao Sun,Zhiqiang Tao,Yi Fang

Large language models (LLMs) have demonstrated remarkable effectiveness in text reranking through works like RankGPT, leveraging their human-like reasoning about relevance. However, supervised fine-tuning for ranking often diminishes these models' general-purpose capabilities, including the crucial reasoning abilities that make them valuable for ranking. We introduce a novel approach integrating Chain-of-Thought prompting with an SFT-DPO (Supervised Fine-Tuning followed by Direct Preference Optimization) pipeline to preserve these capabilities while improving ranking performance. Our experiments on TREC 2019 and 2020 Deep Learning datasets show that our approach outperforms the state-of-the-art RankZephyr while maintaining strong performance on the Massive Multitask Language Understanding (MMLU) benchmark, demonstrating effective preservation of general-purpose capabilities through thoughtful fine-tuning strategies. Our code and data will be publicly released upon the acceptance of the paper.

控制器 · 3D · Microsoft Surface · 回合 · ARM ·

2024 年 12 月 18 日

Immersive Human-in-the-Loop Control: Real-Time 3D Surface Meshing and Physics Simulation

Sait Akturk,Justin Valentine,Junaid Ahmad,Martin Jagersand

from arxiv, IROS 2024

This paper introduces the TactiMesh Teleoperator Interface (TTI), a novel predictive visual and haptic system designed explicitly for human-in-the-loop robot control using a head-mounted display (HMD). By employing simultaneous localization and mapping (SLAM)in tandem with a space carving method (CARV), TTI creates a real time 3D surface mesh of remote environments from an RGB camera mounted on a Barrett WAM arm. The generated mesh is integrated into a physics simulator, featuring a digital twin of the WAM robot arm to create a virtual environment. In this virtual environment, TTI provides haptic feedback directly in response to the operator's movements, eliminating the problem with delayed response from the haptic follower robot. Furthermore, texturing the 3D mesh with keyframes from SLAM allows the operator to control the viewpoint of their Head Mounted Display (HMD) independently of the arm-mounted robot camera, giving a better visual immersion and improving manipulation speed. Incorporating predictive visual and haptic feedback significantly improves teleoperation in applications such as search and rescue, inspection, and remote maintenance.

Integration · Python · 代碼 · 設計 · 多樣性 ·

2024 年 12 月 18 日

SocialED: A Python Library for Social Event Detection

Kun Zhang,Xiaoyan Yu,Pu Li,Hao Peng,Philip S. Yu

from arxiv, 8 pages, 1 figure, Python library

SocialED is a comprehensive, open-source Python library designed to support social event detection (SED) tasks, integrating 19 detection algorithms and 14 diverse datasets. It provides a unified API with detailed documentation, offering researchers and practitioners a complete solution for event detection in social media. The library is designed with modularity in mind, allowing users to easily adapt and extend components for various use cases. SocialED supports a wide range of preprocessing techniques, such as graph construction and tokenization, and includes standardized interfaces for training models and making predictions. By integrating popular deep learning frameworks, SocialED ensures high efficiency and scalability across both CPU and GPU environments. The library is built adhering to high code quality standards, including unit testing, continuous integration, and code coverage, ensuring that SocialED delivers robust, maintainable software. SocialED is publicly available at \url{//github.com/RingBDStack/SocialED} and can be installed via PyPI.

語言模型化 · 樣例 · MoDELS · Performer · 小樣本學習 ·

2024 年 12 月 17 日

RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation

Ioannis Panagiotopoulos,Giorgos Filandrianos,Maria Lymperaiou,Giorgos Stamou

from arxiv, Accepted at COLING 2025

Riddle-solving requires advanced reasoning skills, pushing LLMs to engage in abstract thinking and creative problem-solving, often revealing limitations in their cognitive abilities. In this paper, we examine the riddle-solving capabilities of LLMs using a multiple-choice format, exploring how different prompting techniques impact performance on riddles that demand diverse reasoning skills. To enhance results, we introduce RISCORE (RIddle Solving with COntext REcontruciton) a novel fully automated prompting method that generates and utilizes contextually reconstructed sentence-based puzzles in conjunction with the original examples to create few-shot exemplars. Our experiments demonstrate that RISCORE significantly improves the performance of language models in both vertical and lateral thinking tasks, surpassing traditional exemplar selection strategies across a variety of few-shot settings.

binary · 表示 · 潛在 · MoDELS · Performer ·

2024 年 12 月 17 日

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

Shaozhe Hao,Xuantong Liu,Xianbiao Qi,Shihao Zhao,Bojia Zi,Rong Xiao,Kai Han,Kwan-Yee K. Wong

from arxiv, Updated with additional T2I results; Project page: //haoosz.github.io/BiGR

We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities. BiGR is the first conditional generative model that unifies generation and discrimination within the same framework. BiGR features a binary tokenizer, a masked modeling mechanism, and a binary transcoder for binary code prediction. Additionally, we introduce a novel entropy-ordered sampling method to enable efficient image generation. Extensive experiments validate BiGR's superior performance in generation quality, as measured by FID-50k, and representation capabilities, as evidenced by linear-probe accuracy. Moreover, BiGR showcases zero-shot generalization across various vision tasks, enabling applications such as image inpainting, outpainting, editing, interpolation, and enrichment, without the need for structural modifications. Our findings suggest that BiGR unifies generative and discriminative tasks effectively, paving the way for further advancements in the field. We further enable BiGR to perform text-to-image generation, showcasing its potential for broader applications.

控制器 · 優化器 · Learning · Networking · Neural Networks ·

2024 年 12 月 17 日

Task-Parameter Nexus: Task-Specific Parameter Learning for Model-Based Control

Sheng Cheng,Ran Tao,Yuliang Gu,Shenlong Wang,Xiaofeng Wang,Naira Hovakimyan

This paper presents the Task-Parameter Nexus (TPN), a learning-based approach for online determination of the (near-)optimal control parameters of model-based controllers (MBCs) for tracking tasks. In TPN, a deep neural network is introduced to predict the control parameters for any given tracking task at runtime, especially when optimal parameters for new tasks are not immediately available. To train this network, we constructed a trajectory bank with various speeds and curvatures that represent different motion characteristics. Then, for each trajectory in the bank, we auto-tune the optimal control parameters offline and use them as the corresponding ground truth. With this dataset, the TPN is trained by supervised learning. We evaluated the TPN on the quadrotor platform. In simulation experiments, it is shown that the TPN can predict near-optimal control parameters for a spectrum of tracking tasks, demonstrating its robust generalization capabilities to unseen tasks.

MoDELS · 3D · 正則的 · 數據集 · 語言模型化 ·

2024 年 12 月 16 日

Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian

Amirhosein Chahe,Lifeng Zhou

This paper introduces a novel method for open-vocabulary 3D scene querying in autonomous driving by combining Language Embedded 3D Gaussians with Large Language Models (LLMs). We propose utilizing LLMs to generate both contextually canonical phrases and helping positive words for enhanced segmentation and scene interpretation. Our method leverages GPT-3.5 Turbo as an expert model to create a high-quality text dataset, which we then use to fine-tune smaller, more efficient LLMs for on-device deployment. Our comprehensive evaluation on the WayveScenes101 dataset demonstrates that LLM-guided segmentation significantly outperforms traditional approaches based on predefined canonical phrases. Notably, our fine-tuned smaller models achieve performance comparable to larger expert models while maintaining faster inference times. Through ablation studies, we discover that the effectiveness of helping positive words correlates with model scale, with larger models better equipped to leverage additional semantic information. This work represents a significant advancement towards more efficient, context-aware autonomous driving systems, effectively bridging 3D scene representation with high-level semantic querying while maintaining practical deployment considerations.

tuning · MoDELS · 評論員 · 語言模型化 · 數據集 ·

2023 年 8 月 21 日

Instruction Tuning for Large Language Models: A Survey

Shengyu Zhang,Linfeng Dong,Xiaoya Li,Sen Zhang,Xiaofei Sun,Shuhe Wang,Jiwei Li,Runyi Hu,Tianwei Zhang,Fei Wu,Guoyin Wang

from arxiv, A Survey paper, Pre-print

This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.