在线点播亚洲日韩国产欧美,亚洲AV永久无码精品九之,亚洲美女在线视频专区,久超成人免费视频

Texturing is a crucial step in the 3D asset production workflow, which enhances the visual appeal and diversity of 3D assets. Despite recent advancements in Text-to-Texture (T2T) generation, existing methods often yield subpar results, primarily due to local discontinuities, inconsistencies across multiple views, and their heavy dependence on UV unwrapping outcomes. To tackle these challenges, we propose a novel generation-refinement 3D texturing framework called MVPaint, which can generate high-resolution, seamless textures while emphasizing multi-view consistency. MVPaint mainly consists of three key modules. 1) Synchronized Multi-view Generation (SMG). Given a 3D mesh model, MVPaint first simultaneously generates multi-view images by employing an SMG model, which leads to coarse texturing results with unpainted parts due to missing observations. 2) Spatial-aware 3D Inpainting (S3I). To ensure complete 3D texturing, we introduce the S3I method, specifically designed to effectively texture previously unobserved areas. 3) UV Refinement (UVR). Furthermore, MVPaint employs a UVR module to improve the texture quality in the UV space, which first performs a UV-space Super-Resolution, followed by a Spatial-aware Seam-Smoothing algorithm for revising spatial texturing discontinuities caused by UV unwrapping. Moreover, we establish two T2T evaluation benchmarks: the Objaverse T2T benchmark and the GSO T2T benchmark, based on selected high-quality 3D meshes from the Objaverse dataset and the entire GSO dataset, respectively. Extensive experimental results demonstrate that MVPaint surpasses existing state-of-the-art methods. Notably, MVPaint could generate high-fidelity textures with minimal Janus issues and highly enhanced cross-view consistency.

相關內容

關注 36

3D是英文“Three Dimensions”的簡稱，中文是指三維、三個維度、三個坐標，即有長、有寬、有高，換句話說，就是立體的，是相對于只有長和寬的平面（2D）而言。

類別 · Markov · Processing（編程語言） · SimPLe · INFORMS ·

2024 年 12 月 16 日

Revelations: A Decidable Class of POMDPs with Omega-Regular Objectives

Marius Belly,Nathana?l Fijalkow,Hugo Gimbert,Florian Horn,Guillermo A. Pérez,Pierre Vandenhove

from arxiv, Extended version of paper accepted to AAAI 2025. 26 pages, 10 figures

Partially observable Markov decision processes (POMDPs) form a prominent model for uncertainty in sequential decision making. We are interested in constructing algorithms with theoretical guarantees to determine whether the agent has a strategy ensuring a given specification with probability 1. This well-studied problem is known to be undecidable already for very simple omega-regular objectives, because of the difficulty of reasoning on uncertain events. We introduce a revelation mechanism which restricts information loss by requiring that almost surely the agent has eventually full information of the current state. Our main technical results are to construct exact algorithms for two classes of POMDPs called weakly and strongly revealing. Importantly, the decidable cases reduce to the analysis of a finite belief-support Markov decision process. This yields a conceptually simple and exact algorithm for a large class of POMDPs.

秩 · MoDELS · Excel · 評論員 · 數據集 ·

2024 年 12 月 16 日

IRR: Image Review Ranking Framework for Evaluating Vision-Language Models

Kazuki Hayashi,Kazuma Onishi,Toma Suzuki,Yusuke Ide,Seiji Gobara,Shigeki Saito,Yusuke Sakai,Hidetaka Kamigaito,Katsuhiko Hayashi,Taro Watanabe

from arxiv, 18pages, Accepted at COLING25

Large-scale Vision-Language Models (LVLMs) process both images and text, excelling in multimodal tasks such as image captioning and description generation. However, while these models excel at generating factual content, their ability to generate and evaluate texts reflecting perspectives on the same image, depending on the context, has not been sufficiently explored. To address this, we propose IRR: Image Review Rank, a novel evaluation framework designed to assess critic review texts from multiple perspectives. IRR evaluates LVLMs by measuring how closely their judgments align with human interpretations. We validate it using a dataset of images from 15 categories, each with five critic review texts and annotated rankings in both English and Japanese, totaling over 2,000 data instances. The datasets are available at //hf.co/datasets/naist-nlp/Wiki-ImageReview1.0. Our results indicate that, although LVLMs exhibited consistent performance across languages, their correlation with human annotations was insufficient, highlighting the need for further advancements. These findings highlight the limitations of current evaluation methods and the need for approaches that better capture human reasoning in Vision & Language tasks.

語音合成 · 有偏 · MoDELS · 穩健性 · 樣例 ·

2024 年 12 月 15 日

Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation

Praveen Srinivasa Varadhan,Amogh Gulati,Ashwin Sankar,Srija Anand,Anirudh Gupta,Anirudh Mukherjee,Shiva Kumar Marepally,Ankur Bhatia,Saloni Jaju,Suvrat Bhooshan,Mitesh M. Khapra

from arxiv, 21 pages, 13 Figures

Despite rapid advancements in TTS models, a consistent and robust human evaluation framework is still lacking. For example, MOS tests fail to differentiate between similar models, and CMOS's pairwise comparisons are time-intensive. The MUSHRA test is a promising alternative for evaluating multiple TTS systems simultaneously, but in this work we show that its reliance on matching human reference speech unduly penalises the scores of modern TTS systems that can exceed human speech quality. More specifically, we conduct a comprehensive assessment of the MUSHRA test, focusing on its sensitivity to factors such as rater variability, listener fatigue, and reference bias. Based on our extensive evaluation involving 492 human listeners across Hindi and Tamil we identify two primary shortcomings: (i) reference-matching bias, where raters are unduly influenced by the human reference, and (ii) judgement ambiguity, arising from a lack of clear fine-grained guidelines. To address these issues, we propose two refined variants of the MUSHRA test. The first variant enables fairer ratings for synthesized samples that surpass human reference quality. The second variant reduces ambiguity, as indicated by the relatively lower variance across raters. By combining these approaches, we achieve both more reliable and more fine-grained assessments. We also release MANGO, a massive dataset of 246,000 human ratings, the first-of-its-kind collection for Indian languages, aiding in analyzing human preferences and developing automatic metrics for evaluating TTS systems.

計算機科學 · 講稿 · 張成子空間 · 可辨認的 · 優化器 ·

2024 年 12 月 13 日

EmpireDB: Data System to Accelerate Computational Sciences

Daniel Alabi,Eugene Wu

The emerging discipline of Computational Science is concerned with using computers to simulate or solve scientific problems. These problems span the natural, political, and social sciences. The discipline has exploded over the past decade due to the emergence of larger amounts of observational data and large-scale simulations that were previously unavailable or unfeasible. However, there are still significant challenges with managing the large amounts of data and simulations. The database management systems community has always been at the forefront of the development of the theory and practice of techniques for formalizing and actualizing systems that access or query large datasets. In this paper, we present EmpireDB, a vision for a data management system to accelerate computational sciences. In addition, we identify challenges and opportunities for the database community to further the fledgling field of computational sciences. Finally, we present preliminary evidence showing that the optimized components in EmpireDB could lead to improvements in performance compared to contemporary implementations.

cache · MoDELS · 模型評估 · 語言模型化 · Processing（編程語言） ·

2024 年 12 月 13 日

DroidSpeak: KV Cache Sharing for Efficient Multi-LLM Serving

Yuhan Liu,Yuyang Huang,Jiayi Yao,Zhuohan Gu,Kuntai Du,Hanchen Li,Yihua Cheng,Junchen Jiang,Shan Lu,Madan Musuvathi,Esha Choukse

Large Language Models (LLMs) are increasingly employed in complex workflows, where different LLMs and fine-tuned variants collaboratively address complex tasks. However, these systems face significant inefficiencies due to redundant context processing of the shared context. We propose DroidSpeak, a framework that optimizes context sharing between fine-tuned LLMs derived from the same foundational model. DroidSpeak identifies critical layers in the KV cache and selectively recomputes them, enabling effective reuse of intermediate data while maintaining high accuracy. Our approach balances computational efficiency and task fidelity, significantly reducing inference latency and throughput bottlenecks. Experiments on diverse datasets and model pairs demonstrate that DroidSpeak achieves up to 3x higher throughputs and 2.6x faster prefill times with negligible accuracy loss compared to full recomputation.

穩健性 · Markov · MoDELS · Processing（編程語言） · 轉移概率 ·

2024 年 12 月 13 日

Solving Robust Markov Decision Processes: Generic, Reliable, Efficient

Tobias Meggendorfer,Maximilian Weininger,Patrick Wienh?ft

from arxiv, Accepted for publication at AAAI'25. Extended version with full appendix, 26 pages

Markov decision processes (MDP) are a well-established model for sequential decision-making in the presence of probabilities. In robust MDP (RMDP), every action is associated with an uncertainty set of probability distributions, modelling that transition probabilities are not known precisely. Based on the known theoretical connection to stochastic games, we provide a framework for solving RMDPs that is generic, reliable, and efficient. It is *generic* both with respect to the model, allowing for a wide range of uncertainty sets, including but not limited to intervals, $L^1$- or $L^2$-balls, and polytopes; and with respect to the objective, including long-run average reward, undiscounted total reward, and stochastic shortest path. It is *reliable*, as our approach not only converges in the limit, but provides precision guarantees at any time during the computation. It is *efficient* because -- in contrast to state-of-the-art approaches -- it avoids explicitly constructing the underlying stochastic game. Consequently, our prototype implementation outperforms existing tools by several orders of magnitude and can solve RMDPs with a million states in under a minute.

Apache · 同態加密 · 層 · 設計 · Performer ·

2024 年 12 月 13 日

APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption

Lin Ding,Song Bian,Penggao He,Yan Xu,Gang Qu,Jiliang Zhang

Fully Homomorphic Encryption (FHE) is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE algorithm designs in end-to-end privacy-preserving tasks, existing FHE accelerators often face the challenges of low hardware utilization rates and insufficient memory bandwidth. In this work, we present \NAME, a layered near-memory computing hierarchy tailored for multi-scheme FHE acceleration. By closely inspecting the data flow across different FHE schemes, we propose a layered near-memory computing architecture with fine-grained functional unit design to significantly enhance the utilization rates of computational resources and memory bandwidth. The experimental results illustrate that APACHE outperforms state-of-the-art ASIC FHE accelerators by 10.63x to 35.47x over a variety of application benchmarks, e.g., Lola MNIST, HELR, VSP, and HE$^{3}$DB.

MoDELS · 縮放 · 語言模型化 · 模型評估 · 評論員 ·

2024 年 12 月 13 日

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Yukang Chen,Fuzhao Xue,Dacheng Li,Qinghao Hu,Ligeng Zhu,Xiuyu Li,Yunhao Fang,Haotian Tang,Shang Yang,Zhijian Liu,Ethan He,Hongxu Yin,Pavlo Molchanov,Jan Kautz,Linxi Fan,Yuke Zhu,Yao Lu,Song Han

from arxiv, Code and models are available at //github.com/NVlabs/VILA/tree/main/longvila

Long-context capability is critical for multi-modal foundation models, especially for long video understanding. We introduce LongVILA, a full-stack solution for long-context visual-language models by co-designing the algorithm and system. For model training, we upgrade existing VLMs to support long video understanding by incorporating two additional stages, i.e., long context extension and long video supervised fine-tuning. However, training on long video is computationally and memory intensive. We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing. LongVILA efficiently extends the number of video frames of VILA from 8 to 2048, achieving 99.8% accuracy in 6,000-frame (more than 1 million tokens) video needle-in-a-haystack. LongVILA-7B demonstrates strong accuracy on 9 popular video benchmarks, e.g. 65.1% VideoMME with subtitle. Besides, MM-SP is 2.1x - 5.7x faster than ring style sequence parallelism and 1.1x - 1.4x faster than Megatron with a hybrid context and tensor parallelism. Moreover, it seamlessly integrates with Hugging Face Transformers.

專利 · Processing（編程語言） · Agent · MoDELS · 語言模型化 ·

2024 年 12 月 13 日

AutoPatent: A Multi-Agent Framework for Automatic Patent Generation

Qiyao Wang,Shiwen Ni,Huaren Liu,Shule Lu,Guhong Chen,Xi Feng,Chi Wei,Qiang Qu,Hamid Alinejad-Rokny,Yuan Lin,Min Yang

from arxiv, 19 pages, 7 figures

As the capabilities of Large Language Models (LLMs) continue to advance, the field of patent processing has garnered increased attention within the natural language processing community. However, the majority of research has been concentrated on classification tasks, such as patent categorization and examination, or on short text generation tasks like patent summarization and patent quizzes. In this paper, we introduce a novel and practical task known as Draft2Patent, along with its corresponding D2P benchmark, which challenges LLMs to generate full-length patents averaging 17K tokens based on initial drafts. Patents present a significant challenge to LLMs due to their specialized nature, standardized terminology, and extensive length. We propose a multi-agent framework called AutoPatent which leverages the LLM-based planner agent, writer agents, and examiner agent with PGTree and RRAG to generate lengthy, intricate, and high-quality complete patent documents. The experimental results demonstrate that our AutoPatent framework significantly enhances the ability to generate comprehensive patents across various LLMs. Furthermore, we have discovered that patents generated solely with the AutoPatent framework based on the Qwen2.5-7B model outperform those produced by larger and more powerful LLMs, such as GPT-4o, Qwen2.5-72B, and LLAMA3.1-70B, in both objective metrics and human evaluations. We will make the data and code available upon acceptance at \url{//github.com/QiYao-Wang/AutoPatent}.

MoDELS · Learning · MOS · Continuity · 知識 (knowledge) ·

2024 年 12 月 12 日

MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning

Hai-Long Sun,Da-Wei Zhou,Hanbin Zhao,Le Gan,De-Chuan Zhan,Han-Jia Ye

from arxiv, Accepted to AAAI 2025. Code is available at: //github.com/sun-hailong/AAAI25-MOS

Class-Incremental Learning (CIL) requires models to continually acquire knowledge of new classes without forgetting old ones. Despite Pre-trained Models (PTMs) have shown excellent performance in CIL, catastrophic forgetting still occurs as the model learns new concepts. Existing work seeks to utilize lightweight components to adjust the PTM, while the forgetting phenomenon still comes from {\em parameter and retrieval} levels. Specifically, iterative updates of the model result in parameter drift, while mistakenly retrieving irrelevant modules leads to the mismatch during inference. To this end, we propose MOdel Surgery (MOS) to rescue the model from forgetting previous knowledge. By training task-specific adapters, we continually adjust the PTM to downstream tasks. To mitigate parameter-level forgetting, we present an adapter merging approach to learn task-specific adapters, which aims to bridge the gap between different components while reserve task-specific information. Besides, to address retrieval-level forgetting, we introduce a training-free self-refined adapter retrieval mechanism during inference, which leverages the model's inherent ability for better adapter retrieval. By jointly rectifying the model with those steps, MOS can robustly resist catastrophic forgetting in the learning process. Extensive experiments on seven benchmark datasets validate MOS's state-of-the-art performance. Code is available at: //github.com/sun-hailong/AAAI25-MOS