日韩在线精品小视频,久久免费观看黄色视频,亚洲已满18点击进入在线观看,全免费免A级毛片费观看

Large Language Models (LLMs) have demonstrated remarkable generation capabilities but often struggle to access up-to-date information, which can lead to hallucinations. Retrieval-Augmented Generation (RAG) addresses this issue by incorporating knowledge from external databases, enabling more accurate and relevant responses. Due to the context window constraints of LLMs, it is impractical to input the entire external database context directly into the model. Instead, only the most relevant information, referred to as chunks, is selectively retrieved. However, current RAG research faces three key challenges. First, existing solutions often select each chunk independently, overlooking potential correlations among them. Second, in practice the utility of chunks is non-monotonic, meaning that adding more chunks can decrease overall utility. Traditional methods emphasize maximizing the number of included chunks, which can inadvertently compromise performance. Third, each type of user query possesses unique characteristics that require tailored handling, an aspect that current approaches do not fully consider. To overcome these challenges, we propose a cost constrained retrieval optimization system CORAG for retrieval-augmented generation. We employ a Monte Carlo Tree Search (MCTS) based policy framework to find optimal chunk combinations sequentially, allowing for a comprehensive consideration of correlations among chunks. Additionally, rather than viewing budget exhaustion as a termination condition, we integrate budget constraints into the optimization of chunk combinations, effectively addressing the non-monotonicity of chunk utility.

相關內容

優化器

關注 4

代碼 · 原點 · 相似度 · MoDELS · 語言模型化 ·

2024 年 12 月 16 日

Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting

Tong Ye,Yangkai Du,Tengfei Ma,Lingfei Wu,Xuhong Zhang,Shouling Ji,Wenhai Wang

from arxiv, Accepted by AAAI 2025; previously submitted to EMNLP 2023

Large Language Models (LLMs) have demonstrated remarkable proficiency in generating code. However, the misuse of LLM-generated (synthetic) code has raised concerns in both educational and industrial contexts, underscoring the urgent need for synthetic code detectors. Existing methods for detecting synthetic content are primarily designed for general text and struggle with code due to the unique grammatical structure of programming languages and the presence of numerous ''low-entropy'' tokens. Building on this, our work proposes a novel zero-shot synthetic code detector based on the similarity between the original code and its LLM-rewritten variants. Our method is based on the observation that differences between LLM-rewritten and original code tend to be smaller when the original code is synthetic. We utilize self-supervised contrastive learning to train a code similarity model and evaluate our approach on two synthetic code detection benchmarks. Our results demonstrate a significant improvement over existing SOTA synthetic content detectors, with AUROC scores increasing by 20.5% on the APPS benchmark and 29.1% on the MBPP benchmark.

知識 (knowledge) · 估計/估計量 · MoDELS · 語言模型化 · 置信度 ·

2024 年 12 月 16 日

UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models

Boyang Xue,Fei Mi,Qi Zhu,Hongru Wang,Rui Wang,Sheng Wang,Erxin Yu,Xuming Hu,Kam-Fai Wong

Despite demonstrating impressive capabilities, Large Language Models (LLMs) still often struggle to accurately express the factual knowledge they possess, especially in cases where the LLMs' knowledge boundaries are ambiguous. To improve LLMs' factual expressions, we propose the UAlign framework, which leverages Uncertainty estimations to represent knowledge boundaries, and then explicitly incorporates these representations as input features into prompts for LLMs to Align with factual knowledge. First, we prepare the dataset on knowledge question-answering (QA) samples by calculating two uncertainty estimations, including confidence score and semantic entropy, to represent the knowledge boundaries for LLMs. Subsequently, using the prepared dataset, we train a reward model that incorporates uncertainty estimations and then employ the Proximal Policy Optimization (PPO) algorithm for factuality alignment on LLMs. Experimental results indicate that, by integrating uncertainty representations in LLM alignment, the proposed UAlign can significantly enhance the LLMs' capacities to confidently answer known questions and refuse unknown questions on both in-domain and out-of-domain tasks, showing reliability improvements and good generalizability over various prompt- and training-based baselines.

Hive · 控制器 · Agent · MoDELS · 語言模型化 ·

2024 年 12 月 16 日

Harnessing Language for Coordination: A Framework and Benchmark for LLM-Driven Multi-Agent Control

Timothée Anne,Noah Syrkis,Meriem Elhosni,Florian Turati,Franck Legendre,Alain Jaquier,Sebastian Risi

Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. A promising but largely under-explored area is their potential to facilitate human coordination with many agents. Such capabilities would be useful in domains including disaster response, urban planning, and real-time strategy scenarios. In this work, we introduce (1) a real-time strategy game benchmark designed to evaluate these abilities and (2) a novel framework we term HIVE. HIVE empowers a single human to coordinate swarms of up to 2,000 agents using natural language dialog with an LLM. We present promising results on this multi-agent benchmark, with our hybrid approach solving tasks such as coordinating agent movements, exploiting unit weaknesses, leveraging human annotations, and understanding terrain and strategic points. However, our findings also highlight critical limitations of current models, including difficulties in processing spatial visual information and challenges in formulating long-term strategic plans. This work sheds light on the potential and limitations of LLMs in human-swarm coordination, paving the way for future research in this area. The HIVE project page, which includes videos of the system in action, can be found here: hive.syrkis.com.

相同 · 穩健性 · 設計 · 數據集 · Processing（編程語言） ·

2024 年 12 月 15 日

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Hao Shao,Shulun Wang,Yang Zhou,Guanglu Song,Dailan He,Shuo Qin,Zhuofan Zong,Bingqi Ma,Yu Liu,Hongsheng Li

from arxiv, project page: //hao-shao.com/projects/vividface.html

Video face swapping is becoming increasingly popular across various applications, yet existing methods primarily focus on static images and struggle with video face swapping because of temporal consistency and complex scenarios. In this paper, we present the first diffusion-based framework specifically designed for video face swapping. Our approach introduces a novel image-video hybrid training framework that leverages both abundant static image data and temporal video sequences, addressing the inherent limitations of video-only training. The framework incorporates a specially designed diffusion model coupled with a VidFaceVAE that effectively processes both types of data to better maintain temporal coherence of the generated videos. To further disentangle identity and pose features, we construct the Attribute-Identity Disentanglement Triplet (AIDT) Dataset, where each triplet has three face images, with two images sharing the same pose and two sharing the same identity. Enhanced with a comprehensive occlusion augmentation, this dataset also improves robustness against occlusions. Additionally, we integrate 3D reconstruction techniques as input conditioning to our network for handling large pose variations. Extensive experiments demonstrate that our framework achieves superior performance in identity preservation, temporal consistency, and visual quality compared to existing methods, while requiring fewer inference steps. Our approach effectively mitigates key challenges in video face swapping, including temporal flickering, identity preservation, and robustness to occlusions and pose variations.

cache · MoDELS · 模型評估 · 語言模型化 · Processing（編程語言） ·

2024 年 12 月 13 日

DroidSpeak: KV Cache Sharing for Efficient Multi-LLM Serving

Yuhan Liu,Yuyang Huang,Jiayi Yao,Zhuohan Gu,Kuntai Du,Hanchen Li,Yihua Cheng,Junchen Jiang,Shan Lu,Madan Musuvathi,Esha Choukse

Large Language Models (LLMs) are increasingly employed in complex workflows, where different LLMs and fine-tuned variants collaboratively address complex tasks. However, these systems face significant inefficiencies due to redundant context processing of the shared context. We propose DroidSpeak, a framework that optimizes context sharing between fine-tuned LLMs derived from the same foundational model. DroidSpeak identifies critical layers in the KV cache and selectively recomputes them, enabling effective reuse of intermediate data while maintaining high accuracy. Our approach balances computational efficiency and task fidelity, significantly reducing inference latency and throughput bottlenecks. Experiments on diverse datasets and model pairs demonstrate that DroidSpeak achieves up to 3x higher throughputs and 2.6x faster prefill times with negligible accuracy loss compared to full recomputation.

MoDELS · Learning · 生成模型 · TOS · 圖 ·

2024 年 12 月 13 日

A Fixed-Point Approach for Causal Generative Modeling

Meyer Scetbon,Joel Jennings,Agrin Hilmkil,Cheng Zhang,Chao Ma

We propose a novel formalism for describing Structural Causal Models (SCMs) as fixed-point problems on causally ordered variables, eliminating the need for Directed Acyclic Graphs (DAGs), and establish the weakest known conditions for their unique recovery given the topological ordering (TO). Based on this, we design a two-stage causal generative model that first infers in a zero-shot manner a valid TO from observations, and then learns the generative SCM on the ordered variables. To infer TOs, we propose to amortize the learning of TOs on synthetically generated datasets by sequentially predicting the leaves of graphs seen during training. To learn SCMs, we design a transformer-based architecture that exploits a new attention mechanism enabling the modeling of causal structures, and show that this parameterization is consistent with our formalism. Finally, we conduct an extensive evaluation of each method individually, and show that when combined, our model outperforms various baselines on generated out-of-distribution problems. The code is available on \href{//github.com/microsoft/causica/tree/main/research_experiments/fip}{Github}.

MoDELS · 多峰值 · Performer · 數據集 · Processing（編程語言） ·

2024 年 12 月 13 日

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

Zhijian Huang,Chengjian Feng,Feng Yan,Baihui Xiao,Zequn Jie,Yujie Zhong,Xiaodan Liang,Lin Ma

Large Multimodal Models (LMMs) have demonstrated exceptional comprehension and interpretation capabilities in Autonomous Driving (AD) by incorporating large language models. Despite the advancements, current data-driven AD approaches tend to concentrate on a single dataset and specific tasks, neglecting their overall capabilities and ability to generalize. To bridge these gaps, we propose DriveMM, a general large multimodal model designed to process diverse data inputs, such as images and multi-view videos, while performing a broad spectrum of AD tasks, including perception, prediction, and planning. Initially, the model undergoes curriculum pre-training to process varied visual signals and perform basic visual comprehension and perception tasks. Subsequently, we augment and standardize various AD-related datasets to fine-tune the model, resulting in an all-in-one LMM for autonomous driving. To assess the general capabilities and generalization ability, we conduct evaluations on six public benchmarks and undertake zero-shot transfer on an unseen dataset, where DriveMM achieves state-of-the-art performance across all tasks. We hope DriveMM as a promising solution for future end-to-end autonomous driving applications in the real world. Project page with code: //github.com/zhijian11/DriveMM.

MoDELS · 可理解性 · GM · 解碼 · 圖像字幕 ·

2024 年 12 月 12 日

Causal Graphical Models for Vision-Language Compositional Understanding

Fiorenzo Parascandolo,Nicholas Moratelli,Enver Sangineto,Lorenzo Baraldi,Rita Cucchiara

Recent work has empirically shown that Vision-Language Models (VLMs) struggle to fully understand the compositional properties of the human language, usually modeling an image caption as a "bag of words". As a result, they perform poorly on compositional tasks, which require a deeper understanding of the different entities of a sentence (subject, verb, etc.) jointly with their mutual relationships in order to be solved. In this paper, we model the dependency relations among textual and visual tokens using a Causal Graphical Model (CGM), built using a dependency parser, and we train a decoder conditioned by the VLM visual encoder. Differently from standard autoregressive or parallel predictions, our decoder's generative process is partially-ordered following the CGM structure. This structure encourages the decoder to learn only the main causal dependencies in a sentence discarding spurious correlations. Using extensive experiments on five compositional benchmarks, we show that our method significantly outperforms all the state-of-the-art compositional approaches by a large margin, and it also improves over methods trained using much larger datasets.

縮放 · 模型評估 · 大語言模型 · 語言模型化 · CoT ·

2024 年 12 月 12 日

Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning

Zhenni Bi,Kai Han,Chuanjian Liu,Yehui Tang,Yunhe Wang

from arxiv, Preprint

Large Language Models (LLMs) have shown remarkable abilities across various language tasks, but solving complex reasoning problems remains a challenge. While existing methods like Chain-of-Thought (CoT) and Tree-of-Thought (ToT) enhance reasoning by decomposing problems or structuring prompts, they typically perform a single pass of reasoning and may fail to revisit flawed paths, compromising accuracy. To address this, we propose a novel reasoning framework called Forest-of-Thought (FoT), which integrates multiple reasoning trees to leverage collective decision-making for solving complex logical problems. FoT utilizes sparse activation strategies to select the most relevant reasoning paths, improving both efficiency and accuracy. Additionally, we introduce a dynamic self-correction strategy that enables real-time error correction and learning from past mistakes, as well as consensus-guided decision making strategies to optimize correctness and computational resources. Experimental results demonstrate that the FoT framework, combined with these strategies, significantly enhances the reasoning capabilities of LLMs, enabling them to solve complex tasks with greater precision and efficiency.

向量化 · MoDELS · 潛在 · Continuity · Learning ·

2024 年 12 月 11 日

SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion

Ximing Xing,Juncheng Hu,Jing Zhang,Dong Xu,Qian Yu

from arxiv, project page: \href{//ximinng.github.io/SVGFusionProject/}{//ximinng.github.io/SVGFusionProject/}

The generation of Scalable Vector Graphics (SVG) assets from textual data remains a significant challenge, largely due to the scarcity of high-quality vector datasets and the limitations in scalable vector representations required for modeling intricate graphic distributions. This work introduces SVGFusion, a Text-to-SVG model capable of scaling to real-world SVG data without reliance on a text-based discrete language model or prolonged SDS optimization. The essence of SVGFusion is to learn a continuous latent space for vector graphics with a popular Text-to-Image framework. Specifically, SVGFusion consists of two modules: a Vector-Pixel Fusion Variational Autoencoder (VP-VAE) and a Vector Space Diffusion Transformer (VS-DiT). VP-VAE takes both the SVGs and corresponding rasterizations as inputs and learns a continuous latent space, whereas VS-DiT learns to generate a latent code within this space based on the text prompt. Based on VP-VAE, a novel rendering sequence modeling strategy is proposed to enable the latent space to embed the knowledge of construction logics in SVGs. This empowers the model to achieve human-like design capabilities in vector graphics, while systematically preventing occlusion in complex graphic compositions. Moreover, our SVGFusion's ability can be continuously improved by leveraging the scalability of the VS-DiT by adding more VS-DiT blocks. A large-scale SVG dataset is collected to evaluate the effectiveness of our proposed method. Extensive experimentation has confirmed the superiority of our SVGFusion over existing SVG generation methods, achieving enhanced quality and generalizability, thereby establishing a novel framework for SVG content creation. Code, model, and data will be released at: \href{//ximinng.github.io/SVGFusionProject/}{//ximinng.github.io/SVGFusionProject/}