蜜芽亚洲精品国产品国语在线试看-亚洲AV永久少妇精品一区在线

We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations. Traditional evaluation methods often rely on human judgment, which is costly and unscalable, or focus solely on data accuracy, neglecting the effectiveness of visual communication. By employing VQA models, we assess data representation quality and the general communicative clarity of charts. Experiments were conducted using two leading VQA benchmark datasets, ChartQA and PlotQA, with visualizations generated by OpenAI's GPT-3.5 Turbo and Meta's Llama 3.1 70B-Instruct models. Our results indicate that LLM-generated charts do not match the accuracy of the original non-LLM-generated charts based on VQA performance measures. Moreover, while our results demonstrate that few-shot prompting significantly boosts the accuracy of chart generation, considerable progress remains to be made before LLMs can fully match the precision of human-generated graphs. This underscores the importance of our work, which expedites the research process by enabling rapid iteration without the need for human annotation, thus accelerating advancements in this field.

知識薈萃

精品入門和進階教程、論文和代碼整理等

查看相關VIP內容、論文、資訊等

AI · Integration · Prompt · CRAFT · 語言模型化 ·

2024 年 11 月 5 日

From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice

Alicia Guo,Shreya Sathyanarayanan,Leijie Wang,Jeffrey Heer,Amy Zhang

Creative writers have a love for their craft, yet AI systems using large language models (LLMs) offer the automation of significant parts of the writing process. So why do some creative writers choose to integrate AI into their workflows? To explore this, we interview and observe a writing session with 18 creative writers who already use AI regularly in their writing practice. Our findings reveal that creative writers are intentional about how they incorporate AI, making many deliberate decisions about when and how to engage AI based on the core values they hold about writing. These values, such as authenticity and craftsmanship, alongside writers' relationships with and use of AI influence the parts of writing over which they wish to maintain control. Through our analysis, we contribute a taxonomy of writer values, writer relationships with AI, and integration strategies, and discuss how these three elements interrelate.

Learning · 稀疏 · 控制器 · 可約的 · 超參數 ·

2024 年 11 月 4 日

RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards

Fatemeh Zargarbashi,Jin Cheng,Dongho Kang,Robert Sumner,Stelian Coros

from arxiv, This paper has been accepted to 8th Conference on Robot Learning (CoRL 2024). Project website: //sites.google.com/view/robot-keyframing

This paper presents a novel learning-based control framework that uses keyframing to incorporate high-level objectives in natural locomotion for legged robots. These high-level objectives are specified as a variable number of partial or complete pose targets that are spaced arbitrarily in time. Our proposed framework utilizes a multi-critic reinforcement learning algorithm to effectively handle the mixture of dense and sparse rewards. Additionally, it employs a transformer-based encoder to accommodate a variable number of input targets, each associated with specific time-to-arrivals. Throughout simulation and hardware experiments, we demonstrate that our framework can effectively satisfy the target keyframe sequence at the required times. In the experiments, the multi-critic method significantly reduces the effort of hyperparameter tuning compared to the standard single-critic alternative. Moreover, the proposed transformer-based architecture enables robots to anticipate future goals, which results in quantitative improvements in their ability to reach their targets.

命名實體識別 · entity · 語言模型化 · 樣例 · 標注 ·

2024 年 11 月 4 日

ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models

Anbang Wang

This paper presents ReverseNER, a framework aimed at overcoming the limitations of large language models (LLMs) in zero-shot Named Entity Recognition (NER) tasks, particularly in cases where certain entity types have ambiguous boundaries. ReverseNER tackles this challenge by constructing a reliable example library with the reversed process of NER. Rather than beginning with sentences, this method uses an LLM to generate entities based on their definitions and then expands them into full sentences. During sentence generation, the LLM is guided to replicate the structure of a specific 'feature sentence', extracted from the task sentences by clustering. This results in well-annotated sentences with clearly labeled entities, while preserving semantic and structural similarity to the task sentences. Once the example library is constructed, the method selects the most semantically similar example labels for each task sentence to support the LLM's inference. We also propose an entity-level self-consistency scoring mechanism to improve NER performance with LLMs. Experiments show that ReverseNER significantly outperforms traditional zero-shot NER with LLMs and surpasses several few-shot methods, marking a notable improvement in NER for domains with limited labeled data.

推斷 · CoT · Performer · 可約的 · Better ·

2024 年 11 月 4 日

Nash CoT: Multi-Path Inference with Preference Equilibrium

Ziqi Zhang,Cunxiang Wang,Xiong Xiao,Yue Zhang,Donglin Wang

Chain of thought (CoT) is a reasoning framework that can enhance the performance of Large Language Models (LLMs) on complex inference tasks. In particular, among various studies related to CoT, multi-path inference stands out as a simple yet effective improvement. However, there is no optimal setting for the number of inference paths. Therefore, we have to increase the number of inference paths to obtain better results, which in turn increases the inference cost. To address this limitation, we can utilize question-related role templates to guide LLMs into relevant roles, thereby increasing the possibility of correct inferences for each path and further reducing dependence on the number of inference paths while improving reasoning accuracy. However, placing LLMs into specific roles may reduce their reasoning diversity and performance on a few tasks where role dependence is low. To alleviate the excessive immersion of the LLM into a specific role, we propose Nash CoT by constructing a competitive system on each path that balances the generation from role-specific LLMs' and the general LLMs' generation, thereby ensuring both effective role adoption and diversity in LLM generation further maintaining the performance of multi-path inference while reducing the requirement of the number of inference paths. We evaluate Nash CoT across various inference tasks, including Arabic Reasoning, Commonsense Question Answering, and Symbolic Inference, achieving results that are comparable to or better than those of multi-path CoT with the equal number of inference paths.

MoDELS · 全 · 語言模型化 · tuning · Prompt ·

2024 年 11 月 4 日

Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study

André Storhaug,Jingyue Li

from arxiv, 12 pages, 3 figures, 4 tables, 1 listing

The advent of large language models (LLMs) like GitHub Copilot has significantly enhanced programmers' productivity, particularly in code generation. However, these models often struggle with real-world tasks without fine-tuning. As LLMs grow larger and more performant, fine-tuning for specialized tasks becomes increasingly expensive. Parameter-efficient fine-tuning (PEFT) methods, which fine-tune only a subset of model parameters, offer a promising solution by reducing the computational costs of tuning LLMs while maintaining their performance. Existing studies have explored using PEFT and LLMs for various code-related tasks and found that the effectiveness of PEFT techniques is task-dependent. The application of PEFT techniques in unit test generation remains underexplored. The state-of-the-art is limited to using LLMs with full fine-tuning to generate unit tests. This paper investigates both full fine-tuning and various PEFT methods, including LoRA, (IA)^3, and prompt tuning, across different model architectures and sizes. We use well-established benchmark datasets to evaluate their effectiveness in unit test generation. Our findings show that PEFT methods can deliver performance comparable to full fine-tuning for unit test generation, making specialized fine-tuning more accessible and cost-effective. Notably, prompt tuning is the most effective in terms of cost and resource utilization, while LoRA approaches the effectiveness of full fine-tuning in several cases.

數學 · 值域 · HTTPS · 話題 · 講稿 ·

2024 年 11 月 3 日

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

George Tsoukalas,Jasper Lee,John Jennings,Jimmy Xin,Michelle Ding,Michael Jennings,Amitayush Thakur,Swarat Chaudhuri

from arxiv, Accepted at NeurIPS 2024 Datasets & Benchmarks Track

We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at //github.com/trishullab/PutnamBench.

MoDELS · 成比例 · 估計/估計量 · 模型選擇 · 錯誤率 ·

2024 年 11 月 3 日

DSDE: Using Proportion Estimation to Improve Model Selection for Out-of-Distribution Detection

Jingyao Geng,Yuan Zhang,Jiaqi Huang,Feng Xue,Falong Tan,Chuanlong Xie,Shumei Zhang

from arxiv, 16 pages, 2 figures

Model library is an effective tool for improving the performance of single-model Out-of-Distribution (OoD) detector, mainly through model selection and detector fusion. However, existing methods in the literature do not provide uncertainty quantification for model selection results. Additionally, the model ensemble process primarily focuses on controlling the True Positive Rate (TPR) while neglecting the False Positive Rate (FPR). In this paper, we emphasize the significance of the proportion of models in the library that identify the test sample as an OoD sample. This proportion holds crucial information and directly influences the error rate of OoD detection.To address this, we propose inverting the commonly-used sequential p-value strategies. We define the rejection region initially and then estimate the error rate. Furthermore, we introduce a novel perspective from change-point detection and propose an approach for proportion estimation with automatic hyperparameter selection. We name the proposed approach as DOS-Storey-based Detector Ensemble (DSDE). Experimental results on CIFAR10 and CIFAR100 demonstrate the effectiveness of our approach in tackling OoD detection challenges. Specifically, the CIFAR10 experiments show that DSDE reduces the FPR from 11.07% to 3.31% compared to the top-performing single-model detector.

解碼 · 錯誤率 · 比特 · Microsoft Surface · Integration ·

2024 年 11 月 3 日

Study of Iterative Detection and Decoding for RIS-Aided Multiuser Multi-Antenna Systems

R. Porto,R. C. de Lamare

from arxiv, 5 figures, 6 pages

We present a novel iterative detection and decoding (IDD) scheme for Reconfigurable Intelligent Surface (RIS)-assisted multiuser multiple-antenna systems. The proposed approach introduces a joint iterative detection strategy that integrates Low-Density Parity-Check (LDPC) codes, RIS processing and iterative detection and decoding. In particular, we employ a minimum mean square error receive filter that performs truncation at the RIS and soft interference cancelation at the receiver. Simulation results evaluate the system's overall capacity and bit error rate, and demonstrate substantial improvements in bit error rate across block-fading channels.

估計/估計量 · 穩健性 · Integration · Extensibility · MoDELS ·

2024 年 11 月 1 日

RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior

Mingjiang Liang,Yongkang Cheng,Hualin Liang,Shaoli Huang,Wei Liu

from arxiv, Accepted by WACV 2025 (Round 1)

We present RopeTP, a novel framework that combines Robust pose estimation with a diffusion Trajectory Prior to reconstruct global human motion from videos. At the heart of RopeTP is a hierarchical attention mechanism that significantly improves context awareness, which is essential for accurately inferring the posture of occluded body parts. This is achieved by exploiting the relationships with visible anatomical structures, enhancing the accuracy of local pose estimations. The improved robustness of these local estimations allows for the reconstruction of precise and stable global trajectories. Additionally, RopeTP incorporates a diffusion trajectory model that predicts realistic human motion from local pose sequences. This model ensures that the generated trajectories are not only consistent with observed local actions but also unfold naturally over time, thereby improving the realism and stability of 3D human motion reconstruction. Extensive experimental validation shows that RopeTP surpasses current methods on two benchmark datasets, particularly excelling in scenarios with occlusions. It also outperforms methods that rely on SLAM for initial camera estimates and extensive optimization, delivering more accurate and realistic trajectories.

核化 · 稀疏 · 近似 · 模態 · 知識 (knowledge) ·

2024 年 10 月 31 日

Blind Time-of-Flight Imaging: Sparse Deconvolution on the Continuum with Unknown Kernels

Ruiming Guo,Ayush Bhandari

from arxiv, 27 pages

In recent years, computational Time-of-Flight (ToF) imaging has emerged as an exciting and a novel imaging modality that offers new and powerful interpretations of natural scenes, with applications extending to 3D, light-in-flight, and non-line-of-sight imaging. Mathematically, ToF imaging relies on algorithmic super-resolution, as the back-scattered sparse light echoes lie on a finer time resolution than what digital devices can capture. Traditional methods necessitate knowledge of the emitted light pulses or kernels and employ sparse deconvolution to recover scenes. Unlike previous approaches, this paper introduces a novel, blind ToF imaging technique that does not require kernel calibration and recovers sparse spikes on a continuum, rather than a discrete grid. By studying the shared characteristics of various ToF modalities, we capitalize on the fact that most physical pulses approximately satisfy the Strang-Fix conditions from approximation theory. This leads to a new mathematical formulation for sparse super-resolution. Our recovery approach uses an optimization method that is pivoted on an alternating minimization strategy. We benchmark our blind ToF method against traditional kernel calibration methods, which serve as the baseline. Extensive hardware experiments across different ToF modalities demonstrate the algorithmic advantages, flexibility and empirical robustness of our approach. We show that our work facilitates super-resolution in scenarios where distinguishing between closely spaced objects is challenging, while maintaining performance comparable to known kernel situations. Examples of light-in-flight imaging and light-sweep videos highlight the practical benefits of our blind super-resolution method in enhancing the understanding of natural scenes.