秋霞网一区二区三区_91日韩专区综合第一页_久久久毛片无码免费收看_国产欧美日韩高清乱码_在线看国产黄AV免费_91自拍与自偷拍精品_在线精品亚洲欧美日韩国产

Questions combine our mastery of language with our remarkable facility for reasoning about uncertainty. How do people navigate vast hypothesis spaces to pose informative questions given limited cognitive resources? We study these tradeoffs in a classic grounded question-asking task based on the board game Battleship. Our language-informed program sampling (LIPS) model uses large language models (LLMs) to generate natural language questions, translate them into symbolic programs, and evaluate their expected information gain. We find that with a surprisingly modest resource budget, this simple Monte Carlo optimization strategy yields informative questions that mirror human performance across varied Battleship board scenarios. In contrast, LLM-only baselines struggle to ground questions in the board state; notably, GPT-4V provides no improvement over non-visual baselines. Our results illustrate how Bayesian models of question-asking can leverage the statistics of language to capture human priors, while highlighting some shortcomings of pure LLMs as grounded reasoners.

相關內容

INFORMS

關注 10

《計算機信息》雜志發表高質量的論文，擴大了運籌學和計算的范圍，尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文，以及描述新的和有用的軟件工具的論文。官網鏈接： · 語言模型化 · MoDELS · 大語言模型 · TOOLS ·

2024 年 6 月 12 日

LLM-Craft: Robotic Crafting of Elasto-Plastic Objects with Large Language Models

Alison Bartsch,Amir Barati Farimani

When humans create sculptures, we are able to reason about how geometrically we need to alter the clay state to reach our target goal. We are not computing point-wise similarity metrics, or reasoning about low-level positioning of our tools, but instead determining the higher-level changes that need to be made. In this work, we propose LLM-Craft, a novel pipeline that leverages large language models (LLMs) to iteratively reason about and generate deformation-based crafting action sequences. We simplify and couple the state and action representations to further encourage shape-based reasoning. To the best of our knowledge, LLM-Craft is the first system successfully leveraging LLMs for complex deformable object interactions. Through our experiments, we demonstrate that with the LLM-Craft framework, LLMs are able to successfully reason about the deformation behavior of elasto-plastic objects. Furthermore, we find that LLM-Craft is able to successfully create a set of simple letter shapes. Finally, we explore extending the framework to reaching more ambiguous semantic goals, such as "thinner" or "bumpy". For videos please see our website: //sites.google.com/andrew.cmu.edu/llmcraft.

CASES · Chatbot · Prompt · MoDELS · 大語言模型 ·

2024 年 6 月 12 日

RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots

Philip Feldman,James R. Foulds,Shimei Pan

from arxiv, 7 Pages, 1 Figure, 1 Table

Large language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs.

線性的 · Attention · 解碼 · MoDELS · 語言模型化 ·

2024 年 6 月 11 日

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Haoran You,Yichao Fu,Zheng Wang,Amir Yazdanbakhsh, Yingyan, Lin

from arxiv, Accepted by ICML 2024; 17 pages; 10 figures; 16 tables

Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation. While linear attention and speculative decoding offer potential solutions, their applicability and synergistic potential for enhancing autoregressive LLMs remain uncertain. We conduct the first comprehensive study on the efficacy of existing linear attention methods for autoregressive LLMs, integrating them with speculative decoding. We introduce an augmentation technique for linear attention that ensures compatibility with speculative decoding, enabling more efficient training and serving of LLMs. Extensive experiments and ablation studies involving seven existing linear attention models and five encoder/decoder-based LLMs consistently validate the effectiveness of our augmented linearized LLMs. Notably, our approach achieves up to a 6.67 reduction in perplexity on the LLaMA model and up to a 2$\times$ speedup during generation compared to prior linear attention methods. Codes and models are available at //github.com/GATECH-EIC/Linearized-LLM.

Agent · 模型評估 · Analysis · 論文 · 自然語言處理 ·

2024 年 6 月 11 日

Speaking Your Language: Spatial Relationships in Interpretable Emergent Communication

Olaf Lipinski,Adam J. Sobey,Federico Cerutti,Timothy J. Norman

from arxiv, 16 pages, 3 figures

Effective communication requires the ability to refer to specific parts of an observation in relation to others. While emergent communication literature shows success in developing various language properties, no research has shown the emergence of such positional references. This paper demonstrates how agents can communicate about spatial relationships within their observations. The results indicate that agents can develop a language capable of expressing the relationships between parts of their observation, achieving over 90% accuracy when trained in a referential game which requires such communication. Using a collocation measure, we demonstrate how the agents create such references. This analysis suggests that agents use a mixture of non-compositional and compositional messages to convey spatial relationships. We also show that the emergent language is interpretable by humans. The translation accuracy is tested by communicating with the receiver agent, where the receiver achieves over 78% accuracy using parts of this lexicon, confirming that the interpretation of the emergent language was successful.

Analysis · 可理解性 · AI · MoDELS · INTERACT ·

2024 年 6 月 11 日

Deception Analysis with Artificial Intelligence: An Interdisciplinary Perspective

Stefan Sarkadi

from arxiv, Work in progress

Humans and machines interact more frequently than ever and our societies are becoming increasingly hybrid. A consequence of this hybridisation is the degradation of societal trust due to the prevalence of AI-enabled deception. Yet, despite our understanding of the role of trust in AI in the recent years, we still do not have a computational theory to be able to fully understand and explain the role deception plays in this context. This is a problem because while our ability to explain deception in hybrid societies is delayed, the design of AI agents may keep advancing towards fully autonomous deceptive machines, which would pose new challenges to dealing with deception. In this paper we build a timely and meaningful interdisciplinary perspective on deceptive AI and reinforce a 20 year old socio-cognitive perspective on trust and deception, by proposing the development of DAMAS -- a holistic Multi-Agent Systems (MAS) framework for the socio-cognitive modelling and analysis of deception. In a nutshell this paper covers the topic of modelling and explaining deception using AI approaches from the perspectives of Computer Science, Philosophy, Psychology, Ethics, and Intelligence Analysis.

虛擬現實（VR） · INTERACT · VR · 可辨認的 · 可約的 ·

2024 年 6 月 7 日

Mind Mansion: Exploring Metaphorical Interactions to Engage with Negative Thoughts in Virtual Reality

Julian Rasch,Michelle Johanna Zender,Sophia Sakel,Nadine Wagener

from arxiv, To appear in Proceedings of the Designing Interactive Systems Conference (DIS '24), July 1-5, 2024, IT University of Copenhagen, Denmark

Recurrent negative thoughts can significantly disrupt daily life and contribute to negative emotional states. Facing, confronting, and noticing such thoughts without support can be challenging. To provide a playful setting and leverage the technical maturation of Virtual Reality (VR), our VR experience, Mind Mansion, places the user in an initially cluttered virtual apartment. Here we utilize established concepts from traditional therapy and metaphors identified in prior works to let users engage metaphorically with representations of thoughts, gradually sorting the space, fostering awareness of thoughts, and supporting mental self-care. The results of our user study (n = 30) reveal that Mind Mansion encourages the exploration of alternative perspectives, fosters acceptance, and potentially offers new coping mechanisms. Our findings suggest that this VR intervention can reduce negative affect and improve overall emotional awareness.

MoDELS · INFORMS · Machine Learning · 機器學習模型 · 得分 ·

2024 年 6 月 7 日

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

Sanjoy Chowdhury,Sayan Nag,K J Joseph,Balaji Vasan Srinivasan,Dinesh Manocha

from arxiv, Accepted at CVPR 2024 as Highlight paper. Webpage: //schowdhury671.github.io/melfusion_cvpr2024/

Music is a universal language that can communicate emotions and feelings. It forms an essential part of the whole spectrum of creative media, ranging from movies to social media posts. Machine learning models that can synthesize music are predominantly conditioned on textual descriptions of it. Inspired by how musicians compose music not just from a movie script, but also through visualizations, we propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. MeLFusion is a text-to-music diffusion model with a novel "visual synapse", which effectively infuses the semantics from the visual modality into the generated music. To facilitate research in this area, we introduce a new dataset MeLBench, and propose a new evaluation metric IMSM. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music, measured both objectively and subjectively, with a relative gain of up to 67.98% on the FAD score. We hope that our work will gather attention to this pragmatic, yet relatively under-explored research area.

MoDELS · 語言模型化 · 大語言模型 · Processing（編程語言） · Notability ·

2024 年 6 月 7 日

CLOMO: Counterfactual Logical Modification with Large Language Models

Yinya Huang,Ruixin Hong,Hongming Zhang,Wei Shao,Zhicheng Yang,Dong Yu,Changshui Zhang,Xiaodan Liang,Linqi Song

In this study, we delve into the realm of counterfactual reasoning capabilities of large language models (LLMs). Our primary objective is to cultivate the counterfactual thought processes within LLMs and rigorously assess these processes for their validity. Specifically, we introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark. In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship. To effectively evaluate a generation model's counterfactual capabilities, we propose an innovative evaluation metric, the decomposed Self-Evaluation Score (SES) to directly evaluate the natural language output of LLMs instead of modeling the task as a multiple-choice problem. Analysis shows that the proposed automatic metric aligns well with human preference. Our experimental results show that while LLMs demonstrate a notable capacity for logical counterfactual thinking, there remains a discernible gap between their current abilities and human performance. Code and data are available at //github.com/Eleanor-H/CLOMO.

數據集 · Integration · 地球 · TOOLS · Processing（編程語言） ·

2024 年 6 月 6 日

M3LEO: A Multi-Modal, Multi-Label Earth Observation Dataset Integrating Interferometric SAR and RGB Data

Matthew J Allen,Francisco Dorr,Joseph Alejandro Gallego Mejia,Laura Martínez-Ferrer,Anna Jungbluth,Freddie Kalaitzis,Raúl Ramos-Pollán

from arxiv, 9 pages, 2 figures

Satellite-based remote sensing has revolutionised the way we address global challenges in a rapidly evolving world. Huge quantities of Earth Observation (EO) data are generated by satellite sensors daily, but processing these large datasets for use in ML pipelines is technically and computationally challenging. Specifically, different types of EO data are often hosted on a variety of platforms, with differing availability for Python preprocessing tools. In addition, spatial alignment across data sources and data tiling can present significant technical hurdles for novice users. While some preprocessed EO datasets exist, their content is often limited to optical or near-optical wavelength data, which is ineffective at night or in adverse weather conditions. Synthetic Aperture Radar (SAR), an active sensing technique based on microwave length radiation, offers a viable alternative. However, the application of machine learning to SAR has been limited due to a lack of ML-ready data and pipelines, particularly for the full diversity of SAR data, including polarimetry, coherence and interferometry. We introduce M3LEO, a multi-modal, multi-label EO dataset that includes polarimetric, interferometric, and coherence SAR data derived from Sentinel-1, alongside Sentinel-2 RGB imagery and a suite of labelled tasks for model evaluation. M3LEO spans 17.5TB and contains approximately 10M data chips across six geographic regions. The dataset is complemented by a flexible PyTorch Lightning framework, with configuration management using Hydra. We provide tools to process any dataset available on popular platforms such as Google Earth Engine for integration with our framework. Initial experiments validate the utility of our data and framework, showing that SAR imagery contains information additional to that extractable from RGB data. Data at huggingface.co/M3LEO, and code at github.com/spaceml-org/M3LEO.

代碼 · Prompt · Python · 查全率/召回率 · 模型評估 ·

2024 年 6 月 6 日

Refactoring to Pythonic Idioms: A Hybrid Knowledge-Driven Approach Leveraging Large Language Models

Zejun Zhang,Zhenchang Xing,Xiaoxue Ren,Qinghua Lu,Xiwei Xu

from arxiv, Accepted by FSE 2024,22 pages

Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting a rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptability of LLMs, we propose a hybrid approach consisting of three modules. We not only write prompts to instruct LLMs to complete tasks, but we also invoke Analytic Rule Interfaces (ARIs) to accomplish tasks. The ARIs are Python code generated by prompting LLMs to generate code. We first construct a knowledge module with three elements including ASTscenario, ASTcomponent and Condition, and prompt LLMs to generate Python code for incorporation into an ARI library for subsequent use. After that, for any syntax-error-free Python code, we invoke ARIs from the ARI library to extract ASTcomponent from the ASTscenario, and then filter out ASTcomponent that does not meet the condition. Finally, we design prompts to instruct LLMs to abstract and idiomatize code, and then invoke ARIs from the ARI library to rewrite non-idiomatic code into the idiomatic code. Next, we conduct a comprehensive evaluation of our approach, RIdiom, and Prompt-LLM on nine established Pythonic idioms in RIdiom. Our approach exhibits superior accuracy, F1-score, and recall, while maintaining precision levels comparable to RIdiom, all of which consistently exceed or come close to 90% for each metric of each idiom. Lastly, we extend our evaluation to encompass four new Pythonic idioms. Our approach consistently outperforms Prompt-LLM, achieving metrics with values consistently exceeding 90% for accuracy, F1-score, precision, and recall.