亚洲精品无码国产爽快A片百度,女人让男人桶爽在线观看,欧美精品日韩精品国内精品

Xiang Li,Cristina Mata,Jongwoo Park,Kumara Kahatapitiya,Yoo Sung Jang,Jinghuan Shang,Kanchana Ranasinghe,Ryan Burgert,Mu Cai,Yong Jae Lee,Michael S. Ryoo

LLMs with visual inputs, i.e., Vision Language Models (VLMs), have the capacity to process state information as visual-textual prompts and respond with policy decisions in text. We propose LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as conversations and provides improved action outputs when trained with auxiliary data that complements policy learning. We first introduce an automated pipeline to generate conversation-style instruction tuning data from existing behavior cloning data. Then we enrich the dataset in a self-supervised fashion by formulating six auxiliary tasks. A VLM finetuned with the resulting collection of datasets can generate meaningful robot action policy decisions. Our experiments across multiple simulated and real-world environments demonstrate the state-of-the-art performance of the proposed LLaRA framework. The code, datasets, and pretrained models are available at //github.com/LostXine/LLaRA.

相關內容

LLaRA

關注 0

圖 · 縮放 · 不變 · 尺度不變性 · Learning ·

2024 年 11 月 13 日

ScaleNet: Scale Invariance Learning in Directed Graphs

Qin Jiang,Chengjia Wang,Michael Lones,Wei Pang

from arxiv, Scale invariance in node classification is demonstrated and applied in graph transformation to develop ScaleNet, which achieves state-of-the-art performance on both homophilic and heterophilic directed graphs

Graph Neural Networks (GNNs) have advanced relational data analysis but lack invariance learning techniques common in image classification. In node classification with GNNs, it is actually the ego-graph of the center node that is classified. This research extends the scale invariance concept to node classification by drawing an analogy to image processing: just as scale invariance being used in image classification to capture multi-scale features, we propose the concept of ``scaled ego-graphs''. Scaled ego-graphs generalize traditional ego-graphs by replacing undirected single-edges with ``scaled-edges'', which are ordered sequences of multiple directed edges. We empirically assess the performance of the proposed scale invariance in graphs on seven benchmark datasets, across both homophilic and heterophilic structures. Our scale-invariance-based graph learning outperforms inception models derived from random walks by being simpler, faster, and more accurate. The scale invariance explains inception models' success on homophilic graphs and limitations on heterophilic graphs. To ensure applicability of inception model to heterophilic graphs as well, we further present ScaleNet, an architecture that leverages multi-scaled features. ScaleNet achieves state-of-the-art results on five out of seven datasets (four homophilic and one heterophilic) and matches top performance on the remaining two, demonstrating its excellent applicability. This represents a significant advance in graph learning, offering a unified framework that enhances node classification across various graph types. Our code is available at //github.com/Qin87/ScaleNet/tree/July25.

Learning · AI · 數據集 · 可理解性 · 確切的 ·

2024 年 11 月 13 日

V-LoL: A Diagnostic Dataset for Visual Logical Learning

Lukas Helff,Wolfgang Stammer,Hikaru Shindo,Devendra Singh Dhami,Kristian Kersting

Despite the successes of recent developments in visual AI, different shortcomings still exist; from missing exact logical reasoning, to abstract generalization abilities, to understanding complex and noisy scenes. Unfortunately, existing benchmarks, were not designed to capture more than a few of these aspects. Whereas deep learning datasets focus on visually complex data but simple visual reasoning tasks, inductive logic datasets involve complex logical learning tasks, however, lack the visual component. To address this, we propose the diagnostic visual logical learning dataset, V-LoL, that seamlessly combines visual and logical challenges. Notably, we introduce the first instantiation of V-LoL, V-LoL-Train, - a visual rendition of a classic benchmark in symbolic AI, the Michalski train problem. By incorporating intricate visual scenes and flexible logical reasoning tasks within a versatile framework, V-LoL-Train provides a platform for investigating a wide range of visual logical learning challenges. We evaluate a variety of AI systems including traditional symbolic AI, neural AI, as well as neuro-symbolic AI. Our evaluations demonstrate that even SOTA AI faces difficulties in dealing with visual logical learning challenges, highlighting unique advantages and limitations of each methodology. Overall, V-LoL opens up new avenues for understanding and enhancing current abilities in visual logical learning for AI systems.

收縮 · 查準率/準確率 · TOOLS · Analysis · 代碼 ·

2024 年 11 月 12 日

SoliDiffy: AST Differencing for Solidity Smart Contracts

Mojtaba Eshghie,Viktor ?ryd,Martin Monperrus,Cyrille Artho

Smart contracts, primarily written in Solidity, are integral to blockchain software applications, yet precise analysis and maintenance are hindered by the limitations of existing differencing tools. We introduce SoliDiffy, a novel Abstract Syntax Tree (AST) differencing tool specifically designed for Solidity. SoliDiffy enables fine-grained analysis by generating accurate and concise edit scripts of smart contracts, making it ideal for downstream tasks such as vulnerability detection, automated code repair, and code reviews. Our comprehensive evaluation on a large dataset of real-world Solidity contracts demonstrates that SoliDiffy delivers shorter and more precise edit scripts compared to state-of-the-art tools, while performing consistently in complex contract modifications. SoliDiffy is made publicly available at //github.com/mojtaba-eshghie/SoliDiffy.

代碼 · Analysis · Integration · 語言模型化 · MoDELS ·

2024 年 11 月 11 日

ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation

Xue Jiang,Yihong Dong,Yongding Tao,Huanyu Liu,Zhi Jin,Wenpin Jiao,Ge Li

from arxiv, ICSE 2025

Large language models (LLMs) have achieved impressive performance in code generation recently, offering programmers revolutionary assistance in software development. However, due to the auto-regressive nature of LLMs, they are susceptible to error accumulation during code generation. Once an error is produced, LLMs can merely continue to generate the subsequent code conditioned on it, given their inability to adjust previous outputs. Existing LLM-based approaches typically consider post-revising after code generation, leading to the challenging resolution of accumulated errors and the significant wastage of resources. Ideally, LLMs should rollback and resolve the occurred error in time during code generation, rather than proceed on the basis of the error and wait for post-revising after generation. In this paper, we propose ROCODE, which integrates the backtracking mechanism and program analysis into LLMs for code generation. Specifically, we employ program analysis to perform incremental error detection during the generation process. When an error is detected, the backtracking mechanism is triggered to priming rollback strategies and constraint regeneration, thereby eliminating the error early and ensuring continued generation on the correct basis. Experiments on multiple code generation benchmarks show that ROCODE can significantly reduce the errors generated by LLMs, with a compilation pass rate of 99.1%. The test pass rate is improved by up to 23.8% compared to the best baseline approach. Compared to the post-revising baseline, the token cost is reduced by 19.3%. Moreover, our approach is model-agnostic and achieves consistent improvements across nine representative LLMs.

機器人 · Learning · 設計 · Principle · Integration ·

2024 年 11 月 9 日

TiniScript: A Simplified Language for Educational Robotics

Gabriel Gonzalo Guzman Ramos,Pedro Jesus Guzman Ramos

from arxiv, 10 pages, 5 figures. For associated resources and block-based programming interface, see tinibot.pe. This work explores TiniScripts design for simplified, real-time robotics programming aimed at educational environments, emphasizing accessibility and creative engagement in STEM learning

TiniScript is an intermediate programming language designed for educational robotics, aligned with STEM principles to foster integrative learning experiences. With its minimalist single-line syntax, such as F(2, 80) , TiniScript simplifies robotic programming, allowing users to bypass complex code uploading processes and enabling realtime direct instruction transmission. Thanks to its preloaded interpreter, TiniScript decouples programming from hardware, significantly reducing wait times. Instructions can be sent wirelessly from any Bluetooth enabled device, making TiniScript adaptable to various robots. This adaptability optimizes iterative and collaborative learning, allowing students to focus on the creative aspects of robotics. This paper explores TiniScripts design principles, syntax, and practical applications, highlighting its potential to make robotics programming more accessible and effective in developing critical thinking skills.

話題 · MoDELS · 話題模型 · 噪聲 · 評論員 ·

2024 年 11 月 8 日

BERTrend: Neural Topic Modeling for Emerging Trends Detection

Allaa Boutaleb,Jerome Picault,Guillaume Grosjean

from arxiv, 17 pages, 12 figures, FuturED 2024: Workshop on Future of Event Detection (CoLocated with EMNLP 2024)

Detecting and tracking emerging trends and weak signals in large, evolving text corpora is vital for applications such as monitoring scientific literature, managing brand reputation, surveilling critical infrastructure and more generally to any kind of text-based event detection. Existing solutions often fail to capture the nuanced context or dynamically track evolving patterns over time. BERTrend, a novel method, addresses these limitations using neural topic modeling in an online setting. It introduces a new metric to quantify topic popularity over time by considering both the number of documents and update frequency. This metric classifies topics as noise, weak, or strong signals, flagging emerging, rapidly growing topics for further investigation. Experimentation on two large real-world datasets demonstrates BERTrend's ability to accurately detect and track meaningful weak signals while filtering out noise, offering a comprehensive solution for monitoring emerging trends in large-scale, evolving text corpora. The method can also be used for retrospective analysis of past events. In addition, the use of Large Language Models together with BERTrend offers efficient means for the interpretability of trends of events.

MoDELS · 語言模型化 · 大語言模型 · DATE · Performer ·

2024 年 11 月 8 日

LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction

Andre Niyongabo Rubungo,Kangming Li,Jason Hattrick-Simpers,Adji Bousso Dieng

from arxiv, Accepted at NeurIPS 2024-AI4Mat Workshop. The Benchmark and code can be found at: //github.com/vertaix/LLM4Mat-Bench

Large language models (LLMs) are increasingly being used in materials science. However, little attention has been given to benchmarking and standardized evaluation for LLM-based materials property prediction, which hinders progress. We present LLM4Mat-Bench, the largest benchmark to date for evaluating the performance of LLMs in predicting the properties of crystalline materials. LLM4Mat-Bench contains about 1.9M crystal structures in total, collected from 10 publicly available materials data sources, and 45 distinct properties. LLM4Mat-Bench features different input modalities: crystal composition, CIF, and crystal text description, with 4.7M, 615.5M, and 3.1B tokens in total for each modality, respectively. We use LLM4Mat-Bench to fine-tune models with different sizes, including LLM-Prop and MatBERT, and provide zero-shot and few-shot prompts to evaluate the property prediction capabilities of LLM-chat-like models, including Llama, Gemma, and Mistral. The results highlight the challenges of general-purpose LLMs in materials science and the need for task-specific predictive models and task-specific instruction-tuned LLMs in materials property prediction.

MoDELS · Taxonomy · 語言模型化 · 可理解性 · Performance ·

2023 年 9 月 2 日

Explainability for Large Language Models: A Survey

Haiyan Zhao,Hanjie Chen,Fan Yang,Ninghao Liu,Huiqi Deng,Hengyi Cai,Shuaiqiang Wang,Dawei Yin,Mengnan Du

Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this paper, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations, and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional machine learning models.

INFORMS · 語言模型化 · IR · MoDELS · Integration ·

2023 年 8 月 15 日

Large Language Models for Information Retrieval: A Survey

Yutao Zhu,Huaying Yuan,Shuting Wang,Jiongnan Liu,Wenhan Liu,Chenlong Deng,Zhicheng Dou,Ji-Rong Wen

As a primary means of information acquisition, information retrieval (IR) systems, such as search engines, have integrated themselves into our daily lives. These systems also serve as components of dialogue, question-answering, and recommender systems. The trajectory of IR has evolved dynamically from its origins in term-based methods to its integration with advanced neural models. While the neural models excel at capturing complex contextual signals and semantic nuances, thereby reshaping the IR landscape, they still face challenges such as data scarcity, interpretability, and the generation of contextually plausible yet potentially inaccurate responses. This evolution requires a combination of both traditional methods (such as term-based sparse retrieval methods with rapid response) and modern neural architectures (such as language models with powerful language understanding capacity). Meanwhile, the emergence of large language models (LLMs), typified by ChatGPT and GPT-4, has revolutionized natural language processing due to their remarkable language understanding, generation, generalization, and reasoning abilities. Consequently, recent research has sought to leverage LLMs to improve IR systems. Given the rapid evolution of this research trajectory, it is necessary to consolidate existing methodologies and provide nuanced insights through a comprehensive overview. In this survey, we delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers. Additionally, we explore promising directions within this expanding field.

任務對話系統 · INFORMS · 圖 · Networking · entity ·

2020 年 8 月 11 日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Xiaoze Jiang,Siyi Du,Zengchang Qin,Yajing Sun,Jing Yu

from arxiv, Accepted by the 28th ACM International Conference on Multimedia (ACM MM 2020)

Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts. Classical approaches pay more attention to the integration of the current question, vision knowledge and text knowledge, despising the heterogeneous semantic gaps between the cross-modal information. In the meantime, the concatenation operation has become de-facto standard to the cross-modal information fusion, which has a limited ability in information retrieval. In this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by using graph to bridge the cross-modal semantic relations between vision and text knowledge in fine granularity, as well as retrieving required knowledge via an adaptive information selection mode. Moreover, the reasoning clues for visual dialogue can be clearly drawn from intra-modal entities and inter-modal bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets demonstrate that our model outperforms exiting models with state-of-the-art results.