国产免费一区二区三区在线能观看,欧美日韩精品视频一区二区在线播,男女视频精品网站在线观看免费,亚洲国产日韩欧美网曝香港

Recent advancements in Generative AI, particularly in Large Language Models (LLMs) and Large Vision-Language Models (LVLMs), offer new possibilities for integrating cognitive planning into robotic systems. In this work, we present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies. Our approach enables a robot to navigate unfamiliar environments by leveraging LLMs and LVLMs to understand the semantic structure of the scene. To address the challenge of representing complex environments without overwhelming the system, we propose a 3D modular scene representation, enriched with semantic descriptions. This representation is dynamically pruned using an LLM-based mechanism, which filters irrelevant information and focuses on task-specific data. By combining these elements, our system generates high-level sub-goals that guide the exploration of the robot toward the target object. We validate our approach in simulated environments, demonstrating its ability to enhance object search efficiency while maintaining scalability in complex settings.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 變換 · Learning · 縮放 · 去噪 ·

2024 年 12 月 17 日

Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning

Moritz Reuss,Jyothish Pari,Pulkit Agrawal,Rudolf Lioutikov

Diffusion Policies have become widely used in Imitation Learning, offering several appealing properties, such as generating multimodal and discontinuous behavior. As models are becoming larger to capture more complex capabilities, their computational demands increase, as shown by recent scaling laws. Therefore, continuing with the current architectures will present a computational roadblock. To address this gap, we propose Mixture-of-Denoising Experts (MoDE) as a novel policy for Imitation Learning. MoDE surpasses current state-of-the-art Transformer-based Diffusion Policies while enabling parameter-efficient scaling through sparse experts and noise-conditioned routing, reducing both active parameters by 40% and inference costs by 90% via expert caching. Our architecture combines this efficient scaling with noise-conditioned self-attention mechanism, enabling more effective denoising across different noise levels. MoDE achieves state-of-the-art performance on 134 tasks in four established imitation learning benchmarks (CALVIN and LIBERO). Notably, by pretraining MoDE on diverse robotics data, we achieve 4.01 on CALVIN ABC and 0.95 on LIBERO-90. It surpasses both CNN-based and Transformer Diffusion Policies by an average of 57% across 4 benchmarks, while using 90% fewer FLOPs and fewer active parameters compared to default Diffusion Transformer architectures. Furthermore, we conduct comprehensive ablations on MoDE's components, providing insights for designing efficient and scalable Transformer architectures for Diffusion Policies. Code and demonstrations are available at //mbreuss.github.io/MoDE_Diffusion_Policy/.

標注 · SOFT · 數據集 · 語言模型化 · 可理解性 ·

2024 年 12 月 17 日

Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits

Bohan Li,Jiannan Guan,Longxu Dou,Yunlong Feng,Dingzirui Wang,Yang Xu,Enbo Wang,Qiguang Chen,Bichen Wang,Xiao Xu,Yimeng Zhang,Libo Qin,Yanyan Zhao,Qingfu Zhu,Wanxiang Che

from arxiv, Accepted by COLING 2025. 28 papges, 20 figures, 10 tables

The Myers-Briggs Type Indicator (MBTI) is one of the most influential personality theories reflecting individual differences in thinking, feeling, and behaving. MBTI personality detection has garnered considerable research interest and has evolved significantly over the years. However, this task tends to be overly optimistic, as it currently does not align well with the natural distribution of population personality traits. Specifically, (1) the self-reported labels in existing datasets result in incorrect labeling issues, and (2) the hard labels fail to capture the full range of population personality distributions. In this paper, we optimize the task by constructing MBTIBench, the first manually annotated high-quality MBTI personality detection dataset with soft labels, under the guidance of psychologists. As for the first challenge, MBTIBench effectively solves the incorrect labeling issues, which account for 29.58% of the data. As for the second challenge, we estimate soft labels by deriving the polarity tendency of samples. The obtained soft labels confirm that there are more people with non-extreme personality traits. Experimental results not only highlight the polarized predictions and biases in LLMs as key directions for future research, but also confirm that soft labels can provide more benefits to other psychological tasks than hard labels. The code and data are available at //github.com/Personality-NLP/MbtiBench.

縮放 · 分解的 · 相關系數 · 噪聲 · ML ·

2024 年 12 月 17 日

Scaling up the Banded Matrix Factorization Mechanism for Differentially Private ML

Ryan McKenna

Correlated noise mechanisms such as DP Matrix Factorization (DP-MF) have proven to be effective alternatives to DP-SGD in large-epsilon few-epoch training regimes. Significant work has been done to find the best correlated noise strategies, and the current state-of-the-art approach is DP-BandMF, which optimally balances the benefits of privacy amplification and noise correlation. Despite it's utility advantages, severe scalability limitations prevent this mechanism from handling large-scale training scenarios where the number of training iterations may exceed $10^4$ and the number of model parameters may exceed $10^7$. In this work, we present techniques to scale up DP-BandMF along these two dimensions, significantly extending it's reach and enabling it to handle settings with virtually any number of model parameters and training iterations, with negligible utility degradation.

可行 · 機器人 · 約束 · 講稿 · 操作 ·

2024 年 12 月 16 日

A Feasible Workflow for Retinal Vein Cannulation in Ex Vivo Porcine Eyes with Robotic Assistance

Peiyao Zhang,Peter Gehlbach,Marin Kobilarov,Iulian Iordachita

A potential Retinal Vein Occlusion (RVO) treatment involves Retinal Vein Cannulation (RVC), which requires the surgeon to insert a microneedle into the affected retinal vein and administer a clot-dissolving drug. This procedure presents significant challenges due to human physiological limitations, such as hand tremors, prolonged tool-holding periods, and constraints in depth perception using a microscope. This study proposes a robot-assisted workflow for RVC to overcome these limitations. The test robot is operated through a keyboard. An intraoperative Optical Coherence Tomography (iOCT) system is used to verify successful venous puncture before infusion. The workflow is validated using 12 ex vivo porcine eyes. These early results demonstrate a successful rate of 10 out of 12 cannulations (83.33%), affirming the feasibility of the proposed workflow.

邊 · 邊緣計算 · Neural Networks · Integration · Performer ·

2024 年 12 月 15 日

Deployment Pipeline from Rockpool to Xylo for Edge Computing

Peng Zhou,Dylan R. Muir

Deploying Spiking Neural Networks (SNNs) on the Xylo neuromorphic chip via the Rockpool framework represents a significant advancement in achieving ultra-low-power consumption and high computational efficiency for edge applications. This paper details a novel deployment pipeline, emphasizing the integration of Rockpool's capabilities with Xylo's architecture, and evaluates the system's performance in terms of energy efficiency and accuracy. The unique advantages of the Xylo chip, including its digital spiking architecture and event-driven processing model, are highlighted to demonstrate its suitability for real-time, power-sensitive applications.

知識 (knowledge) · 圖 · 知識圖譜 · 鏈路預測 · MoDELS ·

2024 年 12 月 13 日

A Survey on Knowledge Graph Structure and Knowledge Graph Embeddings

Jeffrey Sardina,John D. Kelleher,Declan O'Sullivan

Knowledge Graphs (KGs) and their machine learning counterpart, Knowledge Graph Embedding Models (KGEMs), have seen ever-increasing use in a wide variety of academic and applied settings. In particular, KGEMs are typically applied to KGs to solve the link prediction task; i.e. to predict new facts in the domain of a KG based on existing, observed facts. While this approach has been shown substantial power in many end-use cases, it remains incompletely characterised in terms of how KGEMs react differently to KG structure. This is of particular concern in light of recent studies showing that KG structure can be a significant source of bias as well as partially determinant of overall KGEM performance. This paper seeks to address this gap in the state-of-the-art. This paper provides, to the authors' knowledge, the first comprehensive survey exploring established relationships of Knowledge Graph Embedding Models and Graph structure in the literature. It is the hope of the authors that this work will inspire further studies in this area, and contribute to a more holistic understanding of KGs, KGEMs, and the link prediction task.

MoDELS · 語言模型化 · Performer · 大語言模型 · 優化器 ·

2024 年 12 月 13 日

Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

Dingkang Yang,Dongling Xiao,Jinjie Wei,Mingcheng Li,Zhaoyu Chen,Ke Li,Lihua Zhang

from arxiv, Accepted by AAAI 2025

Despite their remarkable capabilities, Large Language Models (LLMs) are prone to generate responses that contradict verifiable facts, i.e., unfaithful hallucination content. Existing efforts generally focus on optimizing model parameters or editing semantic representations, which compromise the internal factual knowledge of target LLMs. In addition, hallucinations typically exhibit multifaceted patterns in downstream tasks, limiting the model's holistic performance across tasks. In this paper, we propose a Comparator-driven Decoding-Time (CDT) framework to alleviate the response hallucination. Firstly, we construct hallucinatory and truthful comparators with multi-task fine-tuning samples. In this case, we present an instruction prototype-guided mixture of experts strategy to enhance the ability of the corresponding comparators to capture different hallucination or truthfulness patterns in distinct task instructions. CDT constrains next-token predictions to factuality-robust distributions by contrasting the logit differences between the target LLMs and these comparators. Systematic experiments on multiple downstream tasks show that our framework can significantly improve the model performance and response factuality.

估計/估計量 · 分解的 · 有偏 · 全 · 傳感器 ·

2024 年 12 月 12 日

Full Magnetometer and Gyroscope Bias Estimation using Angular Rates: Theory and Experimental Evaluation of a Factor Graph-Based Approach

Sebastián Rodríguez-Martínez,Giancarlo Troni

from arxiv, 10 pages, 9 figures, submitted to IEEE Journal of Ocean Engineering. arXiv admin note: substantial text overlap with arXiv:2410.13827

Despite their widespread use in determining system attitude, Micro-Electro-Mechanical Systems (MEMS) Attitude and Heading Reference Systems (AHRS) are limited by sensor measurement biases. This paper introduces a method called MAgnetometer and GYroscope Calibration (MAGYC), leveraging three-axis angular rate measurements from an angular rate gyroscope to estimate both the hard- and soft-iron biases of magnetometers as well as the bias of gyroscopes. We present two implementation methods of this approach based on batch and online incremental factor graphs. Our method imposes fewer restrictions on instrument movements required for calibration, eliminates the need for knowledge of the local magnetic field magnitude or instrument's attitude, and facilitates integration into factor graph algorithms for Smoothing and Mapping frameworks. We validate the proposed methods through numerical simulations and in-field experimental evaluations with a sensor onboard an underwater vehicle. By implementing the proposed method in field data of a seafloor mapping dive, the dead reckoning-based position estimation error of the underwater vehicle was reduced from 10% to 0.5% of the distance traveled.

2023 年 5 月 31 日

A Survey on Large Language Models for Recommendation

Likang Wu,Zhi Zheng,Zhaopeng Qiu,Hao Wang,Hongchao Gu,Tingjia Shen,Chuan Qin,Chen Zhu,Hengshu Zhu,Qi Liu,Hui Xiong,Enhong Chen

from arxiv, 10 pages, 3 figures

Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP) and have recently gained significant attention in the domain of Recommendation Systems (RS). These models, trained on massive amounts of data using self-supervised learning, have demonstrated remarkable success in learning universal representations and have the potential to enhance various aspects of recommendation systems by some effective transfer techniques such as fine-tuning and prompt tuning, and so on. The crucial aspect of harnessing the power of language models in enhancing recommendation quality is the utilization of their high-quality representations of textual features and their extensive coverage of external knowledge to establish correlations between items and users. To provide a comprehensive understanding of the existing LLM-based recommendation systems, this survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec), with the latter being systematically sorted out for the first time. Furthermore, we systematically review and analyze existing LLM-based recommendation systems within each paradigm, providing insights into their methodologies, techniques, and performance. Additionally, we identify key challenges and several valuable findings to provide researchers and practitioners with inspiration.

視覺問答 · 自動問答 · MoDELS · 可辨認的 · 注意力機制 ·

2018 年 2 月 15 日

Learning to Count Objects in Natural Images for Visual Question Answering

Yan Zhang,Jonathon Hare,Adam Prügel-Bennett

from arxiv, Published in ICLR 2018

Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.