青柠在线观看免费高清1_日韩一区二区三区免费在线观看_外国一色一情一网一视频_午夜福利视频欧日韩一区二区_国产1024精品免费视频_一级在线播放视频免费观看_国产极品尤物AV一区在线观看

This paper presents MULAN-WC, a novel multi-robot 3D reconstruction framework that leverages wireless signal-based coordination between robots and Neural Radiance Fields (NeRF). Our approach addresses key challenges in multi-robot 3D reconstruction, including inter-robot pose estimation, localization uncertainty quantification, and active best-next-view selection. We introduce a method for using wireless Angle-of-Arrival (AoA) and ranging measurements to estimate relative poses between robots, as well as quantifying and incorporating the uncertainty embedded in the wireless localization of these pose estimates into the NeRF training loss to mitigate the impact of inaccurate camera poses. Furthermore, we propose an active view selection approach that accounts for robot pose uncertainty when determining the next-best views to improve the 3D reconstruction, enabling faster convergence through intelligent view selection. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of our framework in theory and in practice. Leveraging wireless coordination and localization uncertainty-aware training, MULAN-WC can achieve high-quality 3d reconstruction which is close to applying the ground truth camera poses. Furthermore, the quantification of the information gain from a novel view enables consistent rendering quality improvement with incrementally captured images by commending the robot the novel view position. Our hardware experiments showcase the practicality of deploying MULAN-WC to real robotic systems.

相關內容

三維重建

關注 1173

在(zai)計(ji)(ji)(ji)算機(ji)(ji)視(shi)覺(jue)中, 三(san)(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)(zhong)(zhong)建是(shi)(shi)(shi)指(zhi)根據(ju)(ju)單視(shi)圖(tu)或者(zhe)多(duo)(duo)視(shi)圖(tu)的(de)圖(tu)像(xiang)(xiang)重(zhong)(zhong)(zhong)(zhong)建三(san)(san)(san)維(wei)(wei)(wei)(wei)(wei)信息(xi)的(de)過(guo)(guo)程(cheng). 由于單視(shi)頻的(de)信息(xi)不完全,因此三(san)(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)(zhong)(zhong)建需(xu)要(yao)利(li)用(yong)經驗知識. 而(er)多(duo)(duo)視(shi)圖(tu)的(de)三(san)(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)(zhong)(zhong)建(類(lei)似人(ren)的(de)雙目定(ding)位)相(xiang)對比(bi)較容易, 其(qi)方法(fa)是(shi)(shi)(shi)先對攝像(xiang)(xiang)機(ji)(ji)進行標(biao)定(ding), 即計(ji)(ji)(ji)算出攝像(xiang)(xiang)機(ji)(ji)的(de)圖(tu)象坐標(biao)系與世界坐標(biao)系的(de)關系.然后利(li)用(yong)多(duo)(duo)個二(er)維(wei)(wei)(wei)(wei)(wei)圖(tu)象中的(de)信息(xi)重(zhong)(zhong)(zhong)(zhong)建出三(san)(san)(san)維(wei)(wei)(wei)(wei)(wei)信息(xi)。物(wu)體三(san)(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)(zhong)(zhong)建是(shi)(shi)(shi)計(ji)(ji)(ji)算機(ji)(ji)輔助(zhu)幾何(he)(he)(he)(he)設計(ji)(ji)(ji)(CAGD)、計(ji)(ji)(ji)算機(ji)(ji)圖(tu)形學(xue)(xue)(CG)、計(ji)(ji)(ji)算機(ji)(ji)動畫、計(ji)(ji)(ji)算機(ji)(ji)視(shi)覺(jue)、醫學(xue)(xue)圖(tu)像(xiang)(xiang)處理、科(ke)學(xue)(xue)計(ji)(ji)(ji)算和(he)虛擬(ni)現(xian)(xian)實(shi)、數字媒體創(chuang)作等(deng)領域的(de)共性科(ke)學(xue)(xue)問題(ti)和(he)核心技(ji)術。在(zai)計(ji)(ji)(ji)算機(ji)(ji)內生成物(wu)體三(san)(san)(san)維(wei)(wei)(wei)(wei)(wei)表(biao)示(shi)主要(yao)有兩類(lei)方法(fa)。一(yi)類(lei)是(shi)(shi)(shi)使用(yong)幾何(he)(he)(he)(he)建模軟件通(tong)過(guo)(guo)人(ren)機(ji)(ji)交互生成人(ren)為控(kong)制下的(de)物(wu)體三(san)(san)(san)維(wei)(wei)(wei)(wei)(wei)幾何(he)(he)(he)(he)模型,另(ling)一(yi)類(lei)是(shi)(shi)(shi)通(tong)過(guo)(guo)一(yi)定(ding)的(de)手段獲取真實(shi)物(wu)體的(de)幾何(he)(he)(he)(he)形狀。前者(zhe)實(shi)現(xian)(xian)技(ji)術已經十分(fen)成熟,現(xian)(xian)有若干軟件支持,比(bi)如:3DMAX、Maya、AutoCAD、UG等(deng)等(deng),它們一(yi)般使用(yong)具有數學(xue)(xue)表(biao)達式的(de)曲線曲面表(biao)示(shi)幾何(he)(he)(he)(he)形狀。后者(zhe)一(yi)般稱為三(san)(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)(zhong)(zhong)建過(guo)(guo)程(cheng),三(san)(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)(zhong)(zhong)建是(shi)(shi)(shi)指(zhi)利(li)用(yong)二(er)維(wei)(wei)(wei)(wei)(wei)投(tou)影恢復物(wu)體三(san)(san)(san)維(wei)(wei)(wei)(wei)(wei)信息(xi)(形狀等(deng))的(de)數學(xue)(xue)過(guo)(guo)程(cheng)和(he)計(ji)(ji)(ji)算機(ji)(ji)技(ji)術,包括數據(ju)(ju)獲取、預處理、點云拼接(jie)和(he)特征(zheng)分(fen)析等(deng)步驟。

Learning · Processing（編程語言） · Integration · 知識 (knowledge) · Agent ·

2024 年 5 月 2 日

Attention-Driven Multi-Agent Reinforcement Learning: Enhancing Decisions with Expertise-Informed Tasks

Andre R Kuroswiski,Annie S Wu,Angelo Passaro

In this paper, we introduce an alternative approach to enhancing Multi-Agent Reinforcement Learning (MARL) through the integration of domain knowledge and attention-based policy mechanisms. Our methodology focuses on the incorporation of domain-specific expertise into the learning process, which simplifies the development of collaborative behaviors. This approach aims to reduce the complexity and learning overhead typically associated with MARL by enabling agents to concentrate on essential aspects of complex tasks, thus optimizing the learning curve. The utilization of attention mechanisms plays a key role in our model. It allows for the effective processing of dynamic context data and nuanced agent interactions, leading to more refined decision-making. Applied in standard MARL scenarios, such as the Stanford Intelligent Systems Laboratory (SISL) Pursuit and Multi-Particle Environments (MPE) Simple Spread, our method has been shown to improve both learning efficiency and the effectiveness of collaborative behaviors. The results indicate that our attention-based approach can be a viable approach for improving the efficiency of MARL training process, integrating domain-specific knowledge at the action level.

MoDELS · 判別器 · 情景 · 分離的 · 有向 ·

2024 年 5 月 2 日

D2PO: Discriminator-Guided DPO with Response Evaluation Models

Prasann Singhal,Nathan Lambert,Scott Niekum,Tanya Goyal,Greg Durrett

from arxiv, 20 pages, 12 figures

Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO. Although DPO has rapidly gained popularity due to its straightforward training process and competitive results, there is an open question of whether there remain practical advantages of using a discriminator, like a reward model, to evaluate responses. We propose D2PO, discriminator-guided DPO, an approach for the online setting where preferences are being collected throughout learning. As we collect gold preferences, we use these not only to train our policy, but to train a discriminative response evaluation model to silver-label even more synthetic data for policy training. We explore this approach across a set of diverse tasks, including a realistic chat setting, we find that our approach leads to higher-quality outputs compared to DPO with the same data budget, and greater efficiency in terms of preference data requirements. Furthermore, we show conditions under which silver labeling is most helpful: it is most effective when training the policy with DPO, outperforming traditional PPO, and benefits from maintaining a separate discriminator from the policy model.

可理解性 · MoDELS · 標注 · 數據集 · 預測值 ·

2024 年 5 月 2 日

V-FLUTE: Visual Figurative Language Understanding with Textual Explanations

Arkadiy Saakyan,Shreyas Kulkarni,Tuhin Chakrabarty,Smaranda Muresan

Large Vision-Language models (VLMs) have demonstrated strong reasoning capabilities in tasks requiring a fine-grained understanding of literal images and text, such as visual question-answering or visual entailment. However, there has been little exploration of these models' capabilities when presented with images and captions containing figurative phenomena such as metaphors or humor, the meaning of which is often implicit. To close this gap, we propose a new task and a high-quality dataset: Visual Figurative Language Understanding with Textual Explanations (V-FLUTE). We frame the visual figurative language understanding problem as an explainable visual entailment task, where the model has to predict whether the image (premise) entails a claim (hypothesis) and justify the predicted label with a textual explanation. Using a human-AI collaboration framework, we build a high-quality dataset, V-FLUTE, that contains 6,027 <image, claim, label, explanation> instances spanning five diverse multimodal figurative phenomena: metaphors, similes, idioms, sarcasm, and humor. The figurative phenomena can be present either in the image, the caption, or both. We further conduct both automatic and human evaluations to assess current VLMs' capabilities in understanding figurative phenomena.

TransAct · Performer · 圖 · state-of-the-art · 相同 ·

2024 年 5 月 2 日

GTX: A Write-Optimized Latch-free Graph Data System with Transactional Support

Libin Zhou,Yeasir Rayhan,Lu Xing,Walid. G. Aref

from arxiv, 12 pages, 13 figures, submitted to VLDB 2025

This paper introduces GTX a standalone main-memory write-optimized graph system that specializes in structural and graph property updates while maintaining concurrent reads and graph analytics with snapshot isolation-level transactional concurrency. Recent graph libraries target efficient concurrent read and write support while guaranteeing transactional consistency. However, their performance suffers for updates with strong temporal locality over the same vertexes and edges due to vertex-centric lock contentions. GTX introduces a new delta-chain-centric concurrency-control protocol that eliminates traditional mutually exclusive latches. GTX resolves the conflicts caused by vertex-level locking, and adapts to real-life workloads while maintaining sequential access to the graph's adjacency lists storage. This combination of features has been demonstrated to provide good performance in graph analytical queries. GTX's transactions support fast group commit, novel write-write conflict prevention, and lazy garbage collection. Based on extensive experimental and comparative studies, in addition to maintaining competitive concurrent read and analytical performance, GTX demonstrates high throughput over state-of-the-art techniques when handling concurrent transaction+analytics workloads. For write-heavy transactional workloads, GTX performs up to 11x better than the best-performing state-of-the-art systems in transaction throughput. At the same time, GTX does not sacrifice the performance of read-heavy analytical workloads, and has competitive performance similar to state-of-the-art systems.

塑造 · 潛在 · state-of-the-art · 3D · Extensibility ·

2024 年 5 月 2 日

Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields

Yuhang Huang,SHilong Zou,Xinwang Liu,Kai Xu

This paper presents a novel latent 3D diffusion model for the generation of neural voxel fields, aiming to achieve accurate part-aware structures. Compared to existing methods, there are two key designs to ensure high-quality and accurate part-aware generation. On one hand, we introduce a latent 3D diffusion process for neural voxel fields, enabling generation at significantly higher resolutions that can accurately capture rich textural and geometric details. On the other hand, a part-aware shape decoder is introduced to integrate the part codes into the neural voxel fields, guiding the accurate part decomposition and producing high-quality rendering results. Through extensive experimentation and comparisons with state-of-the-art methods, we evaluate our approach across four different classes of data. The results demonstrate the superior generative capabilities of our proposed method in part-aware shape generation, outperforming existing state-of-the-art methods.

回合 · 語言模型化 · MoDELS · Learning · Integration ·

2024 年 5 月 1 日

Sim-Grasp: Learning 6-DOF Grasp Policies for Cluttered Environments Using a Synthetic Benchmark

Juncheng Li,David J. Cappelleri

In this paper, we present Sim-Grasp, a robust 6-DOF two-finger grasping system that integrates advanced language models for enhanced object manipulation in cluttered environments. We introduce the Sim-Grasp-Dataset, which includes 1,550 objects across 500 scenarios with 7.9 million annotated labels, and develop Sim-GraspNet to generate grasp poses from point clouds. The Sim-Grasp-Polices achieve grasping success rates of 97.14% for single objects and 87.43% and 83.33% for mixed clutter scenarios of Levels 1-2 and Levels 3-4 objects, respectively. By incorporating language models for target identification through text and box prompts, Sim-Grasp enables both object-agnostic and target picking, pushing the boundaries of intelligent robotic systems.

search engine · Engineering · 語言模型化 · 秩 · MoDELS ·

2024 年 4 月 30 日

Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models

Alireza Salemi,Hamed Zamani

This paper introduces uRAG--a framework with a unified retrieval engine that serves multiple downstream retrieval-augmented generation (RAG) systems. Each RAG system consumes the retrieval results for a unique purpose, such as open-domain question answering, fact verification, entity linking, and relation extraction. We introduce a generic training guideline that standardizes the communication between the search engine and the downstream RAG systems that engage in optimizing the retrieval model. This lays the groundwork for us to build a large-scale experimentation ecosystem consisting of 18 RAG systems that engage in training and 18 unknown RAG systems that use the uRAG as the new users of the search engine. Using this experimentation ecosystem, we answer a number of fundamental research questions that improve our understanding of promises and challenges in developing search engines for machines.

情景 · 極大 · 控制器 · Learning · 可辨認的 ·

2024 年 4 月 30 日

Data-Driven Permissible Safe Control with Barrier Certificates

Rayan Mazouz,John Skovbekk,Frederik Baymler Mathiesen,Eric Frew,Luca Laurenti,Morteza Lahijanian

This paper introduces a method of identifying a maximal set of safe strategies from data for stochastic systems with unknown dynamics using barrier certificates. The first step is learning the dynamics of the system via Gaussian process (GP) regression and obtaining probabilistic errors for this estimate. Then, we develop an algorithm for constructing piecewise stochastic barrier functions to find a maximal permissible strategy set using the learned GP model, which is based on sequentially pruning the worst controls until a maximal set is identified. The permissible strategies are guaranteed to maintain probabilistic safety for the true system. This is especially important for learning-enabled systems, because a rich strategy space enables additional data collection and complex behaviors while remaining safe. Case studies on linear and nonlinear systems demonstrate that increasing the size of the dataset for learning the system grows the permissible strategy set.

Single-Shot · Branch · 目標檢測 · 推斷 · MS ·

2018 年 4 月 8 日

Single-Shot Object Detection with Enriched Semantics

Zhishuai Zhang,Siyuan Qiao,Cihang Xie,Wei Shen,Bo Wang,Alan L. Yuille

We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.

MoDELS · 注意力機制 · RNN · 標注 · Networking ·

2017 年 12 月 20 日

Order-Free RNN with Visual Attention for Multi-Label Classification

Shang-Fu Chen,Yi-Chen Chen,Chih-Kuan Yeh,Yu-Chiang Frank Wang

from arxiv, Accepted at 32nd AAAI Conference on Artificial Intelligence (AAAI-18)

In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.