青柠在线观看免费高清1_成人免费午夜剧场_欧美啊123区在线观看_亚洲国产专区一区二区麻豆_亚洲AV无码乱码国产精品夜夜嗨_亚洲免费AV一区二区三区_又爽又色1000拍拍拍的视频

The availability of a reliable map and a robust localization system is critical for the operation of an autonomous vehicle. In a modern system, both mapping and localization solutions generally employ convolutional neural network (CNN) --based perception. Hence, any algorithm should consider potential errors in perception for safe and robust functioning. In this work, we present uncertainty-aware panoptic Localization and Mapping (uPLAM), which employs perception uncertainty as a bridge to fuse the perception information with classical localization and mapping approaches. We introduce an uncertainty-based map aggregation technique to create a long-term panoptic bird's eye view map and provide an associated mapping uncertainty. Our map consists of surface semantics and landmarks with unique IDs. Moreover, we present panoptic uncertainty-aware particle filter-based localization. To this end, we propose an uncertainty-based particle importance weight calculation for the adaptive incorporation of perception information into localization. We also present a new dataset for evaluating long-term panoptic mapping and map-based localization. Extensive evaluations showcase that our proposed uncertainty incorporation leads to better mapping with reliable uncertainty estimates and accurate localization. We make our dataset and code available at: \url{//uplam.cs.uni-freiburg.de}

相關內容

穩健性

關注 3

圖像還原 · MINE · INFORMS · MoDELS · INTERACT ·

2024 年 3 月 21 日

AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation

Yuning Cui,Syed Waqas Zamir,Salman Khan,Alois Knoll,Mubarak Shah,Fahad Shahbaz Khan

from arxiv, 28 pages,15 figures

In the image acquisition process, various forms of degradation, including noise, haze, and rain, are frequently introduced. These degradations typically arise from the inherent limitations of cameras or unfavorable ambient conditions. To recover clean images from degraded versions, numerous specialized restoration methods have been developed, each targeting a specific type of degradation. Recently, all-in-one algorithms have garnered significant attention by addressing different types of degradations within a single model without requiring prior information of the input degradation type. However, these methods purely operate in the spatial domain and do not delve into the distinct frequency variations inherent to different degradation types. To address this gap, we propose an adaptive all-in-one image restoration network based on frequency mining and modulation. Our approach is motivated by the observation that different degradation types impact the image content on different frequency subbands, thereby requiring different treatments for each restoration task. Specifically, we first mine low- and high-frequency information from the input features, guided by the adaptively decoupled spectra of the degraded image. The extracted features are then modulated by a bidirectional operator to facilitate interactions between different frequency components. Finally, the modulated features are merged into the original input for a progressively guided restoration. With this approach, the model achieves adaptive reconstruction by accentuating the informative frequency subbands according to different input degradations. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance on different image restoration tasks, including denoising, dehazing, deraining, motion deblurring, and low-light image enhancement. Our code is available at //github.com/c-yn/AdaIR.

語音識別 · 表示學習 · 表示 · Learning · Performer ·

2024 年 3 月 21 日

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

HyoJung Han,Mohamed Anwar,Juan Pino,Wei-Ning Hsu,Marine Carpuat,Bowen Shi,Changhan Wang

Speech recognition and translation systems perform poorly on noisy inputs, which are frequent in realistic environments. Augmenting these systems with visual signals has the potential to improve robustness to noise. However, audio-visual (AV) data is only available in limited amounts and for fewer languages than audio-only resources. To address this gap, we present XLAVS-R, a cross-lingual audio-visual speech representation model for noise-robust speech recognition and translation in over 100 languages. It is designed to maximize the benefits of limited multilingual AV pre-training data, by building on top of audio-only multilingual pre-training and simplifying existing pre-training schemes. Extensive evaluation on the MuAViC benchmark shows the strength of XLAVS-R on downstream audio-visual speech recognition and translation tasks, where it outperforms the previous state of the art by up to 18.5% WER and 4.7 BLEU given noisy AV inputs, and enables strong zero-shot audio-visual ability with audio-only fine-tuning.

評論員 · SLIM · Learning · INFORMS · 互信息 ·

2024 年 3 月 21 日

SLIM: Skill Learning with Multiple Critics

David Emukpere,Bingbing Wu,Julien Perez,Jean-Michel Renders

from arxiv, Accepted at IEEE ICRA 2024

Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment. Latent variable models, based on mutual information maximization, have been successful in this task but still struggle in the context of robotic manipulation. As it requires impacting a possibly large set of degrees of freedom composing the environment, mutual information maximization fails alone in producing useful and safe manipulation behaviors. Furthermore, tackling this by augmenting skill discovery rewards with additional rewards through a naive combination might fail to produce desired behaviors. To address this limitation, we introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation. Our main insight is that utilizing multiple critics in an actor-critic framework to gracefully combine multiple reward functions leads to a significant improvement in latent-variable skill discovery for robotic manipulation while overcoming possible interference occurring among rewards which hinders convergence to useful skills. Furthermore, in the context of tabletop manipulation, we demonstrate the applicability of our novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, significantly surpassing baseline approaches for skill discovery.

YOLOv5 · Networking · 可約的 · 損失函數（機器學習） · 模型評估 ·

2024 年 3 月 20 日

Fostc3net:A Lightweight YOLOv5 Based On the Network Structure Optimization

Danqing Ma,Shaojie Li,Bo Dang,Hengyi Zang,Xinqi Dong

Transmission line detection technology is crucial for automatic monitoring and ensuring the safety of electrical facilities. The YOLOv5 series is currently one of the most advanced and widely used methods for object detection. However, it faces inherent challenges, such as high computational load on devices and insufficient detection accuracy. To address these concerns, this paper presents an enhanced lightweight YOLOv5 technique customized for mobile devices, specifically intended for identifying objects associated with transmission lines. The C3Ghost module is integrated into the convolutional network of YOLOv5 to reduce floating point operations per second (FLOPs) in the feature channel fusion process and improve feature expression performance. In addition, a FasterNet module is introduced to replace the c3 module in the YOLOv5 Backbone. The FasterNet module uses Partial Convolutions to process only a portion of the input channels, improving feature extraction efficiency and reducing computational overhead. To address the imbalance between simple and challenging samples in the dataset and the diversity of aspect ratios of bounding boxes, the wIoU v3 LOSS is adopted as the loss function. To validate the performance of the proposed approach, Experiments are conducted on a custom dataset of transmission line poles. The results show that the proposed model achieves a 1% increase in detection accuracy, a 13% reduction in FLOPs, and a 26% decrease in model parameters compared to the existing YOLOv5.In the ablation experiment, it was also discovered that while the Fastnet module and the CSghost module improved the precision of the original YOLOv5 baseline model, they caused a decrease in the [email protected] metric. However, the improvement of the wIoUv3 loss function significantly mitigated the decline of the [email protected] metric.

可理解性 · MoDELS · 表示 · INTERACT · 可辨認的 ·

2024 年 3 月 20 日

OSCaR: Object State Captioning and State Change Representation

Nguyen Nguyen,Jing Bi,Ali Vosoughi,Yapeng Tian,Pooyan Fazli,Chenliang Xu

from arxiv, NAACL 2024

The capability of intelligent models to extrapolate and comprehend changes in object states is a crucial yet demanding aspect of AI research, particularly through the lens of human interaction in real-world settings. This task involves describing complex visual environments, identifying active objects, and interpreting their changes as conveyed through language. Traditional methods, which isolate object captioning and state change detection, offer a limited view of dynamic environments. Moreover, relying on a small set of symbolic words to represent changes has restricted the expressiveness of the language. To address these challenges, in this paper, we introduce the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark. OSCaR consists of 14,084 annotated video segments with nearly 1,000 unique objects from various egocentric video collections. It sets a new testbed for evaluating multimodal large language models (MLLMs). Our experiments demonstrate that while MLLMs show some skill, they lack a full understanding of object state changes. The benchmark includes a fine-tuned model that, despite initial capabilities, requires significant improvements in accuracy and generalization ability for effective understanding of these changes. Our code and dataset are available at //github.com/nguyennm1024/OSCaR.

估計/估計量 · binary · 3D · 偏移量 · Networking ·

2024 年 3 月 20 日

DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation

Yamin Mao,Zhihua Liu,Weiming Li,SoonYong Cho,Qiang Wang,Xiaoshuai Hao

Depth-based 3D hand pose estimation is an important but challenging research task in human-machine interaction community. Recently, dense regression methods have attracted increasing attention in 3D hand pose estimation task, which provide a low computational burden and high accuracy regression way by densely regressing hand joint offset maps. However, large-scale regression offset values are often affected by noise and outliers, leading to a significant drop in accuracy. To tackle this, we re-formulate 3D hand pose estimation as a dense ordinal regression problem and propose a novel Dense Ordinal Regression 3D Pose Network (DOR3D-Net). Specifically, we first decompose offset value regression into sub-tasks of binary classifications with ordinal constraints. Then, each binary classifier can predict the probability of a binary spatial relationship relative to joint, which is easier to train and yield much lower level of noise. The estimated hand joint positions are inferred by aggregating the ordinal regression results at local positions with a weighted sum. Furthermore, both joint regression loss and ordinal regression loss are used to train our DOR3D-Net in an end-to-end manner. Extensive experiments on public datasets (ICVL, MSRA, NYU and HANDS2017) show that our design provides significant improvements over SOTA methods.

state-of-the-art · Analysis · 控制器 · 類別 · 操作 ·

2024 年 3 月 19 日

ReProbes: An Architecture for Reconfigurable and Adaptive Probes

Federico Alessi,Alessandro Tundo,Marco Mobilio,Oliviero Riganelli,Leonardo Mariani

Modern distributed systems are highly dynamic and scalable, requiring monitoring solutions that can adapt to rapid changes. Monitoring systems that rely on external probes can only achieve adaptation through expensive operations such as deployment, undeployment, and reconfiguration. This poster paper introduces ReProbes, a class of adaptive monitoring probes that can handle rapid changes in data collection strategies. ReProbe offers controllable and configurable self-adaptive capabilities for data transmission, collection, and analysis methods. The resulting architecture can effectively enhance probe adaptability when qualitatively compared to state-of-the-art monitoring solutions.

回合 · 路徑 · Networking · Automator · 整數線性規劃 ·

2024 年 3 月 19 日

MARPF: Multi-Agent and Multi-Rack Path Finding

Hiroya Makino,Yoshihiro Ohama,Seigo Ito

from arxiv, 6 pages, 10 figures, submitted to IROS 2024

In environments where many automated guided vehicles (AGVs) operate, planning efficient, collision-free paths is essential. Related research has mainly focused on environments with static passages, resulting in space inefficiency. We define multi-agent and multi-rack path finding (MARPF) as the problem of planning paths for AGVs to convey target racks to their designated locations in environments without passages. In such environments, an AGV without a rack can pass under racks, whereas an AGV with a rack cannot pass under racks to avoid collisions. MARPF entails conveying the target racks without collisions, while the other obstacle racks are positioned without a specific arrangement. AGVs are essential for relocating other racks to prevent any interference with the target racks. We formulated MARPF as an integer linear programming problem in a network flow. To distinguish situations in which an AGV is or is not loading a rack, the proposed method introduces two virtual layers into the network. We optimized the AGVs' movements to move obstacle racks and convey the target racks. The formulation and applicability of the algorithm were validated through numerical experiments. The results indicated that the proposed algorithm addressed issues in environments with dense racks.

泛化理論 · Vision · domain shift · 對象識別 · 行人重識別 ·

2021 年 7 月 18 日

Domain Generalization in Vision: A Survey

Kaiyang Zhou,Ziwei Liu,Yu Qiao,Tao Xiang,Chen Change Loy

from arxiv, v4: includes the word "vision" in the title; improves the organization and clarity in Section 2-3; adds future directions; and more

Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce. This is because most learning algorithms strongly rely on the i.i.d.~assumption on source/target data, which is often violated in practice due to domain shift. Domain generalization (DG) aims to achieve OOD generalization by using only source data for model learning. Since first introduced in 2011, research in DG has made great progresses. In particular, intensive research in this topic has led to a broad spectrum of methodologies, e.g., those based on domain alignment, meta-learning, data augmentation, or ensemble learning, just to name a few; and has covered various vision applications such as object recognition, segmentation, action recognition, and person re-identification. In this paper, for the first time a comprehensive literature review is provided to summarize the developments in DG for computer vision over the past decade. Specifically, we first cover the background by formally defining DG and relating it to other research fields like domain adaptation and transfer learning. Second, we conduct a thorough review into existing methods and present a categorization based on their methodologies and motivations. Finally, we conclude this survey with insights and discussions on future research directions.

Performer · 判別器 · 正例 · 假陽性 · 監督 ·

2018 年 5 月 24 日

DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction

Pengda Qin,Weiran Xu,William Yang Wang

Distant supervision can effectively label data for relation extraction, but suffers from the noise labeling problem. Recent works mainly perform soft bag-level noise reduction strategies to find the relatively better samples in a sentence bag, which is suboptimal compared with making a hard decision of false positive samples in sentence level. In this paper, we introduce an adversarial learning framework, which we named DSGAN, to learn a sentence-level true-positive generator. Inspired by Generative Adversarial Networks, we regard the positive samples generated by the generator as the negative samples to train the discriminator. The optimal generator is obtained until the discrimination ability of the discriminator has the greatest decline. We adopt the generator to filter distant supervision training dataset and redistribute the false positive instances into the negative set, in which way to provide a cleaned dataset for relation classification. The experimental results show that the proposed strategy significantly improves the performance of distant supervision relation extraction comparing to state-of-the-art systems.