国产裸体美女永久免费无遮挡久久_精品国产V一区二区三区_欧美丰满熟妇肥老太牲交视频_国产色一区二区免费在线观看_性色AV一区二区三区观看_免费国免费国产在线观看_欧美精品国产综合一区二区三区

Semantic scene segmentation from a bird's-eye-view (BEV) perspective plays a crucial role in facilitating planning and decision-making for mobile robots. Although recent vision-only methods have demonstrated notable advancements in performance, they often struggle under adverse illumination conditions such as rain or nighttime. While active sensors offer a solution to this challenge, the prohibitively high cost of LiDARs remains a limiting factor. Fusing camera data with automotive radars poses a more inexpensive alternative but has received less attention in prior research. In this work, we aim to advance this promising avenue by introducing BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data, which is then leveraged to efficiently initialize the lifting of image features into the BEV space. We perform extensive experiments on the nuScenes dataset and demonstrate that BEVCar outperforms the current state of the art. Moreover, we show that incorporating radar information significantly enhances robustness in challenging environmental conditions and improves segmentation performance for distant objects. To foster future research, we provide the weather split of the nuScenes dataset used in our experiments, along with our code and trained models at //bevcar.cs.uni-freiburg.de.

相關內容

Performer

關注 10

massive MIMO · MIMO · 通道 · 查準率/準確率 · INFORMS ·

2024 年 4 月 30 日

Pilot Contamination in Massive MIMO Systems: Challenges and Future Prospects

Muhammad Kamran Saeed,Ashfaq Khokhar,Shakil Ahmed

from arxiv, Accepted At IWCMC 2024 Comm & SP Symposium

Massive multiple input multiple output (M-MIMO) technology plays a pivotal role in fifth-generation (5G) and beyond communication systems, offering a wide range of benefits, from increased spectral efficiency (SE) to enhanced energy efficiency and higher reliability. However, these advantages are contingent upon precise channel state information (CSI) availability at the base station (BS). Ensuring precise CSI is challenging due to the constrained size of the coherence interval and the resulting limitations on pilot sequence length. Therefore, reusing pilot sequences in adjacent cells introduces pilot contamination, hindering SE enhancement. This paper reviews recent advancements and addresses research challenges in mitigating pilot contamination and improving channel estimation, categorizing the existing research into three broader categories: pilot assignment schemes, advanced signal processing methods, and advanced channel estimation techniques. Salient representative pilot mitigation/assignment techniques are analyzed and compared in each category. Lastly, possible future research directions are discussed.

MINE · Learning · Integration · 過擬合 · state-of-the-art ·

2024 年 4 月 29 日

CVTN: Cross Variable and Temporal Integration for Time Series Forecasting

Han Zhou,Yuntian Chen

In multivariate time series forecasting, the Transformer architecture encounters two significant challenges: effectively mining features from historical sequences and avoiding overfitting during the learning of temporal dependencies. To tackle these challenges, this paper deconstructs time series forecasting into the learning of historical sequences and prediction sequences, introducing the Cross-Variable and Time Network (CVTN). This unique method divides multivariate time series forecasting into two phases: cross-variable learning for effectively mining fea tures from historical sequences, and cross-time learning to capture the temporal dependencies of prediction sequences. Separating these two phases helps avoid the impact of overfitting in cross-time learning on cross-variable learning. Exten sive experiments on various real-world datasets have confirmed its state-of-the-art (SOTA) performance. CVTN emphasizes three key dimensions in time series fore casting: the short-term and long-term nature of time series (locality and longevity), feature mining from both historical and prediction sequences, and the integration of cross-variable and cross-time learning. This approach not only advances the current state of time series forecasting but also provides a more comprehensive framework for future research in this field.

回合 · 可辨認的 · Vision · MoDELS · Prompt ·

2024 年 4 月 29 日

A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental Interaction

Yu Hao,Fan Yang,Hao Huang,Shuaihang Yuan,Sundeep Rangan,John-Ross Rizzo,Yao Wang,Yi Fang

People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards on their own. In this paper, we present a pioneering approach that leverages a large vision-language model to enhance visual perception for pBLV, offering detailed and comprehensive descriptions of the surrounding environments and providing warnings about the potential risks. Our method begins by leveraging a large image tagging model (i.e., Recognize Anything (RAM)) to identify all common objects present in the captured images. The recognition results and user query are then integrated into a prompt, tailored specifically for pBLV using prompt engineering. By combining the prompt and input image, a large vision-language model (i.e., InstructBLIP) generates detailed and comprehensive descriptions of the environment and identifies potential risks in the environment by analyzing the environmental objects and scenes, relevant to the prompt. We evaluate our approach through experiments conducted on both indoor and outdoor datasets. Our results demonstrate that our method is able to recognize objects accurately and provide insightful descriptions and analysis of the environment for pBLV.

知識 (knowledge) · CTR · MoDELS · Performer · 基 ·

2024 年 4 月 28 日

Retrieval-Oriented Knowledge for Click-Through Rate Prediction

Huanshuo Liu,Bo Chen,Menghui Zhu,Jianghao Lin,Jiarui Qin,Yang Yang,Hao Zhang,Ruiming Tang

Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-and-play Retrieval-Oriented Knowledge (ROK) framework. Specifically, a knowledge base, consisting of a retrieval-oriented embedding layer and a knowledge encoder, is designed to preserve and imitate the retrieved & aggregated representations in a decomposition-reconstruction paradigm. Knowledge distillation and contrastive learning methods are utilized to optimize the knowledge base, and the learned retrieval-enhanced representations can be integrated with arbitrary CTR models in both instance-wise and feature-wise manners. Extensive experiments on three large-scale datasets show that ROK achieves competitive performance with the retrieval-based CTR models while reserving superior inference efficiency and model compatibility.

Branch · Boosting（一種模型訓練加速方式） · Learning · 知識 (knowledge) · anchor ·

2024 年 4 月 28 日

Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

Haiwen Diao,Ying Zhang,Shang Gao,Xiang Ruan,Huchuan Lu

from arxiv, 12 pages, 9 figures, Accepted by TIP2024

Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous approaches focusing on enhancing multi-modal representations or exploiting cross-modal correspondence for more accurate retrieval, in this paper we aim to leverage the knowledge transfer between peer branches in a boosting manner to seek a more powerful matching model. Specifically, we propose a brand-new Deep Boosting Learning (DBL) algorithm, where an anchor branch is first trained to provide insights into the data properties, with a target branch gaining more advanced knowledge to develop optimal features and distance metrics. Concretely, an anchor branch initially learns the absolute or relative distance between positive and negative pairs, providing a foundational understanding of the particular network and data distribution. Building upon this knowledge, a target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples. Extensive experiments validate that our DBL can achieve impressive and consistent improvements based on various recent state-of-the-art models in the image-text matching field, and outperform related popular cooperative strategies, e.g., Conventional Distillation, Mutual Learning, and Contrastive Learning. Beyond the above, we confirm that DBL can be seamlessly integrated into their training scenarios and achieve superior performance under the same computational costs, demonstrating the flexibility and broad applicability of our proposed method. Our code is publicly available at: //github.com/Paranioar/DBL.

知識 (knowledge) · MoDELS · 機器人 · 控制器 · 可交換的 ·

2024 年 4 月 27 日

HIPer: A Human-Inspired Scene Perception Model for Multifunctional Mobile Robots

Florenz Graf,Jochen Lindermayr,Birgit Graf,Werner Kraus,Marco F. Huber

Taking over arbitrary tasks like humans do with a mobile service robot in open-world settings requires a holistic scene perception for decision-making and high-level control. This paper presents a human-inspired scene perception model to minimize the gap between human and robotic capabilities. The approach takes over fundamental neuroscience concepts, such as a triplet perception split into recognition, knowledge representation, and knowledge interpretation. A recognition system splits the background and foreground to integrate exchangeable image-based object detectors and SLAM, a multi-layer knowledge base represents scene information in a hierarchical structure and offers interfaces for high-level control, and knowledge interpretation methods deploy spatio-temporal scene analysis and perceptual learning for self-adjustment. A single-setting ablation study is used to evaluate the impact of each component on the overall performance for a fetch-and-carry scenario in two simulated and one real-world environment.

Automator · 傳感器 · CASES · 稀疏自編碼 · 設計 ·

2024 年 4 月 26 日

CoCar NextGen: a Multi-Purpose Platform for Connected Autonomous Driving Research

Marc Heinrich,Maximilian Zipfl,Marc Uecker,Sven Ochs,Martin Gontscharow,Tobias Fleck,Jens Doll,Philip Sch?rner,Christian Hubschneider,Marc René Zofka,Alexander Viehl,J. Marius Z?llner

Real world testing is of vital importance to the success of automated driving. While many players in the business design purpose build testing vehicles, we designed and build a modular platform that offers high flexibility for any kind of scenario. CoCar NextGen is equipped with next generation hardware that addresses all future use cases. Its extensive, redundant sensor setup allows to develop cross-domain data driven approaches that manage the transfer to other sensor setups. Together with the possibility of being deployed on public roads, this creates a unique research platform that supports the road to automated driving on SAE Level 5.

3D · Performer · MoDELS · state-of-the-art · 點云 ·

2024 年 4 月 25 日

ODIN: A Single Model for 2D and 3D Segmentation

Ayush Jain,Pushkal Katara,Nikolaos Gkanatsios,Adam W. Harley,Gabriel Sarch,Kriti Aggarwal,Vishrav Chaudhary,Katerina Fragkiadaki

from arxiv, Camera Ready (CVPR 2024, Highlight)

State-of-the-art models on contemporary 3D segmentation benchmarks like ScanNet consume and label dataset-provided 3D point clouds, obtained through post processing of sensed multiview RGB-D images. They are typically trained in-domain, forego large-scale 2D pre-training and outperform alternatives that featurize the posed RGB-D multiview images instead. The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures. In this paper, we challenge this view and propose ODIN (Omni-Dimensional INstance segmentation), a model that can segment and label both 2D RGB images and 3D point clouds, using a transformer architecture that alternates between 2D within-view and 3D cross-view information fusion. Our model differentiates 2D and 3D feature operations through the positional encodings of the tokens involved, which capture pixel coordinates for 2D patch tokens and 3D coordinates for 3D feature tokens. ODIN achieves state-of-the-art performance on ScanNet200, Matterport3D and AI2THOR 3D instance segmentation benchmarks, and competitive performance on ScanNet, S3DIS and COCO. It outperforms all previous works by a wide margin when the sensed 3D point cloud is used in place of the point cloud sampled from 3D mesh. When used as the 3D perception engine in an instructable embodied agent architecture, it sets a new state-of-the-art on the TEACh action-from-dialogue benchmark. Our code and checkpoints can be found at the project website (//odin-seg.github.io).

可辨認的 · Extensibility · TEAM · 估計/估計量 · 納什均衡 ·

2021 年 9 月 15 日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Brian Reily,Terran Mott,Hao Zhang

Effective multi-robot teams require the ability to move to goals in complex environments in order to address real-world applications such as search and rescue. Multi-robot teams should be able to operate in a completely decentralized manner, with individual robot team members being capable of acting without explicit communication between neighbors. In this paper, we propose a novel game theoretic model that enables decentralized and communication-free navigation to a goal position. Robots each play their own distributed game by estimating the behavior of their local teammates in order to identify behaviors that move them in the direction of the goal, while also avoiding obstacles and maintaining team cohesion without collisions. We prove theoretically that generated actions approach a Nash equilibrium, which also corresponds to an optimal strategy identified for each robot. We show through extensive simulations that our approach enables decentralized and communication-free navigation by a multi-robot system to a goal position, and is able to avoid obstacles and collisions, maintain connectivity, and respond robustly to sensor noise.

Extensibility · GM · MoDELS · 類別 · 多代理人模型 ·

2021 年 2 月 9 日

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Lewis Hammond,James Fox,Tom Everitt,Alessandro Abate,Michael Wooldridge

from arxiv, Accepted to the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-21)

Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations. In this paper, we extend previous work on MAIDs by introducing the concept of a MAID subgame, as well as subgame perfect and trembling hand perfect equilibrium refinements. We then prove several equivalence results between MAIDs and EFGs. Finally, we describe an open source implementation for reasoning about MAIDs and computing their equilibria.