国产免费一区二区三区在线能观看,亚洲日韩精品三级在线观看

The criticality of prompt and precise traffic forecasting in optimizing traffic flow management in Intelligent Transportation Systems (ITS) has drawn substantial scholarly focus. Spatio-Temporal Graph Neural Networks (STGNNs) have been lauded for their adaptability to road graph structures. Yet, current research on STGNNs architectures often prioritizes complex designs, leading to elevated computational burdens with only minor enhancements in accuracy. To address this issue, we propose ST-MLP, a concise spatio-temporal model solely based on cascaded Multi-Layer Perceptron (MLP) modules and linear layers. Specifically, we incorporate temporal information, spatial information and predefined graph structure with a successful implementation of the channel-independence strategy - an effective technique in time series forecasting. Empirical results demonstrate that ST-MLP outperforms state-of-the-art STGNNs and other models in terms of accuracy and computational efficiency. Our finding encourages further exploration of more concise and effective neural network architectures in the field of traffic forecasting.

相關內容

線性的

關注 1

SLAM · 優化器 · 過濾式方法 · 即時定位與地圖構建 · HTTPS ·

2023 年 10 月 4 日

LIMOT: A Tightly-Coupled System for LiDAR-Inertial Odometry and Multi-Object Tracking

Zhongyang Zhu,Junqiao Zhao,Kai Huang,Xuebo Tian,Jiaye Lin,Chen Ye

from arxiv, 7 pages, 5 figures. This updated version mainly refines the experiments. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Simultaneous localization and mapping (SLAM) is critical to the implementation of autonomous driving. Most LiDAR-inertial SLAM algorithms assume a static environment, leading to unreliable localization in dynamic environments. Moreover, the accurate tracking of moving objects is of great significance for the control and planning of autonomous vehicles. This study proposes LIMOT, a tightly-coupled multi-object tracking and LiDAR-inertial odometry system that is capable of accurately estimating the poses of both ego-vehicle and objects. We propose a trajectory-based dynamic feature filtering method, which filters out features belonging to moving objects by leveraging tracking results before scan-matching. Factor graph-based optimization is then conducted to optimize the bias of the IMU and the poses of both the ego-vehicle and surrounding objects in a sliding window. Experiments conducted on the KITTI tracking dataset and self-collected dataset show that our method achieves better pose and tracking accuracy than our previous work DL-SLOT and other baseline methods. Our open-source implementation is available at //github.com/tiev-tongji/LIMOT.

語言模型化 · MoDELS · Extensibility · HTTPS · 機器人 ·

2023 年 10 月 3 日

Generalizable Long-Horizon Manipulations with Large Language Models

Haoyu Zhou,Mingyu Ding,Weikun Peng,Masayoshi Tomizuka,Lin Shao,Chuang Gan

This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations with novel objects and unseen tasks. These task conditions serve as guides for the generation and adjustment of Dynamic Movement Primitives (DMP) trajectories for long-horizon task execution. We further create a challenging robotic manipulation task suite based on Pybullet for long-horizon task evaluation. Extensive experiments in both simulated and real-world environments demonstrate the effectiveness of our framework on both familiar tasks involving new objects and novel but related tasks, highlighting the potential of LLMs in enhancing robotic system versatility and adaptability. Project website: //object814.github.io/Task-Condition-With-LLM/

無監督 · 數據增強 · state-of-the-art · Performer · HTTPS ·

2023 年 10 月 3 日

Learnable Data Augmentation for One-Shot Unsupervised Domain Adaptation

Julio Ivan Davila Carrazco,Pietro Morerio,Alessio Del Bue,Vittorio Murino

from arxiv, Accepted to The 34th British Machine Vision Conference (BMVC 2023)

This paper presents a classification framework based on learnable data augmentation to tackle the One-Shot Unsupervised Domain Adaptation (OS-UDA) problem. OS-UDA is the most challenging setting in Domain Adaptation, as only one single unlabeled target sample is assumed to be available for model adaptation. Driven by such single sample, our method LearnAug-UDA learns how to augment source data, making it perceptually similar to the target. As a result, a classifier trained on such augmented data will generalize well for the target domain. To achieve this, we designed an encoder-decoder architecture that exploits a perceptual loss and style transfer strategies to augment the source data. Our method achieves state-of-the-art performance on two well-known Domain Adaptation benchmarks, DomainNet and VisDA. The project code is available at //github.com/IIT-PAVIS/LearnAug-UDA

講稿 · DeepFakes · MoDELS · 遷移學習 · Learning ·

2023 年 10 月 3 日

PAD-Phys: Exploiting Physiology for Presentation Attack Detection in Face Biometrics

Luis F. Gomez,Julian Fierrez,Aythami Morales,Mahdi Ghafourian,Ruben Tolosana,Imanol Solano,Alejandro Garcia,Francisco Zamora-Martinez

from arxiv, Preprint of the paper presented to the Workshop on IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC, 2023)

Presentation Attack Detection (PAD) is a crucial stage in facial recognition systems to avoid leakage of personal information or spoofing of identity to entities. Recently, pulse detection based on remote photoplethysmography (rPPG) has been shown to be effective in face presentation attack detection. This work presents three different approaches to the presentation attack detection based on rPPG: (i) The physiological domain, a domain using rPPG-based models, (ii) the Deepfakes domain, a domain where models were retrained from the physiological domain to specific Deepfakes detection tasks; and (iii) a new Presentation Attack domain was trained by applying transfer learning from the two previous domains to improve the capability to differentiate between bona-fides and attacks. The results show the efficiency of the rPPG-based models for presentation attack detection, evidencing a 21.70% decrease in average classification error rate (ACER) (from 41.03% to 19.32%) when the presentation attack domain is compared to the physiological and Deepfakes domains. Our experiments highlight the efficiency of transfer learning in rPPG-based models and perform well in presentation attack detection in instruments that do not allow copying of this physiological feature.

向量化 · 自動問答 · 模態 · MoDELS · 語言模型化 ·

2023 年 10 月 3 日

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Long Chen,Oleg Sinavski,Jan Hünermann,Alice Karnsund,Andrew James Willmott,Danny Birch,Daniel Maund,Jamie Shotton

Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency in interpreting driving scenarios, answering questions, and decision-making. Our findings highlight the potential of LLM-based driving action generation in comparison to traditional behavioral cloning. We make our benchmark, datasets, and model available for further exploration.

查準率/準確率 · Learning · DNN · Extensibility · SOTA ·

2023 年 10 月 2 日

RF-ULM: Deep Learning for Radio-Frequency Ultrasound Localization Microscopy

Christopher Hahne,Georges Chabouh,Arthur Chavignon,Olivier Couture,Raphael Sznitman

In Ultrasound Localization Microscopy (ULM), achieving high-resolution images relies on the precise localization of contrast agent particles across consecutive beamformed frames. However, our study uncovers an enormous potential: The process of delay-and-sum beamforming leads to an irreversible reduction of Radio-Frequency (RF) data, while its implications for localization remain largely unexplored. The rich contextual information embedded within RF wavefronts, including their hyperbolic shape and phase, offers great promise for guiding Deep Neural Networks (DNNs) in challenging localization scenarios. To fully exploit this data, we propose to directly localize scatterers in RF signals. Our approach involves a custom super-resolution DNN using learned feature channel shuffling and a novel semi-global convolutional sampling block tailored for reliable and accurate localization in RF input data. Additionally, we introduce a geometric point transformation that facilitates seamless mapping between B-mode and RF spaces. To validate the effectiveness of our method and understand the impact of beamforming, we conduct an extensive comparison with State-Of-The-Art (SOTA) techniques in ULM. We present the inaugural in vivo results from an RF-trained DNN, highlighting its real-world practicality. Our findings show that RF-ULM bridges the domain gap between synthetic and real datasets, offering a considerable advantage in terms of precision and complexity. To enable the broader research community to benefit from our findings, our code and the associated SOTA methods are made available at //github.com/hahnec/rf-ulm.

Learning · 表示 · 強化學習 · 移動平均 · HTTPS ·

2023 年 10 月 2 日

H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation

Yanjie Ze,Yuyao Liu,Ruizhe Shi,Jiaxin Qin,Zhecheng Yuan,Jiashun Wang,Huazhe Xu

from arxiv, NeurIPS 2023. Code and videos: //yanjieze.com/H-InDex

Human hands possess remarkable dexterity and have long served as a source of inspiration for robotic manipulation. In this work, we propose a human $\textbf{H}$and$\textbf{-In}$formed visual representation learning framework to solve difficult $\textbf{Dex}$terous manipulation tasks ($\textbf{H-InDex}$) with reinforcement learning. Our framework consists of three stages: (i) pre-training representations with 3D human hand pose estimation, (ii) offline adapting representations with self-supervised keypoint detection, and (iii) reinforcement learning with exponential moving average BatchNorm. The last two stages only modify $0.36\%$ parameters of the pre-trained representation in total, ensuring the knowledge from pre-training is maintained to the full extent. We empirically study 12 challenging dexterous manipulation tasks and find that H-InDex largely surpasses strong baseline methods and the recent visual foundation models for motor control. Code is available at //yanjieze.com/H-InDex .

TransAct · Networking · 異常檢測 · 穩健性 · 圖卷積網絡 ·

2023 年 9 月 29 日

Probabilistic Sampling-Enhanced Temporal-Spatial GCN: A Scalable Framework for Transaction Anomaly Detection in Ethereum Networks

Stefan Kambiz Behfar,Jon Crowcroft

The rapid evolution of the Ethereum network necessitates sophisticated techniques to ensure its robustness against potential threats and to maintain transparency. While Graph Neural Networks (GNNs) have pioneered anomaly detection in such platforms, capturing the intricacies of both spatial and temporal transactional patterns has remained a challenge. This study presents a fusion of Graph Convolutional Networks (GCNs) with Temporal Random Walks (TRW) enhanced by probabilistic sampling to bridge this gap. Our approach, unlike traditional GCNs, leverages the strengths of TRW to discern complex temporal sequences in Ethereum transactions, thereby providing a more nuanced transaction anomaly detection mechanism. Preliminary evaluations demonstrate that our TRW-GCN framework substantially advances the performance metrics over conventional GCNs in detecting anomalies and transaction bursts. This research not only underscores the potential of temporal cues in Ethereum transactional data but also offers a scalable and effective methodology for ensuring the security and transparency of decentralized platforms. By harnessing both spatial relationships and time-based transactional sequences as node features, our model introduces an additional layer of granularity, making the detection process more robust and less prone to false positives. This work lays the foundation for future research aimed at optimizing and enhancing the transparency of blockchain technologies, and serves as a testament to the significance of considering both time and space dimensions in the ever-evolving landscape of the decentralized platforms.

GRU · 回合 · Agent · 門控循環單元 · 路徑 ·

2023 年 9 月 29 日

Memory Gym: Partially Observable Challenges to Memory-Based Agents in Endless Episodes

Marco Pleines,Matthias Pallasch,Frank Zimmer,Mike Preuss

from arxiv, 40 pages, 17 figures, 5 tables, under review

Memory Gym introduces a unique benchmark designed to test Deep Reinforcement Learning agents, specifically comparing Gated Recurrent Unit (GRU) against Transformer-XL (TrXL), on their ability to memorize long sequences, withstand noise, and generalize. It features partially observable 2D environments with discrete controls, namely Mortar Mayhem, Mystery Path, and Searing Spotlights. These originally finite environments are extrapolated to novel endless tasks that act as an automatic curriculum, drawing inspiration from the car game ``I packed my bag". These endless tasks are not only beneficial for evaluating efficiency but also intriguingly valuable for assessing the effectiveness of approaches in memory-based agents. Given the scarcity of publicly available memory baselines, we contribute an implementation driven by TrXL and Proximal Policy Optimization. This implementation leverages TrXL as episodic memory using a sliding window approach. In our experiments on the finite environments, TrXL demonstrates superior sample efficiency in Mystery Path and outperforms in Mortar Mayhem. However, GRU is more efficient on Searing Spotlights. Most notably, in all endless tasks, GRU makes a remarkable resurgence, consistently outperforming TrXL by significant margins.

估計/估計量 · 查準率/準確率 · Performer · Attention · Learning ·

2023 年 9 月 29 日

GSDC Transformer: An Efficient and Effective Cue Fusion for Monocular Multi-Frame Depth Estimation

Naiyu Fang,Lemiao Qiu,Shuyou Zhang,Zili Wang,Zheyuan Zhou,Kerui Hu

Depth estimation provides an alternative approach for perceiving 3D information in autonomous driving. Monocular depth estimation, whether with single-frame or multi-frame inputs, has achieved significant success by learning various types of cues and specializing in either static or dynamic scenes. Recently, these cues fusion becomes an attractive topic, aiming to enable the combined cues to perform well in both types of scenes. However, adaptive cue fusion relies on attention mechanisms, where the quadratic complexity limits the granularity of cue representation. Additionally, explicit cue fusion depends on precise segmentation, which imposes a heavy burden on mask prediction. To address these issues, we propose the GSDC Transformer, an efficient and effective component for cue fusion in monocular multi-frame depth estimation. We utilize deformable attention to learn cue relationships at a fine scale, while sparse attention reduces computational requirements when granularity increases. To compensate for the precision drop in dynamic scenes, we represent scene attributes in the form of super tokens without relying on precise shapes. Within each super token attributed to dynamic scenes, we gather its relevant cues and learn local dense relationships to enhance cue fusion. Our method achieves state-of-the-art performance on the KITTI dataset with efficient fusion speed.