国产乱人弄视频免费观看-亚洲无码精品动漫啪啪一区二区

Reinforcement Learning (RL) is a promising solution, allowing Unmanned Underwater Vehicles (UUVs) to learn optimal behaviors through trial and error. However, existing simulators lack efficient integration with RL methods, limiting training scalability and performance. This paper introduces MarineGym, a novel simulation framework designed to enhance RL training efficiency for UUVs by utilizing GPU acceleration. MarineGym offers a 10,000-fold performance improvement over real-time simulation on a single GPU, enabling rapid training of RL algorithms across multiple underwater tasks. Key features include realistic dynamic modeling of UUVs, parallel environment execution, and compatibility with popular RL frameworks like PyTorch and TorchRL. The framework is validated through four distinct tasks: station-keeping, circle tracking, helical tracking, and lemniscate tracking. This framework sets the stage for advancing RL in underwater robotics and facilitating efficient training in complex, dynamic environments.

相關內容

Learning

關注 12

Learning · 傳感器 · 估計/估計量 · 流 · 匯聚 ·

2024 年 12 月 9 日

Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving

Xin Fei,Wenzhao Zheng,Yueqi Duan,Wei Zhan,Masayoshi Tomizuka,Kurt Keutzer,Jiwen Lu

from arxiv, Code is available at: //github.com/Barrybarry-Smith/Driv3R

Realtime 4D reconstruction for dynamic scenes remains a crucial challenge for autonomous driving perception. Most existing methods rely on depth estimation through self-supervision or multi-modality sensor fusion. In this paper, we propose Driv3R, a DUSt3R-based framework that directly regresses per-frame point maps from multi-view image sequences. To achieve streaming dense reconstruction, we maintain a memory pool to reason both spatial relationships across sensors and dynamic temporal contexts to enhance multi-view 3D consistency and temporal integration. Furthermore, we employ a 4D flow predictor to identify moving objects within the scene to direct our network focus more on reconstructing these dynamic regions. Finally, we align all per-frame pointmaps consistently to the world coordinate system in an optimization-free manner. We conduct extensive experiments on the large-scale nuScenes dataset to evaluate the effectiveness of our method. Driv3R outperforms previous frameworks in 4D dynamic scene reconstruction, achieving 15x faster inference speed compared to methods requiring global alignment. Code: //github.com/Barrybarry-Smith/Driv3R.

MoDELS · 語言模型化 · INFORMS · Learning · 大語言模型 ·

2024 年 12 月 9 日

SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation

Yi-Chia Chen,Wei-Hua Li,Cheng Sun,Yu-Chiang Frank Wang,Chu-Song Chen

We introduce SAM4MLLM, an innovative approach which integrates the Segment Anything Model (SAM) with Multi-Modal Large Language Models (MLLMs) for pixel-aware tasks. Our method enables MLLMs to learn pixel-level location information without requiring excessive modifications to the existing model architecture or adding specialized tokens. We introduce an inquiry-based approach that can effectively find prompt points for SAM to perform segmentation based on MLLM. It combines detailed visual information with the powerful expressive capabilities of large language models in a unified language-based manner without additional computational overhead in learning. Experimental results on pubic benchmarks demonstrate the effectiveness of our approach.

3D · 論文 · 有向 · AIM · Performer ·

2024 年 12 月 9 日

Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects

Shi Qiu,Binzhu Xie,Qixuan Liu,Pheng-Ann Heng

from arxiv, IEEE AIxVR 2025

3D Gaussian Splatting (3DGS) has attracted significant attention for its potential to revolutionize 3D representation, rendering, and interaction. Despite the rapid growth of 3DGS research, its direct application to Extended Reality (XR) remains underexplored. Although many studies recognize the potential of 3DGS for XR, few have explicitly focused on or demonstrated its effectiveness within XR environments. In this paper, we aim to synthesize innovations in 3DGS that show specific potential for advancing XR research and development. We conduct a comprehensive review of publicly available 3DGS papers, with a focus on those referencing XR-related concepts. Additionally, we perform an in-depth analysis of innovations explicitly relevant to XR and propose a taxonomy to highlight their significance. Building on these insights, we propose several prospective XR research areas where 3DGS can make promising contributions, yet remain rarely touched. By investigating the intersection of 3DGS and XR, this paper provides a roadmap to push the boundaries of XR using cutting-edge 3DGS techniques.

多峰值 · Continuity · MoDELS · 優化器 · 多樣性 ·

2024 年 12 月 8 日

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

Leigang Qu,Haochuan Li,Wenjie Wang,Xiang Liu,Juncheng Li,Liqiang Nie,Tat-Seng Chua

from arxiv, project page: //silmm.github.io/

Large Multimodal Models (LMMs) have demonstrated impressive capabilities in multimodal understanding and generation, pushing forward advancements in text-to-image generation. However, achieving accurate text-image alignment for LMMs, particularly in compositional scenarios, remains challenging. Existing approaches, such as layout planning for multi-step generation and learning from human feedback or AI feedback, depend heavily on prompt engineering, costly human annotations, and continual upgrading, limiting flexibility and scalability. In this work, we introduce a model-agnostic iterative self-improvement framework (SILMM) that can enable LMMs to provide helpful and scalable self-feedback and optimize text-image alignment via Direct Preference Optimization (DPO). DPO can readily applied to LMMs that use discrete visual tokens as intermediate image representations; while it is less suitable for LMMs with continuous visual features, as obtaining generation probabilities is challenging. To adapt SILMM to LMMs with continuous features, we propose a diversity mechanism to obtain diverse representations and a kernel-based continuous DPO for alignment. Extensive experiments on three compositional text-to-image generation benchmarks validate the effectiveness and superiority of SILMM, showing improvements exceeding 30% on T2I-CompBench++ and around 20% on DPG-Bench.

多峰值 · 語言模型化 · MoDELS · 大語言模型 · Notability ·

2024 年 12 月 6 日

Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models

Zehao Wang,Xinpeng Liu,Xiaoqian Wu,Yudonglin Zhang,Zhou Fang,Yifan Fang,Junfu Pu,Cewu Lu,Yong-Lu Li

Multimodal Large Language Models (MLLMs) have garnered significant attention recently and demonstrate outstanding capabilities in various tasks such as OCR, VQA, captioning, $\textit{etc}$. However, hallucination remains a persistent issue. While numerous methods have been proposed to mitigate hallucinations, achieving notable improvements, these methods primarily focus on mitigating hallucinations about $\textbf{object/noun-related}$ concepts. Verb concepts, crucial for understanding human actions, have been largely overlooked. In this paper, to the best of our knowledge, we are the $\textbf{first}$ to investigate the $\textbf{verb hallucination}$ phenomenon of MLLMs from various perspectives. Our findings reveal that most state-of-the-art MLLMs suffer from severe verb hallucination. To assess the effectiveness of existing mitigation methods for object concept hallucination on verb hallucination, we evaluated these methods and found that they do not effectively address verb hallucination. To address this issue, we propose a novel rich verb knowledge-based tuning method to mitigate verb hallucination. The experiment results demonstrate that our method significantly reduces hallucinations related to verbs. $\textit{Our code and data will be made publicly available}$.

MoDELS · 情景 · 語言模型化 · INFORMS · Extensibility ·

2024 年 12 月 6 日

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Renqiu Xia,Bo Zhang,Hancheng Ye,Xiangchao Yan,Qi Liu,Hongbin Zhou,Zijun Chen,Min Dou,Botian Shi,Junchi Yan,Yu Qiao

from arxiv, Code and dataset are available for downloading at: //github.com/UniModal4Reasoning/ChartVLM 25 pages, 15 figures

Recently, many versatile Multi-modal Large Language Models (MLLMs) have emerged continuously. However, their capacity to query information depicted in visual charts and engage in reasoning based on the queried contents remains under-explored. In this paper, to comprehensively and rigorously benchmark the ability of the off-the-shelf MLLMs in the chart domain, we construct ChartX, a multi-modal evaluation set covering 18 chart types, 7 chart tasks, 22 disciplinary topics, and high-quality chart data. Besides, we develop ChartVLM to offer a new perspective on handling multi-modal tasks that strongly depend on interpretable patterns, such as reasoning tasks in the field of charts or geometric images. We evaluate the chart-related ability of mainstream MLLMs and our ChartVLM on the proposed ChartX evaluation set. Extensive experiments demonstrate that ChartVLM surpasses both versatile and chart-related large models, achieving results comparable to GPT-4V. We believe that our study can pave the way for further exploration in creating a more comprehensive chart evaluation set and developing more interpretable multi-modal models. Both ChartX and ChartVLM are available at: //github.com/UniModal4Reasoning/ChartVLM

路徑 · 知識 (knowledge) · 圖 · 鏈路預測 · MoDELS ·

2024 年 12 月 6 日

eXpath: Explaining Knowledge Graph Link Prediction with Ontological Closed Path Rules

Ye Sun,Lei Shi,Yongxin Tong

from arxiv, 13 pages, 5 figures. Submitted to PVLDB volumn 18 on 20241201

Link prediction (LP) is crucial for Knowledge Graphs (KG) completion but commonly suffers from interpretability issues. While several methods have been proposed to explain embedding-based LP models, they are generally limited to local explanations on KG and are deficient in providing human interpretable semantics. Based on real-world observations of the characteristics of KGs from multiple domains, we propose to explain LP models in KG with path-based explanations. An integrated framework, namely eXpath, is introduced which incorporates the concept of relation path with ontological closed path rules to enhance both the efficiency and effectiveness of LP interpretation. Notably, the eXpath explanations can be fused with other single-link explanation approaches to achieve a better overall solution. Extensive experiments across benchmark datasets and LP models demonstrate that introducing eXpath can boost the quality of resulting explanations by about 20% on two key metrics and reduce the required explanation time by 61.4%, in comparison to the best existing method. Case studies further highlight eXpath's ability to provide more semantically meaningful explanations through path-based evidence.

多峰值 · 可理解性 · 語言模型化 · Integration · MoDELS ·

2024 年 12 月 6 日

CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model

Dongyoung Go,Taesun Whang,Chanhee Lee,Hwa-Yeon Kim,Sunghoon Park,Seunghwan Ji,Jinho Kim,Dongchan Kim,Young-Bum Kim

from arxiv, Preprint. Under review

The integration of Retrieval-Augmented Generation (RAG) with Multimodal Large Language Models (MLLMs) has revolutionized information retrieval and expanded the practical applications of AI. However, current systems struggle in accurately interpreting user intent, employing diverse retrieval strategies, and effectively filtering unintended or inappropriate responses, limiting their effectiveness. This paper introduces Contextual Understanding and Enhanced Search with MLLM (CUE-M), a novel multimodal search framework that addresses these challenges through a multi-stage pipeline comprising image context enrichment, intent refinement, contextual query generation, external API integration, and relevance-based filtering. CUE-M incorporates a robust filtering pipeline combining image-based, text-based, and multimodal classifiers, dynamically adapting to instance- and category-specific concern defined by organizational policies. Evaluations on a multimodal Q&A dataset and a public safety benchmark demonstrate that CUE-M outperforms baselines in accuracy, knowledge integration, and safety, advancing the capabilities of multimodal retrieval systems.

INFORMS · Integration · Learning · data integrity · Processing（編程語言） ·

2024 年 12 月 5 日

Game-Theoretic Foundations for Cyber Resilience Against Deceptive Information Attacks in Intelligent Transportation Systems

Ya-Ting Yang,Quanyan Zhu

The growing complexity and interconnectivity of Intelligent Transportation Systems (ITS) make them increasingly vulnerable to advanced cyber threats, particularly deceptive information attacks. These sophisticated threats exploit vulnerabilities to manipulate data integrity and decision-making processes through techniques such as data poisoning, spoofing, and phishing. They target multiple ITS domains, including intra-vehicle systems, inter-vehicle communications, transportation infrastructure, and human interactions, creating cascading effects across the ecosystem. This chapter introduces a game-theoretic framework, enhanced by control and learning theories, to systematically analyze and mitigate these risks. By modeling the strategic interactions among attackers, users, and system operators, the framework facilitates comprehensive risk assessment and the design of adaptive, scalable resilience mechanisms. A prime example of this approach is the Proactive Risk Assessment and Mitigation of Misinformed Demand Attacks (PRADA) system, which integrates trust mechanisms, dynamic learning processes, and multi-layered defense strategies to counteract deceptive attacks on navigational recommendation systems. In addition, the chapter explores the broader applicability of these methodologies to address various ITS threats, including spoofing, Advanced Persistent Threats (APTs), and denial-of-service attacks. It highlights cross-domain resilience strategies, offering actionable insights to bolster the security, reliability, and adaptability of ITS. By providing a robust game-theoretic foundation, this work advances the development of comprehensive solutions to the evolving challenges in ITS cybersecurity.

相似度 · INFORMS · 估計/估計量 · Extensibility · 無監督 ·

2021 年 3 月 10 日

SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance

Fu-Zhao Ou,Xingyu Chen,Ruixin Zhang,Yuge Huang,Shaoxin Li,Jilin Li,Yong Li,Liujuan Cao,Yuan-Gen Wang

In recent years, Face Image Quality Assessment (FIQA) has become an indispensable part of the face recognition system to guarantee the stability and reliability of recognition performance in an unconstrained scenario. For this purpose, the FIQA method should consider both the intrinsic property and the recognizability of the face image. Most previous works aim to estimate the sample-wise embedding uncertainty or pair-wise similarity as the quality score, which only considers the information from partial intra-class. However, these methods ignore the valuable information from the inter-class, which is for estimating to the recognizability of face image. In this work, we argue that a high-quality face image should be similar to its intra-class samples and dissimilar to its inter-class samples. Thus, we propose a novel unsupervised FIQA method that incorporates Similarity Distribution Distance for Face Image Quality Assessment (SDD-FIQA). Our method generates quality pseudo-labels by calculating the Wasserstein Distance (WD) between the intra-class similarity distributions and inter-class similarity distributions. With these quality pseudo-labels, we are capable of training a regression network for quality prediction. Extensive experiments on benchmark datasets demonstrate that the proposed SDD-FIQA surpasses the state-of-the-arts by an impressive margin. Meanwhile, our method shows good generalization across different recognition systems.