97SE亚洲国产综合在线_中文字幕无码乱人伦漫画_99热这里只有精品在线_日韩成人精品一卡2卡3卡4卡新区乱码_亚洲超清丝袜无码网站_先锋影音最新资源网_果冻传媒在线看免费视频观看

We present DisCo, a distributed algorithm for contact-rich, multi-robot tasks. DisCo is a distributed contact-implicit trajectory optimization algorithm, which allows a group of robots to optimize a time sequence of forces to objects and to their environment to accomplish tasks such as collaborative manipulation, robot team sports, and modular robot locomotion. We build our algorithm on a variant of the Alternating Direction Method of Multipliers (ADMM), where each robot computes its own contact forces and contact-switching events from a smaller single-robot, contact-implicit trajectory optimization problem, while cooperating with other robots through dual variables, enforcing constraints between robots. Each robot iterates between solving its local problem, and communicating over a wireless mesh network to enforce these consistency constraints with its neighbors, ultimately converging to a coordinated plan for the group. The local problems solved by each robot are significantly less challenging than a centralized problem with all robots' contact forces and switching events, improving the computational efficiency, while also preserving the privacy of some aspects of each robot's operation. We demonstrate the effectiveness of our algorithm in simulations of collaborative manipulation, multi-robot team sports scenarios, and in modular robot locomotion, where DisCo achieves $3$x higher success rates with a 2.5x to 5x faster computation time. Further, we provide results of hardware experiments on a modular truss robot, with three collaborating truss nodes planning individually while working together to produce a punctuated rolling-gate motion of the composite structure. Videos are available on the project page: //disco-opt.github.io.

相關內容

優化器

關注 4

Automator · Attention · Learning · MoDELS · 跳躍連接 ·

2024 年 12 月 10 日

SKIPNet: Spatial Attention Skip Connections for Enhanced Brain Tumor Classification

Khush Mendiratta,Shweta Singh,Pratik Chattopadhyay

Early detection of brain tumors through magnetic resonance imaging (MRI) is essential for timely treatment, yet access to diagnostic facilities remains limited in remote areas. Gliomas, the most common primary brain tumors, arise from the carcinogenesis of glial cells in the brain and spinal cord, with glioblastoma patients having a median survival time of less than 14 months. MRI serves as a non-invasive and effective method for tumor detection, but manual segmentation of brain MRI scans has traditionally been a labor-intensive task for neuroradiologists. Recent advancements in computer-aided design (CAD), machine learning (ML), and deep learning (DL) offer promising solutions for automating this process. This study proposes an automated deep learning model for brain tumor detection and classification using MRI data. The model, incorporating spatial attention, achieved 96.90% accuracy, enhancing the aggregation of contextual information for better pattern recognition. Experimental results demonstrate that the proposed approach outperforms baseline models, highlighting its robustness and potential for advancing automated MRI-based brain tumor analysis.

MoDELS · 解碼 · Performer · 值域 · 全 ·

2024 年 12 月 10 日

CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding

Jiquan Wang,Sha Zhao,Zhiling Luo,Yangxuan Zhou,Haiteng Jiang,Shijian Li,Tao Li,Gang Pan

Electroencephalography (EEG) is a non-invasive technique to measure and record brain electrical activity, widely used in various BCI and healthcare applications. Early EEG decoding methods rely on supervised learning, limited by specific tasks and datasets, hindering model performance and generalizability. With the success of large language models, there is a growing body of studies focusing on EEG foundation models. However, these studies still leave challenges: Firstly, most of existing EEG foundation models employ full EEG modeling strategy. It models the spatial and temporal dependencies between all EEG patches together, but ignores that the spatial and temporal dependencies are heterogeneous due to the unique structural characteristics of EEG signals. Secondly, existing EEG foundation models have limited generalizability on a wide range of downstream BCI tasks due to varying formats of EEG data, making it challenging to adapt to. To address these challenges, we propose a novel foundation model called CBraMod. Specifically, we devise a criss-cross transformer as the backbone to thoroughly leverage the structural characteristics of EEG signals, which can model spatial and temporal dependencies separately through two parallel attention mechanisms. And we utilize an asymmetric conditional positional encoding scheme which can encode positional information of EEG patches and be easily adapted to the EEG with diverse formats. CBraMod is pre-trained on a very large corpus of EEG through patch-based masked EEG reconstruction. We evaluate CBraMod on up to 10 downstream BCI tasks (12 public datasets). CBraMod achieves the state-of-the-art performance across the wide range of tasks, proving its strong capability and generalizability. The source code is publicly available at \url{//github.com/wjq-learning/CBraMod}.

MoDELS · 語言模型化 · Integration · Performer · INFORMS ·

2024 年 12 月 9 日

RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models

Hieu Tran,Zonghai Yao,Junda Wang,Yifan Zhang,Zhichao Yang,Hong Yu

from arxiv, 24 pages, 8 figures

This work introduces RARE (Retrieval-Augmented Reasoning Enhancement), a versatile extension to the mutual reasoning framework (rStar), aimed at enhancing reasoning accuracy and factual integrity across large language models (LLMs) for complex, knowledge-intensive tasks such as commonsense and medical reasoning. RARE incorporates two innovative actions within the Monte Carlo Tree Search (MCTS) framework: A6, which generates search queries based on the initial problem statement, performs information retrieval using those queries, and augments reasoning with the retrieved data to formulate the final answer; and A7, which leverages information retrieval specifically for generated sub-questions and re-answers these sub-questions with the relevant contextual information. Additionally, a Retrieval-Augmented Factuality Scorer is proposed to replace the original discriminator, prioritizing reasoning paths that meet high standards of factuality. Experimental results with LLaMA 3.1 show that RARE enables open-source LLMs to achieve competitive performance with top open-source models like GPT-4 and GPT-4o. This research establishes RARE as a scalable solution for improving LLMs in domains where logical coherence and factual integrity are critical.

MoDELS · 語言模型化 · INFORMS · Learning · 大語言模型 ·

2024 年 12 月 9 日

SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation

Yi-Chia Chen,Wei-Hua Li,Cheng Sun,Yu-Chiang Frank Wang,Chu-Song Chen

We introduce SAM4MLLM, an innovative approach which integrates the Segment Anything Model (SAM) with Multi-Modal Large Language Models (MLLMs) for pixel-aware tasks. Our method enables MLLMs to learn pixel-level location information without requiring excessive modifications to the existing model architecture or adding specialized tokens. We introduce an inquiry-based approach that can effectively find prompt points for SAM to perform segmentation based on MLLM. It combines detailed visual information with the powerful expressive capabilities of large language models in a unified language-based manner without additional computational overhead in learning. Experimental results on pubic benchmarks demonstrate the effectiveness of our approach.

Agent · 傳感器 · 相關系數 · 回合 · 數據集 ·

2024 年 12 月 9 日

AgentAlign: Misalignment-Adapted Multi-Agent Perception for Resilient Inter-Agent Sensor Correlations

Zonglin Meng,Yun Zhang,Zhaoliang Zheng,Zhihao Zhao,Jiaqi Ma

Cooperative perception has attracted wide attention given its capability to leverage shared information across connected automated vehicles (CAVs) and smart infrastructures to address sensing occlusion and range limitation issues. However, existing research overlooks the fragile multi-sensor correlations in multi-agent settings, as the heterogeneous agent sensor measurements are highly susceptible to environmental factors, leading to weakened inter-agent sensor interactions. The varying operational conditions and other real-world factors inevitably introduce multifactorial noise and consequentially lead to multi-sensor misalignment, making the deployment of multi-agent multi-modality perception particularly challenging in the real world. In this paper, we propose AgentAlign, a real-world heterogeneous agent cross-modality feature alignment framework, to effectively address these multi-modality misalignment issues. Our method introduces a cross-modality feature alignment space (CFAS) and heterogeneous agent feature alignment (HAFA) mechanism to harmonize multi-modality features across various agents dynamically. Additionally, we present a novel V2XSet-noise dataset that simulates realistic sensor imperfections under diverse environmental conditions, facilitating a systematic evaluation of our approach's robustness. Extensive experiments on the V2X-Real and V2XSet-Noise benchmarks demonstrate that our framework achieves state-of-the-art performance, underscoring its potential for real-world applications in cooperative autonomous driving. The controllable V2XSet-Noise dataset and generation pipeline will be released in the future.

多峰值 · Continuity · MoDELS · 優化器 · 多樣性 ·

2024 年 12 月 8 日

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

Leigang Qu,Haochuan Li,Wenjie Wang,Xiang Liu,Juncheng Li,Liqiang Nie,Tat-Seng Chua

from arxiv, project page: //silmm.github.io/

Large Multimodal Models (LMMs) have demonstrated impressive capabilities in multimodal understanding and generation, pushing forward advancements in text-to-image generation. However, achieving accurate text-image alignment for LMMs, particularly in compositional scenarios, remains challenging. Existing approaches, such as layout planning for multi-step generation and learning from human feedback or AI feedback, depend heavily on prompt engineering, costly human annotations, and continual upgrading, limiting flexibility and scalability. In this work, we introduce a model-agnostic iterative self-improvement framework (SILMM) that can enable LMMs to provide helpful and scalable self-feedback and optimize text-image alignment via Direct Preference Optimization (DPO). DPO can readily applied to LMMs that use discrete visual tokens as intermediate image representations; while it is less suitable for LMMs with continuous visual features, as obtaining generation probabilities is challenging. To adapt SILMM to LMMs with continuous features, we propose a diversity mechanism to obtain diverse representations and a kernel-based continuous DPO for alignment. Extensive experiments on three compositional text-to-image generation benchmarks validate the effectiveness and superiority of SILMM, showing improvements exceeding 30% on T2I-CompBench++ and around 20% on DPG-Bench.

INTERACT · ASSETS · 機器人 · 回合 · 知識 (knowledge) ·

2024 年 12 月 8 日

InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction

Pengzhen Ren,Min Li,Zhen Luo,Xinshuai Song,Ziwei Chen,Weijia Liufu,Yixuan Yang,Hao Zheng,Rongtao Xu,Zitong Huang,Tongsheng Ding,Luyang Xie,Kaidong Zhang,Changfei Fu,Yang Liu,Liang Lin,Feng Zheng,Xiaodan Liang

from arxiv, 8 pages, 5 figures

Realizing scaling laws in embodied AI has become a focus. However, previous work has been scattered across diverse simulation platforms, with assets and models lacking unified interfaces, which has led to inefficiencies in research. To address this, we introduce InfiniteWorld, a unified and scalable simulator for general vision-language robot interaction built on Nvidia Isaac Sim. InfiniteWorld encompasses a comprehensive set of physics asset construction methods and generalized free robot interaction benchmarks. Specifically, we first built a unified and scalable simulation framework for embodied learning that integrates a series of improvements in generation-driven 3D asset construction, Real2Sim, automated annotation framework, and unified 3D asset processing. This framework provides a unified and scalable platform for robot interaction and learning. In addition, to simulate realistic robot interaction, we build four new general benchmarks, including scene graph collaborative exploration and open-world social mobile manipulation. The former is often overlooked as an important task for robots to explore the environment and build scene knowledge, while the latter simulates robot interaction tasks with different levels of knowledge agents based on the former. They can more comprehensively evaluate the embodied agent's capabilities in environmental understanding, task planning and execution, and intelligent interaction. We hope that this work can provide the community with a systematic asset interface, alleviate the dilemma of the lack of high-quality assets, and provide a more comprehensive evaluation of robot interactions.

示例 · 掩碼 · 可理解性 · 目標檢測 · 數據集 ·

2024 年 12 月 6 日

Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection

Khurram Azeem Hashmi,Talha Uddin Sheikh,Didier Stricker,Muhammad Zeshan Afzal

from arxiv, To appear in WACV 2025

The primary challenge in Video Object Detection (VOD) is effectively exploiting temporal information to enhance object representations. Traditional strategies, such as aggregating region proposals, often suffer from feature variance due to the inclusion of background information. We introduce a novel instance mask-based feature aggregation approach, significantly refining this process and deepening the understanding of object dynamics across video frames. We present FAIM, a new VOD method that enhances temporal Feature Aggregation by leveraging Instance Mask features. In particular, we propose the lightweight Instance Feature Extraction Module (IFEM) to learn instance mask features and the Temporal Instance Classification Aggregation Module (TICAM) to aggregate instance mask and classification features across video frames. Using YOLOX as a base detector, FAIM achieves 87.9% mAP on the ImageNet VID dataset at 33 FPS on a single 2080Ti GPU, setting a new benchmark for the speed-accuracy trade-off. Additional experiments on multiple datasets validate that our approach is robust, method-agnostic, and effective in multi-object tracking, demonstrating its broader applicability to video understanding tasks.

收縮 · 編譯器 · state-of-the-art · 泛函 · 圖 ·

2024 年 12 月 5 日

HCC: A Language-Independent Hardening Contract Compiler for Smart Contracts

Jens-Rene Giesen,Sebastien Andreina,Michael Rodler,Ghassan O. Karame,Lucas Davi

from arxiv, To appear at ACNS 2025

Developing secure smart contracts remains a challenging task. Existing approaches are either impractical or leave the burden to developers for fixing bugs. In this paper, we propose the first practical smart contract compiler, called HCC, which automatically inserts security hardening checks at the source-code level based on a novel and language-independent code property graph (CPG) notation. The high expressiveness of our developed CPG allows us to mitigate all of the most common smart contract vulnerabilities, namely reentrancy, integer bugs, suicidal smart contracts, improper use of tx.origin, untrusted delegate-calls, and unchecked low-level call bugs. Our large-scale evaluation on 10k real-world contracts and several sets of vulnerable contracts from related work demonstrates that HCC is highly practical, outperforms state-of-the-art contract hardening techniques, and effectively prevents all verified attack transactions without hampering functional correctness.

Pyramid · MoDELS · Extensibility · state-of-the-art · Performer ·

2022 年 12 月 1 日

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Wan-Cyuan Fan,Yen-Chun Chen,Dongdong Chen,Yu Cheng,Lu Yuan,Yu-Chiang Frank Wang

from arxiv, AAAI 2023

Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how to properly describe both image global structures and object details remains a challenging task. In this paper, we present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis. Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output. During the above multi-scale representation learning stage, additional input conditions like text, scene graph, or image layout can be further exploited. Thus, Frido can be also applied for conditional or cross-modality image synthesis. We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image. More specifically, we achieved state-of-the-art FID scores on five benchmarks, namely layout-to-image on COCO and OpenImages, scene-graph-to-image on COCO and Visual Genome, and label-to-image on COCO. Code is available at //github.com/davidhalladay/Frido.