无码人妻一区二区三区在线不卡-人妻无码精品人妻

Neural Radiance Field (NeRF) excels in photo-realistically static scenes, inspiring numerous efforts to facilitate volumetric videos. However, rendering dynamic and long-sequence radiance fields remains challenging due to the significant data required to represent volumetric videos. In this paper, we propose a novel end-to-end joint optimization scheme of dynamic NeRF representation and compression, called JointRF, thus achieving significantly improved quality and compression efficiency against the previous methods. Specifically, JointRF employs a compact residual feature grid and a coefficient feature grid to represent the dynamic NeRF. This representation handles large motions without compromising quality while concurrently diminishing temporal redundancy. We also introduce a sequential feature compression subnetwork to further reduce spatial-temporal redundancy. Finally, the representation and compression subnetworks are end-to-end trained combined within the JointRF. Extensive experiments demonstrate that JointRF can achieve superior compression performance across various datasets.

相關內容

端到端

關注 1

TOOLS · Agent · MoDELS · Learning · Extensibility ·

2024 年 7 月 2 日

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

Binxu Li,Tiankai Yan,Yuanting Pan,Zhe Xu,Jie Luo,Ruiyang Ji,Shilong Liu,Haoyu Dong,Zihao Lin,Yixin Wang

Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To bridge this gap, this paper introduces the first agent explicitly designed for the medical field, named \textbf{M}ulti-modal \textbf{Med}ical \textbf{Agent} (MMedAgent). We curate an instruction-tuning dataset comprising six medical tools solving seven tasks, enabling the agent to choose the most suitable tools for a given task. Comprehensive experiments demonstrate that MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art open-source methods and even the closed-source model, GPT-4o. Furthermore, MMedAgent exhibits efficiency in updating and integrating new medical tools.

LIDAR · 穩健性 · 圖 · 目標檢測 · 3D ·

2024 年 7 月 2 日

GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection

Ziying Song,Lei Yang,Shaoqing Xu,Lin Liu,Dongyang Xu,Caiyan Jia,Feiyang Jia,Li Wang

Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a robust fusion framework called Graph BEV. Addressing errors caused by inaccurate point cloud projection, we introduce a Local Align module that employs neighbor-aware depth features via Graph matching. Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features. Our Graph BEV framework achieves state-of-the-art performance, with an mAP of 70.1\%, surpassing BEV Fusion by 1.6\% on the nuscenes validation set. Importantly, our Graph BEV outperforms BEV Fusion by 8.3\% under conditions with misalignment noise.

MoDELS · 知識 (knowledge) · 圖 · 集成 · Performer ·

2024 年 7 月 2 日

DynaSemble: Dynamic Ensembling of Textual and Structure-Based Models for Knowledge Graph Completion

Ananjan Nandi,Navdeep Kaur,Parag Singla, Mausam

from arxiv, 12 pages, 2 figures, 15 tables Accepted to ACL 2024

We consider two popular approaches to Knowledge Graph Completion (KGC): textual models that rely on textual entity descriptions, and structure-based models that exploit the connectivity structure of the Knowledge Graph (KG). Preliminary experiments show that these approaches have complementary strengths: structure-based models perform exceptionally well when the gold answer is easily reachable from the query head in the KG, while textual models exploit descriptions to give good performance even when the gold answer is not easily reachable. In response, we propose DynaSemble, a novel method for learning query-dependent ensemble weights to combine these approaches by using the distributions of scores assigned by the models in the ensemble to all candidate entities. DynaSemble achieves state-of-the-art results on three standard KGC datasets, with up to 6.8 pt MRR and 8.3 pt Hits@1 gains over the best baseline model for the WN18RR dataset.

Processing（編程語言） · 優化器 · Learning · Performer · 正則化項 ·

2024 年 7 月 1 日

Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

Xuanchen Li,Yuhao Cheng,Xingyu Ren,Haozhe Jia,Di Xu,Wenhan Zhu,Yichao Yan

4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures. Project page: //xuanchenli.github.io/Topo4D/.

圖 · MoDELS · Extensibility · 團 · ESA ·

2024 年 7 月 1 日

Worst-Case to Expander-Case Reductions: Derandomized and Generalized

Amir Abboud,Nathan Wallheimer

from arxiv, Full version of a paper to appear at ESA 2024

A recent paper by Abboud and Wallheimer [ITCS 2023] presents self-reductions for various fundamental graph problems, which transform worst-case instances to expanders, thus proving that the complexity remains unchanged if the input is assumed to be an expander. An interesting corollary of their self-reductions is that if some problem admits such reduction, then the popular algorithmic paradigm based on expander-decompositions is useless against it. In this paper, we improve their core gadget, which augments a graph to make it an expander while retaining its important structure. Our new core construction has the benefit of being simple to analyze and generalize while obtaining the following results: 1. A derandomization of the self-reductions, showing that the equivalence between worst-case and expander-case holds even for deterministic algorithms, and ruling out the use of expander-decompositions as a derandomization tool. 2. An extension of the results to other models of computation, such as the Fully Dynamic model and the Congested Clique model. In the former, we either improve or provide an alternative approach to some recent hardness results for dynamic expander graphs by Henzinger, Paz, and Sricharan [ESA 2022]. In addition, we continue this line of research by designing new self-reductions for more problems, such as Max-Cut and dynamic Densest Subgraph, and demonstrating that the core gadget can be utilized to lift lower bounds based on the OMv Conjecture to expanders.

模型評估 · state-of-the-art · Networking · Neural Networks · 多樣性 ·

2024 年 6 月 30 日

HASNAS: A Hardware-Aware Spiking Neural Architecture Search Framework for Neuromorphic Compute-in-Memory Systems

Rachmad Vidya Wicaksana Putra,Muhammad Shafique

from arxiv, 9 pages, 13 figures, 2 tables

Spiking Neural Networks (SNNs) have shown capabilities for solving diverse machine learning tasks with ultra-low-power/energy computation. To further improve the performance and efficiency of SNN inference, the Compute-in-Memory (CIM) paradigm with emerging device technologies such as resistive random access memory is employed. However, most of SNN architectures are developed without considering constraints from the application and the underlying CIM hardware (e.g., memory, area, latency, and energy consumption). Moreover, most of SNN designs are derived from the Artificial Neural Networks, whose network operations are different from SNNs. These limitations hinder SNNs from reaching their full potential in accuracy and efficiency. Toward this, we propose HASNAS, a novel hardware-aware spiking neural architecture search (NAS) framework for neuromorphic CIM systems that finds an SNN that offers high accuracy under the given memory, area, latency, and energy constraints. To achieve this, HASNAS employs the following key steps: (1) optimizing SNN operations to achieve high accuracy, (2) developing an SNN architecture that facilitates an effective learning process, and (3) devising a systematic hardware-aware search algorithm to meet the constraints. The experimental results show that our HASNAS quickly finds an SNN that maintains high accuracy compared to the state-of-the-art by up to 11x speed-up, and meets the given constraints: 4x10^6 parameters of memory, 100mm^2 of area, 400ms of latency, and 120uJ energy consumption for CIFAR10 and CIFAR100; while the state-of-the-art fails to meet the constraints. In this manner, our HASNAS can enable efficient design automation for providing high-performance and energy-efficient neuromorphic CIM systems for diverse applications.

Vision · Integration · 長短期記憶網絡 · MoDELS · Extensibility ·

2024 年 6 月 29 日

VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting

Yujin Tang,Peijie Dong,Zhenheng Tang,Xiaowen Chu,Junwei Liang

from arxiv, CVPR2024 Precognition Workshop

Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded unparalleled results in predicting temporal and spatial dynamics. However, modeling extensive global information remains a formidable challenge; CNNs are limited by their narrow receptive fields, and ViTs struggle with the intensive computational demands of their attention mechanisms. The emergence of recent Mamba-based architectures has been met with enthusiasm for their exceptional long-sequence modeling capabilities, surpassing established vision models in efficiency and accuracy, which motivates us to develop an innovative architecture tailored for spatiotemporal forecasting. In this paper, we propose the VMRNN cell, a new recurrent unit that integrates the strengths of Vision Mamba blocks with LSTM. We construct a network centered on VMRNN cells to tackle spatiotemporal prediction tasks effectively. Our extensive evaluations show that our proposed approach secures competitive results on a variety of tasks while maintaining a smaller model size. Our code is available at //github.com/yyyujintang/VMRNN-PyTorch.

INFORMS · Networking · Attention · 描述符 · 可約的 ·

2024 年 6 月 28 日

Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics

Chengrui Gao,Ziyuan Yang,Andrew Beng Jin Teoh,Min Zhu

Recently, finger knuckle prints (FKPs) have gained attention due to their rich textural patterns, positioning them as a promising biometric for identity recognition. Prior FKP recognition methods predominantly leverage first-order feature descriptors, which capture intricate texture details but fail to account for structural information. Emerging research, however, indicates that second-order textures, which describe the curves and arcs of the textures, encompass this overlooked structural information. This paper introduces a novel FKP recognition approach, the Dual-Order Texture Competition Network (DOTCNet), designed to capture texture information in FKP images comprehensively. DOTCNet incorporates three dual-order texture competitive modules (DTCMs), each targeting textures at different scales. Each DTCM employs a learnable texture descriptor, specifically a learnable Gabor filter (LGF), to extract texture features. By leveraging LGFs, the network extracts first and second order textures to describe fine textures and structural features thoroughly. Furthermore, an attention mechanism enhances relevant features in the first-order features, thereby highlighting significant texture details. For second-order features, a competitive mechanism emphasizes structural information while reducing noise from higher-order features. Extensive experimental results reveal that DOTCNet significantly outperforms several standard algorithms on the publicly available PolyU-FKP dataset.

Networking · 變換 · 圖像分割 · Attention · Dynamic Perspective ·

2024 年 6 月 28 日

PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation

Deyi Ji,Wenwei Jin,Hongtao Lu,Feng Zhao

from arxiv, IJCAI 2024

The ascension of Unmanned Aerial Vehicles (UAVs) in various fields necessitates effective UAV image segmentation, which faces challenges due to the dynamic perspectives of UAV-captured images. Traditional segmentation algorithms falter as they cannot accurately mimic the complexity of UAV perspectives, and the cost of obtaining multi-perspective labeled datasets is prohibitive. To address these issues, we introduce the PPTFormer, a novel \textbf{P}seudo Multi-\textbf{P}erspective \textbf{T}rans\textbf{former} network that revolutionizes UAV image segmentation. Our approach circumvents the need for actual multi-perspective data by creating pseudo perspectives for enhanced multi-perspective learning. The PPTFormer network boasts Perspective Decomposition, novel Perspective Prototypes, and a specialized encoder and decoder that together achieve superior segmentation results through Pseudo Multi-Perspective Attention (PMP Attention) and fusion. Our experiments demonstrate that PPTFormer achieves state-of-the-art performance across five UAV segmentation datasets, confirming its capability to effectively simulate UAV flight perspectives and significantly advance segmentation precision. This work presents a pioneering leap in UAV scene understanding and sets a new benchmark for future developments in semantic segmentation.

事件抽取 · Extensibility · 端到端 · 有向非循環圖 · state-of-the-art ·

2019 年 9 月 23 日

Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction

Shun Zheng,Wei Cao,Wei Xu,Jiang Bian

from arxiv, Accepted by EMNLP 2019

Most existing event extraction (EE) methods merely extract event arguments within the sentence scope. However, such sentence-level EE methods struggle to handle soaring amounts of documents from emerging applications, such as finance, legislation, health, etc., where event arguments always scatter across different sentences, and even multiple such event mentions frequently co-exist in the same document. To address these challenges, we propose a novel end-to-end model, Doc2EDAG, which can generate an entity-based directed acyclic graph to fulfill the document-level EE (DEE) effectively. Moreover, we reformalize a DEE task with the no-trigger-words design to ease the document-level event labeling. To demonstrate the effectiveness of Doc2EDAG, we build a large-scale real-world dataset consisting of Chinese financial announcements with the challenges mentioned above. Extensive experiments with comprehensive analyses illustrate the superiority of Doc2EDAG over state-of-the-art methods. Data and codes can be found at //github.com/dolphin-zs/Doc2EDAG.