日本三级网站在线播放_国产亚洲一区二区三区在线_国产一区二区三区无码在线观看_国产亚洲欧美精品久久久简单_国产在线不卡人成视频_国产精品香港三级在线电影_国产成人无码区免费A片

Change detection, an essential application for high-resolution remote sensing images, aims to monitor and analyze changes in the land surface over time. Due to the rapid increase in the quantity of high-resolution remote sensing data and the complexity of texture features, several quantitative deep learning-based methods have been proposed. These methods outperform traditional change detection methods by extracting deep features and combining spatial-temporal information. However, reasonable explanations for how deep features improve detection performance are still lacking. In our investigations, we found that modern Hopfield network layers significantly enhance semantic understanding. In this paper, we propose a Deep Supervision and FEature Retrieval network (Dsfer-Net) for bitemporal change detection. Specifically, the highly representative deep features of bitemporal images are jointly extracted through a fully convolutional Siamese network. Based on the sequential geographical information of the bitemporal images, we designed a feature retrieval module to extract difference features and leverage discriminative information in a deeply supervised manner. Additionally, we observed that the deeply supervised feature retrieval module provides explainable evidence of the semantic understanding of the proposed network in its deep layers. Finally, our end-to-end network establishes a novel framework by aggregating retrieved features and feature pairs from different layers. Experiments conducted on three public datasets (LEVIR-CD, WHU-CD, and CDD) confirm the superiority of the proposed Dsfer-Net over other state-of-the-art methods.

相關內容

Networking

關注 22

Networking：IFIP International Conferences on Networking。 Explanation：國際網絡(luo)會(hui)議。 Publisher：IFIP。 SIT：

控制器 · 相互獨立的 · 多樣性 · ONCE · Extensibility ·

2024 年 7 月 16 日

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Zhouxia Wang,Ziyang Yuan,Xintao Wang,Tianshui Chen,Menghan Xia,Ping Luo,Ying Shan

from arxiv, SIGGRAPH 2024 Conference Proceedings

Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. Accurate control of both camera and object motion is essential for video generation. However, existing works either mainly focus on one type of motion or do not clearly distinguish between the two, limiting their control capabilities and diversity. Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion. The architecture and training strategy of MotionCtrl are carefully devised, taking into account the inherent properties of camera motion, object motion, and imperfect training data. Compared to previous methods, MotionCtrl offers three main advantages: 1) It effectively and independently controls camera motion and object motion, enabling more fine-grained motion control and facilitating flexible and diverse combinations of both types of motion. 2) Its motion conditions are determined by camera poses and trajectories, which are appearance-free and minimally impact the appearance or shape of objects in generated videos. 3) It is a relatively generalizable model that can adapt to a wide array of camera poses and trajectories once trained. Extensive qualitative and quantitative experiments have been conducted to demonstrate the superiority of MotionCtrl over existing methods. Project Page: //wzhouxiff.github.io/projects/MotionCtrl/

規范化的 · Networking · 可約的 · Performer · 網絡容量 ·

2024 年 7 月 16 日

ParamNet: A Dynamic Parameter Network for Fast Multi-to-One Stain Normalization

Hongtao Kang,Die Luo,Li Chen,Junbo Hu,Tingwei Quan,Shaoqun Zeng,Shenghua Cheng,Xiuli Liu

In practice, digital pathology images are often affected by various factors, resulting in very large differences in color and brightness. Stain normalization can effectively reduce the differences in color and brightness of digital pathology images, thus improving the performance of computer-aided diagnostic systems. Conventional stain normalization methods rely on one or several reference images, but one or several images may not adequately represent the entire dataset. Although learning-based stain normalization methods are a general approach, they use complex deep networks, which not only greatly reduce computational efficiency, but also risk introducing artifacts. Some studies use specialized network structures to enhance computational efficiency and reliability, but these methods are difficult to apply to multi-to-one stain normalization due to insufficient network capacity. In this study, we introduced dynamic-parameter network and proposed a novel method for stain normalization, called ParamNet. ParamNet addresses the challenges of limited network capacity and computational efficiency by introducing dynamic parameters (weights and biases of convolutional layers) into the network design. By effectively leveraging these parameters, ParamNet achieves superior performance in stain normalization while maintaining computational efficiency. Results show ParamNet can normalize one whole slide image (WSI) of 100,000x100,000 within 25s. The code is available at: //github.com/khtao/ParamNet.

潛在 · Processing（編程語言） · CASE · 生成方法 · MoDELS ·

2024 年 7 月 16 日

Semi-Supervised Generative Models for Disease Trajectories: A Case Study on Systemic Sclerosis

Cécile Trottet,Manuel Schürch,Ahmed Allam,Imon Barua,Liubov Petelytska,Oliver Distler,Anna-Maria Hoffmann-Vold,Michael Krauthammer,the EUSTAR collaborators

from arxiv, Accepted at Machine Learning for Healthcare 2024. arXiv admin note: substantial text overlap with arXiv:2311.08149

We propose a deep generative approach using latent temporal processes for modeling and holistically analyzing complex disease trajectories, with a particular focus on Systemic Sclerosis (SSc). We aim to learn temporal latent representations of the underlying generative process that explain the observed patient disease trajectories in an interpretable and comprehensive way. To enhance the interpretability of these latent temporal processes, we develop a semi-supervised approach for disentangling the latent space using established medical knowledge. By combining the generative approach with medical definitions of different characteristics of SSc, we facilitate the discovery of new aspects of the disease. We show that the learned temporal latent processes can be utilized for further data analysis and clinical hypothesis testing, including finding similar patients and clustering SSc patient trajectories into novel sub-types. Moreover, our method enables personalized online monitoring and prediction of multivariate time series with uncertainty quantification.

Processing（編程語言） · 優化器 · Learning · Performer · 正則化項 ·

2024 年 7 月 15 日

Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

Xuanchen Li,Yuhao Cheng,Xingyu Ren,Haozhe Jia,Di Xu,Wenhan Zhu,Yichao Yan

4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures. Project page: //xuanchenli.github.io/Topo4D/.

知識 (knowledge) · 自動問答 · 圖 · 語言模型化 · 知識圖譜 ·

2024 年 7 月 15 日

Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education

Rui Yang,Boming Yang,Sixun Ouyang,Tianwei She,Aosong Feng,Yuang Jiang,Freddy Lecue,Jinghui Lu,Irene Li

from arxiv, 24 pages, 11 figures, 13 tables. arXiv admin note: substantial text overlap with arXiv:2402.14293

Knowledge graphs (KGs) are crucial in the field of artificial intelligence and are widely applied in downstream tasks, such as enhancing Question Answering (QA) systems. The construction of KGs typically requires significant effort from domain experts. Recently, Large Language Models (LLMs) have been used for knowledge graph construction (KGC), however, most existing approaches focus on a local perspective, extracting knowledge triplets from individual sentences or documents. In this work, we introduce Graphusion, a zero-shot KGC framework from free text. The core fusion module provides a global view of triplets, incorporating entity merging, conflict resolution, and novel triplet discovery. We showcase how Graphusion could be applied to the natural language processing (NLP) domain and validate it in the educational scenario. Specifically, we introduce TutorQA, a new expert-verified benchmark for graph reasoning and QA, comprising six tasks and a total of 1,200 QA pairs. Our evaluation demonstrates that Graphusion surpasses supervised baselines by up to 10% in accuracy on link prediction. Additionally, it achieves average scores of 2.92 and 2.37 out of 3 in human evaluations for concept entity extraction and relation recognition, respectively.

Automator · Performer · 設計 · 優化器 · 變換 ·

2024 年 7 月 13 日

INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers

Souradip Poddar,Youngmin Oh,Yao Lai,Hanqing Zhu,Bosun Hwang,David Z. Pan

Analog front-end design heavily relies on specialized human expertise and costly trial-and-error simulations, which motivated many prior works on analog design automation. However, efficient and effective exploration of the vast and complex design space remains constrained by the time-consuming nature of SPICE simulations, making effective design automation a challenging endeavor. In this paper, we introduce INSIGHT, a GPU-powered, technology-agnostic, effective universal neural simulator in the analog front-end design automation loop. INSIGHT accurately predicts the performance metrics of analog circuits across various technologies with just a few microseconds of inference time. Notably, its autoregressive capabilities enable INSIGHT to accurately predict simulation-costly critical transient specifications leveraging less expensive performance metric information. The low cost and high fidelity feature make INSIGHT a good substitute for standard simulators in analog front-end optimization frameworks. INSIGHT is compatible with any optimization framework, facilitating enhanced design space exploration for sample efficiency through sophisticated offline learning and adaptation techniques. Our experiments demonstrate that INSIGHT-M, a model-based batch reinforcement learning sizing framework with INSIGHT as the accurate surrogate, only requires < 20 real-time simulations with 100-1000x lower simulation costs and significant speedup over existing sizing methods.

估計/估計量 · MoDELS · 數據集 · state-of-the-art · LIDAR ·

2024 年 7 月 12 日

Open-Canopy: A Country-Scale Benchmark for Canopy Height Estimation at Very High Resolution

Fajwel Fogel,Yohann Perron,Nikola Besic,Laurent Saint-André,Agnès Pellissier-Tanon,Martin Schwartz,Thomas Boudras,Ibrahim Fayad,Alexandre d'Aspremont,Loic Landrieu,Phillipe Ciais

from arxiv, 22 pages, 8 figures, Submitted to NeurIPS 2024 Datasets and Benchmarks Track

Estimating canopy height and canopy height change at meter resolution from satellite imagery has numerous applications, such as monitoring forest health, logging activities, wood resources, and carbon stocks. However, many existing forest datasets are based on commercial or closed data sources, restricting the reproducibility and evaluation of new approaches. To address this gap, we introduce Open-Canopy, the first open-access and country-scale benchmark for very high resolution (1.5 m) canopy height estimation. Covering more than 87,000 km$^2$ across France, Open-Canopy combines SPOT satellite imagery with high resolution aerial LiDAR data. We also propose Open-Canopy-$\Delta$, the first benchmark for canopy height change detection between two images taken at different years, a particularly challenging task even for recent models. To establish a robust foundation for these benchmarks, we evaluate a comprehensive list of state-of-the-art computer vision models for canopy height estimation. The dataset and associated codes can be accessed at //github.com/fajwel/Open-Canopy.

代價 · Boosting（一種模型訓練加速方式） · 推斷 · 估計/估計量 · state-of-the-art ·

2024 年 7 月 12 日

ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion

Sungmin Woo,Wonjoon Lee,Woo Jin Kim,Dogyoon Lee,Sangyoun Lee

from arxiv, Accepted by ECCV 2024. Project Page: //sungmin-woo.github.io/prodepth/

Self-supervised multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene. However, the presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training. In this paper, we propose a novel framework called ProDepth, which effectively addresses the mismatch problem caused by dynamic objects using a probabilistic approach. We initially deduce the uncertainty associated with static scene assumption by adopting an auxiliary decoder. This decoder analyzes inconsistencies embedded in the cost volume, inferring the probability of areas being dynamic. We then directly rectify the erroneous cost volume for dynamic areas through a Probabilistic Cost Volume Modulation (PCVM) module. Specifically, we derive probability distributions of depth candidates from both single-frame and multi-frame cues, modulating the cost volume by adaptively fusing those distributions based on the inferred uncertainty. Additionally, we present a self-supervision loss reweighting strategy that not only masks out incorrect supervision with high uncertainty but also mitigates the risks in remaining possible dynamic areas in accordance with the probability. Our proposed method excels over state-of-the-art approaches in all metrics on both Cityscapes and KITTI datasets, and demonstrates superior generalization ability on the Waymo Open dataset.

圖 · 3D · Extensibility · Performer · state-of-the-art ·

2024 年 7 月 12 日

SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

Yang Miao,Francis Engelmann,Olga Vysotska,Federico Tombari,Marc Pollefeys,Dániel Béla Baráth

We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given the available modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing an object instance) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map embeddings. When images are leveraged, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. The code will be made public.

Performer · contrastive · 模態 · 數據集 · 對比學習 ·

2024 年 7 月 12 日

Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework

Haoqin Sun,Shiwan Zhao,Shaokai Li,Xiangyu Kong,Xuechen Wang,Aobo Kong,Jiaming Zhou,Yong Chen,Wenjia Zeng,Yong Qin

Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that sequentially engages in cross-modal alignment, reconstruction, and refinement phases to handle missing modalities and enhance emotion recognition. This framework utilizes unsupervised distribution-based contrastive learning to align heterogeneous modal distributions, reducing discrepancies and modeling semantic uncertainty effectively. The reconstruction phase applies normalizing flow models to transform these aligned distributions and recover missing modalities. The refinement phase employs supervised point-based contrastive learning to disrupt semantic correlations and accentuate emotional traits, thereby enriching the affective content of the reconstructed representations. Extensive experiments on the IEMOCAP and MSP-IMPROV datasets confirm the superior performance of CM-ARR under conditions of both missing and complete modalities. Notably, averaged across six scenarios of missing modalities, CM-ARR achieves absolute improvements of 2.11% in WAR and 2.12% in UAR on the IEMOCAP dataset, and 1.71% and 1.96% in WAR and UAR, respectively, on the MSP-IMPROV dataset.