Change detection using earth observation data plays a vital role in quantifying the impact of disasters in affected areas. While data sources like Sentinel-2 provide rich optical information, they are often hindered by cloud cover, limiting their usage in disaster scenarios. However, leveraging pre-disaster optical data can offer valuable contextual information about the area such as landcover type, vegetation cover, soil types, enabling a better understanding of the disaster's impact. In this study, we develop a model to assess the contribution of pre-disaster Sentinel-2 data in change detection tasks, focusing on disaster-affected areas. The proposed Context-Aware Change Detection Network (CACDN) utilizes a combination of pre-disaster Sentinel-2 data, pre and post-disaster Sentinel-1 data and ancillary Digital Elevation Models (DEM) data. The model is validated on flood and landslide detection and evaluated using three metrics: Area Under the Precision-Recall Curve (AUPRC), Intersection over Union (IoU), and mean IoU. The preliminary results show significant improvement (4\%, AUPRC, 3-7\% IoU, 3-6\% mean IoU) in model's change detection capabilities when incorporated with pre-disaster optical data reflecting the effectiveness of using contextual information for accurate flood and landslide detection.
With a growing interest in outer space, space robots have become a focus of exploration. To coordinate them for unmanned space exploration, we propose to use the "mother-daughter structure". In this setup, the mother spacecraft orbits the planet, while daughter probes are distributed across the surface. The mother spacecraft senses the environment, computes control commands and distributes them to daughter probes to take actions. They synergistically form sensing-communication-computing-control ($\mathbf{SC^3}$) loops, which are indivisible. We thereby optimize the spacecraft-probe downlink within $\mathbf{SC^3}$ loops to minimize the sum linear quadratic regulator (LQR) cost. The optimization variables are block length and transmit power. On account of the cycle time constraint, the spacecraft-probe downlink operates in the finite block length (FBL) regime. To solve the nonlinear mixed-integer problem, we first identify the optimal block length and then transform the power allocation problem into a tractable convex one. Additionally, we derive the approximate closed-form solutions for the proposed scheme and also for the max-sum rate scheme and max-min rate scheme. On this basis, we reveal their different power allocation principles. Moreover, we find that for time-insensitive control tasks, the proposed scheme demonstrates equivalence to the max-min rate scheme. These findings are verified through simulations.
Transformer has recently gained considerable popularity in low-level vision tasks, including image super-resolution (SR). These networks utilize self-attention along different dimensions, spatial or channel, and achieve impressive performance. This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner. Specifically, we alternately apply spatial and channel self-attention in consecutive Transformer blocks. The alternate strategy enables DAT to capture the global context and realize inter-block feature aggregation. Furthermore, we propose the adaptive interaction module (AIM) and the spatial-gate feed-forward network (SGFN) to achieve intra-block feature aggregation. AIM complements two self-attention mechanisms from corresponding dimensions. Meanwhile, SGFN introduces additional non-linear spatial information in the feed-forward network. Extensive experiments show that our DAT surpasses current methods. Code and models are obtainable at //github.com/zhengchen1999/DAT.
In space-air-ground integrated networks (SAGIN), receivers experience diverse interference from both the satellite and terrestrial transmitters. The heterogeneous structure of SAGIN poses challenges for traditional interference management (IM) schemes to effectively mitigate interference. To address this, a novel UAV-RIS-aided IM scheme is proposed for SAGIN, where different types of channel state information (CSI) including no CSI, instantaneous CSI, and delayed CSI, are considered. According to the types of CSI, interference alignment, beamforming, and space-time precoding are designed at the satellite and terrestrial transmitter side, and meanwhile, the UAV-RIS is introduced for cooperating interference elimination process. Additionally, the degrees of freedom (DoF) obtained by the proposed IM scheme are discussed in depth when the number of antennas on the satellite side is insufficient. Simulation results show that the proposed IM scheme improves the system capacity in different CSI scenarios, and the performance is better than the existing IM benchmarks without UAV-RIS.
Recent advances in Scene Graph Generation (SGG) typically model the relationships among entities utilizing box-level features from pre-defined detectors. We argue that an overlooked problem in SGG is the coarse-grained interactions between boxes, which inadequately capture contextual semantics for relationship modeling, practically limiting the development of the field. In this paper, we take the initiative to explore and propose a generic paradigm termed Superpixel-based Interaction Learning (SIL) to remedy coarse-grained interactions at the box level. It allows us to model fine-grained interactions at the superpixel level in SGG. Specifically, (i) we treat a scene as a set of points and cluster them into superpixels representing sub-regions of the scene. (ii) We explore intra-entity and cross-entity interactions among the superpixels to enrich fine-grained interactions between entities at an earlier stage. Extensive experiments on two challenging benchmarks (Visual Genome and Open Image V6) prove that our SIL enables fine-grained interaction at the superpixel level above previous box-level methods, and significantly outperforms previous state-of-the-art methods across all metrics. More encouragingly, the proposed method can be applied to boost the performance of existing box-level approaches in a plug-and-play fashion. In particular, SIL brings an average improvement of 2.0% mR (even up to 3.4%) of baselines for the PredCls task on Visual Genome, which facilitates its integration into any existing box-level method.
Depth estimation aims to predict dense depth maps. In autonomous driving scenes, sparsity of annotations makes the task challenging. Supervised models produce concave objects due to insufficient structural information. They overfit to valid pixels and fail to restore spatial structures. Self-supervised methods are proposed for the problem. Their robustness is limited by pose estimation, leading to erroneous results in natural scenes. In this paper, we propose a supervised framework termed Diffusion-Augmented Depth Prediction (DADP). We leverage the structural characteristics of diffusion model to enforce depth structures of depth models in a plug-and-play manner. An object-guided integrality loss is also proposed to further enhance regional structure integrality by fetching objective information. We evaluate DADP on three driving benchmarks and achieve significant improvements in depth structures and robustness. Our work provides a new perspective on depth estimation with sparse annotations in autonomous driving scenes.
Wireless localization is essential for tracking objects in indoor environments. Internet of Things (IoT) enables localization through its diverse wireless communication protocols. In this paper, a hybrid section-based indoor localization method using a developed Radio Frequency Identification (RFID) tracking device and multiple IoT wireless technologies is proposed. In order to reduce the cost of the RFID tags, the tags are installed only on the borders of each section. The RFID tracking device identifies the section, and the proposed wireless hybrid method finds the location of the object inside the section. The proposed hybrid method is analytically driven by linear location estimates obtained from different IoT wireless technologies. The experimental results using developed RFID tracking device and RSSI-based localization for Bluetooth, WiFi and ZigBee technologies verifies the analytical results.
Human-centric perception plays a vital role in vision and graphics. But their data annotations are prohibitively expensive. Therefore, it is desirable to have a versatile pre-train model that serves as a foundation for data-efficient downstream tasks transfer. To this end, we propose the Human-Centric Multi-Modal Contrastive Learning framework HCMoCo that leverages the multi-modal nature of human data (e.g. RGB, depth, 2D keypoints) for effective representation learning. The objective comes with two main challenges: dense pre-train for multi-modality data, efficient usage of sparse human priors. To tackle the challenges, we design the novel Dense Intra-sample Contrastive Learning and Sparse Structure-aware Contrastive Learning targets by hierarchically learning a modal-invariant latent space featured with continuous and ordinal feature distribution and structure-aware semantic consistency. HCMoCo provides pre-train for different modalities by combining heterogeneous datasets, which allows efficient usage of existing task-specific human data. Extensive experiments on four downstream tasks of different modalities demonstrate the effectiveness of HCMoCo, especially under data-efficient settings (7.16% and 12% improvement on DensePose Estimation and Human Parsing). Moreover, we demonstrate the versatility of HCMoCo by exploring cross-modality supervision and missing-modality inference, validating its strong ability in cross-modal association and reasoning.
We study the problem of few-shot graph classification across domains with nonequivalent feature spaces by introducing three new cross-domain benchmarks constructed from publicly available datasets. We also propose an attention-based graph encoder that uses three congruent views of graphs, one contextual and two topological views, to learn representations of task-specific information for fast adaptation, and task-agnostic information for knowledge transfer. We run exhaustive experiments to evaluate the performance of contrastive and meta-learning strategies. We show that when coupled with metric-based meta-learning frameworks, the proposed encoder achieves the best average meta-test classification accuracy across all benchmarks. The source code and data will be released here: //github.com/kavehhassani/metagrl
Knowledge graph (KG) embeddings learn low-dimensional representations of entities and relations to predict missing facts. KGs often exhibit hierarchical and logical patterns which must be preserved in the embedding space. For hierarchical data, hyperbolic embedding methods have shown promise for high-fidelity and parsimonious representations. However, existing hyperbolic embedding methods do not account for the rich logical patterns in KGs. In this work, we introduce a class of hyperbolic KG embedding models that simultaneously capture hierarchical and logical patterns. Our approach combines hyperbolic reflections and rotations with attention to model complex relational patterns. Experimental results on standard KG benchmarks show that our method improves over previous Euclidean- and hyperbolic-based efforts by up to 6.1% in mean reciprocal rank (MRR) in low dimensions. Furthermore, we observe that different geometric transformations capture different types of relations while attention-based transformations generalize to multiple relations. In high dimensions, our approach yields new state-of-the-art MRRs of 49.6% on WN18RR and 57.7% on YAGO3-10.
Multivariate time series forecasting is extensively studied throughout the years with ubiquitous applications in areas such as finance, traffic, environment, etc. Still, concerns have been raised on traditional methods for incapable of modeling complex patterns or dependencies lying in real word data. To address such concerns, various deep learning models, mainly Recurrent Neural Network (RNN) based methods, are proposed. Nevertheless, capturing extremely long-term patterns while effectively incorporating information from other variables remains a challenge for time-series forecasting. Furthermore, lack-of-explainability remains one serious drawback for deep neural network models. Inspired by Memory Network proposed for solving the question-answering task, we propose a deep learning based model named Memory Time-series network (MTNet) for time series forecasting. MTNet consists of a large memory component, three separate encoders, and an autoregressive component to train jointly. Additionally, the attention mechanism designed enable MTNet to be highly interpretable. We can easily tell which part of the historic data is referenced the most.