We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets) to enhance the optimality of attention-based sequence generation, focusing on long-term outcomes. This integration proves particularly effective in solving combinatorial optimization (CO) tasks, especially the Travelling Salesman Problem (TSP), which is the focus of our study. We address this challenge by defining a Markov Decision Process (MDP) compatible with PQN, which involves iterative graph embedding, encoding and decoding by an LSTM-based recurrent neural network. This process generates a context vector and computes raw attention scores, which are dynamically adjusted by Q-values calculated for all available state-action pairs before applying softmax. The resulting attention vector is utilized as an action distribution, with actions selected hinged to exploration-exploitation dynamic adaptibility of PQN. Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
We introduce DEIM, an innovative and efficient training framework designed to accelerate convergence in real-time object detection with Transformer-based architectures (DETR). To mitigate the sparse supervision inherent in one-to-one (O2O) matching in DETR models, DEIM employs a Dense O2O matching strategy. This approach increases the number of positive samples per image by incorporating additional targets, using standard data augmentation techniques. While Dense O2O matching speeds up convergence, it also introduces numerous low-quality matches that could affect performance. To address this, we propose the Matchability-Aware Loss (MAL), a novel loss function that optimizes matches across various quality levels, enhancing the effectiveness of Dense O2O. Extensive experiments on the COCO dataset validate the efficacy of DEIM. When integrated with RT-DETR and D-FINE, it consistently boosts performance while reducing training time by 50%. Notably, paired with RT-DETRv2, DEIM achieves 53.2% AP in a single day of training on an NVIDIA 4090 GPU. Additionally, DEIM-trained real-time models outperform leading real-time object detectors, with DEIM-D-FINE-L and DEIM-D-FINE-X achieving 54.7% and 56.5% AP at 124 and 78 FPS on an NVIDIA T4 GPU, respectively, without the need for additional data. We believe DEIM sets a new baseline for advancements in real-time object detection. Our code and pre-trained models are available at //github.com/ShihuaHuang95/DEIM.
Intelligent reflecting surface (IRS)-assisted mobile edge computing (MEC) systems have shown notable improvements in efficiency, such as reduced latency, higher data rates, and better energy efficiency. However, the resource competition among users will lead to uneven allocation, increased latency, and lower throughput. Fortunately, the rate-splitting multiple access (RSMA) technique has emerged as a promising solution for managing interference and optimizing resource allocation in MEC systems. This paper studies an IRS-assisted MEC system with RSMA, aiming to jointly optimize the passive beamforming of the IRS, the active beamforming of the base station, the task offloading allocation, the transmit power of users, the ratios of public and private information allocation, and the decoding order of the RSMA to minimize the average delay from a novel uplink transmission perspective. Since the formulated problem is non-convex and the optimization variables are highly coupled, we propose a hierarchical deep reinforcement learning-based algorithm to optimize both continuous and discrete variables of the problem. Additionally, to better extract channel features, we design a novel network architecture within the policy and evaluation networks of the proposed algorithm, combining convolutional neural networks and densely connected convolutional network for feature extraction. Simulation results indicate that the proposed algorithm not only exhibits excellent convergence performance but also outperforms various benchmarks.
Human-AI cooperative classification (HAI-CC) approaches aim to develop hybrid intelligent systems that enhance decision-making in various high-stakes real-world scenarios by leveraging both human expertise and AI capabilities. Current HAI-CC methods primarily focus on learning-to-defer (L2D), where decisions are deferred to human experts, and learning-to-complement (L2C), where AI and human experts make predictions cooperatively. However, a notable research gap remains in effectively exploring both L2D and L2C under diverse expert knowledge to improve decision-making, particularly when constrained by the cooperation cost required to achieve a target probability for AI-only selection (i.e., coverage). In this paper, we address this research gap by proposing the Coverage-constrained Learning to Defer and Complement with Specific Experts (CL2DC) method. CL2DC makes final decisions through either AI prediction alone or by deferring to or complementing a specific expert, depending on the input data. Furthermore, we propose a coverage-constrained optimisation to control the cooperation cost, ensuring it approximates a target probability for AI-only selection. This approach enables an effective assessment of system performance within a specified budget. Also, CL2DC is designed to address scenarios where training sets contain multiple noisy-label annotations without any clean-label references. Comprehensive evaluations on both synthetic and real-world datasets demonstrate that CL2DC achieves superior performance compared to state-of-the-art HAI-CC methods.
Building LiDAR generative models holds promise as powerful data priors for restoration, scene manipulation, and scalable simulation in autonomous mobile robots. In recent years, approaches using diffusion models have emerged, significantly improving training stability and generation quality. Despite the success of diffusion models, generating high-quality samples requires numerous iterations of running neural networks, and the increasing computational cost can pose a barrier to robotics applications. To address this challenge, this paper presents R2Flow, a fast and high-fidelity generative model for LiDAR data. Our method is based on rectified flows that learn straight trajectories, simulating data generation with much fewer sampling steps against diffusion models. We also propose a efficient Transformer-based model architecture for processing the image representation of LiDAR range and reflectance measurements. Our experiments on the unconditional generation of the KITTI-360 dataset demonstrate the effectiveness of our approach in terms of both efficiency and quality.
We propose to integrate long-distance LongRange (LoRa) communication solution for sending the data from IoT to the edge computing system, by taking advantage of its unlicensed nature and the potential for open source implementations that are common in edge computing. We propose a channel hoping optimization model and apply TinyML-based channel hoping model based for LoRa transmissions, as well as experimentally study a fast predictive algorithm to find free channels between edge and IoT devices. In the open source experimental setup that includes LoRa, TinyML and IoT-edge-cloud continuum, we integrate a novel application workflow and cloud-friendly protocol solutions in a case study of plant recommender application that combines concepts of microfarming and urban computing. In a LoRa-optimized edge computing setup, we engineer the application workflow, and apply collaborative filtering and various machine learning algorithms on application data collected to identify and recommend the planting schedule for a specific microfarm in an urban area. In the LoRa experiments, we measure the occurrence of packet loss, RSSI, and SNR, using a random channel hoping scheme to compare with our proposed TinyML method. The results show that it is feasible to use TinyML in microcontrollers for channel hopping, while proving the effectiveness of TinyML in learning to predict the best channel to select for LoRa transmission, and by improving the RSSI by up to 63 %, SNR by up to 44 % in comparison with a random hopping mechanism.
This paper presents a unified framework for constructing Approximate Message Passing (AMP) algorithms for rotationally-invariant models. By employing a general iterative algorithm template and reducing it to long-memory Orthogonal AMP (OAMP), we systematically derive the correct Onsager terms of AMP algorithms. This approach allows us to rederive an AMP algorithm introduced by Fan and Opper et al., while shedding new light on the role of free cumulants of the spectral law. The free cumulants arise naturally from a recursive centering operation, potentially of independent interest beyond the scope of AMP. To illustrate the flexibility of our framework, we introduce two novel AMP variants and apply them to estimation in spiked models.
This letter presents a flexible rate-splitting multiple access (RSMA) framework for near-field (NF) integrated sensing and communications (ISAC). The spatial beams configured to meet the communication rate requirements of NF users are simultaneously leveraged to sense an additional NF target. A key innovation lies in its flexibility to select a subset of users for decoding the common stream, enhancing interference management and system performance. The system is designed by minimizing the Cram\'{e}r-Rao bound (CRB) for joint distance and angle estimation through optimized power allocation, common rate allocation, and user selection. This leads to a discrete, non-convex optimization problem. Remarkably, we demonstrate that the preconfigured beams are sufficient for target sensing, eliminating the need for additional probing signals. To solve the optimization problem, an iterative algorithm is proposed combining the quadratic transform and simulated annealing. Simulation results indicate that the proposed scheme significantly outperforms conventional RSMA and space division multiple access (SDMA), reducing distance and angle estimation errors by approximately 100\% and 20\%, respectively.
Graph Neural Networks (GNNs) have received considerable attention on graph-structured data learning for a wide variety of tasks. The well-designed propagation mechanism which has been demonstrated effective is the most fundamental part of GNNs. Although most of GNNs basically follow a message passing manner, litter effort has been made to discover and analyze their essential relations. In this paper, we establish a surprising connection between different propagation mechanisms with a unified optimization problem, showing that despite the proliferation of various GNNs, in fact, their proposed propagation mechanisms are the optimal solution optimizing a feature fitting function over a wide class of graph kernels with a graph regularization term. Our proposed unified optimization framework, summarizing the commonalities between several of the most representative GNNs, not only provides a macroscopic view on surveying the relations between different GNNs, but also further opens up new opportunities for flexibly designing new GNNs. With the proposed framework, we discover that existing works usually utilize naive graph convolutional kernels for feature fitting function, and we further develop two novel objective functions considering adjustable graph kernels showing low-pass or high-pass filtering capabilities respectively. Moreover, we provide the convergence proofs and expressive power comparisons for the proposed models. Extensive experiments on benchmark datasets clearly show that the proposed GNNs not only outperform the state-of-the-art methods but also have good ability to alleviate over-smoothing, and further verify the feasibility for designing GNNs with our unified optimization framework.
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.