With the development of Internet of Things (IoT), demands of power and data for IoT devices increase drastically. In order to resolve the supply-demand contradiction, simultaneous wireless information and power transfer (SWIPT) has been envisioned as an enabling technology by providing high-power energy transfer and high-rate data delivering concurrently. In this paper, we introduce a high-efficiency resonant beam (RB) charging and communication scheme. The scheme utilizes the semiconductor materials as gain medium, which has a better energy absorption capacity compared with the traditional solid-state one. Moreover, to match the gain size and reduce the transmission loss, the telescope internal modulator (TIM) are adopted in the scheme, which can concentrate beams.To evaluate the scheme SWIPT performance, we establish an analytical model and study the influence factors of its beam transmission, energy conversion, output power, and spectral efficiency. Numerical results shows that the proposed RB system can realize 16 W electric power output with 11 $\text{\%}$ end-to-end conversion efficiency, and support 18 bit/s/Hz spectral efficiency for communication.
Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert). However, the presence of the domain gap between LiDAR and camera features, coupled with the inherent incompatibility in temporal fusion, significantly hinders the effectiveness of distillation-based enhancements for apprentices. Motivated by the success of uni-modal distillation, an apprentice-friendly expert model would predominantly rely on camera features, while still achieving comparable performance to multi-modal models. To this end, we introduce VCD, a framework to improve the camera-only apprentice model, including an apprentice-friendly multi-modal expert and temporal-fusion-friendly distillation supervision. The multi-modal expert VCD-E adopts an identical structure as that of the camera-only apprentice in order to alleviate the feature disparity, and leverages LiDAR input as a depth prior to reconstruct the 3D scene, achieving the performance on par with other heterogeneous multi-modal experts. Additionally, a fine-grained trajectory-based distillation module is introduced with the purpose of individually rectifying the motion misalignment for each object in the scene. With those improvements, our camera-only apprentice VCD-A sets new state-of-the-art on nuScenes with a score of 63.1% NDS.
The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure. Furthermore, we illustrate the practical efficiency of this reformulation by applying standard soft RL algorithms to GFlowNet training across several probabilistic modeling tasks. Contrary to previously reported results, we show that entropic RL approaches can be competitive against established GFlowNet training methods. This perspective opens a direct path for integrating reinforcement learning principles into the realm of generative flow networks.
Multimodal models trained on complete modality data often exhibit a substantial decrease in performance when faced with imperfect data containing corruptions or missing modalities. To address this robustness challenge, prior methods have explored various approaches from aspects of augmentation, consistency or uncertainty, but these approaches come with associated drawbacks related to data complexity, representation, and learning, potentially diminishing their overall effectiveness. In response to these challenges, this study introduces a novel approach known as the Redundancy-Adaptive Multimodal Learning (RAML). RAML efficiently harnesses information redundancy across multiple modalities to combat the issues posed by imperfect data while remaining compatible with the complete modality. Specifically, RAML achieves redundancy-lossless information extraction through separate unimodal discriminative tasks and enforces a proper norm constraint on each unimodal feature representation. Furthermore, RAML explicitly enhances multimodal fusion by leveraging fine-grained redundancy among unimodal features to learn correspondences between corrupted and untainted information. Extensive experiments on various benchmark datasets under diverse conditions have consistently demonstrated that RAML outperforms state-of-the-art methods by a significant margin.
Neural network verification mainly focuses on local robustness properties, which can be checked by bounding the image (set of outputs) of a given input set. However, often it is important to know whether a given property holds globally for the input domain, and if not then for what proportion of the input the property is true. To analyze such properties requires computing preimage abstractions of neural networks. In this work, we propose an efficient anytime algorithm for generating symbolic under-approximations of the preimage of any polyhedron output set for neural networks. Our algorithm combines a novel technique for cheaply computing polytope preimage under-approximations using linear relaxation, with a carefully-designed refinement procedure that iteratively partitions the input region into subregions using input and ReLU splitting in order to improve the approximation. Empirically, we validate the efficacy of our method across a range of domains, including a high-dimensional MNIST classification task beyond the reach of existing preimage computation methods. Finally, as use cases, we showcase the application to quantitative verification and robustness analysis. We present a sound and complete algorithm for the former, which exploits our disjoint union of polytopes representation to provide formal guarantees. For the latter, we find that our method can provide useful quantitative information even when standard verifiers cannot verify a robustness property.
With the advancement of data-driven techniques, addressing continuous con-trol challenges has become more efficient. However, the reliance of these methods on historical data introduces the potential for unexpected decisions in novel scenarios. To enhance performance in autonomous driving and collision avoidance, we propose a symbiotic fusion of policy gradient with safety-based control. In this study, we em-ploy the Deep Deterministic Policy Gradient (DDPG) algorithm to enable autono-mous driving in the absence of surrounding vehicles. By training the vehicle's driving policy within a stable and familiar environment, a robust and efficient learning pro-cess is achieved. Subsequently, an artificial potential field approach is utilized to formulate a collision avoidance algorithm, accounting for the presence of surround-ing vehicles. Furthermore, meticulous consideration is given to path tracking meth-ods. The amalgamation of these approaches demonstrates substantial performance across diverse scenarios, underscoring its potential for advancing autonomous driving while upholding safety standards.
Cross-Domain Recommendation (CDR) stands as a pivotal technology addressing issues of data sparsity and cold start by transferring general knowledge from the source to the target domain. However, existing CDR models suffer limitations in adaptability across various scenarios due to their inherent complexity. To tackle this challenge, recent advancements introduce universal CDR models that leverage shared embeddings to capture general knowledge across domains and transfer it through "Multi-task Learning" or "Pre-train, Fine-tune" paradigms. However, these models often overlook the broader structural topology that spans domains and fail to align training objectives, potentially leading to negative transfer. To address these issues, we propose a motif-based prompt learning framework, MOP, which introduces motif-based shared embeddings to encapsulate generalized domain knowledge, catering to both intra-domain and inter-domain CDR tasks. Specifically, we devise three typical motifs: butterfly, triangle, and random walk, and encode them through a Motif-based Encoder to obtain motif-based shared embeddings. Moreover, we train MOP under the "Pre-training \& Prompt Tuning" paradigm. By unifying pre-training and recommendation tasks as a common motif-based similarity learning task and integrating adaptable prompt parameters to guide the model in downstream recommendation tasks, MOP excels in transferring domain knowledge effectively. Experimental results on four distinct CDR tasks demonstrate the effectiveness of MOP than the state-of-the-art models.
Personalized recommender systems play a crucial role in capturing users' evolving preferences over time to provide accurate and effective recommendations on various online platforms. However, many recommendation models rely on a single type of behavior learning, which limits their ability to represent the complex relationships between users and items in real-life scenarios. In such situations, users interact with items in multiple ways, including clicking, tagging as favorite, reviewing, and purchasing. To address this issue, we propose the Relation-aware Contrastive Learning (RCL) framework, which effectively models dynamic interaction heterogeneity. The RCL model incorporates a multi-relational graph encoder that captures short-term preference heterogeneity while preserving the dedicated relation semantics for different types of user-item interactions. Moreover, we design a dynamic cross-relational memory network that enables the RCL model to capture users' long-term multi-behavior preferences and the underlying evolving cross-type behavior dependencies over time. To obtain robust and informative user representations with both commonality and diversity across multi-behavior interactions, we introduce a multi-relational contrastive learning paradigm with heterogeneous short- and long-term interest modeling. Our extensive experimental studies on several real-world datasets demonstrate the superiority of the RCL recommender system over various state-of-the-art baselines in terms of recommendation accuracy and effectiveness.
Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.
We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.