Passive millimeter-wave (PMMW) is a significant potential technique for human security screening. Several popular object detection networks have been used for PMMW images. However, restricted by the low resolution and high noise of PMMW images, PMMW hidden object detection based on deep learning usually suffers from low accuracy and low classification confidence. To tackle the above problems, this paper proposes a Task-Aligned Detection Transformer network, named PMMW-DETR. In the first stage, a Denoising Coarse-to-Fine Transformer (DCFT) backbone is designed to extract long- and short-range features in the different scales. In the second stage, we propose the Query Selection module to introduce learned spatial features into the network as prior knowledge, which enhances the semantic perception capability of the network. In the third stage, aiming to improve the classification performance, we perform a Task-Aligned Dual-Head block to decouple the classification and regression tasks. Based on our self-developed PMMW security screening dataset, experimental results including comparison with State-Of-The-Art (SOTA) methods and ablation study demonstrate that the PMMW-DETR obtains higher accuracy and classification confidence than previous works, and exhibits robustness to the PMMW images of low quality.
The beam-oriented digital predistortion (BO-DPD) is not sufficient to linearize the output from a subarray of power amplifiers (PAs) in different directions except the desired direction. Therefore, subsequent to the BO-DPD operation, we perform a post-weighting (PW) processing to minimize the nonlinear radiations in the wide range of directions under crosstalk. Here, the optimized PW coefficients are multiplied by the polynomial terms of the BO-DPD, then, the resultant signals are distributed to the PAs to compensate the nonlinear radiations. In this work, first, we propose fully-featured post-weighting (FF-PW) scheme, then, we derive a low-complexity post-weighting (LC-PW) scheme.
Spatial-temporal information has been proven to be of great significance for click-through rate prediction tasks in online Location-Based Services (LBS), especially in mainstream food ordering platforms such as DoorDash, Uber Eats, Meituan, and Ele.me. Modeling user spatial-temporal preferences with sequential behavior data has become a hot topic in recommendation systems and online advertising. However, most of existing methods either lack the representation of rich spatial-temporal information or only handle user behaviors with limited length, e.g. 100. In this paper, we tackle these problems by designing a new spatial-temporal modeling paradigm named Fragment and Integrate Network (FIN). FIN consists of two networks: (i) Fragment Network (FN) extracts Multiple Sub-Sequences (MSS) from lifelong sequential behavior data, and captures the specific spatial-temporal representation by modeling each MSS respectively. Here both a simplified attention and a complicated attention are adopted to balance the performance gain and resource consumption. (ii) Integrate Network (IN) builds a new integrated sequence by utilizing spatial-temporal interaction on MSS and captures the comprehensive spatial-temporal representation by modeling the integrated sequence with a complicated attention. Both public datasets and production datasets have demonstrated the accuracy and scalability of FIN. Since 2022, FIN has been fully deployed in the recommendation advertising system of Ele.me, one of the most popular online food ordering platforms in China, obtaining 5.7% improvement on Click-Through Rate (CTR) and 7.3% increase on Revenue Per Mille (RPM).
Wireless fingerprinting refers to a device identification method leveraging hardware imperfections and wireless channel variations as signatures. Beyond physical layer characteristics, recent studies demonstrated that user behaviors could be identified through network traffic, e.g., packet length, without decryption of the payload. Inspired by these results, we propose a multi-layer fingerprinting framework that jointly considers the multi-layer signatures for improved identification performance. In contrast to previous works, by leveraging the recent multi-view machine learning paradigm, i.e., data with multiple forms, our method can cluster the device information shared among the multi-layer features without supervision. Our information-theoretic approach can be extended to supervised and semi-supervised settings with straightforward derivations. In solving the formulated problem, we obtain a tight surrogate bound using variational inference for efficient optimization. In extracting the shared device information, we develop an algorithm based on the Wyner common information method, enjoying reduced computation complexity as compared to existing approaches. The algorithm can be applied to data distributions belonging to the exponential family class. Empirically, we evaluate the algorithm in a synthetic dataset with real-world video traffic and simulated physical layer characteristics. Our empirical results show that the proposed method outperforms the state-of-the-art baselines in both supervised and unsupervised settings.
Human/User interaction with buildings are mostly restricted to interacting with building automation systems through user-interfaces that mainly aim to improve energy efficiency of buildings and ensure comfort of occupants. This research builds on the existing theories of Human-Building Interaction (HBI) and proposes a novel conceptual framework for HBI that combines the concepts of Human-Computer Interaction (HCI) and Ambient Intelligence (AmI). The proposed framework aims to study the needs of occupants in specific-purpose buildings, which is currently undermined. Specifically, we explore the application of the proposed HBI framework to improve the learning experience of students in academic buildings. Focus groups and semi-structured interviews were conducted among students who are considered primary occupants of Goodwin Hall, a flagship smart engineering building at Virginia Tech. Qualitative coding and concept mapping were used to analyze the qualitative data and determine the impact of occupant-specific needs on the learning experience of students in academic buildings. The occupant-specific problem that was found to have the highest direct impact on learning experience was finding study space and highest indirect impact was Indoor Environment Quality (IEQ). We discuss new ideas for designing Intelligent User Interfaces (IUI), e.g. Augmented Reality (AR), increase the perceivable affordances for building occupants and considering a context-aware ubiquitous analytics-based strategy to provide services that are tailored to address the identified needs.
Vehicular communication networks are rapidly emerging as vehicles become smarter. However, these networks are increasingly susceptible to various attacks. The situation is exacerbated by the rise in automated vehicles complicates, emphasizing the need for security and authentication measures to ensure safe and effective traffic management. In this paper, we propose a novel hybrid physical layer security (PLS)-machine learning (ML) authentication scheme by exploiting the position of the transmitter vehicle as a device fingerprint. We use a time-of-arrival (ToA) based localization mechanism where the ToA is estimated at roadside units (RSUs), and the coordinates of the transmitter vehicle are extracted at the base station (BS).Furthermore, to track the mobility of the moving legitimate vehicle, we use ML model trained on several system parameters. We try two ML models for this purpose, i.e., support vector regression and decision tree. To evaluate our scheme, we conduct binary hypothesis testing on the estimated positions with the help of the ground truths provided by the ML model, which classifies the transmitter node as legitimate or malicious. Moreover, we consider the probability of false alarm and the probability of missed detection as performance metrics resulting from the binary hypothesis testing, and mean absolute error (MAE), mean square error (MSE), and coefficient of determination $\text{R}^2$ to further evaluate the ML models. We also compare our scheme with a baseline scheme that exploits the angle of arrival at RSUs for authentication. We observe that our proposed position-based mechanism outperforms the baseline scheme significantly in terms of missed detections.
The industrial Internet of Things (IIoT) and network slicing (NS) paradigms have been envisioned as key enablers for flexible and intelligent manufacturing in the industry 4.0, where a myriad of interconnected machines, sensors, and devices of diversified quality of service (QoS) requirements coexist. To optimize network resource usage, stakeholders in the IIoT network are encouraged to take pragmatic steps towards resource sharing. However, resource sharing is only attractive if the entities involved are able to settle on a fair exchange of resource for remuneration in a win-win situation. In this paper, we design an economic model that analyzes the multilateral strategic trading interactions between sliced tenants in IIoT networks. We formulate the resource pricing and purchasing problem of the seller and buyer tenants as a cooperative Stackelberg game. Particularly, the cooperative game enforces collaboration among the buyer tenants by coalition formation in order to strengthen their position in resource price negotiations as opposed to acting individually, while the Stackelberg game determines the optimal policy optimization of the seller tenants and buyer tenant coalitions. To achieve a Stackelberg equilibrium (SE), a multi-agent deep reinforcement learning (MADRL) method is developed to make flexible pricing and purchasing decisions without prior knowledge of the environment. Simulation results and analysis prove that the proposed method achieves convergence and is superior to other baselines, in terms of utility maximization.
Physical-layer security (PLS) is a promising technique to complement communication security in beyond-5G wireless networks. However, PLS developments in current research are often based on the ideal assumption of infinite coding blocklengths or perfect knowledge of the wiretap link's channel state information (CSI). In this work, we study the performance of finite blocklength (FBL) transmissions using a new secrecy metric - the average information leakage (AIL). We evaluate the exact and approximate AIL with arbitrary signaling and fading channels, assuming that the eavesdropper's instantaneous CSI is unknown. We then conduct case studies that use artificial noise (AN) beamforming to thoroughly analyze the AIL in both Rayleigh and Rician fading channels. The accuracy of the analytical expressions is verified through extensive simulations, and various insights regarding the impact of key system parameters on the AIL are obtained. Particularly, our results reveal that allowing a small level of AIL can potentially lead to significant reliability improvements. To improve the system performance, we formulate and solve an average secrecy throughput (AST) optimization problem via both non-adaptive and adaptive design strategies. Our findings highlight the significance of blocklength design and AN power allocation, as well as the impact of their trade-off on the AST.
Pre-captured immersive environments using omnidirectional cameras provide a wide range of virtual reality applications. Previous research has shown that manipulating the eye height in egocentric virtual environments can significantly affect distance perception and immersion. However, the influence of eye height in pre-captured real environments has received less attention due to the difficulty of altering the perspective after finishing the capture process. To explore this influence, we first propose a pilot study that captures real environments with multiple eye heights and asks participants to judge the egocentric distances and immersion. If a significant influence is confirmed, an effective image-based approach to adapt pre-captured real-world environments to the user's eye height would be desirable. Motivated by the study, we propose a learning-based approach for synthesizing novel views for omnidirectional images with altered eye heights. This approach employs a multitask architecture that learns depth and semantic segmentation in two formats, and generates high-quality depth and semantic segmentation to facilitate the inpainting stage. With the improved omnidirectional-aware layered depth image, our approach synthesizes natural and realistic visuals for eye height adaptation. Quantitative and qualitative evaluation shows favorable results against state-of-the-art methods, and an extensive user study verifies improved perception and immersion for pre-captured real-world environments.
The military is investigating methods to improve communication and agility in its multi-domain operations (MDO). Nascent popularity of Internet of Things (IoT) has gained traction in public and government domains. Its usage in MDO may revolutionize future battlefields and may enable strategic advantage. While this technology offers leverage to military capabilities, it comes with challenges where one is the uncertainty and associated risk. A key question is how can these uncertainties be addressed. Recently published studies proposed information camouflage to transform information from one data domain to another. As this is comparatively a new approach, we investigate challenges of such transformations and how these associated uncertainties can be detected and addressed, specifically unknown-unknowns to improve decision-making.
Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.