This study investigates a primary inaudible attack vector on Amazon Alexa voice services using near ultrasound trojans and focuses on characterizing the attack surface and examining the practical implications of issuing inaudible voice commands. The research maps each attack vector to a tactic or technique from the MITRE ATT&CK matrix, covering enterprise, mobile, and Industrial Control System (ICS) frameworks. The experiment involved generating and surveying fifty near-ultrasonic audios to assess the attacks' effectiveness, with unprocessed commands having a 100% success rate and processed ones achieving a 58% overall success rate. This systematic approach stimulates previously unaddressed attack surfaces, ensuring comprehensive detection and attack design while pairing each ATT&CK Identifier with a tested defensive method, providing attack and defense tactics for prompt-response options. The main findings reveal that the attack method employs Single Upper Sideband Amplitude Modulation (SUSBAM) to generate near-ultrasonic audio from audible sources, transforming spoken commands into a frequency range beyond human-adult hearing. By eliminating the lower sideband, the design achieves a 6 kHz minimum from 16-22 kHz while remaining inaudible after transformation. The research investigates the one-to-many attack surface where a single device simultaneously triggers multiple actions or devices. Additionally, the study demonstrates the reversibility or demodulation of the inaudible signal, suggesting potential alerting methods and the possibility of embedding secret messages like audio steganography.
We study how to mitigate the effects of energy attacks in the batteryless Internet of Things (IoT). Battery-less IoT devices live and die with ambient energy, as they use energy harvesting to power their operation. They are employed in a multitude of applications, including safety-critical ones such as biomedical implants. Due to scarce energy intakes and limited energy buffers, their executions become intermittent, alternating periods of active operation with periods of recharging their energy buffers. Experimental evidence exists that shows how controlling ambient energy allows an attacker to steer a device execution in unintended ways: energy provisioning effectively becomes an attack vector. We design, implement, and evaluate a mitigation system for energy attacks. By taking into account the specific application requirements and the output of an attack detection module, we tune task execution rates and optimize energy management. This ensures continued application execution in the event of an energy attack. When a device is under attack, our solution ensures the execution of 23.3% additional application cycles compared to the baselines we consider and increases task schedulability by at least 21%, while enabling a 34% higher peripheral availability.
This paper studies a multi-antenna networked integrated sensing and communications (ISAC) system, in which a set of multi-antenna base stations (BSs) employ the coordinated transmit beamforming to serve multiple single-antenna communication users (CUs) and perform joint target detection by exploiting the reflected signals simultaneously. To facilitate target sensing, the BSs transmit dedicated sensing signals combined with their information signals. Accordingly, we consider two types of CU receivers with and without the capability of canceling the interference from the dedicated sensing signals, respectively. In addition, we investigate two scenarios with and without time synchronization among the BSs. For the scenario with synchronization, the BSs can exploit the target-reflected signals over both the direct links (BS-to-target-to-originated BS links) and the cross-links (BS-to-target-to-other BSs links) for joint detection, while in the unsynchronized scenario, the BSs can only utilize the target-reflected signals over the direct links. For each scenario under different types of CU receivers, we optimize the coordinated transmit beamforming at the BSs to maximize the minimum detection probability over a particular targeted area, while guaranteeing the required minimum signal-to-interference-plus-noise ratio (SINR) constraints at the CUs. These SINR-constrained detection probability maximization problems are recast as non-convex quadratically constrained quadratic programs (QCQPs), which are then optimally solved via the semi-definite relaxation (SDR) technique.
It is commonly assumed that the end-to-end networking performance of edge offloading is purely dictated by that of the network connectivity between end devices and edge computing facilities, where ongoing innovation in 5G/6G networking can help. However, with the growing complexity of edge-offloaded computation and dynamic load balancing requirements, an offloaded task often goes through a multi-stage pipeline that spans across multiple compute nodes and proxies interconnected via a dedicated network fabric within a given edge computing facility. As the latest hardware-accelerated transport technologies such as RDMA and GPUDirect RDMA are adopted to build such network fabric, there is a need for good understanding of the full potential of these technologies in the context of computation offload and the effect of different factors such as GPU scheduling and characteristics of computation on the net performance gain achievable by these technologies. This paper unveils detailed insights into the latency overhead in typical machine learning (ML)-based computation pipelines and analyzes the potential benefits of adopting hardware-accelerated communication. To this end, we build a model-serving framework that supports various communication mechanisms. Using the framework, we identify performance bottlenecks in state-of-the-art model-serving pipelines and show how hardware-accelerated communication can alleviate them. For example, we show that GPUDirect RDMA can save 15--50\% of model-serving latency, which amounts to 70--160 ms.
Social media platforms such as Instagram and Twitter have emerged as critical channels for drug marketing and illegal sale. Detecting and labeling online illicit drug trafficking activities becomes important in addressing this issue. However, the effectiveness of conventional supervised learning methods in detecting drug trafficking heavily relies on having access to substantial amounts of labeled data, while data annotation is time-consuming and resource-intensive. Furthermore, these models often face challenges in accurately identifying trafficking activities when drug dealers use deceptive language and euphemisms to avoid detection. To overcome this limitation, we conduct the first systematic study on leveraging large language models (LLMs), such as ChatGPT, to detect illicit drug trafficking activities on social media. We propose an analytical framework to compose \emph{knowledge-informed prompts}, which serve as the interface that humans can interact with and use LLMs to perform the detection task. Additionally, we design a Monte Carlo dropout based prompt optimization method to further to improve performance and interpretability. Our experimental findings demonstrate that the proposed framework outperforms other baseline language models in terms of drug trafficking detection accuracy, showing a remarkable improvement of nearly 12\%. By integrating prior knowledge and the proposed prompts, ChatGPT can effectively identify and label drug trafficking activities on social networks, even in the presence of deceptive language and euphemisms used by drug dealers to evade detection. The implications of our research extend to social networks, emphasizing the importance of incorporating prior knowledge and scenario-based prompts into analytical tools to improve online security and public safety.
Semantic communication has gained significant attention from researchers as a promising technique to replace conventional communication in the next generation of communication systems, primarily due to its ability to reduce communication costs. However, little literature has studied its effectiveness in multi-user scenarios, particularly when there are variations in the model architectures used by users and their computing capacities. To address this issue, we explore a semantic communication system that caters to multiple users with different model architectures by using a multi-purpose transmitter at the base station (BS). Specifically, the BS in the proposed framework employs semantic and channel encoders to encode the image for transmission, while the receiver utilizes its local channel and semantic decoder to reconstruct the original image. Our joint source-channel encoder at the BS can effectively extract and compress semantic features for specific users by considering the signal-to-noise ratio (SNR) and computing capacity of the user. Based on the network status, the joint source-channel encoder at the BS can adaptively adjust the length of the transmitted signal. A longer signal ensures more information for high-quality image reconstruction for the user, while a shorter signal helps avoid network congestion. In addition, we propose a hybrid loss function for training, which enhances the perceptual details of reconstructed images. Finally, we conduct a series of extensive evaluations and ablation studies to validate the effectiveness of the proposed system.
User selection has become crucial for decreasing the communication costs of federated learning (FL) over wireless networks. However, centralized user selection causes additional system complexity. This study proposes a network intrinsic approach of distributed user selection that leverages the radio resource competition mechanism in random access. Taking the carrier sensing multiple access (CSMA) mechanism as an example of random access, we manipulate the contention window (CW) size to prioritize certain users for obtaining radio resources in each round of training. Training data bias is used as a target scenario for FL with user selection. Prioritization is based on the distance between the newly trained local model and the global model of the previous round. To avoid excessive contribution by certain users, a counting mechanism is used to ensure fairness. Simulations with various datasets demonstrate that this method can rapidly achieve convergence similar to that of the centralized user selection approach.
For all the successes in verifying low-level, efficient, security-critical code, little has been said or studied about the structure, architecture and engineering of such large-scale proof developments. We present the design, implementation and evaluation of a set of language-based techniques that allow the programmer to modularly write and verify code at a high level of abstraction, while retaining control over the compilation process and producing high-quality, zero-overhead, low-level code suitable for integration into mainstream software. We implement our techniques within the F* proof assistant, and specifically its shallowly-embedded Low* toolchain that compiles to C. Through our evaluation, we establish that our techniques were critical in scaling the popular HACL* library past 100,000 lines of verified source code, and brought about significant gains in proof engineer productivity. The exposition of our methodology converges on one final, novel case study: the streaming API, a finicky API that has historically caused many bugs in high-profile software. Using our approach, we manage to capture the streaming semantics in a generic way, and apply it ``for free'' to over a dozen use-cases. Six of those have made it into the reference implementation of the Python programming language, replacing the previous CVE-ridden code.
While existing work in robust deep learning has focused on small pixel-level $\ell_p$ norm-based perturbations, this may not account for perturbations encountered in several real world settings. In many such cases although test data might not be available, broad specifications about the types of perturbations (such as an unknown degree of rotation) may be known. We consider a setup where robustness is expected over an unseen test domain that is not i.i.d. but deviates from the training domain. While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes. We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space, without having access to the data from the test domain. Our adversarial training solves a min-max optimization problem, with the inner maximization generating adversarial perturbations, and the outer minimization finding model parameters by optimizing the loss on adversarial perturbations generated from the inner maximization. We demonstrate the applicability of our approach on three types of naturally occurring perturbations -- object-related shifts, geometric transformations, and common image corruptions. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations. We demonstrate the usefulness of the proposed approach by showing the robustness gains of deep neural networks trained using our adversarial training on MNIST, CIFAR-10, and a new variant of the CLEVR dataset.
Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.
We present a monocular Simultaneous Localization and Mapping (SLAM) using high level object and plane landmarks, in addition to points. The resulting map is denser, more compact and meaningful compared to point only SLAM. We first propose a high order graphical model to jointly infer the 3D object and layout planes from single image considering occlusions and semantic constraints. The extracted cuboid object and layout planes are further optimized in a unified SLAM framework. Objects and planes can provide more semantic constraints such as Manhattan and object supporting relationships compared to points. Experiments on various public and collected datasets including ICL NUIM and TUM mono show that our algorithm can improve camera localization accuracy compared to state-of-the-art SLAM and also generate dense maps in many structured environments.