Key-value stores typically leave access control to the systems for which they act as storage engines. Unfortunately, attackers may circumvent such read access controls via timing attacks on the key-value store, which use differences in query response times to glean information about stored data. To date, key-value store timing attacks have aimed to disclose stored values and have exploited external mechanisms that can be disabled for protection. In this paper, we point out that key disclosure is also a security threat -- and demonstrate key disclosure timing attacks that exploit mechanisms of the key-value store itself. We target LSM-tree based key-value stores utilizing range filters, which have been recently proposed to optimize LSM-tree range queries. We analyze the impact of the range filters SuRF and prefix Bloom filter on LSM-trees through a security lens, and show that they enable a key disclosure timing attack, which we call prefix siphoning. Prefix siphoning successfully leverages benign queries for non-present keys to identify prefixes of actual keys -- and in some cases, full keys -- in scenarios where brute force searching for keys (via exhaustive enumeration or random guesses) is infeasible.
Internet of Things (IoT) devices sit at the intersection of unwieldy software complexity and unprecedented attacker access. This unique position comes with a daunting security challenge: how can I protect both proprietary code and confidential data on a device that the attacker has unfettered access to? Trusted Execution Environments (TEEs) promise to solve this challenge through hardware-based separation of trusted and untrusted computation and data. While TEEs do an adequate job of protecting secrets on desktop-class devices, we reveal that trade-offs made in one of the most widely-used commercial IoT devices undermine their TEE's security. This paper uncovers two fundamental weaknesses in IP Encapsulation (IPE), the TEE deployed by Texas Instruments for MSP430 and MSP432 devices. We observe that lack of call site enforcement and residual state after unexpected TEE exits enable an attacker to reveal all proprietary code and secret data within the IPE. We design and implement an attack called RIPencapsulation, which systematically executes portions of code within the IPE and uses the partial state revealed through the register file to exfiltrate secret data and to identify gadget instructions. The attack then uses gadget instructions to reveal all proprietary code within the IPE. Our evaluation with commodity devices and a production compiler and settings shows that -- even after following all manufacturer-recommended secure coding practices -- RIPencapsultaion reveals, within minutes, both the code and keys from third-party cryptographic implementations protected by the IPE.
Worldwide, storage demands and costs are increasing. As a consequence of fault tolerance, storage device heterogenity, and data center specific constraints, optimal storage capacity utilization cannot be achieved with the integrated balancing algorithm of the distributed storage cluster system Ceph. This work presents Equilibrium, a device utilization size-aware shard balancing algorithm. With extensive experiments we demonstrate that our proposed algorithm balances near optimally on real-world clusters with strong available storage capacity improvements while reducing the amount of needed data movement.
In the absence of readily available labeled data for a given sequence labeling task and language, annotation projection has been proposed as one of the possible strategies to automatically generate annotated data. Annotation projection has often been formulated as the task of transporting, on parallel corpora, the labels pertaining to a given span in the source language into its corresponding span in the target language. In this paper we present T-Projection, a novel approach for annotation projection that leverages large pretrained text-to-text language models and state-of-the-art machine translation technology. T-Projection decomposes the label projection task into two subtasks: (i) A candidate generation step, in which a set of projection candidates using a multilingual T5 model is generated and, (ii) a candidate selection step, in which the generated candidates are ranked based on translation probabilities. We conducted experiments on intrinsic and extrinsic tasks in 5 Indo-European and 8 low-resource African languages. We demostrate that T-projection outperforms previous annotation projection methods by a wide margin. We believe that T-Projection can help to automatically alleviate the lack of high-quality training data for sequence labeling tasks. Code and data are publicly available.
In the context of IoT deployments, a multitude of devices concurrently require network access to transmit data over a shared communication channel. Employing symmetric strategies can effectively facilitate the collaborative use of the communication medium among these devices. By adopting such strategies, devices collectively optimize their transmission parameters, resulting in minimized collisions and enhanced overall network throughput. Our primary focus centers on the formulation of symmetric (i.e., identical) strategies for the sensors, aiming to optimize a finite horizon team objective. The imposition of symmetric strategies introduces novel facets and complexities into the team problem. To address this, we embrace the common information approach and adapt it to accommodate the use of symmetric strategies. This adaptation yields a dynamic programming framework grounded in common information, wherein each step entails the minimization of a single function mapping from an agent's private information space to the space of probability distributions over possible actions. Our proposed policy/method incurs a reduced cumulative cost compared to other methods employing symmetric strategies, a point substantiated by our simulation results.
The community explored to build private inference frameworks for transformer-based large language models (LLMs) in a server-client setting, where the server holds the model parameters and the client inputs its private data (or prompt) for inference. However, these frameworks impose significant overhead when the private inputs are forward propagated through the original LLMs. In this paper, we show that substituting the computation- and communication-heavy operators in the transformer architecture with privacy-computing friendly approximations can greatly reduce the private inference costs while incurring very minor impact on model performance. Compared to state-of-the-art Iron (NeurIPS 2022), our privacy-computing friendly model inference pipeline achieves a $5\times$ acceleration in computation and an 80% reduction in communication overhead, while retaining nearly identical accuracy.
Robots responsible for tasks over long time scales must be able to localize consistently and scalably amid geometric, viewpoint, and appearance changes. Existing visual SLAM approaches rely on low-level feature descriptors that are not robust to such environmental changes and result in large map sizes that scale poorly over long-term deployments. In contrast, object detections are robust to environmental variations and lead to more compact representations, but most object-based SLAM systems target short-term indoor deployments with close objects. In this paper, we introduce ObVi-SLAM to overcome these challenges by leveraging the best of both approaches. ObVi-SLAM uses low-level visual features for high-quality short-term visual odometry; and to ensure global, long-term consistency, ObVi-SLAM builds an uncertainty-aware long-term map of persistent objects and updates it after every deployment. By evaluating ObVi-SLAM on data from 16 deployment sessions spanning different weather and lighting conditions, we empirically show that ObVi-SLAM generates accurate localization estimates consistent over long-time scales in spite of varying appearance conditions.
Recent trends see a move away from a fixed-resource server-centric datacenter model to a more adaptable "disaggregated" datacenter model. These disaggregated datacenters can then dynamically group resources to the specific requirements of an incoming workload, thereby improving efficiency. To properly utilize these disaggregated datacenters, workload allocation techniques must examine the current state of the datacenter and choose resources that not only optimize the current workload request, but future ones. Since disaggregated datacenters are severely bottlenecked by the available network resources, our work proposes a heuristic-based approach called RISA, which significantly reduces the network usage of workload allocations in disaggregated datacenters. Compared to the state-of-the-art, RISA reduces the power consumption for optical components by 33% and reduces the average CPU-RAM round-trip latency by 50%. Additionally, RISA significantly outperforms the state-of-the-art in terms of execution time.
On-device learning allows AI models to adapt to user data, thereby enhancing service quality on edge platforms. However, training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs). To address these issues, we propose utilizing embedded dynamic random-access memory (eDRAM) as the primary storage medium for transient training data. In comparison to static random-access memory (SRAM), eDRAM provides higher storage density and lower leakage power, resulting in reduced access cost and power leakage. Nevertheless, to maintain the integrity of the stored data, periodic power-hungry refresh operations could potentially degrade system performance. To minimize the occurrence of expensive eDRAM refresh operations, it is beneficial to shorten the lifetime of stored data during the training process. To achieve this, we adopt the principles of algorithm and hardware co-design, introducing a family of reversible DNN architectures that effectively decrease data lifetime and storage costs throughout training. Additionally, we present a highly efficient on-device training engine named \textit{CAMEL}, which leverages eDRAM as the primary on-chip memory. This engine enables efficient on-device training with significantly reduced memory usage and off-chip DRAM traffic while maintaining superior training accuracy. We evaluate our CAMEL system on multiple DNNs with different datasets, demonstrating a $2.5\times$ speedup of the training process and $2.8\times$ training energy savings than the other baseline hardware platforms.
The wide-ranging applications of large language models (LLMs), especially in safety-critical domains, necessitate the proper evaluation of the LLM's adversarial robustness. This paper proposes an efficient tool to audit the LLM's adversarial robustness via a prompt-based adversarial attack (PromptAttack). PromptAttack converts adversarial textual attacks into an attack prompt that can cause the victim LLM to output the adversarial sample to fool itself. The attack prompt is composed of three important components: (1) original input (OI) including the original sample and its ground-truth label, (2) attack objective (AO) illustrating a task description of generating a new sample that can fool itself without changing the semantic meaning, and (3) attack guidance (AG) containing the perturbation instructions to guide the LLM on how to complete the task by perturbing the original sample at character, word, and sentence levels, respectively. Besides, we use a fidelity filter to ensure that PromptAttack maintains the original semantic meanings of the adversarial examples. Further, we enhance the attack power of PromptAttack by ensembling adversarial examples at different perturbation levels. Comprehensive empirical results using Llama2 and GPT-3.5 validate that PromptAttack consistently yields a much higher attack success rate compared to AdvGLUE and AdvGLUE++. Interesting findings include that a simple emoji can easily mislead GPT-3.5 to make wrong predictions.
In many visual systems, visual tracking often bases on RGB image sequences, in which some targets are invalid in low-light conditions, and tracking performance is thus affected significantly. Introducing other modalities such as depth and infrared data is an effective way to handle imaging limitations of individual sources, but multi-modal imaging platforms usually require elaborate designs and cannot be applied in many real-world applications at present. Near-infrared (NIR) imaging becomes an essential part of many surveillance cameras, whose imaging is switchable between RGB and NIR based on the light intensity. These two modalities are heterogeneous with very different visual properties and thus bring big challenges for visual tracking. However, existing works have not studied this challenging problem. In this work, we address the cross-modal object tracking problem and contribute a new video dataset, including 654 cross-modal image sequences with over 481K frames in total, and the average video length is more than 735 frames. To promote the research and development of cross-modal object tracking, we propose a new algorithm, which learns the modality-aware target representation to mitigate the appearance gap between RGB and NIR modalities in the tracking process. It is plug-and-play and could thus be flexibly embedded into different tracking frameworks. Extensive experiments on the dataset are conducted, and we demonstrate the effectiveness of the proposed algorithm in two representative tracking frameworks against 17 state-of-the-art tracking methods. We will release the dataset for free academic usage, dataset download link and code will be released soon.