The ever-rising computation demand is forcing the move from the CPU to heterogeneous specialized hardware, which is readily available across modern datacenters through disaggregated infrastructure. On the other hand, trusted execution environments (TEEs), one of the most promising recent developments in hardware security, can only protect code confined in the CPU, limiting TEEs' potential and applicability to a handful of applications. We observe that the TEEs' hardware trusted computing base (TCB) is fixed at design time, which in practice leads to using untrusted software to employ peripherals in TEEs. Based on this observation, we propose \emph{composite enclaves} with a configurable hardware and software TCB, allowing enclaves access to multiple computing and IO resources. Finally, we present two case studies of composite enclaves: i) an FPGA platform based on RISC-V Keystone connected to emulated peripherals and sensors, and ii) a large-scale accelerator. These case studies showcase a flexible but small TCB (2.5 KLoC for IO peripherals and drivers), with a low-performance overhead (only around 220 additional cycles for a context switch), thus demonstrating the feasibility of our approach and showing that it can work with a wide range of specialized hardware.
Federated learning enables multiple data owners to jointly train a machine learning model without revealing their private datasets. However, a malicious aggregation server might use the model parameters to derive sensitive information about the training dataset used. To address such leakage, differential privacy and cryptographic techniques have been investigated in prior work, but these often result in large communication overheads or impact model performance. To mitigate this centralization of power, we propose \textsc{Scotch}, a decentralized \textit{m-party} secure-computation framework for federated aggregation that deploys MPC primitives, such as \textit{secret sharing}. Our protocol is simple, efficient, and provides strict privacy guarantees against curious aggregators or colluding data-owners with minimal communication overheads compared to other existing \textit{state-of-the-art} privacy-preserving federated learning frameworks. We evaluate our framework by performing extensive experiments on multiple datasets with promising results. \textsc{Scotch} can train the standard MLP NN with the training dataset split amongst 3 participating users and 3 aggregating servers with 96.57\% accuracy on MNIST, and 98.40\% accuracy on the Extended MNIST (digits) dataset, while providing various optimizations.
With advances in emerging technologies, options for operating public transit services have broadened from conventional fixed-route service through semi-flexible service to on-demand microtransit. Nevertheless, guidelines for deciding between these services remain limited in the real implementation. An open-source simulation sandbox is developed that can compare state-of-the-practice methods for evaluating between the different types of public transit operations. For the case of the semi-flexible service, the Mobility Allowance Shuttle Transit (MAST) system is extended to include passenger deviations. A case study demonstrates the sandbox to evaluate and existing B63 bus route in Brooklyn, NY and compares its performance with the four other system designs spanning across the three service types for three different demand scenarios.
We present the first BLS12-381 elliptic curve pairing crypto-processor for Internet-of-Things (IoT) security applications. Efficient finite field arithmetic and algorithm-architecture co-optimizations together enable two orders of magnitude energy savings. We implement several countermeasures against timing and power side-channel attacks. Our crypto-processor is programmable to provide the flexibility to accelerate various elliptic curve and pairing-based protocols such as signature aggregation and functional encryption.
If devices are physically accessible optical fault injection attacks pose a great threat since the data processed as well as the operation flow can be manipulated. Successful physical attacks may lead not only to leakage of secret information such as cryptographic private keys, but can also cause economic damage especially if as a result of such a manipulation a critical infrastructure is successfully attacked. Laser based attacks exploit the sensitivity of CMOS technologies to electromagnetic radiation in the visible or the infrared spectrum. It can be expected that radiation-hard designs, specially crafted for space applications, are more robust not only against high-energy particles and short electromagnetic waves but also against optical fault injection attacks. In this work we investigated the sensitivity of radiation-hard JICG shift registers to optical fault injection attacks. In our experiments, we were able to trigger bit-set and bit-reset repeatedly changing the data stored in single JICG flip-flops despite their high-radiation fault tolerance.
This paper shows how the Xtensa architecture can be attacked with Return-Oriented-Programming (ROP). The presented techniques include possibilities for both supported Application Binary Interfaces (ABIs). Especially for the windowed ABI a powerful mechanism is presented that not only allows to jump to gadgets but also to manipulate registers without relying on specific gadgets. This paper purely focuses on how the properties of the architecture itself can be exploited to chain gadgets and not on specific attacks or a gadget catalog.
As the scale of distributed training grows, communication becomes a bottleneck. To accelerate the communication, recent works introduce In-Network Aggregation (INA), which moves the gradients summation into network middle-boxes, e.g., programmable switches to reduce the traffic volume. However, switch memory is scarce compared to the volume of gradients transmitted in distributed training. Although literature applies methods like pool-based streaming or dynamic sharing to tackle the mismatch, switch memory is still a potential performance bottleneck. Furthermore, we observe the under-utilization of switch memory due to the synchronization requirement for aggregator deallocation in recent works. To improve the switch memory utilization, we propose ESA, an $\underline{E}$fficient Switch Memory $\underline{S}$cheduler for In-Network $\underline{A}$ggregation. At its cores, ESA enforces the preemptive aggregator allocation primitive and introduces priority scheduling at the data-plane, which improves the switch memory utilization and average job completion time (JCT). Experiments show that ESA can improve the average JCT by up to $1.35\times$.
We assess costs and efficiency of state-of-the-art high performance cloud computing compared to a traditional on-premises compute cluster. Our use case are atomistic simulations carried out with the GROMACS molecular dynamics (MD) toolkit with a focus on alchemical protein-ligand binding free energy calculations. We set up a compute cluster in the Amazon Web Services (AWS) cloud that incorporates various different instances with Intel, AMD, and ARM CPUs, some with GPU acceleration. Using representative biomolecular simulation systems we benchmark how GROMACS performs on individual instances and across multiple instances. Thereby we assess which instances deliver the highest performance and which are the most cost-efficient ones for our use case. We find that, in terms of total costs including hardware, personnel, room, energy and cooling, producing MD trajectories in the cloud can be as cost-efficient as an on-premises cluster given that optimal cloud instances are chosen. Further, we find that high-throughput ligand-screening for protein-ligand binding affinity estimation can be accelerated dramatically by using global cloud resources. For a ligand screening study consisting of 19,872 independent simulations, we used all hardware that was available in the cloud at the time of the study. The computations scaled-up to reach peak performances using more than 10,000 instances, 140,000 cores, and 3,000 GPUs simultaneously around the globe. Our simulation ensemble finished in about two days in the cloud, while weeks would be required to complete the task on a typical on-premises cluster consisting of several hundred nodes. We demonstrate that the costs of such and similar studies can be drastically reduced with a checkpoint-restart protocol that allows to use cheap Spot pricing and by using instance types with optimal cost-efficiency.
The CAN Bus is crucial to the efficiency, and safety of modern vehicle infrastructure. Electronic Control Units (ECUs) exchange data across a shared bus, dropping messages whenever errors occur. If an ECU generates enough errors, their transmitter is put in a bus-off state, turning it off. Previous work abuses this process to disable ECUs, but is trivial to detect through the multiple errors transmitted over the bus. We propose a novel attack, undetectable by prior intrusion detection systems, which disables ECUs within a single message without generating any errors on the bus. Performing this attack requires the ability to flip bits on the bus, but not with any level of sophistication. We show that an attacker who can only flip bits 40% of the time can execute our stealthy attack 100% of the time. But this attack, and all prior CAN attacks, rely on the ability to read the bus. We propose a new technique which synchronizes the bus, such that even a blind attacker, incapable of reading the bus, can know when to transmit. Taking a limited attacker's chance of success from the percentage of dead bus time, to 100%. Finally, we propose a small modification to the CAN error process to ensure an ECU cannot fail without being detected, no matter how advanced the attacker is. Taken together we advance the state of the art for CAN attacks and blind attackers, while proposing a detection system against stealthy attacks, and the larger problem of CAN's abusable error frames.
Federated learning is the centralized training of statistical models from decentralized data on mobile devices while preserving the privacy of each device. We present a robust aggregation approach to make federated learning robust to settings when a fraction of the devices may be sending corrupted updates to the server. The approach relies on a robust aggregation oracle based on the geometric median, which returns a robust aggregate using a constant number of iterations of a regular non-robust averaging oracle. The robust aggregation oracle is privacy-preserving, similar to the non-robust secure average oracle it builds upon. We establish its convergence for least squares estimation of additive models. We provide experimental results with linear models and deep networks for three tasks in computer vision and natural language processing. The robust aggregation approach is agnostic to the level of corruption; it outperforms the classical aggregation approach in terms of robustness when the level of corruption is high, while being competitive in the regime of low corruption. Two variants, a faster one with one-step robust aggregation and another one with on-device personalization, round off the paper.
In the past few decades, artificial intelligence (AI) technology has experienced swift developments, changing everyone's daily life and profoundly altering the course of human society. The intention of developing AI is to benefit humans, by reducing human labor, bringing everyday convenience to human lives, and promoting social good. However, recent research and AI applications show that AI can cause unintentional harm to humans, such as making unreliable decisions in safety-critical scenarios or undermining fairness by inadvertently discriminating against one group. Thus, trustworthy AI has attracted immense attention recently, which requires careful consideration to avoid the adverse effects that AI may bring to humans, so that humans can fully trust and live in harmony with AI technologies. Recent years have witnessed a tremendous amount of research on trustworthy AI. In this survey, we present a comprehensive survey of trustworthy AI from a computational perspective, to help readers understand the latest technologies for achieving trustworthy AI. Trustworthy AI is a large and complex area, involving various dimensions. In this work, we focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being. For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems. We also discuss the accordant and conflicting interactions among different dimensions and discuss potential aspects for trustworthy AI to investigate in the future.