Many applications that benefit from data offload to cloud services operate on private data. A now-long line of work has shown that, even when data is offloaded in an encrypted form, an adversary can learn sensitive information by analyzing data access patterns. Existing techniques for oblivious data access-that protect against access pattern attacks-require a centralized and stateful trusted proxy to orchestrate data accesses from applications to cloud services. We show that, in failure-prone deployments, such a centralized and stateful proxy results in violation of oblivious data access security guarantees and/or system unavailability. We thus initiate the study of distributed, fault-tolerant, oblivious data access. We present SHORTSTACK, a distributed proxy architecture for oblivious data access in failure-prone deployments. SHORTSTACK achieves the classical obliviousness guarantee--access patterns observed by the adversary being independent of the input--even under a powerful passive persistent adversary that can force failure of arbitrary (bounded-sized) subset of proxy servers at arbitrary times. We also introduce a security model that enables studying oblivious data access with distributed, failure-prone, servers. We provide a formal proof that SHORTSTACK enables oblivious data access under this model, and show empirically that SHORTSTACK performance scales near-linearly with number of distributed proxy servers.
Performance Monitor Unit (PMU) is a significant hardware module on the current processors, which counts the events launched by processor into a set of PMU counters. Ideally, the events triggered by instructions that are executed but the results are not successfully committed (transient execution) should not be recorded. However, in this study, we discover that some PMU events triggered by the transient execution instructions will actually be recorded by PMU. Based on this, we propose the PMUSpill attack, which enables attackers to maliciously leak the secret data that are loaded during transient executions. The biggest challenge is how to encode the secret data into PMU events. We construct an instruction gadget to solve this challenge, whose execution path that can be identified by PMU counters represents what values the secret data are. We successfully implement the PMUSpill attack to leak the secret data stored in Intel Software Guard Extensions (SGX) (a Trusted Execution Environment (TEE) in the Intel's processors) through real experiments. Besides, we locate the vulnerable PMU counters and their trigger instructions by iterating all the valid PMU counters and instructions. The experiment results demonstrate that there are up to 20 PMU counters available to implement the PMUSpill attack. We also provide some possible hardware and software-based countermeasures for addressing the PMUSpill attack, which can be utilized to enhance the security of processors in future.
Motivated by decentralized sensing and policy evaluation problems, we consider a particular type of distributed optimization problem that involves averaging several stochastic, online observations on a network. We design a dual-based method for this consensus problem with Polyak--Ruppert averaging and analyze its behavior. We show that this algorithm attains an accelerated deterministic error depending optimally on the condition number of the network, and also that it has order-optimal stochastic error. This improves on the guarantees of state-of-the-art distributed optimization algorithms when specialized to this setting, and yields -- among other things -- corollaries for decentralized policy evaluation. Our proofs rely on explicitly studying the evolution of several relevant linear systems, and may be of independent interest. Numerical experiments are provided, which validate our theoretical results and demonstrate that our approach outperforms existing methods in finite-sample scenarios on several natural network topologies.
Distributed data analytics platforms (i.e., Apache Spark, Hadoop) enable cost-effective storage and processing by distributing data and computation to multiple nodes. Since these frameworks' design was primarily motivated by performance and usability, most were assumed to operate in non-malicious settings. Hence, they allow users to execute arbitrary code to analyze the data. To make the situation worse, they do not support fine-grained access control inherently or offer any plugin mechanism to enable it - which makes them risky to be used in multi-tier organizational settings. There have been attempts to build "add-on" solutions to enable fine-grained access control for distributed data analytics platforms. In this paper, we show that by knowing the nature of the solution, an attacker can evade the access control by maliciously using the platform-provided APIs. Specifically, we crafted several attack vectors to evade such solutions. Next, we systematically analyze the threats and potentially risky APIs and propose a two-layered (i.e., proactive and reactive) defense to protect against those attacks. Our proactive security layer utilizes state-of-the-art program analysis to detect potentially malicious user code. The reactive security layer consists of binary integrity checking, instrumentation-based runtime checks, and sandboxed execution. Finally, Using this solution, we provide a secure implementation of a new framework-agnostic fine-grained attribute-based access control framework named SecureDL for Apache Spark. To the best of our knowledge, this is the first work that provides secure fine-grained attribute-based access control distributed data analytics platforms that allow arbitrary code execution. Performance evaluation showed that the overhead due to added security is low.
US Wind power generation has grown significantly over the last decades, both in number and average size of operating turbines. A lower specific power, i.e. larger rotor blades relative to wind turbine capacities, allows to increase capacity factors and to reduce cost. However, this development also reduces system efficiency, i.e. the share of power in the wind flowing through rotor swept areas which is converted to electricity. At the same time, also output power density, the amount of electric energy generated per unit of rotor swept area, may decrease due to the decline of specific power. The precise outcome depends, however, on the interplay of wind resources and wind turbine models. In this study, we present a decomposition of historical US wind power generation data for the period 2001-2021 to study to which extent the decrease in specific power affected system efficiency and output power density. We show that as a result of a decrease in specific power, system efficiency fell and therefore, output power density was reduced during the last decade. Furthermore, we show that the wind available to turbines has increased substantially due to increases in the average hub height of turbines since 2001. However, site quality has slightly decreased during the last 20 years.
Traditional supervised bearing fault diagnosis methods rely on massive labelled data, yet annotations may be very time-consuming or infeasible. The fault diagnosis approach that utilizes limited labelled data is becoming increasingly popular. In this paper, a Wavelet Transform (WT) and self-supervised learning-based bearing fault diagnosis framework is proposed to address the lack of supervised samples issue. Adopting the WT and cubic spline interpolation technique, original measured vibration signals are converted to the time-frequency maps (TFMs) with a fixed scale as inputs. The Vision Transformer (ViT) is employed as the encoder for feature extraction, and the self-distillation with no labels (DINO) algorithm is introduced in the proposed framework for self-supervised learning with limited labelled data and sufficient unlabeled data. Two rolling bearing fault datasets are used for validations. In the case of both datasets only containing 1% labelled samples, utilizing the feature vectors extracted by the trained encoder without fine-tuning, over 90\% average diagnosis accuracy can be obtained based on the simple K-Nearest Neighbor (KNN) classifier. Furthermore, the superiority of the proposed method is demonstrated in comparison with other self-supervised fault diagnosis methods.
As the central nerve of the intelligent vehicle control system, the in-vehicle network bus is crucial to the security of vehicle driving. One of the best standards for the in-vehicle network is the Controller Area Network (CAN bus) protocol. However, the CAN bus is designed to be vulnerable to various attacks due to its lack of security mechanisms. To enhance the security of in-vehicle networks and promote the research in this area, based upon a large scale of CAN network traffic data with the extracted valuable features, this study comprehensively compared fully-supervised machine learning with semi-supervised machine learning methods for CAN message anomaly detection. Both traditional machine learning models (including single classifier and ensemble models) and neural network based deep learning models are evaluated. Furthermore, this study proposed a deep autoencoder based semi-supervised learning method applied for CAN message anomaly detection and verified its superiority over other semi-supervised methods. Extensive experiments show that the fully-supervised methods generally outperform semi-supervised ones as they are using more information as inputs. Typically the developed XGBoost based model obtained state-of-the-art performance with the best accuracy (98.65%), precision (0.9853), and ROC AUC (0.9585) beating other methods reported in the literature.
Advancements in semiconductor technology have reduced dimensions and cost while improving the performance and capacity of chipsets. In addition, advancement in the AI frameworks and libraries brings possibilities to accommodate more AI at the resource-constrained edge of consumer IoT devices. Sensors are nowadays an integral part of our environment which provide continuous data streams to build intelligent applications. An example could be a smart home scenario with multiple interconnected devices. In such smart environments, for convenience and quick access to web-based service and personal information such as calendars, notes, emails, reminders, banking, etc, users link third-party skills or skills from the Amazon store to their smart speakers. Also, in current smart home scenarios, several smart home products such as smart security cameras, video doorbells, smart plugs, smart carbon monoxide monitors, and smart door locks, etc. are interlinked to a modern smart speaker via means of custom skill addition. Since smart speakers are linked to such services and devices via the smart speaker user's account. They can be used by anyone with physical access to the smart speaker via voice commands. If done so, the data privacy, home security and other aspects of the user get compromised. Recently launched, Tensor Cam's AI Camera, Toshiba's Symbio, Facebook's Portal are camera-enabled smart speakers with AI functionalities. Although they are camera-enabled, yet they do not have an authentication scheme in addition to calling out the wake-word. This paper provides an overview of cybersecurity risks faced by smart speaker users due to lack of authentication scheme and discusses the development of a state-of-the-art camera-enabled, microphone array-based modern Alexa smart speaker prototype to address these risks.
The accurate and interpretable prediction of future events in time-series data often requires the capturing of representative patterns (or referred to as states) underpinning the observed data. To this end, most existing studies focus on the representation and recognition of states, but ignore the changing transitional relations among them. In this paper, we present evolutionary state graph, a dynamic graph structure designed to systematically represent the evolving relations (edges) among states (nodes) along time. We conduct analysis on the dynamic graphs constructed from the time-series data and show that changes on the graph structures (e.g., edges connecting certain state nodes) can inform the occurrences of events (i.e., time-series fluctuation). Inspired by this, we propose a novel graph neural network model, Evolutionary State Graph Network (EvoNet), to encode the evolutionary state graph for accurate and interpretable time-series event prediction. Specifically, Evolutionary State Graph Network models both the node-level (state-to-state) and graph-level (segment-to-segment) propagation, and captures the node-graph (state-to-segment) interactions over time. Experimental results based on five real-world datasets show that our approach not only achieves clear improvements compared with 11 baselines, but also provides more insights towards explaining the results of event predictions.
Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields including economics, finance, and traffic. A basic assumption behind multivariate time series forecasting is that its variables depend on one another but, upon looking closely, it is fair to say that existing methods fail to fully exploit latent spatial dependencies between pairs of variables. In recent years, meanwhile, graph neural networks (GNNs) have shown high capability in handling relational dependencies. GNNs require well-defined graph structures for information propagation which means they cannot be applied directly for multivariate time series where the dependencies are not known in advance. In this paper, we propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module, into which external knowledge like variable attributes can be easily integrated. A novel mix-hop propagation layer and a dilated inception layer are further proposed to capture the spatial and temporal dependencies within the time series. The graph learning, graph convolution, and temporal convolution modules are jointly learned in an end-to-end framework. Experimental results show that our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets and achieves on-par performance with other approaches on two traffic datasets which provide extra structural information.
Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.