The proliferation of connected devices through Internet connectivity presents both opportunities for smart applications and risks to security and privacy. It is vital to proactively address these concerns to fully leverage the potential of the Internet of Things. IoT services where one data owner serves multiple clients, like smart city transportation, smart building management and healthcare can offer benefits but also bring cybersecurity and data privacy risks. For example, in healthcare, a hospital may collect data from medical devices and make it available to multiple clients such as researchers and pharmaceutical companies. This data can be used to improve medical treatments and research but if not protected, it can also put patients' personal information at risk. To ensure the benefits of these services, it is important to implement proper security and privacy measures. In this paper, we propose a symmetric searchable encryption scheme with dynamic updates on a database that has a single owner and multiple clients for IoT environments. Our proposed scheme supports both forward and backward privacy. Additionally, our scheme supports a decentralized storage environment in which data owners can outsource data across multiple servers or even across multiple service providers to improve security and privacy. Further, it takes a minimum amount of effort and costs to revoke a client's access to our system at any time. The performance and formal security analyses of the proposed scheme show that our scheme provides better functionality, and security and is more efficient in terms of computation and storage than the closely related works.
Online computation is a concept to model uncertainty where not all information on a problem instance is known in advance. An online algorithm receives requests which reveal the instance piecewise and has to respond with irrevocable decisions. Often, an adversary is assumed that constructs the instance knowing the deterministic behavior of the algorithm. Thus, the adversary is able to tailor the input to any online algorithm. From a game theoretical point of view, the adversary and the online algorithm are players in an asymmetric two-player game. To overcome this asymmetry, the online algorithm is equipped with an isomorphic copy of the graph, which is referred to as unlabeled map. By applying the game theoretical perspective on online graph problems, where the solution is a subset of the vertices, we analyze the complexity of these online vertex subset games. For this, we introduce a framework for reducing online vertex subset games from TQBF. This framework is based on gadget reductions from 3-SATISFIABILITY to the corresponding offline problem. We further identify a set of rules for extending the 3-SATISFIABILITY-reduction and provide schemes for additional gadgets which assure that these rules are fulfilled. By extending the gadget reduction of the vertex subset problem with these additional gadgets, we obtain a reduction for the corresponding online vertex subset game. At last, we provide example reductions for online vertex subset games based on VERTEX COVER, INDEPENDENT SET, and DOMINATING SET, proving that they are PSPACE-complete. Thus, this paper establishes that the online version with a map of NP-complete vertex subset problems form a large class of PSPACE-complete problems.
Dictionary learning is an effective tool for pattern recognition and classification of time series data. Among various dictionary learning techniques, the dynamic time warping (DTW) is commonly used for dealing with temporal delays, scaling, transformation, and many other kinds of temporal misalignments issues. However, the DTW suffers overfitting or information loss due to its discrete nature in aligning time series data. To address this issue, we propose a generalized time warping invariant dictionary learning algorithm in this paper. Our approach features a generalized time warping operator, which consists of linear combinations of continuous basis functions for facilitating continuous temporal warping. The integration of the proposed operator and the dictionary learning is formulated as an optimization problem, where the block coordinate descent method is employed to jointly optimize warping paths, dictionaries, and sparseness coefficients. The optimized results are then used as hyperspace distance measures to feed classification and clustering algorithms. The superiority of the proposed method in terms of dictionary learning, classification, and clustering is validated through ten sets of public datasets in comparing with various benchmark methods.
Numerous studies have underscored the significant privacy risks associated with various leakage patterns in encrypted data stores. Most existing systems that conceal leakage either (1) incur substantial overheads, (2) focus on specific subsets of leakage patterns, or (3) apply the same security notion across various workloads, thereby impeding the attainment of fine-tuned privacy-efficiency trade-offs. In light of various detrimental leakage patterns, this paper starts with an investigation into which specific leakage patterns require our focus respectively in the contexts of key-value, range-query, and dynamic workloads. Subsequently, we introduce new security notions tailored to the specific privacy requirements of these workloads. Accordingly, we present, SWAT, an efficient construction that progressively enables these workloads, while provably mitigating system-wide leakage via a suite of algorithms with tunable privacy-efficiency trade-offs. We conducted extensive experiments and compiled a detailed result analysis, showing the efficiency of our solution. SWAT is about $10.6\times$ slower than an encryption-only data store that reveals various leakage patterns and is $31.6\times$ faster than a trivially zero-leakage solution. Meanwhile, the performance of SWAT remains highly competitive compared to other designs that mitigate specific types of leakage.
The world has been experiencing rapid urbanization over the last few decades, putting a strain on existing city infrastructure such as waste management, water supply management, public transport and electricity consumption. We are also seeing increasing pollution levels in cities threatening the environment, natural resources and health conditions. However, we must realize that the real growth lies in urbanization as it provides many opportunities to individuals for better employment, healthcare and better education. However, it is imperative to limit the ill effects of rapid urbanization through integrated action plans to enable the development of growing cities. This gave rise to the concept of a smart city in which all available information associated with a city will be utilized systematically for better city management. The proposed system architecture is divided in subsystems and is discussed in individual chapters. The first chapter introduces and gives overview to the reader of the complete system architecture. The second chapter discusses the data monitoring system and data lake system based on the oneM2M standards. DMS employs oneM2M as a middleware layer to achieve interoperability, and DLS uses a multi-tenant architecture with multiple logical databases, enabling efficient and reliable data management. The third chapter discusses energy monitoring and electric vehicle charging systems developed to illustrate the applicability of the oneM2M standards. The fourth chapter discusses the Data Exchange System based on the Indian Urban Data Exchange framework. DES uses IUDX standard data schema and open APIs to avoid data silos and enable secure data sharing. The fifth chapter discusses the 5D-IoT framework that provides uniform data quality assessment of sensor data with meaningful data descriptions.
The current amount of IoT devices and their limitations has come to serve as a motivation for malicious entities to take advantage of such devices and use them for their own gain. To protect against cyberattacks in IoT devices, Machine Learning techniques can be applied to Intrusion Detection Systems. Moreover, privacy related issues associated with centralized approaches can be mitigated through Federated Learning. This work proposes a Host-based Intrusion Detection Systems that leverages Federated Learning and Multi-Layer Perceptron neural networks to detected cyberattacks on IoT devices with high accuracy and enhancing data privacy protection.
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy.
Deployment of Internet of Things (IoT) devices and Data Fusion techniques have gained popularity in public and government domains. This usually requires capturing and consolidating data from multiple sources. As datasets do not necessarily originate from identical sensors, fused data typically results in a complex data problem. Because military is investigating how heterogeneous IoT devices can aid processes and tasks, we investigate a multi-sensor approach. Moreover, we propose a signal to image encoding approach to transform information (signal) to integrate (fuse) data from IoT wearable devices to an image which is invertible and easier to visualize supporting decision making. Furthermore, we investigate the challenge of enabling an intelligent identification and detection operation and demonstrate the feasibility of the proposed Deep Learning and Anomaly Detection models that can support future application that utilizes hand gesture data from wearable devices.
Behaviors of the synthetic characters in current military simulations are limited since they are generally generated by rule-based and reactive computational models with minimal intelligence. Such computational models cannot adapt to reflect the experience of the characters, resulting in brittle intelligence for even the most effective behavior models devised via costly and labor-intensive processes. Observation-based behavior model adaptation that leverages machine learning and the experience of synthetic entities in combination with appropriate prior knowledge can address the issues in the existing computational behavior models to create a better training experience in military training simulations. In this paper, we introduce a framework that aims to create autonomous synthetic characters that can perform coherent sequences of believable behavior while being aware of human trainees and their needs within a training simulation. This framework brings together three mutually complementary components. The first component is a Unity-based simulation environment - Rapid Integration and Development Environment (RIDE) - supporting One World Terrain (OWT) models and capable of running and supporting machine learning experiments. The second is Shiva, a novel multi-agent reinforcement and imitation learning framework that can interface with a variety of simulation environments, and that can additionally utilize a variety of learning algorithms. The final component is the Sigma Cognitive Architecture that will augment the behavior models with symbolic and probabilistic reasoning capabilities. We have successfully created proof-of-concept behavior models leveraging this framework on realistic terrain as an essential step towards bringing machine learning into military simulations.
As data are increasingly being stored in different silos and societies becoming more aware of data privacy issues, the traditional centralized training of artificial intelligence (AI) models is facing efficiency and privacy challenges. Recently, federated learning (FL) has emerged as an alternative solution and continue to thrive in this new reality. Existing FL protocol design has been shown to be vulnerable to adversaries within or outside of the system, compromising data privacy and system robustness. Besides training powerful global models, it is of paramount importance to design FL systems that have privacy guarantees and are resistant to different types of adversaries. In this paper, we conduct the first comprehensive survey on this topic. Through a concise introduction to the concept of FL, and a unique taxonomy covering: 1) threat models; 2) poisoning attacks and defenses against robustness; 3) inference attacks and defenses against privacy, we provide an accessible review of this important topic. We highlight the intuitions, key techniques as well as fundamental assumptions adopted by various attacks and defenses. Finally, we discuss promising future research directions towards robust and privacy-preserving federated learning.
Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue in federated learning: intermittent client availability, where the set of eligible clients may change during the training process. Such an intermittent client availability model would significantly deteriorate the performance of the classical Federated Averaging algorithm (FedAvg for short). We propose a simple distributed non-convex optimization algorithm, called Federated Latest Averaging (FedLaAvg for short), which leverages the latest gradients of all clients, even when the clients are not available, to jointly update the global model in each iteration. Our theoretical analysis shows that FedLaAvg attains the convergence rate of $O(1/(N^{1/4} T^{1/2}))$, achieving a sublinear speedup with respect to the total number of clients. We implement and evaluate FedLaAvg with the CIFAR-10 dataset. The evaluation results demonstrate that FedLaAvg indeed reaches a sublinear speedup and achieves 4.23% higher test accuracy than FedAvg.