Dynamic replication is a wide-spread multi-copy routing approach for efficiently coping with the intermittent connectivity in mobile opportunistic networks. According to it, a node forwards a message replica to an encountered node based on a utility value that captures the latter's fitness for delivering the message to the destination. The popularity of the approach stems from its flexibility to effectively operate in networks with diverse characteristics without requiring special customization. Nonetheless, its drawback is the tendency to produce a high number of replicas that consume limited resources such as energy and storage. To tackle the problem we make the observation that network nodes can be grouped, based on their utility values, into clusters that portray different delivery capabilities. We exploit this finding to transform the basic forwarding strategy, which is to move a packet using nodes of increasing utility, and actually forward it through clusters of increasing delivery capability. The new strategy works in synergy with the basic dynamic replication algorithms and is fully configurable, in the sense that it can be used with virtually any utility function. We also extend our approach to work with two utility functions at the same time, a feature that is especially efficient in mobile networks that exhibit social characteristics. By conducting experiments in a wide set of real-life networks, we empirically show that our method is robust in reducing the overall number of replicas in networks with diverse connectivity characteristics without at the same time hindering delivery efficiency.
Deep Reinforcement Learning (DRL) solutions are becoming pervasive at the edge of the network as they enable autonomous decision-making in a dynamic environment. However, to be able to adapt to the ever-changing environment, the DRL solution implemented on an embedded device has to continue to occasionally take exploratory actions even after initial convergence. In other words, the device has to occasionally take random actions and update the value function, i.e., re-train the Artificial Neural Network (ANN), to ensure its performance remains optimal. Unfortunately, embedded devices often lack processing power and energy required to train the ANN. The energy aspect is particularly challenging when the edge device is powered only by a means of Energy Harvesting (EH). To overcome this problem, we propose a two-part algorithm in which the DRL process is trained at the sink. Then the weights of the fully trained underlying ANN are periodically transferred to the EH-powered embedded device taking actions. Using an EH-powered sensor, real-world measurements dataset, and optimizing for Age of Information (AoI) metric, we demonstrate that such a DRL solution can operate without any degradation in the performance, with only a few ANN updates per day.
The availability of genomic data is essential to progress in biomedical research, personalized medicine, etc. However, its extreme sensitivity makes it problematic, if not outright impossible, to publish or share it. As a result, several initiatives have been launched to experiment with synthetic genomic data, e.g., using generative models to learn the underlying distribution of the real data and generate artificial datasets that preserve its salient characteristics without exposing it. This paper provides the first evaluation of both utility and privacy protection of six state-of-the-art models for generating synthetic genomic data. We assess the performance of the synthetic data on several common tasks, such as allele population statistics and linkage disequilibrium. We then measure privacy through the lens of membership inference attacks, i.e., inferring whether a record was part of the training data. Our experiments show that no single approach to generate synthetic genomic data yields both high utility and strong privacy across the board. Also, the size and nature of the training dataset matter. Moreover, while some combinations of datasets and models produce synthetic data with distributions close to the real data, there often are target data points that are vulnerable to membership inference. Looking forward, our techniques can be used by practitioners to assess the risks of deploying synthetic genomic data in the wild and serve as a benchmark for future work.
Digital signatures are widely used for providing security of communications. At the same time, the security of currently deployed digital signature protocols is based on unproven computational assumptions. An efficient way to ensure an unconditional (information-theoretic) security of communication is to use quantum key distribution (QKD), whose security is based on laws of quantum mechanics. In this work, we develop an unconditionally secure signature scheme that guarantees authenticity and transferability of arbitrary length messages in a QKD network. In the proposed setup, the QKD network consists of two subnetworks: (i) an internal network that includes the signer and with limitation on the number of malicious nodes and (ii) an external network that has no assumptions on the number of malicious nodes. A consequence of the absence of the trust assumption in the external subnetwork is the necessity of assistance from internal subnetwork recipients for the verification of message-signature pairs by external subnetwork recipients. We provide a comprehensive security analysis of the developed scheme, perform an optimization of the scheme parameters with respect to the secret key consumption, and demonstrate that the developed scheme is compatible with the capabilities of currently available QKD devices.
In this paper, we propose a novel class of symmetric key distribution protocols that leverages basic security primitives offered by low-cost, hardware chipsets containing millions of synchronized self-powered timers. The keys are derived from the temporal dynamics of a physical, micro-scale time-keeping device which makes the keys immune to any potential side-channel attacks, malicious tampering, or snooping. Using the behavioral model of the self-powered timers, we first show that the derived key-strings can pass the randomness test as defined by the National Institute of Standards and Technology (NIST) suite. The key-strings are then used in two SPoTKD (Self-Powered Timer Key Distribution) protocols that exploit the timer's dynamics as one-way functions: (a) protocol 1 facilitates secure communications between a user and a remote Server, and (b) protocol 2 facilitates secure communications between two users. In this paper, we investigate the security of these protocols under standard model and against different adversarial attacks. Using Monte-Carlo simulations, we also investigate the robustness of these protocols in the presence of real-world operating conditions and propose error-correcting SPoTKD protocols to mitigate these noise-related artifacts.
With the advent of large LEO satellite communication networks to provide global broadband Internet access, interest in providing edge computing resources within LEO networks has emerged. The LEO Edge promises low-latency, high-bandwidth access to compute and storage resources for a global base of clients and IoT devices regardless of their geographical location. Current proposals assume compute resources or service replicas at every LEO satellite, which requires high upfront investments and can lead to over-provisioning. To implement and use the LEO Edge efficiently, methods for server and service placement are required that help select an optimal subset of satellites as server or service replica locations. In this paper, we show how the existing research on resource placement on a 2D torus can be applied to this problem by leveraging the unique topology of LEO satellite networks. Further, we extend the existing discrete resource placement methods to allow placement with QoS constraints. In simulation of proposed LEO satellite communication networks, we show how QoS depends on orbital parameters and that our proposed method can take these effects into account where the existing approach cannot.
Probabilistic counters are well known tools often used for space-efficient set cardinality estimation. In this paper we investigate probabilistic counters from the perspective of preserving privacy. We use standard, rigid differential privacy notion. The intuition is that the probabilistic counters do not reveal too much information about individuals, but provide only general information about the population. Thus they can be used safely without violating privacy of individuals. It turned out however that providing a precise, formal analysis of privacy parameters of probabilistic counters is surprisingly difficult and needs advanced techniques and a very careful approach. We demonstrate also that probabilistic counters can be used as a privacy protecion mechanism without any extra randomization. That is, the inherit randomization from the protocol is sufficient for protecting privacy, even if the probabilistic counter is used many times. In particular we present a specific privacy-preserving data aggregation protocol based on a probabilistic counter. Our results can be used for example in performing distributed surveys.
The adoption of machine learning techniques in next-generation networks has increasingly attracted the attention of the research community. This is to provide adaptive learning and decision-making approaches to meet the requirements of different verticals, and to guarantee the appropriate performance requirements in complex mobility scenarios. In this perspective, the characterization of mobile service usage represents a funda-mental step. In this vein, this paper highlights the new features and capabilities offered by the "Network Slice Planner"(NSP) in its second version [12]. It also proposes a method combining both supervised and unsupervised learning techniques to analyze the behavior of a mass of mobile users in terms of service consumption. We exploit the data provided by the NSP v2 to conduct our analysis. Furthermore, we provide an evaluation of both the accuracy of the predictor and the performance of the underlying MEC infrastructure.
Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16].
Representation learning for networks provides a new way to mine graphs. Although current researches in this area are able to generate reliable results of node embeddings, they are still limited to homogeneous networks in which all nodes and edges are of the same type. While, increasingly, graphs are heterogeneous with multiple node- and edge- types in the real world. Existing heterogeneous embedding methods are mostly task-based or only able to deal with limited types of node & edge. To tackle this challenge, in this paper, an edge2vec model is proposed to represent nodes in ways that incorporate edge semantics represented as different edge-types in heterogeneous networks. An edge-type transition matrix is optimized from an Expectation-Maximization (EM) framework as an extra criterion of a biased node random walk on networks, and a biased skip-gram model is leveraged to learn node embeddings based on the random walks afterwards. edge2vec is validated and evaluated using three medical domain problems on an ensemble of complex medical networks (more than 10 node- \& edge- types): medical entity classification, compound-gene binding prediction, and medical information searching cost. Results show that by considering edge semantics, edge2vec significantly outperforms other state-of-art models on all three tasks.
Existing Deep Learning frameworks exclusively use either Parameter Server(PS) approach or MPI parallelism. In this paper, we discuss the drawbacks of such approaches and propose a generic framework supporting both PS and MPI programming paradigms, co-existing at the same time. The key advantage of the new model is to embed the scaling benefits of MPI parallelism into the loosely coupled PS task model. Apart from providing a practical usage model of MPI in cloud, such framework allows for novel communication avoiding algorithms that do parameter averaging in Stochastic Gradient Descent(SGD) approaches. We show how MPI and PS models can synergestically apply algorithms such as Elastic SGD to improve the rate of convergence against existing approaches. These new algorithms directly help scaling SGD clusterwide. Further, we also optimize the critical component of the framework, namely global aggregation or allreduce using a novel concept of tensor collectives. These treat a group of vectors on a node as a single object allowing for the existing single vector algorithms to be directly applicable. We back our claims with sufficient emperical evidence using large scale ImageNet 1K data. Our framework is built upon MXNET but the design is generic and can be adapted to other popular DL infrastructures.