In this work, we analyze the privacy guarantees of Zigbee protocol, an energy-efficient wireless IoT protocol that is increasingly being deployed in smart home settings. Specifically, we devise two passive inference techniques to demonstrate how a passive eavesdropper, located outside the smart home, can reliably identify in-home devices or events from the encrypted wireless Zigbee traffic by 1) inferring a single application layer (APL) command in the event's traffic burst, and 2) exploiting the device's periodic reporting pattern and interval. This enables an attacker to infer user's habits or determine if the smart home is vulnerable to unauthorized entry. We evaluated our techniques on 19 unique Zigbee devices across several categories and 5 popular smart hubs in three different scenarios: i) controlled shield, ii) living smart-home IoT lab, and iii) third-party Zigbee captures. Our results indicate over 85% accuracy in determining events and devices using the command inference approach, without the need of a-priori device signatures, and 99.8% accuracy in determining known devices using the periodic reporting approach. In addition, we identified APL commands in a third party capture file with 90.6% accuracy. Through this work, we highlight the trade-off between designing a low-power, low-cost wireless network and achieving privacy guarantees.
We propose a new approach to temporal inference, inspired by the Pearlian causal inference paradigm - though quite different from Pearl's approach formally. Rather than using directed acyclic graphs, we make use of factored sets, which are sets expressed as Cartesian products. We show that finite factored sets are powerful tools for inferring temporal relations. We introduce an analog of d-separation for factored sets, conditional orthogonality, and we demonstrate that this notion is equivalent to conditional independence in all probability distributions on a finite factored set.
Generative models trained using Differential Privacy (DP) are increasingly used to produce and share synthetic data in a privacy-friendly manner. In this paper, we set out to analyze the impact of DP on these models vis-a-vis underrepresented classes and subgroups of data. We do so from two angles: 1) the size of classes and subgroups in the synthetic data, and 2) classification accuracy on them. We also evaluate the effect of various levels of imbalance and privacy budgets. Our experiments, conducted using three state-of-the-art DP models (PrivBayes, DP-WGAN, and PATE-GAN), show that DP results in opposite size distributions in the generated synthetic data. More precisely, it affects the gap between the majority and minority classes and subgroups, either reducing it (a "Robin Hood" effect) or increasing it ("Matthew" effect). However, both of these size shifts lead to similar disparate impacts on a classifier's accuracy, affecting disproportionately more the underrepresented subparts of the data. As a result, we call for caution when analyzing or training a model on synthetic data, or risk treating different subpopulations unevenly, which might also lead to unreliable conclusions.
Due to the pervasive diffusion of personal mobile and IoT devices, many "smart environments" (e.g., smart cities and smart factories) will be, generators of huge amounts of data. Currently, analysis of this data is typically achieved through centralised cloud-based services. However, according to many studies, this approach may present significant issues from the standpoint of data ownership, as well as wireless network capacity. In this paper, we exploit the fog computing paradigm to move computation close to where data is produced. We exploit a well-known distributed machine learning framework (Hypothesis Transfer Learning), and perform data analytics on mobile nodes passing by IoT devices, in addition to fog gateways at the edge of the network infrastructure. We analyse the performance of different configurations of the distributed learning framework, in terms of (i) accuracy obtained in the learning task and (ii) energy spent to send data between the involved nodes. Specifically, we consider reference wireless technologies for communication between the different types of nodes we consider, e.g. LTE, Nb-IoT, 802.15.4, 802.11, etc. Our results show that collecting data through the mobile nodes and executing the distributed analytics using short-range communication technologies, such as 802.15.4 and 802.11, allows to strongly reduce the energy consumption of the system up to $94\%$ with a loss in accuracy w.r.t. a centralised cloud solution up to $2\%$.
Anti-malware agents typically communicate with their remote services to share information about suspicious files. These remote services use their up-to-date information and global context (view) to help classify the files and instruct their agents to take a predetermined action (e.g., delete or quarantine). In this study, we provide a security analysis of a specific form of communication between anti-malware agents and their services, which takes place entirely over the insecure DNS protocol. These services, which we denote DNS anti-malware list (DNSAML) services, affect the classification of files scanned by anti-malware agents, therefore potentially putting their consumers at risk due to known integrity and confidentiality flaws of the DNS protocol. By analyzing a large-scale DNS traffic dataset made available to the authors by a well-known CDN provider, we identify anti-malware solutions that seem to make use of DNSAML services. We found that these solutions, deployed on almost three million machines worldwide, exchange hundreds of millions of DNS requests daily. These requests are carrying sensitive file scan information, oftentimes - as we demonstrate - without any additional safeguards to compensate for the insecurities of the DNS protocol. As a result, these anti-malware solutions that use DNSAML are made vulnerable to DNS attacks. For instance, an attacker capable of tampering with DNS queries, gains the ability to alter the classification of scanned files, without presence on the scanning machine. We showcase three attacks applicable to at least three anti-malware solutions that could result in the disclosure of sensitive information and improper behavior of the anti-malware agent, such as ignoring detected threats. Finally, we propose and review a set of countermeasures for anti-malware solution providers to prevent the attacks stemming from the use of DNSAML services.
Strawberries are widely appreciated for their aroma, bright red color, juicy texture, and sweetness. They are, however, among the most sensitive fruits when it comes to the quality of the end product. The recent commercial trends show a rising number of farmers who directly sell their products and are interested in using smart solutions for a continuous quality control of the product. Cloud-based approaches for smart irrigation have been widely used in the recent years. However, the network traffic, security and regulatory challenges, which come hand in hand with sharing the crop data with third parties outside the edge of the network, lead strawberry farmers and data owners to rely on global clouds and potentially lose control over their data, which are usually transferred to third party data centers. In this paper, we follow a three-step methodological approach in order to design, implement and validate a solution for smart strawberry irrigation in greenhouses, while keeping the corresponding data at the edge of the network: (i) We develop a small-scale smart irrigation prototype solution with off-the-shelf hardware and software equipment, which we test and evaluate on different kinds of plants in order to gain useful insights for larger scale deployments, (ii) we introduce a reference network architecture, specifically targeting smart irrigation and edge data distribution for strawberry greenhouses, and (iii) adopting the proposed reference architecture, we implement a full-scale system in an actual strawberry greenhouse environment in Greece, and we compare its performance against that of conventional strawberries irrigation. We show that our design significantly outperforms the conventional approach, both in terms of soil moisture variation and in terms of water consumption, and conclude by critically appraising the costs and benefits of our approach in the agricultural industry.
The traditional convolution neural networks (CNN) have several drawbacks like the Picasso effect and the loss of information by the pooling layer. The Capsule network (CapsNet) was proposed to address these challenges because its architecture can encode and preserve the spatial orientation of input images. Similar to traditional CNNs, CapsNet is also vulnerable to several malicious attacks, as studied by several researchers in the literature. However, most of these studies focus on single-device-based inference, but horizontally collaborative inference in state-of-the-art systems, like intelligent edge services in self-driving cars, voice controllable systems, and drones, nullify most of these analyses. Horizontal collaboration implies partitioning the trained CNN models or CNN tasks to multiple end devices or edge nodes. Therefore, it is imperative to examine the robustness of the CapsNet against malicious attacks when deployed in horizontally collaborative environments. Towards this, we examine the robustness of the CapsNet when subjected to noise-based inference attacks in a horizontal collaborative environment. In this analysis, we perturbed the feature maps of the different layers of four DNN models, i.e., CapsNet, Mini-VGG, LeNet, and an in-house designed CNN (ConvNet) with the same number of parameters as CapsNet, using two types of noised-based attacks, i.e., Gaussian Noise Attack and FGSM noise attack. The experimental results show that similar to the traditional CNNs, depending upon the access of the attacker to the DNN layer, the classification accuracy of the CapsNet drops significantly. For example, when Gaussian Noise Attack classification is performed at the DigitCap layer of the CapsNet, the maximum classification accuracy drop is approximately 97%.
Mobile devices have become an indispensable component of modern life. Their high storage capacity gives these devices the capability to store vast amounts of sensitive personal data, which makes them a high-value target: these devices are routinely stolen by criminals for data theft, and are increasingly viewed by law enforcement agencies as a valuable source of forensic data. Over the past several years, providers have deployed a number of advanced cryptographic features intended to protect data on mobile devices, even in the strong setting where an attacker has physical access to a device. Many of these techniques draw from the research literature, but have been adapted to this entirely new problem setting. This involves a number of novel challenges, which are incompletely addressed in the literature. In this work, we outline those challenges, and systematize the known approaches to securing user data against extraction attacks. Our work proposes a methodology that researchers can use to analyze cryptographic data confidentiality for mobile devices. We evaluate the existing literature for securing devices against data extraction adversaries with powerful capabilities including access to devices and to the cloud services they rely on. We then analyze existing mobile device confidentiality measures to identify research areas that have not received proper attention from the community and represent opportunities for future research.
Car sharing is one the pillars of a smart transportation infrastructure, as it is expected to reduce traffic congestion, parking demands and pollution in our cities. From the point of view of demand modelling, car sharing is a weak signal in the city landscape: only a small percentage of the population uses it, and thus it is difficult to study reliably with traditional techniques such as households travel diaries. In this work, we depart from these traditional approaches and we leverage web-based, digital records about vehicle availability in 10 European cities for one of the major active car sharing operators. We discuss which sociodemographic and urban activity indicators are associated with variations in car sharing demand, which forecasting approach (among the most popular in the related literature) is better suited to predict pickup and drop-off events, and how the spatio-temporal information about vehicle availability can be used to infer how different zones in a city are used by customers. We conclude the paper by presenting a direct application of the analysis of the dataset, aimed at identifying where to locate maintenance facilities within the car sharing operation area.
With the recent advances of IoT (Internet of Things) new and more robust security frameworks are needed to detect and mitigate new forms of cyber-attacks, which exploit complex and heterogeneity IoT networks, as well as, the existence of many vulnerabilities in IoT devices. With the rise of blockchain technologies service providers pay considerable attention to better understand and adopt blockchain technologies in order to have better secure and trusted systems for own organisations and their customers. The present paper introduces a high level guide for the senior officials and decision makers in the organisations and technology managers for blockchain security framework by design principle for trust and adoption in IoT environments. The paper discusses Cyber-Trust project blockchain technology development as a representative case study for offered security framework. Security and privacy by design approach is introduced as an important consideration in setting up the framework.
ASR (automatic speech recognition) systems like Siri, Alexa, Google Voice or Cortana has become quite popular recently. One of the key techniques enabling the practical use of such systems in people's daily life is deep learning. Though deep learning in computer vision is known to be vulnerable to adversarial perturbations, little is known whether such perturbations are still valid on the practical speech recognition. In this paper, we not only demonstrate such attacks can happen in reality, but also show that the attacks can be systematically conducted. To minimize users' attention, we choose to embed the voice commands into a song, called CommandSong. In this way, the song carrying the command can spread through radio, TV or even any media player installed in the portable devices like smartphones, potentially impacting millions of users in long distance. In particular, we overcome two major challenges: minimizing the revision of a song in the process of embedding commands, and letting the CommandSong spread through the air without losing the voice "command". Our evaluation demonstrates that we can craft random songs to "carry" any commands and the modify is extremely difficult to be noticed. Specially, the physical attack that we play the CommandSongs over the air and record them can success with 94 percentage.