Data anonymization is an approach to privacy-preserving data release aimed at preventing participants reidentification, and it is an important alternative to differential privacy in applications that cannot tolerate noisy data. Existing algorithms for enforcing $k$-anonymity in the released data assume that the curator performing the anonymization has complete access to the original data. Reasons for limiting this access range from undesirability to complete infeasibility. This paper explores ideas -- objectives, metrics, protocols, and extensions -- for reducing the trust that must be placed in the curator, while still maintaining a statistical notion of $k$-anonymity. We suggest trust (amount of information provided to the curator) and privacy (anonymity of the participants) as the primary objectives of such a framework. We describe a class of protocols aimed at achieving these goals, proposing new metrics of privacy in the process, and proving related bounds. We conclude by discussing a natural extension of this work that completely removes the need for a central curator.
Unlike suggested during their early years of existence, Bitcoin and similar cryptocurrencies in fact offer significantly less privacy as compared to traditional banking. A myriad of privacy-enhancing extensions to those cryptocurrencies as well as several clean-slate privacy-protecting cryptocurrencies have been proposed in turn. To convey a better understanding of the protection of popular design decisions, we investigate expected anonymity set sizes in an initial simulation study. The large variation of expected transaction values yields soberingly small effective anonymity sets for protocols that leak transaction values. We hence examine the effect of preliminary, intuitive strategies for merging groups of payments into larger anonymity sets, for instance by choosing from pre-specified value classes. The results hold promise, as they indeed induce larger anonymity sets at comparatively low cost, depending on the corresponding strategy
The Internet of Things (IoT) is one of the emerging technologies that has grabbed the attention of researchers from academia and industry. The idea behind Internet of things is the interconnection of internet enabled things or devices to each other and to humans, to achieve some common goals. In near future IoT is expected to be seamlessly integrated into our environment and human will be wholly solely dependent on this technology for comfort and easy life style. Any security compromise of the system will directly affect human life. Therefore security and privacy of this technology is foremost important issue to resolve. In this paper we present a thorough study of security problems in IoT and classify possible cyberattacks on each layer of IoT architecture. We also discuss challenges to traditional security solutions such as cryptographic solutions, authentication mechanisms and key management in IoT. Device authentication and access controls is an essential area of IoT security, which is not surveyed so far. We spent our efforts to bring the state of the art device authentication and access control techniques on a single paper.
Data collection and research methodology represents a critical part of the research pipeline. On the one hand, it is important that we collect data in a way that maximises the validity of what we are measuring, which may involve the use of long scales with many items. On the other hand, collecting a large number of items across multiple scales results in participant fatigue, and expensive and time consuming data collection. It is therefore important that we use the available resources optimally. In this work, we consider how a consideration for theory and the associated causal/structural model can help us to streamline data collection procedures by not wasting time collecting data for variables which are not causally critical for subsequent analysis. This not only saves time and enables us to redirect resources to attend to other variables which are more important, but also increases research transparency and the reliability of theory testing. In order to achieve this streamlined data collection, we leverage structural models, and Markov conditional independency structures implicit in these models to identify the substructures which are critical for answering a particular research question. In this work, we review the relevant concepts and present a number of didactic examples with the hope that psychologists can use these techniques to streamline their data collection process without invalidating the subsequent analysis. We provide a number of simulation results to demonstrate the limited analytical impact of this streamlining.
We present a framework for gesture customization requiring minimal examples from users, all without degrading the performance of existing gesture sets. To achieve this, we first deployed a large-scale study (N=500+) to collect data and train an accelerometer-gyroscope recognition model with a cross-user accuracy of 95.7% and a false-positive rate of 0.6 per hour when tested on everyday non-gesture data. Next, we design a few-shot learning framework which derives a lightweight model from our pre-trained model, enabling knowledge transfer without performance degradation. We validate our approach through a user study (N=20) examining on-device customization from 12 new gestures, resulting in an average accuracy of 55.3%, 83.1%, and 87.2% on using one, three, or five shots when adding a new gesture, while maintaining the same recognition accuracy and false-positive rate from the pre-existing gesture set. We further evaluate the usability of our real-time implementation with a user experience study (N=20). Our results highlight the effectiveness, learnability, and usability of our customization framework. Our approach paves the way for a future where users are no longer bound to pre-existing gestures, freeing them to creatively introduce new gestures tailored to their preferences and abilities.
We describe a cognitive architecture intended to solve a wide range of problems based on the five identified principles of brain activity, with their implementation in three subsystems: logical-probabilistic inference, probabilistic formal concepts, and functional systems theory. Building an architecture involves the implementation of a task-driven approach that allows defining the target functions of applied applications as tasks formulated in terms of the operating environment corresponding to the task, expressed in the applied ontology. We provide a basic ontology for a number of practical applications as well as for the subject domain ontologies based upon it, describe the proposed architecture, and give possible examples of the execution of these applications in this architecture.
Materialized model query aims to find the most appropriate materialized model as the initial model for model reuse. It is the precondition of model reuse, and has recently attracted much attention. Nonetheless, the existing methods suffer from low privacy protection, limited range of applications, and inefficiency since they do not construct a suitable metric to measure the target-related knowledge of materialized models. To address this, we present MMQ, a privacy-protected, general, efficient, and effective materialized model query framework. It uses a Gaussian mixture-based metric called separation degree to rank materialized models. For each materialized model, MMQ first vectorizes the samples in the target dataset into probability vectors by directly applying this model, then utilizes Gaussian distribution to fit for each class of probability vectors, and finally uses separation degree on the Gaussian distributions to measure the target-related knowledge of the materialized model. Moreover, we propose an improved MMQ (I-MMQ), which significantly reduces the query time while retaining the query performance of MMQ. Extensive experiments on a range of practical model reuse workloads demonstrate the effectiveness and efficiency of MMQ.
Recently, numerous studies have demonstrated the presence of bias in machine learning powered decision-making systems. Although most definitions of algorithmic bias have solid mathematical foundations, the corresponding bias detection techniques often lack statistical rigor, especially for non-iid data. We fill this gap in the literature by presenting a rigorous non-parametric testing procedure for bias according to Predictive Rate Parity, a commonly considered notion of algorithmic bias. We adapt traditional asymptotic results for non-parametric estimators to test for bias in the presence of dependence commonly seen in user-level data generated by technology industry applications and illustrate how these approaches can be leveraged for mitigation. We further propose modifications of this methodology to address bias measured through marginal outcome disparities in classification settings and extend notions of predictive rate parity to multi-objective models. Experimental results on real data show the efficacy of the proposed detection and mitigation methods.
Earables (ear wearables) is rapidly emerging as a new platform encompassing a diverse range of personal applications. The traditional authentication methods hence become less applicable and inconvenient for earables due to their limited input interface. Nevertheless, earables often feature rich around-the-head sensing capability that can be leveraged to capture new types of biometrics. In this work, we proposeToothSonic which leverages the toothprint-induced sonic effect produced by users performing teeth gestures for earable authentication. In particular, we design representative teeth gestures that can produce effective sonic waves carrying the information of the toothprint. To reliably capture the acoustic toothprint, it leverages the occlusion effect of the ear canal and the inward-facing microphone of the earables. It then extracts multi-level acoustic features to reflect the intrinsic toothprint information for authentication. The key advantages of ToothSonic are that it is suitable for earables and is resistant to various spoofing attacks as the acoustic toothprint is captured via the user's private teeth-ear channel that modulates and encrypts the sonic waves. Our experiment studies with 25 participants show that ToothSonic achieves up to 95% accuracy with only one of the users' tooth gestures.
In variable selection, a selection rule that prescribes the permissible sets of selected variables (called a "selection dictionary") is desirable due to the inherent structural constraints among the candidate variables. The methods that can incorporate such restrictions can improve model interpretability and prediction accuracy. Penalized regression can integrate selection rules by assigning the coefficients to different groups and then applying penalties to the groups. However, no general framework has been proposed to formalize selection rules and their applications. In this work, we establish a framework for structured variable selection that can incorporate universal structural constraints. We develop a mathematical language for constructing arbitrary selection rules, where the selection dictionary is formally defined. We show that all selection rules can be represented as a combination of operations on constructs, which can be used to identify the related selection dictionary. One may then apply some criteria to select the best model. We show that the theoretical framework can help to identify the grouping structure in existing penalized regression methods. In addition, we formulate structured variable selection into mixed-integer optimization problems which can be solved by existing software. Finally, we discuss the significance of the framework in the context of statistics.
Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).