A few potential IoT communication protocols at the application layer have been proposed, including MQTT, CoAP and REST HTTP, with the latter being the protocol of choice for software developers due to its compatibility with the existing systems. We present a theoretical model of the expected buffer size on the REST HTTP client buffer in IoT devices under lossy wireless conditions, and validate the study experimentally. The results show that increasing the buffer size in IoT devices does not always improve performance in lossy environments, hence demonstrating the importance of benchmarking the buffer size in IoT systems deploying REST HTTP.
ASBK (named after the authors' initials) is a recent blockchain protocol tackling data availability attacks against light nodes, employing two-dimensional Reed-Solomon codes to encode the list of transactions and a random sampling phase where adversaries are forced to reveal information. In its original formulation, only codes with rate $1/4$ are considered, and a theoretical analysis requiring computationally demanding formulas is provided. This makes ASBK difficult to optimize in situations of practical interest. In this paper, we introduce a much simpler model for such a protocol, which additionally supports the use of codes with arbitrary rate. This makes blockchains implementing ASBK much easier to design and optimize. Furthermore, disposing of a clearer view of the protocol, some general features and considerations can be derived (e.g., nodes behaviour in largely participated networks). As a concrete application of our analysis, we consider relevant blockchain parameters and find network settings that minimize the amount of data downloaded by light nodes. Our results show that the protocol benefits from the use of codes defined over large finite fields, with code rates that may be even significantly different from the originally proposed ones.
The core issue of cyberspace detecting and mapping is to accurately identify and dynamically track devices. However, with the development of anonymization technology, devices can have multiple IP addresses and MAC addresses, and it is difficult to map multiple virtual attributes to the same physical device by existing detecting and mapping technologies. In this paper, we propose a detailed PUF-based detecting and mapping framework which can actively detect physical resources in cyberspace, construct resource portraits based on physical fingerprints, and dynamically track devices. We present a new method to implement a rowhammer DRAM PUF on a general PC equipped with DDR4 memory. The PUF performance evaluation shows that the extracted rowhammer PUF response is unique and reliable on PC, which can be treated as the device's unique physical fingerprint. The results of detecting and mapping experiments show that the framework we proposed can accurately identify the target devices. Even if the device modifies its MAC address, IP address, and operating system, by constructing a physical fingerprint database for device matching, the identification accuracy is close to the ideal value of 100%.
We introduce Zef, the first Byzantine-Fault Tolerant (BFT) protocol to support payments in anonymous digital coins at arbitrary scale. Zef follows the communication and security model of FastPay: both protocols are asynchronous, low-latency, linearly-scalable, and powered by partially-trusted sharded authorities. In contrast with FastPay, user accounts in Zef are uniquely-identified and safely removable. Zef coins are bound to an account by a digital certificate and otherwise stored off-chain by their owners. To create and redeem coins, users interact with the protocol via privacy-preserving operations: Zef uses randomized commitments and NIZK proofs to hide coin values; and, created coins are made unlinkable using the blind and randomizable threshold anonymous credentials of Coconut. Besides the detailed specifications and our analysis of the protocol, we are making available an open-source implementation of Zef in Rust. Our extensive benchmarks on AWS confirm textbook linear scalability and demonstrate a confirmation time under one second at nominal capacity. Compared to existing anonymous payment systems based on a blockchain, this represents a latency speedup of three orders of magnitude, with no theoretical limit on throughput.
In this paper, a simple yet effective network pruning framework is proposed to simultaneously address the problems of pruning indicator, pruning ratio, and efficiency constraint. This paper argues that the pruning decision should depend on the convolutional weights, and thus proposes novel weight-dependent gates (W-Gates) to learn the information from filter weights and obtain binary gates to prune or keep the filters automatically. To prune the network under efficiency constraints, a switchable Efficiency Module is constructed to predict the hardware latency or FLOPs of candidate pruned networks. Combined with the proposed Efficiency Module, W-Gates can perform filter pruning in an efficiency-aware manner and achieve a compact network with a better accuracy-efficiency trade-off. We have demonstrated the effectiveness of the proposed method on ResNet34, ResNet50, and MobileNet V2, respectively achieving up to 1.33/1.28/1.1 higher Top-1 accuracy with lower hardware latency on ImageNet. Compared with state-of-the-art methods, W-Gates also achieves superior performance.
The selection of essential variables in logistic regression is vital because of its extensive use in medical studies, finance, economics and related fields. In this paper, we explore four main typologies (test-based, penalty-based, screening-based, and tree-based) of frequentist variable selection methods in logistic regression setup. Primary objective of this work is to give a comprehensive overview of the existing literature for practitioners. Underlying assumptions and theory, along with the specifics of their implementations, are detailed as well. Next, we conduct a thorough simulation study to explore the performances of fifteen different methods in terms of variable selection, estimation of coefficients, prediction accuracy as well as time complexity under various settings. We take low, moderate and high dimensional setups and consider different correlation structures for the covariates. A real-life application, using a high-dimensional gene expression data, is also included in this study to further understand the efficacy and consistency of the methods. Finally, based on our findings in the simulated data and in the real data, we provide recommendations for practitioners on the choice of variable selection methods under various contexts.
We present a functional form of the Erd\"os-Renyi law of large numbers for Levy processes.
In many applications, such as recommender systems, online advertising, and product search, click-through rate (CTR) prediction is a critical task, because its accuracy has a direct impact on both platform revenue and user experience. In recent years, with the prevalence of deep learning, CTR prediction has been widely studied in both academia and industry, resulting in an abundance of deep CTR models. Unfortunately, there is still a lack of a standardized benchmark and uniform evaluation protocols for CTR prediction. This leads to the non-reproducible and even inconsistent experimental results among these studies. In this paper, we present an open benchmark (namely FuxiCTR) for reproducible research and provide a rigorous comparison of different models for CTR prediction. Specifically, we ran over 4,600 experiments for a total of more than 12,000 GPU hours in a uniform framework to re-evaluate 24 existing models on two widely-used datasets, Criteo and Avazu. Surprisingly, our experiments show that many models have smaller differences than expected and sometimes are even inconsistent with what reported in the literature. We believe that our benchmark could not only allow researchers to gauge the effectiveness of new models conveniently, but also share some good practices to fairly compare with the state of the arts. We will release all the code and benchmark settings.
In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users' privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users' sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications.
Object detection has made great progress in the past few years along with the development of deep learning. However, most current object detection methods are resource hungry, which hinders their wide deployment to many resource restricted usages such as usages on always-on devices, battery-powered low-end devices, etc. This paper considers the resource and accuracy trade-off for resource-restricted usages during designing the whole object detection framework. Based on the deeply supervised object detection (DSOD) framework, we propose Tiny-DSOD dedicating to resource-restricted usages. Tiny-DSOD introduces two innovative and ultra-efficient architecture blocks: depthwise dense block (DDB) based backbone and depthwise feature-pyramid-network (D-FPN) based front-end. We conduct extensive experiments on three famous benchmarks (PASCAL VOC 2007, KITTI, and COCO), and compare Tiny-DSOD to the state-of-the-art ultra-efficient object detection solutions such as Tiny-YOLO, MobileNet-SSD (v1 & v2), SqueezeDet, Pelee, etc. Results show that Tiny-DSOD outperforms these solutions in all the three metrics (parameter-size, FLOPs, accuracy) in each comparison. For instance, Tiny-DSOD achieves 72.1% mAP with only 0.95M parameters and 1.06B FLOPs, which is by far the state-of-the-arts result with such a low resource requirement.
Conversation interfaces (CIs), or chatbots, are a popular form of intelligent agents that engage humans in task-oriented or informal conversation. In this position paper and demonstration, we argue that chatbots working in dynamic environments, like with sensor data, can not only serve as a promising platform to research issues at the intersection of learning, reasoning, representation and execution for goal-directed autonomy; but also handle non-trivial business applications. We explore the underlying issues in the context of Water Advisor, a preliminary multi-modal conversation system that can access and explain water quality data.