Efficient inference of Deep Neural Networks (DNNs) on resource-constrained edge devices is essential. Quantization and sparsity are key algorithmic techniques that translate to repetition and sparsity within tensors at the hardware-software interface. This paper introduces the concept of repetition-sparsity trade-off that helps explain computational efficiency during inference. We propose Signed Binarization, a unified co-design framework that synergistically integrates hardware-software systems, quantization functions, and representation learning techniques to address this trade-off. Our results demonstrate that Signed Binarization is more accurate than binarization with the same number of non-zero weights. Detailed analysis indicates that signed binarization generates a smaller distribution of effectual (non-zero) parameters nested within a larger distribution of total parameters, both of the same type, for a DNN block. Finally, our approach achieves a 26% speedup on real hardware, doubles energy efficiency, and reduces density by 2.8x compared to binary methods for ResNet 18, presenting an alternative solution for deploying efficient models in resource-limited environments.
The proliferation of mobile social networks (MSNs) has transformed information dissemination, leading to increased reliance on these platforms for news consumption. However, this shift has been accompanied by the widespread propagation of fake news, posing significant challenges in terms of public panic, political influence, and the obscuring of truth. Traditional data processing pipelines for fake news detection in MSNs suffer from lengthy response times and poor scalability, failing to address the unique characteristics of news in MSNs, such as prompt propagation, large-scale quantity, and rapid evolution. This paper introduces a novel system named Decaffe - a DHT Tree-Based Online Federated Fake News Detection system. Decaffe leverages distributed hash table (DHT)-based aggregation trees for scalability and real-time detection, and it employs two model fine-tuning methods for adapting to mobile network dynamics. The system's structure includes a root, branches, and leaves for effective dissemination of a pre-trained model and ensemble-based aggregation of predictive results. Decaffe uniquely combines centralized server-based and decentralized serverless model fine-tuning approaches with personalized model fine-tuning, addressing the challenges of real-time detection, scalability, and adaptability in the dynamic environment of MSNs.
Most research on formal system design has focused on optimizing various measures of efficiency. However, insufficient attention has been given to the design of systems optimizing resilience, the ability of systems to adapt to unexpected changes or adversarial disruptions. In our prior work, we formalized the intuitive notion of resilience as a property of cyber-physical systems by using a multiset rewriting language with explicit time. In the present paper, we study the computational complexity of a formalization of time-bounded resilience problems for the class of $\eta$-simple progressing planning scenarios, where, intuitively, it is simple to check that a system configuration is critical, and only a finite number of actions can be carried out in a bounded time period. We show that, in the time-bounded model with $n$ (potentially adversarially chosen) updates, the corresponding time-bounded resilience problem for this class of systems is complete for the $\Sigma^P_{2n+1}$ class of the polynomial hierarchy, PH. To support the formal models and complexity results, we perform automated experiments for time-bounded verification using the rewriting logic tool Maude.
Internet of Things devices can now be found everywhere, including in our households in the form of Smart Home networks. Despite their ubiquity, their security is unsatisfactory, as demonstrated by recent attacks. The IETF's MUD standard has as goal to simplify and automate the secure deployment of end devices in networks. A MUD file contains a device specific description of allowed network activities (e.g., allowed IP ports or host addresses) and can be used to configure for example a firewall. A major weakness of MUD is that it is not expressive enough to describe traffic patterns representing device interactions, which often occur in modern Smart Home platforms. In this article, we present a new language for describing such traffic patterns. The language allows writing device profiles that are more expressive than MUD files and take into account the interdependencies of traffic connections. We show how these profiles can be translated to efficient code for a lightweight firewall leveraging NFTables to block non-conforming traffic. We evaluate our approach on traffic generated by various Smart Home devices, and show that our system can accurately block unwanted traffic while inducing negligible latency.
The importance of ground Mobile Robots (MRs) and Unmanned Aerial Vehicles (UAVs) within the research community, industry, and society is growing fast. Many of these agents are nowadays equipped with communication systems that are, in some cases, essential to successfully achieve certain tasks. In this context, we have begun to witness the development of a new interdisciplinary research field at the intersection of robotics and communications. This research field has been boosted by the intention of integrating UAVs within the 5G and 6G communication networks. This research will undoubtedly lead to many important applications in the near future. Nevertheless, one of the main obstacles to the development of this research area is that most researchers address these problems by oversimplifying either the robotics or the communications aspect. This impedes the ability of reaching the full potential of this new interdisciplinary research area. In this tutorial, we present some of the modelling tools necessary to address problems involving both robotics and communication from an interdisciplinary perspective. As an illustrative example of such problems, we focus in this tutorial on the issue of communication-aware trajectory planning.
Deep models, e.g., CNNs and Vision Transformers, have achieved impressive achievements in many vision tasks in the closed world. However, novel classes emerge from time to time in our ever-changing world, requiring a learning system to acquire new knowledge continually. For example, a robot needs to understand new instructions, and an opinion monitoring system should analyze emerging topics every day. Class-Incremental Learning (CIL) enables the learner to incorporate the knowledge of new classes incrementally and build a universal classifier among all seen classes. Correspondingly, when directly training the model with new class instances, a fatal problem occurs -- the model tends to catastrophically forget the characteristics of former ones, and its performance drastically degrades. There have been numerous efforts to tackle catastrophic forgetting in the machine learning community. In this paper, we survey comprehensively recent advances in deep class-incremental learning and summarize these methods from three aspects, i.e., data-centric, model-centric, and algorithm-centric. We also provide a rigorous and unified evaluation of 16 methods in benchmark image classification tasks to find out the characteristics of different algorithms empirically. Furthermore, we notice that the current comparison protocol ignores the influence of memory budget in model storage, which may result in unfair comparison and biased results. Hence, we advocate fair comparison by aligning the memory budget in evaluation, as well as several memory-agnostic performance measures. The source code to reproduce these evaluations is available at //github.com/zhoudw-zdw/CIL_Survey/
Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.
We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from the popular FB15K-237 knowledge graph completion dataset by showing that CoDEx covers more diverse and interpretable content, and is a more difficult link prediction benchmark. Data, code, and pretrained models are available at //bit.ly/2EPbrJs.
We propose a knowledge-enhanced approach, ERNIE-ViL, to learn joint representations of vision and language. ERNIE-ViL tries to construct the detailed semantic connections (objects, attributes of objects and relationships between objects in visual scenes) across vision and language, which are essential to vision-language cross-modal tasks. Incorporating knowledge from scene graphs, ERNIE-ViL constructs Scene Graph Prediction tasks, i.e., Object Prediction, Attribute Prediction and Relationship Prediction in the pre-training phase. More specifically, these prediction tasks are implemented by predicting nodes of different types in the scene graph parsed from the sentence. Thus, ERNIE-ViL can model the joint representation characterizing the alignments of the detailed semantics across vision and language. Pre-trained on two large image-text alignment datasets (Conceptual Captions and SBU), ERNIE-ViL learns better and more robust joint representations. It achieves state-of-the-art performance on 5 vision-language downstream tasks after fine-tuning ERNIE-ViL. Furthermore, it ranked the 1st place on the VCR leader-board with an absolute improvement of 3.7%.
The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.
Named entity recognition (NER) in Chinese is essential but difficult because of the lack of natural delimiters. Therefore, Chinese Word Segmentation (CWS) is usually considered as the first step for Chinese NER. However, models based on word-level embeddings and lexicon features often suffer from segmentation errors and out-of-vocabulary (OOV) words. In this paper, we investigate a Convolutional Attention Network called CAN for Chinese NER, which consists of a character-based convolutional neural network (CNN) with local-attention layer and a gated recurrent unit (GRU) with global self-attention layer to capture the information from adjacent characters and sentence contexts. Also, compared to other models, not depending on any external resources like lexicons and employing small size of char embeddings make our model more practical. Extensive experimental results show that our approach outperforms state-of-the-art methods without word embedding and external lexicon resources on different domain datasets including Weibo, MSRA and Chinese Resume NER dataset.