This paper implements and analyzes multiple networks with the goal of understanding their suitability for edge device applications such as X-ray threat detection. In this study, we use the state-of-the-art YOLO object detection model to solve this task of detecting threats in security baggage screening images. We designed and studied three models - Tiny YOLO, QCFS Tiny YOLO, and SNN Tiny YOLO. We utilize an alternative activation function calculated to have zero expected conversion error with the activation of a spiking activation function in our Tiny YOLOv7 model. This \textit{QCFS} version of the Tiny YOLO replicates the activation function from ultra-low latency and high-efficiency SNN architecture. It achieves state-of-the-art performance on CLCXray, an open-source X-ray threat Detection dataset. In addition, we also study the behavior of a Spiking Tiny YOLO on the same X-ray threat Detection dataset.
Recent unsupervised anomaly detection methods often rely on feature extractors pretrained with auxiliary datasets or on well-crafted anomaly-simulated samples. However, this might limit their adaptability to an increasing set of anomaly detection tasks due to the priors in the selection of auxiliary datasets or the strategy of anomaly simulation. To tackle this challenge, we first introduce a prior-less anomaly generation paradigm and subsequently develop an innovative unsupervised anomaly detection framework named GRAD, grounded in this paradigm. GRAD comprises three essential components: (1) a diffusion model (PatchDiff) to generate contrastive patterns by preserving the local structures while disregarding the global structures present in normal images, (2) a self-supervised reweighting mechanism to handle the challenge of long-tailed and unlabeled contrastive patterns generated by PatchDiff, and (3) a lightweight patch-level detector to efficiently distinguish the normal patterns and reweighted contrastive patterns. The generation results of PatchDiff effectively expose various types of anomaly patterns, e.g. structural and logical anomaly patterns. In addition, extensive experiments on both MVTec AD and MVTec LOCO datasets also support the aforementioned observation and demonstrate that GRAD achieves competitive anomaly detection accuracy and superior inference speed.
In this paper, we introduce MVSparse, a novel and efficient framework for cooperative multi-person tracking across multiple synchronized cameras. The MVSparse system is comprised of a carefully orchestrated pipeline, combining edge server-based models with distributed lightweight Reinforcement Learning (RL) agents operating on individual cameras. These RL agents intelligently select informative blocks within each frame based on historical camera data and detection outcomes from neighboring cameras, significantly reducing computational load and communication overhead. The edge server aggregates multiple camera views to perform detection tasks and provides feedback to the individual agents. By projecting inputs from various perspectives onto a common ground plane and applying deep detection models, MVSparse optimally leverages temporal and spatial redundancy in multi-view videos. Notably, our contributions include an empirical analysis of multi-camera pedestrian tracking datasets, the development of a multi-camera, multi-person detection pipeline, and the implementation of MVSparse, yielding impressive results on both open datasets and real-world scenarios. Experimentally, MVSparse accelerates overall inference time by 1.88X and 1.60X compared to a baseline approach while only marginally compromising tracking accuracy by 2.27% and 3.17%, respectively, showcasing its promising potential for efficient multi-camera tracking applications.
This paper analyzes the benefits of sampling intraday returns in intrinsic time for the standard and pre-averaging realized variance (RV) estimators. We theoretically show in finite samples and asymptotically that the RV estimator is most efficient under the new concept of realized business time, which samples according to a combination of observed trades and estimated tick variance. Our asymptotic results carry over to the pre-averaging RV estimator under market microstructure noise. The analysis builds on the assumption that asset prices follow a diffusion that is time-changed with a jump process that separately models the transaction times. This provides a flexible model that separately captures the empirically varying trading intensity and tick variance processes, which are particularly relevant for disentangling the driving forces of the sampling schemes. Extensive simulations confirm our theoretical results and show that realized business time remains superior also under more general noise and process specifications. An application to stock data provides empirical evidence for the benefits of using realized business time sampling to construct more efficient RV estimators as well as for an improved forecasting performance.
Whereas distributed computing research has been very successful in exploring the solvability/impossibility border of distributed computing problems like consensus in representative classes of computing models with respect to model parameters like failure bounds, this is not the case for characterizing necessary and sufficient communication requirements. In this paper, we introduce network abstractions as a novel approach for modeling communication requirements in asynchronous distributed systems. A network abstraction of a run is a sequence of directed graphs on the set of processes, where the $i$-th graph defines the potential message chains that may arise in the $i$-th portion of the run. Formally, it is defined via associating (potential) message sending times with the corresponding message receiving times in a message schedule. Network abstractions allow to reason about the future causal cones that might arise in a run, hence also facilitate reasoning about liveness properties, and are inherently compatible with temporal epistemic reasoning frameworks. We demonstrate the utility of our approach by providing necessary and sufficient network abstractions for solving the canonical firing rebels with relay (FRR) problem, and variants thereof, in asynchronous systems with up to $f$ byzantine processes. FRR is not only a basic primitive in clock synchronization and consensus algorithms, but also integrates several distributed computing problems, namely triggering events, agreement and even stabilizing agreement, in a single problem instance.
The increasing reliance of drivers on navigation applications has made transportation networks more susceptible to data-manipulation attacks by malicious actors. Adversaries may exploit vulnerabilities in the data collection or processing of navigation services to inject false information, and to thus interfere with the drivers' route selection. Such attacks can significantly increase traffic congestions, resulting in substantial waste of time and resources, and may even disrupt essential services that rely on road networks. To assess the threat posed by such attacks, we introduce a computational framework to find worst-case data-injection attacks against transportation networks. First, we devise an adversarial model with a threat actor who can manipulate drivers by increasing the travel times that they perceive on certain roads. Then, we employ hierarchical multi-agent reinforcement learning to find an approximate optimal adversarial strategy for data manipulation. We demonstrate the applicability of our approach through simulating attacks on the Sioux Falls, ND network topology.
This research explores a novel approach in the realm of learning-based image registration, addressing the limitations inherent in weakly-supervised and unsupervised methods. Weakly-supervised techniques depend heavily on scarce labeled data, while unsupervised strategies rely on indirect measures of accuracy through image similarity. Notably, traditional supervised learning is not utilized due to the lack of precise deformation ground-truth in medical imaging. Our study introduces a unique training framework with On-the-Fly Guidance (OFG) to enhance existing models. This framework, during training, generates pseudo-ground truth a few steps ahead by refining the current deformation prediction with our custom optimizer. This pseudo-ground truth then serves to directly supervise the model in a supervised learning context. The process involves optimizing the predicted deformation with a limited number of steps, ensuring training efficiency and setting achievable goals for each training phase. OFG notably boosts the precision of existing image registration techniques while maintaining the speed of learning-based methods. We assessed our approach using various pseudo-ground truth generation strategies, including predictions and optimized outputs from established registration models. Our experiments spanned three benchmark datasets and three cutting-edge models, with OFG demonstrating significant and consistent enhancements, surpassing previous state-of-the-arts in the field. OFG offers an easily integrable plug-and-play solution to enhance the training effectiveness of learning-based image registration models. Code at //github.com/miraclefactory/on-the-fly-guidance.
Graphs are important data representations for describing objects and their relationships, which appear in a wide diversity of real-world scenarios. As one of a critical problem in this area, graph generation considers learning the distributions of given graphs and generating more novel graphs. Owing to their wide range of applications, generative models for graphs, which have a rich history, however, are traditionally hand-crafted and only capable of modeling a few statistical properties of graphs. Recent advances in deep generative models for graph generation is an important step towards improving the fidelity of generated graphs and paves the way for new kinds of applications. This article provides an extensive overview of the literature in the field of deep generative models for graph generation. Firstly, the formal definition of deep generative models for the graph generation and the preliminary knowledge are provided. Secondly, taxonomies of deep generative models for both unconditional and conditional graph generation are proposed respectively; the existing works of each are compared and analyzed. After that, an overview of the evaluation metrics in this specific domain is provided. Finally, the applications that deep graph generation enables are summarized and five promising future research directions are highlighted.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.
Knowledge graphs capture interlinked information between entities and they represent an attractive source of structured information that can be harnessed for recommender systems. However, existing recommender engines use knowledge graphs by manually designing features, do not allow for end-to-end training, or provide poor scalability. Here we propose Knowledge Graph Convolutional Networks (KGCN), an end-to-end trainable framework that harnesses item relationships captured by the knowledge graph to provide better recommendations. Conceptually, KGCN computes user-specific item embeddings by first applying a trainable function that identifies important knowledge graph relations for a given user and then transforming the knowledge graph into a user-specific weighted graph. Then, KGCN applies a graph convolutional neural network that computes an embedding of an item node by propagating and aggregating knowledge graph neighborhood information. Moreover, to provide better inductive bias KGCN uses label smoothness (LS), which provides regularization over edge weights and we prove that it is equivalent to label propagation scheme on a graph. Finally, We unify KGCN and LS regularization, and present a scalable minibatch implementation for KGCN-LS model. Experiments show that KGCN-LS outperforms strong baselines in four datasets. KGCN-LS also achieves great performance in sparse scenarios and is highly scalable with respect to the knowledge graph size.