Bitcoin is the most common cryptocurrency involved in cyber scams. Cybercriminals often utilize pseudonymity and privacy protection mechanism associated with Bitcoin transactions to make their scams virtually untraceable. The Ponzi scheme has attracted particularly significant attention among Bitcoin fraudulent activities. This paper considers a multi-class classification problem to determine whether a transaction is involved in Ponzi schemes or other cyber scams, or is a non-scam transaction. We design a specifically designed crawler to collect data and propose a novel Attention-based Long Short-Term Memory (A-LSTM) method for the classification problem. The experimental results show that the proposed model has better efficiency and accuracy than existing approaches, including Random Forest, Extra Trees, Gradient Boosting, and classical LSTM. With correctly identified scam features, our proposed A-LSTM achieves an F1-score over 82% for the original data and outperforms the existing approaches.
A spectral approximation of a Boolean function is proposed for approximating the decision boundary of an ensemble of Deep Neural Networks (DNNs) solving two-class pattern recognition problems. The Walsh combination of relatively weak DNN classifiers is shown experimentally to be capable of detecting adversarial attacks. By observing the difference in Walsh coefficient approximation between clean and adversarial images, it appears that transferability of attack may be used for detection. Approximating the decision boundary may also aid in understanding the learning and transferability properties of DNNs. While the experiments here use images, the proposed approach of modelling two-class ensemble decision boundaries could in principle be applied to any application area. Code for this paper implementing Walsh Coefficient Examples of approximating artificial Boolean functions can be found at //doi.org/10.24433/CO.3695905.v1
The rapid development of social media provides a hotbed for the dissemination of fake news, which misleads readers and causes negative effects on society. News usually involves texts and images to be more vivid. Consequently, multi-modal fake news detection has received wide attention. Prior efforts primarily conduct multi-modal fusion by simple concatenation or co-attention mechanism, leading to sub-optimal performance. In this paper, we propose a novel mutual learning network based model MMNet, which enhances the multi-modal fusion for fake news detection via mutual learning between text- and vision-centered views towards the same classification objective. Specifically, we design two detection modules respectively based on text- and vision-centered multi-modal fusion features, and enable the mutual learning of the two modules to facilitate the multi-modal fusion, considering the latent consistency between the two modules towards the same training objective. Moreover, we also consider the influence of the image-text matching degree on news authenticity judgement by designing an image-text matching aware co-attention mechanism for multi-modal fusion. Extensive experiments are conducted on three benchmark datasets and the results demonstrate that our proposed MMNet achieves superior performance in fake news detection.
We propose a deep learning method for solving the American options model with a free boundary feature. To extract the free boundary known as the early exercise boundary from our proposed method, we introduce the Landau transformation. For efficient implementation of our proposed method, we further construct a dual solution framework consisting of a novel auxiliary function and free boundary equations. The auxiliary function is formulated to include the feed forward deep neural network (DNN) output and further mimic the far boundary behaviour, smooth pasting condition, and remaining boundary conditions due to the second-order space derivative and first-order time derivative. Because the early exercise boundary and its derivative are not a priori known, the boundary values mimicked by the auxiliary function are in approximate form. Concurrently, we then establish equations that approximate the early exercise boundary and its derivative directly from the DNN output based on some linear relationships at the left boundary. Furthermore, the option Greeks are obtained from the derivatives of this auxiliary function. We test our implementation with several examples and compare them with the existing numerical methods. All indicators show that our proposed deep learning method presents an efficient and alternative way of pricing options with early exercise features.
Real-time detection of anomalies in streaming data is receiving increasing attention as it allows us to raise alerts, predict faults, and detect intrusions or threats across industries. Yet, little attention has been given to compare the effectiveness and efficiency of anomaly detectors for streaming data (i.e., of online algorithms). In this paper, we present a qualitative, synthetic overview of major online detectors from different algorithmic families (i.e., distance, density, tree or projection-based) and highlight their main ideas for constructing, updating and testing detection models. Then, we provide a thorough analysis of the results of a quantitative experimental evaluation of online detection algorithms along with their offline counterparts. The behavior of the detectors is correlated with the characteristics of different datasets (i.e., meta-features), thereby providing a meta-level analysis of their performance. Our study addresses several missing insights from the literature such as (a) how reliable are detectors against a random classifier and what dataset characteristics make them perform randomly; (b) to what extent online detectors approximate the performance of offline counterparts; (c) which sketch strategy and update primitives of detectors are best to detect anomalies visible only within a feature subspace of a dataset; (d) what are the tradeoffs between the effectiveness and the efficiency of detectors belonging to different algorithmic families; (e) which specific characteristics of datasets yield an online algorithm to outperform all others.
Anomaly detection is important in many real-life applications. Recently, self-supervised learning has greatly helped deep anomaly detection by recognizing several geometric transformations. However these methods lack finer features, usually highly depend on the anomaly type, and do not perform well on fine-grained problems. To address these issues, we first introduce in this work three novel and efficient discriminative and generative tasks which have complementary strength: (i) a piece-wise jigsaw puzzle task focuses on structure cues; (ii) a tint rotation recognition is used within each piece, taking into account the colorimetry information; (iii) and a partial re-colorization task considers the image texture. In order to make the re-colorization task more object-oriented than background-oriented, we propose to include the contextual color information of the image border via an attention mechanism. We then present a new out-of-distribution detection function and highlight its better stability compared to existing methods. Along with it, we also experiment different score fusion functions. Finally, we evaluate our method on an extensive protocol composed of various anomaly types, from object anomalies, style anomalies with fine-grained classification to local anomalies with face anti-spoofing datasets. Our model significantly outperforms state-of-the-art with up to 36% relative error improvement on object anomalies and 40% on face anti-spoofing problems.
The essence of multivariate sequential learning is all about how to extract dependencies in data. These data sets, such as hourly medical records in intensive care units and multi-frequency phonetic time series, often time exhibit not only strong serial dependencies in the individual components (the "marginal" memory) but also non-negligible memories in the cross-sectional dependencies (the "joint" memory). Because of the multivariate complexity in the evolution of the joint distribution that underlies the data generating process, we take a data-driven approach and construct a novel recurrent network architecture, termed Memory-Gated Recurrent Networks (mGRN), with gates explicitly regulating two distinct types of memories: the marginal memory and the joint memory. Through a combination of comprehensive simulation studies and empirical experiments on a range of public datasets, we show that our proposed mGRN architecture consistently outperforms state-of-the-art architectures targeting multivariate time series.
Object detection with transformers (DETR) reaches competitive performance with Faster R-CNN via a transformer encoder-decoder architecture. Inspired by the great success of pre-training transformers in natural language processing, we propose a pretext task named random query patch detection to unsupervisedly pre-train DETR (UP-DETR) for object detection. Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. The model is pre-trained to detect these query patches from the original image. During the pre-training, we address two critical issues: multi-task learning and multi-query localization. (1) To trade-off multi-task learning of classification and localization in the pretext task, we freeze the CNN backbone and propose a patch feature reconstruction branch which is jointly optimized with patch detection. (2) To perform multi-query localization, we introduce UP-DETR from single-query patch and extend it to multi-query patches with object query shuffle and attention mask. In our experiments, UP-DETR significantly boosts the performance of DETR with faster convergence and higher precision on PASCAL VOC and COCO datasets. The code will be available soon.
The LSTM network was proposed to overcome the difficulty in learning long-term dependence, and has made significant advancements in applications. With its success and drawbacks in mind, this paper raises the question - do RNN and LSTM have long memory? We answer it partially by proving that RNN and LSTM do not have long memory from a statistical perspective. A new definition for long memory networks is further introduced, and it requires the model weights to decay at a polynomial rate. To verify our theory, we convert RNN and LSTM into long memory networks by making a minimal modification, and their superiority is illustrated in modeling long-term dependence of various datasets.
Over the past few years, we have seen fundamental breakthroughs in core problems in machine learning, largely driven by advances in deep neural networks. At the same time, the amount of data collected in a wide array of scientific domains is dramatically increasing in both size and complexity. Taken together, this suggests many exciting opportunities for deep learning applications in scientific settings. But a significant challenge to this is simply knowing where to start. The sheer breadth and diversity of different deep learning techniques makes it difficult to determine what scientific problems might be most amenable to these methods, or which specific combination of methods might offer the most promising first approach. In this survey, we focus on addressing this central issue, providing an overview of many widely used deep learning models, spanning visual, sequential and graph structured data, associated tasks and different training methods, along with techniques to use deep learning with less data and better interpret these complex models --- two central considerations for many scientific use cases. We also include overviews of the full design process, implementation tips, and links to a plethora of tutorials, research summaries and open-sourced deep learning pipelines and pretrained models, developed by the community. We hope that this survey will help accelerate the use of deep learning across different scientific domains.
Clinical Named Entity Recognition (CNER) aims to identify and classify clinical terms such as diseases, symptoms, treatments, exams, and body parts in electronic health records, which is a fundamental and crucial task for clinical and translational research. In recent years, deep neural networks have achieved significant success in named entity recognition and many other Natural Language Processing (NLP) tasks. Most of these algorithms are trained end to end, and can automatically learn features from large scale labeled datasets. However, these data-driven methods typically lack the capability of processing rare or unseen entities. Previous statistical methods and feature engineering practice have demonstrated that human knowledge can provide valuable information for handling rare and unseen cases. In this paper, we address the problem by incorporating dictionaries into deep neural networks for the Chinese CNER task. Two different architectures that extend the Bi-directional Long Short-Term Memory (Bi-LSTM) neural network and five different feature representation schemes are proposed to handle the task. Computational results on the CCKS-2017 Task 2 benchmark dataset show that the proposed method achieves the highly competitive performance compared with the state-of-the-art deep learning methods.