Price movement forecasting aims at predicting the future trends of financial assets based on the current market conditions and other relevant information. Recently, machine learning(ML) methods have become increasingly popular and achieved promising results for price movement forecasting in both academia and industry. Most existing ML solutions formulate the forecasting problem as a classification(to predict the direction) or a regression(to predict the return) problem in the entire set of training data. However, due to the extremely low signal-to-noise ratio and stochastic nature of financial data, good trading opportunities are extremely scarce. As a result, without careful selection of potentially profitable samples, such ML methods are prone to capture the patterns of noises instead of real signals. To address the above issues, we propose a novel framework-LARA(Locality-Aware Attention and Adaptive Refined Labeling), which contains the following three components: 1)Locality-aware attention automatically extracts the potentially profitable samples by attending to their label information in order to construct a more accurate classifier on these selected samples. 2)Adaptive refined labeling further iteratively refines the labels, alleviating the noise of samples. 3)Equipped with metric learning techniques, Locality-aware attention enjoys task-specific distance metrics and distributes attention on potentially profitable samples in a more effective way. To validate our method, we conduct comprehensive experiments on three real-world financial markets: ETFs, the China's A-share stock market, and the cryptocurrency market. LARA achieves superior performance compared with the time-series analysis methods and a set of machine learning based competitors on the Qlib platform. Extensive ablation studies and experiments demonstrate that LARA indeed captures more reliable trading opportunities.
The micro-segmentation of customers in the finance sector is a non-trivial task and has been an atypical omission from recent scientific literature. Where traditional segmentation classifies customers based on coarse features such as demographics, micro-segmentation depicts more nuanced differences between individuals, bringing forth several advantages including the potential for improved personalization in financial services. AI and representation learning offer a unique opportunity to solve the problem of micro-segmentation. Although ubiquitous in many industries, the proliferation of AI in sensitive industries such as finance has become contingent on the imperatives of responsible AI. We had previously solved the micro-segmentation problem by extracting temporal features from the state space of a recurrent neural network (RNN). However, due to the inherent opacity of RNNs our solution lacked an explanation - one of the imperatives of responsible AI. In this study, we address this issue by extracting an explanation for and providing an interpretation of our temporal features. We investigate the state space of our RNN and through a linear regression model reconstruct the trajectories in the state space with high fidelity. We show that our linear regression coefficients have not only learned the rules used to create the RNN's output data but have also learned the relationships that were not directly evident in the raw data.
Network intrusion detection systems (NIDS) are one of several solutions that make up a computer security system. They are responsible for inspecting network traffic and triggering alerts when detecting intrusion attempts. One of the most popular approaches in NIDS research today is the Anomaly-based technique, characterized by the ability to recognize previously unobserved attacks. Some A-NIDS systems go beyond the separation into normal and anomalous classes by trying to identify the type of detected anomalies. This is an important capability of a security system, as it allows a more effective response to an intrusion attempt. The existing systems with this ability are often subject to limitations such as high complexity and incorrect labeling of unknown attacks. In this work, we propose an algorithm to be used in NIDS that overcomes these limitations. Our proposal is an adaptation of the Anomaly-based classifier EFC to perform multi-class classification. It has a single layer, with low temporal complexity, and can correctly classify not only the known attacks, but also unprecedented attacks. Our proposal was evaluated in two up-to-date flow-based intrusion detection datasets: CIDDS-001 and CICIDS2017. We also conducted a specific experiment to assess our classifier's ability to correctly label unknown attacks. Our results show that the multi-class EFC is a promising classifier to be used in NIDS.
Mobile devices have become an indispensable component of modern life. Their high storage capacity gives these devices the capability to store vast amounts of sensitive personal data, which makes them a high-value target: these devices are routinely stolen by criminals for data theft, and are increasingly viewed by law enforcement agencies as a valuable source of forensic data. Over the past several years, providers have deployed a number of advanced cryptographic features intended to protect data on mobile devices, even in the strong setting where an attacker has physical access to a device. Many of these techniques draw from the research literature, but have been adapted to this entirely new problem setting. This involves a number of novel challenges, which are incompletely addressed in the literature. In this work, we outline those challenges, and systematize the known approaches to securing user data against extraction attacks. Our work proposes a methodology that researchers can use to analyze cryptographic data confidentiality for mobile devices. We evaluate the existing literature for securing devices against data extraction adversaries with powerful capabilities including access to devices and to the cloud services they rely on. We then analyze existing mobile device confidentiality measures to identify research areas that have not received proper attention from the community and represent opportunities for future research.
In this study, we demonstrate that the norm test and inner product/orthogonality test presented in \cite{Bol18} are equivalent in terms of the convergence rates associated with Stochastic Gradient Descent (SGD) methods if $\epsilon^2=\theta^2+\nu^2$ with specific choices of $\theta$ and $\nu$. Here, $\epsilon$ controls the relative statistical error of the norm of the gradient while $\theta$ and $\nu$ control the relative statistical error of the gradient in the direction of the gradient and in the direction orthogonal to the gradient, respectively. Furthermore, we demonstrate that the inner product/orthogonality test can be as inexpensive as the norm test in the best case scenario if $\theta$ and $\nu$ are optimally selected, but the inner product/orthogonality test will never be more computationally affordable than the norm test if $\epsilon^2=\theta^2+\nu^2$. Finally, we present two stochastic optimization problems to illustrate our results.
Time series forecasting (TSF) is fundamentally required in many real-world applications, such as electricity consumption planning and sales forecasting. In e-commerce, accurate time-series sales forecasting (TSSF) can significantly increase economic benefits. TSSF in e-commerce aims to predict future sales of millions of products. The trend and seasonality of products vary a lot, and the promotion activity heavily influences sales. Besides the above difficulties, we can know some future knowledge in advance except for the historical statistics. Such future knowledge may reflect the influence of the future promotion activity on current sales and help achieve better accuracy. However, most existing TSF methods only predict the future based on historical information. In this work, we make up for the omissions of future knowledge. Except for introducing future knowledge for prediction, we propose Aliformer based on the bidirectional Transformer, which can utilize the historical information, current factor, and future knowledge to predict future sales. Specifically, we design a knowledge-guided self-attention layer that uses known knowledge's consistency to guide the transmission of timing information. And the future-emphasized training strategy is proposed to make the model focus more on the utilization of future knowledge. Extensive experiments on four public benchmark datasets and one proposed large-scale industrial dataset from Tmall demonstrate that Aliformer can perform much better than state-of-the-art TSF methods. Aliformer has been deployed for goods selection on Tmall Industry Tablework, and the dataset will be released upon approval.
Traffic forecasting is an important factor for the success of intelligent transportation systems. Deep learning models including convolution neural networks and recurrent neural networks have been applied in traffic forecasting problems to model the spatial and temporal dependencies. In recent years, to model the graph structures in the transportation systems as well as the contextual information, graph neural networks (GNNs) are introduced as new tools and have achieved the state-of-the-art performance in a series of traffic forecasting problems. In this survey, we review the rapidly growing body of recent research using different GNNs, e.g., graph convolutional and graph attention networks, in various traffic forecasting problems, e.g., road traffic flow and speed forecasting, passenger flow forecasting in urban rail transit systems, demand forecasting in ride-hailing platforms, etc. We also present a collection of open data and source resources for each problem, as well as future research directions. To the best of our knowledge, this paper is the first comprehensive survey that explores the application of graph neural networks for traffic forecasting problems. We have also created a public Github repository to update the latest papers, open data and source resources.
Predicting the road traffic speed is a challenging task due to different types of roads, abrupt speed changes, and spatial dependencies between roads, which requires the modeling of dynamically changing spatial dependencies among roads and temporal patterns over long input sequences. This paper proposes a novel Spatio-Temporal Graph Attention (STGRAT) that effectively captures the spatio-temporal dynamics in road networks. The features of our approach mainly include spatial attention, temporal attention, and spatial sentinel vectors. The spatial attention takes the graph structure information (e.g., distance between roads) and dynamically adjusts spatial correlation based on road states. The temporal attention is responsible for capturing traffic speed changes, while the sentinel vectors allow the model to retrieve new features from spatially correlated nodes or preserve existing features. The experimental results show that STGRAT outperforms existing models, especially in difficult conditions where traffic speeds rapidly change (e.g., rush hours). We additionally provide a qualitative study to analyze when and where STGRAT mainly attended to make accurate predictions during a rush-hour time.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.
In this paper, we study object detection using a large pool of unlabeled images and only a few labeled images per category, named "few-example object detection". The key challenge consists in generating trustworthy training samples as many as possible from the pool. Using few training examples as seeds, our method iterates between model training and high-confidence sample selection. In training, easy samples are generated first and, then the poorly initialized model undergoes improvement. As the model becomes more discriminative, challenging but reliable samples are selected. After that, another round of model improvement takes place. To further improve the precision and recall of the generated training samples, we embed multiple detection models in our framework, which has proven to outperform the single model baseline and the model ensemble method. Experiments on PASCAL VOC'07, MS COCO'14, and ILSVRC'13 indicate that by using as few as three or four samples selected for each category, our method produces very competitive results when compared to the state-of-the-art weakly-supervised approaches using a large number of image-level labels.
Recently, deep learning has achieved very promising results in visual object tracking. Deep neural networks in existing tracking methods require a lot of training data to learn a large number of parameters. However, training data is not sufficient for visual object tracking as annotations of a target object are only available in the first frame of a test sequence. In this paper, we propose to learn hierarchical features for visual object tracking by using tree structure based Recursive Neural Networks (RNN), which have fewer parameters than other deep neural networks, e.g. Convolutional Neural Networks (CNN). First, we learn RNN parameters to discriminate between the target object and background in the first frame of a test sequence. Tree structure over local patches of an exemplar region is randomly generated by using a bottom-up greedy search strategy. Given the learned RNN parameters, we create two dictionaries regarding target regions and corresponding local patches based on the learned hierarchical features from both top and leaf nodes of multiple random trees. In each of the subsequent frames, we conduct sparse dictionary coding on all candidates to select the best candidate as the new target location. In addition, we online update two dictionaries to handle appearance changes of target objects. Experimental results demonstrate that our feature learning algorithm can significantly improve tracking performance on benchmark datasets.