Dynamic time warping (DTW) is an effective dissimilarity measure in many time series applications. Despite its popularity, it is prone to noises and outliers, which leads to singularity problem and bias in the measurement. The time complexity of DTW is quadratic to the length of time series, making it inapplicable in real-time applications. In this paper, we propose a novel time series dissimilarity measure named RobustDTW to reduce the effects of noises and outliers. Specifically, the RobustDTW estimates the trend and optimizes the time warp in an alternating manner by utilizing our designed temporal graph trend filtering. To improve efficiency, we propose a multi-level framework that estimates the trend and the warp function at a lower resolution, and then repeatedly refines them at a higher resolution. Based on the proposed RobustDTW, we further extend it to periodicity detection and outlier time series detection. Experiments on real-world datasets demonstrate the superior performance of RobustDTW compared to DTW variants in both outlier time series detection and periodicity detection.
Subsequence anomaly detection in long sequences is an important problem with applications in a wide range of domains. However, the approaches proposed so far in the literature have severe limitations: they either require prior domain knowledge used to design the anomaly discovery algorithms, or become cumbersome and expensive to use in situations with recurrent anomalies of the same type. In this work, we address these problems, and propose an unsupervised method suitable for domain agnostic subsequence anomaly detection. Our method, Series2Graph, is based on a graph representation of a novel low-dimensionality embedding of subsequences. Series2Graph needs neither labeled instances (like supervised techniques) nor anomaly-free data (like zero-positive learning techniques), and identifies anomalies of varying lengths. The experimental results, on the largest set of synthetic and real datasets used to date, demonstrate that the proposed approach correctly identifies single and recurrent anomalies without any prior knowledge of their characteristics, outperforming by a large margin several competing approaches in accuracy, while being up to orders of magnitude faster. This paper has appeared in VLDB 2020.
Machine learning systems, especially the methods based on deep learning, enjoy great success in modern computer vision tasks under experimental settings. Generally, these classic deep learning methods are built on the \emph{i.i.d.} assumption, supposing the training and test data are drawn from a similar distribution independently and identically. However, the aforementioned \emph{i.i.d.} assumption is in general unavailable in the real-world scenario, and as a result, leads to sharp performance decay of deep learning algorithms. Behind this, domain shift is one of the primary factors to be blamed. In order to tackle this problem, we propose using \textbf{Po}tential \textbf{E}nergy \textbf{R}anking (PoER) to decouple the object feature and the domain feature (\emph{i.e.,} appearance feature) in given images, promoting the learning of label-discriminative features while filtering out the irrelevant correlations between the objects and the background. PoER helps the neural networks to capture label-related features which contain the domain information first in shallow layers and then distills the label-discriminative representations out progressively, enforcing the neural networks to be aware of the characteristic of objects and background which is vital to the generation of domain-invariant features. PoER reports superior performance on domain generalization benchmarks, improving the average top-1 accuracy by at least 1.20\% compared to the existing methods. Moreover, we use PoER in the ECCV 2022 NICO Challenge\footnote{//nicochallenge.com}, achieving top place with only a vanilla ResNet-18. The code has been made available at //github.com/ForeverPs/PoER.
This paper researches the unexplored task-point cloud salient object detection (SOD). Differing from SOD for images, we find the attention shift of point clouds may provoke saliency conflict, i.e., an object paradoxically belongs to salient and non-salient categories. To eschew this issue, we present a novel view-dependent perspective of salient objects, reasonably reflecting the most eye-catching objects in point cloud scenarios. Following this formulation, we introduce PCSOD, the first dataset proposed for point cloud SOD consisting of 2,872 in-/out-door 3D views. The samples in our dataset are labeled with hierarchical annotations, e.g., super-/sub-class, bounding box, and segmentation map, which endows the brilliant generalizability and broad applicability of our dataset verifying various conjectures. To evidence the feasibility of our solution, we further contribute a baseline model and benchmark five representative models for a comprehensive comparison. The proposed model can effectively analyze irregular and unordered points for detecting salient objects. Thanks to incorporating the task-tailored designs, our method shows visible superiority over other baselines, producing more satisfactory results. Extensive experiments and discussions reveal the promising potential of this research field, paving the way for further study.
Since the inception of Bitcoin in 2009, the market of cryptocurrencies has grown beyond initial expectations as daily trades exceed $10 billion. As industries become automated, the need for an automated fraud detector becomes very apparent. Detecting anomalies in real time prevents potential accidents and economic losses. Anomaly detection in multivariate time series data poses a particular challenge because it requires simultaneous consideration of temporal dependencies and relationships between variables. Identifying an anomaly in real time is not an easy task specifically because of the exact anomalistic behavior they observe. Some points may present pointwise global or local anomalistic behavior, while others may be anomalistic due to their frequency or seasonal behavior or due to a change in the trend. In this paper we suggested working on real time series of trades of Ethereum from specific accounts and surveyed a large variety of different algorithms traditional and new. We categorized them according to the strategy and the anomalistic behavior which they search and showed that when bundling them together to different groups, they can prove to be a good real-time detector with an alarm time of no longer than a few seconds and with very high confidence.
Modern convolutional neural networks (CNNs)-based face detectors have achieved tremendous strides due to large annotated datasets. However, misaligned results with high detection confidence but low localization accuracy restrict the further improvement of detection performance. In this paper, the authors first predict high confidence detection results on the training set itself. Surprisingly, a considerable part of them exist in the same misalignment problem. Then, the authors carefully examine these cases and point out that annotation misalignment is the main reason. Later, a comprehensive discussion is given for the replacement rationality between predicted and annotated bounding-boxes. Finally, the authors propose a novel Bounding-Box Deep Calibration (BDC) method to reasonably replace misaligned annotations with model predicted bounding-boxes and offer calibrated annotations for the training set. Extensive experiments on multiple detectors and two popular benchmark datasets show the effectiveness of BDC on improving models' precision and recall rate, without adding extra inference time and memory consumption. Our simple and effective method provides a general strategy for improving face detection, especially for light-weight detectors in real-time situations.
While camera and LiDAR are widely used in most of the assisted and autonomous driving systems, only a few works have been proposed to associate the temporal synchronization and extrinsic calibration for camera and LiDAR which are dedicated to online sensors data fusion. The temporal and spatial calibration technologies are facing the challenges of lack of relevance and real-time. In this paper, we introduce the pose estimation model and environmental robust line features extraction to improve the relevance of data fusion and instant online ability of correction. Dynamic targets eliminating aims to seek optimal policy considering the correspondence of point cloud matching between adjacent moments. The searching optimization process aims to provide accurate parameters with both computation accuracy and efficiency. To demonstrate the benefits of this method, we evaluate it on the KITTI benchmark with ground truth value. In online experiments, our approach improves the accuracy by 38.5\% than the soft synchronization method in temporal calibration. While in spatial calibration, our approach automatically corrects disturbance errors within 0.4 second and achieves an accuracy of 0.3-degree. This work can promote the research and application of sensor fusion.
The perceptive models of autonomous driving require fast inference within a low latency for safety. While existing works ignore the inevitable environmental changes after processing, streaming perception jointly evaluates the latency and accuracy into a single metric for video online perception, guiding the previous works to search trade-offs between accuracy and speed. In this paper, we explore the performance of real time models on this metric and endow the models with the capacity of predicting the future, significantly improving the results for streaming perception. Specifically, we build a simple framework with two effective modules. One is a Dual Flow Perception module (DFP). It consists of dynamic flow and static flow in parallel to capture moving tendency and basic detection feature, respectively. Trend Aware Loss (TAL) is the other module which adaptively generates loss weight for each object with its moving speed. Realistically, we consider multiple velocities driving scene and further propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy. In this realistic setting, we design a efficient mix-velocity training strategy to guide detector perceive any velocities. Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively compared to the strong baseline, validating its effectiveness.
It is a common paradigm in object detection frameworks to treat all samples equally and target at maximizing the performance on average. In this work, we revisit this paradigm through a careful study on how different samples contribute to the overall performance measured in terms of mAP. Our study suggests that the samples in each mini-batch are neither independent nor equally important, and therefore a better classifier on average does not necessarily mean higher mAP. Motivated by this study, we propose the notion of Prime Samples, those that play a key role in driving the detection performance. We further develop a simple yet effective sampling and learning strategy called PrIme Sample Attention (PISA) that directs the focus of the training process towards such samples. Our experiments demonstrate that it is often more effective to focus on prime samples than hard samples when training a detector. Particularly, On the MSCOCO dataset, PISA outperforms the random sampling baseline and hard mining schemes, e.g. OHEM and Focal Loss, consistently by more than 1% on both single-stage and two-stage detectors, with a strong backbone ResNeXt-101.
The prevalence of networked sensors and actuators in many real-world systems such as smart buildings, factories, power plants, and data centers generate substantial amounts of multivariate time series data for these systems. The rich sensor data can be continuously monitored for intrusion events through anomaly detection. However, conventional threshold-based anomaly detection methods are inadequate due to the dynamic complexities of these systems, while supervised machine learning methods are unable to exploit the large amounts of data due to the lack of labeled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system for detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs). Instead of treating each data stream independently, our proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies by discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPS: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results showed that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems.
The low resolution of objects of interest in aerial images makes pedestrian detection and action detection extremely challenging tasks. Furthermore, using deep convolutional neural networks to process large images can be demanding in terms of computational requirements. In order to alleviate these challenges, we propose a two-step, yes and no question answering framework to find specific individuals doing one or multiple specific actions in aerial images. First, a deep object detector, Single Shot Multibox Detector (SSD), is used to generate object proposals from small aerial images. Second, another deep network, is used to learn a latent common sub-space which associates the high resolution aerial imagery and the pedestrian action labels that are provided by the human-based sources