Time series refer to a series of data points indexed in time order, which can be found in various fields, e.g., transportation, healthcare, and finance. Accurate time series forecasting can enhance optimization planning and decision-making support. Time series have multi-scale characteristics, i.e., different temporal patterns at different scales, which presents a challenge for time series forecasting. In this paper, we propose TPRNN, a Top-down Pyramidal Recurrent Neural Network for time series forecasting. We first construct subsequences of different scales from the input, forming a pyramid structure. Then by executing a multi-scale information interaction module from top to bottom, we model both the temporal dependencies of each scale and the influences of subsequences of different scales, resulting in a complete modeling of multi-scale temporal patterns in time series. Experiments on seven real-world datasets demonstrate that TPRNN has achieved the state-of-the-art performance with an average improvement of 8.13% in MSE compared to the best baseline.
In recent years, the foundation models have swept the computer vision field and facilitated the development of various tasks within different modalities. However, it remains an open question on how to design an infrared foundation model. In this paper, we propose InfMAE, a foundation model in infrared modality. We release an infrared dataset, called Inf30 to address the problem of lacking large-scale data for self-supervised learning in the infrared vision community. Besides, we design an information-aware masking strategy, which is suitable for infrared images. This masking strategy allows for a greater emphasis on the regions with richer information in infrared images during the self-supervised learning process, which is conducive to learning the generalized representation. In addition, we adopt a multi-scale encoder to enhance the performance of the pre-trained encoders in downstream tasks. Finally, based on the fact that infrared images do not have a lot of details and texture information, we design an infrared decoder module, which further improves the performance of downstream tasks. Extensive experiments show that our proposed method InfMAE outperforms other supervised methods and self-supervised learning methods in three downstream tasks. Our code will be made public at //github.com/liufangcen/InfMAE.
In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementation details of these algorithms are often just as important (if not more so) for performance as the choice of algorithm. We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods. To address this challenge, we developed a carefully implemented library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment, a high-quality controller for a widely-adopted robot, and a number of challenging example tasks. We provide this library as a resource for the community, describe its design choices, and present experimental results. Perhaps surprisingly, we find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation between 25 to 50 minutes of training per policy on average, improving over state-of-the-art results reported for similar tasks in the literature. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent recovery and correction behaviors. We hope that these promising results and our high-quality open-source implementation will provide a tool for the robotics community to facilitate further developments in robotic RL. Our code, documentation, and videos can be found at //serl-robot.github.io/
Data sonification-mapping data variables to auditory variables, such as pitch or volume-is used for data accessibility, scientific exploration, and data-driven art (e.g., museum exhibitions) among others. While a substantial amount of research has been made on effective and intuitive sonification design, software support is not commensurate, limiting researchers from fully exploring its capabilities. We contribute Erie, a declarative grammar for data sonification, that enables abstractly expressing auditory mappings. Erie supports specifying extensible tone designs (e.g., periodic wave, sampling, frequency/amplitude modulation synthesizers), various encoding channels, auditory legends, and composition options like sequencing and overlaying. Using standard Web Audio and Web Speech APIs, we provide an Erie compiler for web environments. We demonstrate the expressiveness and feasibility of Erie by replicating research prototypes presented by prior work and provide a sonification design gallery. We discuss future steps to extend Erie toward other audio computing environments and support interactive data sonification.
In real-world applications, there is often a domain shift from training to test data. This observation resulted in the development of test-time adaptation (TTA). It aims to adapt a pre-trained source model to the test data without requiring access to the source data. Thereby, most existing works are limited to the closed-set assumption, i.e. there is no category shift between source and target domain. We argue that in a realistic open-world setting a category shift can appear in addition to a domain shift. This means, individual source classes may not appear in the target domain anymore, samples of new classes may be part of the target domain or even both at the same time. Moreover, in many real-world scenarios the test data is not accessible all at once but arrives sequentially as a stream of batches demanding an immediate prediction. Hence, TTA must be applied in an online manner. To the best of our knowledge, the combination of these aspects, i.e. online source-free universal domain adaptation (online SF-UniDA), has not been studied yet. In this paper, we introduce a Contrastive Mean Teacher (COMET) tailored to this novel scenario. It applies a contrastive loss to rebuild a feature space where the samples of known classes build distinct clusters and the samples of new classes separate well from them. It is complemented by an entropy loss which ensures that the classifier output has a small entropy for samples of known classes and a large entropy for samples of new classes to be easily detected and rejected as unknown. To provide the losses with reliable pseudo labels, they are embedded into a mean teacher (MT) framework. We evaluate our method across two datasets and all category shifts to set an initial benchmark for online SF-UniDA. Thereby, COMET yields state-of-the-art performance and proves to be consistent and robust across a variety of different scenarios.
Handwritten character recognition (HCR) is a challenging problem for machine learning researchers. Unlike printed text data, handwritten character datasets have more variation due to human-introduced bias. With numerous unique character classes present, some data, such as Logographic Scripts or Sino-Korean character sequences, bring new complications to the HCR problem. The classification task on such datasets requires the model to learn high-complexity details of the images that share similar features. With recent advances in computational resource availability and further computer vision theory development, some research teams have effectively addressed the arising challenges. Although known for achieving high efficiency, many common approaches are still not generalizable and use dataset-specific solutions to achieve better results. Due to complex structure and high computing demands, existing methods frequently prevent the solutions from gaining popularity. This paper proposes a straightforward, generalizable, and highly effective approach (CharNet) for detailed character image classification and compares its performance to that of existing approaches.
Time series anomaly detection has applications in a wide range of research fields and applications, including manufacturing and healthcare. The presence of anomalies can indicate novel or unexpected events, such as production faults, system defects, or heart fluttering, and is therefore of particular interest. The large size and complex patterns of time series have led researchers to develop specialised deep learning models for detecting anomalous patterns. This survey focuses on providing structured and comprehensive state-of-the-art time series anomaly detection models through the use of deep learning. It providing a taxonomy based on the factors that divide anomaly detection models into different categories. Aside from describing the basic anomaly detection technique for each category, the advantages and limitations are also discussed. Furthermore, this study includes examples of deep anomaly detection in time series across various application domains in recent years. It finally summarises open issues in research and challenges faced while adopting deep anomaly detection models.
Time series forecasting is widely used in business intelligence, e.g., forecast stock market price, sales, and help the analysis of data trend. Most time series of interest are macroscopic time series that are aggregated from microscopic data. However, instead of directly modeling the macroscopic time series, rare literature studied the forecasting of macroscopic time series by leveraging data on the microscopic level. In this paper, we assume that the microscopic time series follow some unknown mixture probabilistic distributions. We theoretically show that as we identify the ground truth latent mixture components, the estimation of time series from each component could be improved because of lower variance, thus benefitting the estimation of macroscopic time series as well. Inspired by the power of Seq2seq and its variants on the modeling of time series data, we propose Mixture of Seq2seq (MixSeq), an end2end mixture model to cluster microscopic time series, where all the components come from a family of Seq2seq models parameterized by different parameters. Extensive experiments on both synthetic and real-world data show the superiority of our approach.
Conventionally, spatiotemporal modeling network and its complexity are the two most concentrated research topics in video action recognition. Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance. In this paper, we attempt to acquire both efficiency and effectiveness simultaneously. First of all, besides traditionally treating H x W x T video frames as space-time signal (viewing from the Height-Width spatial plane), we propose to also model video from the other two Height-Time and Width-Time planes, to capture the dynamics of video thoroughly. Secondly, our model is designed based on 2D CNN backbones and model complexity is well kept in mind by design. Specifically, we introduce a novel multi-view fusion (MVF) module to exploit video dynamics using separable convolution for efficiency. It is a plug-and-play module and can be inserted into off-the-shelf 2D CNNs to form a simple yet effective model called MVFNet. Moreover, MVFNet can be thought of as a generalized video modeling framework and it can specialize to be existing methods such as C2D, SlowOnly, and TSM under different settings. Extensive experiments are conducted on popular benchmarks (i.e., Something-Something V1 & V2, Kinetics, UCF-101, and HMDB-51) to show its superiority. The proposed MVFNet can achieve state-of-the-art performance with 2D CNN's complexity.
This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm which regularizes the community membership of neighboring nodes to be similar in the latent space. Experimental results on multiple real-world graphs show that vGraph is very effective in both community detection and node representation learning, outperforming many competitive baselines in both tasks. We show that the framework of vGraph is quite flexible and can be easily extended to detect hierarchical communities.
The cross-domain recommendation technique is an effective way of alleviating the data sparsity in recommender systems by leveraging the knowledge from relevant domains. Transfer learning is a class of algorithms underlying these techniques. In this paper, we propose a novel transfer learning approach for cross-domain recommendation by using neural networks as the base model. We assume that hidden layers in two base networks are connected by cross mappings, leading to the collaborative cross networks (CoNet). CoNet enables dual knowledge transfer across domains by introducing cross connections from one base network to another and vice versa. CoNet is achieved in multi-layer feedforward networks by adding dual connections and joint loss functions, which can be trained efficiently by back-propagation. The proposed model is evaluated on two real-world datasets and it outperforms baseline models by relative improvements of 3.56\% in MRR and 8.94\% in NDCG, respectively.