Remaining useful life prediction plays a crucial role in the health management of industrial systems. Given the increasing complexity of systems, data-driven predictive models have attracted significant research interest. Upon reviewing the existing literature, it appears that many studies either do not fully integrate both spatial and temporal features or employ only a single attention mechanism. Furthermore, there seems to be inconsistency in the choice of data normalization methods, particularly concerning operating conditions, which might influence predictive performance. To bridge these observations, this study presents the Spatio-Temporal Attention Graph Neural Network. Our model combines graph neural networks and temporal convolutional neural networks for spatial and temporal feature extraction, respectively. The cascade of these extractors, combined with multi-head attention mechanisms for both spatio-temporal dimensions, aims to improve predictive precision and refine model explainability. Comprehensive experiments were conducted on the C-MAPSS dataset to evaluate the impact of unified versus clustering normalization. The findings suggest that our model performs state-of-the-art results using only the unified normalization. Additionally, when dealing with datasets with multiple operating conditions, cluster normalization enhances the performance of our proposed model by up to 27%.
Temporal localization of driving actions plays a crucial role in advanced driver-assistance systems and naturalistic driving studies. However, this is a challenging task due to strict requirements for robustness, reliability and accurate localization. In this work, we focus on improving the overall performance by efficiently utilizing video action recognition networks and adapting these to the problem of action localization. To this end, we first develop a density-guided label smoothing technique based on label probability distributions to facilitate better learning from boundary video-segments that typically include multiple labels. Second, we design a post-processing step to efficiently fuse information from video-segments and multiple camera views into scene-level predictions, which facilitates elimination of false positives. Our methodology yields a competitive performance on the A2 test set of the naturalistic driving action recognition track of the 2022 NVIDIA AI City Challenge with an F1 score of 0.271.
Pedestrian trajectory prediction plays an important role in autonomous driving systems and robotics. Recent work utilizing prominent deep learning models for pedestrian motion prediction makes limited a priori assumptions about human movements, resulting in a lack of explainability and explicit constraints enforced on predicted trajectories. We present a dynamics-based deep learning framework with a novel asymptotically stable dynamical system integrated into a Transformer-based model. We use an asymptotically stable dynamical system to model human goal-targeted motion by enforcing the human walking trajectory, which converges to a predicted goal position, and to provide the Transformer model with prior knowledge and explainability. Our framework features the Transformer model that works with a goal estimator and dynamical system to learn features from pedestrian motion history. The results show that our framework outperforms prominent models using five benchmark human motion datasets.
Accurate state estimation plays a critical role in ensuring the robust control of humanoid robots, particularly in the context of learning-based control policies for legged robots. However, there is a notable gap in analytical research concerning estimations. Therefore, we endeavor to further understand how various types of estimations influence the decision-making processes of policies. In this paper, we provide quantitative insight into the effectiveness of learned state estimations, employing saliency analysis to identify key estimation variables and optimize their combination for humanoid locomotion tasks. Evaluations assessing tracking precision and robustness are conducted on comparative groups of policies with varying estimation combinations in both simulated and real-world environments. Results validated that the proposed policy is capable of crossing the sim-to-real gap and demonstrating superior performance relative to alternative policy configurations.
Correspondence matching plays a crucial role in numerous robotics applications. In comparison to conventional hand-crafted methods and recent data-driven approaches, there is significant interest in plug-and-play algorithms that make full use of pre-trained backbone networks for multi-scale feature extraction and leverage hierarchical refinement strategies to generate matched correspondences. The primary focus of this paper is to address the limitations of deep feature matching (DFM), a state-of-the-art (SoTA) plug-and-play correspondence matching approach. First, we eliminate the pre-defined threshold employed in the hierarchical refinement process of DFM by leveraging a more flexible nearest neighbor search strategy, thereby preventing the exclusion of repetitive yet valid matches during the early stages. Our second technical contribution is the integration of a patch descriptor, which extends the applicability of DFM to accommodate a wide range of backbone networks pre-trained across diverse computer vision tasks, including image classification, semantic segmentation, and stereo matching. Taking into account the practical applicability of our method in real-world robotics applications, we also propose a novel patch descriptor distillation strategy to further reduce the computational complexity of correspondence matching. Extensive experiments conducted on three public datasets demonstrate the superior performance of our proposed method. Specifically, it achieves an overall performance in terms of mean matching accuracy of 0.68, 0.92, and 0.95 with respect to the tolerances of 1, 3, and 5 pixels, respectively, on the HPatches dataset, outperforming all other SoTA algorithms. Our source code, demo video, and supplement are publicly available at mias.group/GCM.
Federated Learning (FL) plays a critical role in distributed systems. In these systems, data privacy and confidentiality hold paramount importance, particularly within edge-based data processing systems such as IoT devices deployed in smart homes. FL emerges as a privacy-enforcing sub-domain of machine learning that enables model training on client devices, eliminating the necessity to share private data with a central server. While existing research has predominantly addressed challenges pertaining to data heterogeneity, there remains a current gap in addressing issues such as varying device capabilities and efficient communication. These unaddressed issues raise a number of implications in resource-constrained environments. In particular, the practical implementation of FL-based IoT or edge systems is extremely inefficient. In this paper, we propose "Resource-Efficient Federated Training Framework for Heterogeneous and Resource-Constrained Environments (REFT)," a novel approach specifically devised to address these challenges in resource-limited devices. Our proposed method uses Variable Pruning to optimize resource utilization by adapting pruning strategies to the computational capabilities of each client. Furthermore, our proposed REFT technique employs knowledge distillation to minimize the need for continuous bidirectional client-server communication. This achieves a significant reduction in communication bandwidth, thereby enhancing the overall resource efficiency. We conduct experiments for an image classification task, and the results demonstrate the effectiveness of our approach in resource-limited settings. Our technique not only preserves data privacy and performance standards but also accommodates heterogeneous model architectures, facilitating the participation of a broader array of diverse client devices in the training process, all while consuming minimal bandwidth.
Data plays a fundamental role in the training of Large Language Models (LLMs). Effective data management, particularly in the formulation of a well-suited training dataset, holds significance for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning phases. Despite the considerable importance of data management, the current research community still falls short in providing a systematic analysis of the rationale behind management strategy selection, its consequential effects, methodologies for evaluating curated datasets, and the ongoing pursuit of improved strategies. Consequently, the exploration of data management has attracted more and more attention among the research community. This survey provides a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs, covering various noteworthy aspects of data management strategy design: data quantity, data quality, domain/task composition, etc. Looking toward the future, we extrapolate existing challenges and outline promising directions for development in this field. Therefore, this survey serves as a guiding resource for practitioners aspiring to construct powerful LLMs through effective data management practices. The collection of the latest papers is available at //github.com/ZigeW/data_management_LLM.
We present a large-scale study on unsupervised spatiotemporal representation learning from videos. With a unified perspective on four recent image-based frameworks, we study a simple objective that can easily generalize all these methods to space-time. Our objective encourages temporally-persistent features in the same video, and in spite of its simplicity, it works surprisingly well across: (i) different unsupervised frameworks, (ii) pre-training datasets, (iii) downstream datasets, and (iv) backbone architectures. We draw a series of intriguing observations from this study, e.g., we discover that encouraging long-spanned persistency can be effective even if the timespan is 60 seconds. In addition to state-of-the-art results in multiple benchmarks, we report a few promising cases in which unsupervised pre-training can outperform its supervised counterpart. Code is made available at //github.com/facebookresearch/SlowFast
Machine learning plays a role in many deployed decision systems, often in ways that are difficult or impossible to understand by human stakeholders. Explaining, in a human-understandable way, the relationship between the input and output of machine learning models is essential to the development of trustworthy machine-learning-based systems. A burgeoning body of research seeks to define the goals and methods of explainability in machine learning. In this paper, we seek to review and categorize research on counterfactual explanations, a specific class of explanation that provides a link between what could have happened had input to a model been changed in a particular way. Modern approaches to counterfactual explainability in machine learning draw connections to the established legal doctrine in many countries, making them appealing to fielded systems in high-impact areas such as finance and healthcare. Thus, we design a rubric with desirable properties of counterfactual explanation algorithms and comprehensively evaluate all currently-proposed algorithms against that rubric. Our rubric provides easy comparison and comprehension of the advantages and disadvantages of different approaches and serves as an introduction to major research themes in this field. We also identify gaps and discuss promising research directions in the space of counterfactual explainability.
Emotion plays an important role in detecting fake news online. When leveraging emotional signals, the existing methods focus on exploiting the emotions of news contents that conveyed by the publishers (i.e., publisher emotion). However, fake news is always fabricated to evoke high-arousal or activating emotions of people to spread like a virus, so the emotions of news comments that aroused by the crowd (i.e., social emotion) can not be ignored. Furthermore, it needs to be explored whether there exists a relationship between publisher emotion and social emotion (i.e., dual emotion), and how the dual emotion appears in fake news. In the paper, we propose Dual Emotion Features to mine dual emotion and the relationship between them for fake news detection. And we design a universal paradigm to plug it into any existing detectors as an enhancement. Experimental results on three real-world datasets indicate the effectiveness of the proposed features.
Few-shot Knowledge Graph (KG) completion is a focus of current research, where each task aims at querying unseen facts of a relation given its few-shot reference entity pairs. Recent attempts solve this problem by learning static representations of entities and references, ignoring their dynamic properties, i.e., entities may exhibit diverse roles within task relations, and references may make different contributions to queries. This work proposes an adaptive attentional network for few-shot KG completion by learning adaptive entity and reference representations. Specifically, entities are modeled by an adaptive neighbor encoder to discern their task-oriented roles, while references are modeled by an adaptive query-aware aggregator to differentiate their contributions. Through the attention mechanism, both entities and references can capture their fine-grained semantic meanings, and thus render more expressive representations. This will be more predictive for knowledge acquisition in the few-shot scenario. Evaluation in link prediction on two public datasets shows that our approach achieves new state-of-the-art results with different few-shot sizes.