Effectively aligning Large Language Models (LLMs) with human-centric values while preventing the degradation of abilities acquired through Pre-training and Supervised Fine-tuning (SFT) poses a central challenge in Reinforcement Learning from Human Feedback (RLHF). In this paper, we first discover that interpolating RLHF and SFT model parameters can adjust the trade-off between human preference and basic capabilities, thereby reducing the alignment tax at the cost of alignment reward. Inspired by this, we propose integrating the RL policy and SFT models at each optimization step in RLHF to continuously regulate the training direction, introducing the Online Merging Optimizer. Specifically, we merge gradients with the parameter differences between SFT and pretrained models, effectively steering the gradient towards maximizing rewards in the direction of SFT optimization. We demonstrate that our optimizer works well with different LLM families, such as Qwen and LLaMA, across various model sizes ranging from 1.8B to 8B, various RLHF algorithms like DPO and KTO, and existing model merging methods. It significantly enhances alignment reward while mitigating alignment tax, achieving higher overall performance across 14 benchmarks.
In-situ sensing, in conjunction with learning models, presents a unique opportunity to address persistent defect issues in Additive Manufacturing (AM) processes. However, this integration introduces significant data privacy concerns, such as data leakage, sensor data compromise, and model inversion attacks, revealing critical details about part design, material composition, and machine parameters. Differential Privacy (DP) models, which inject noise into data under mathematical guarantees, offer a nuanced balance between data utility and privacy by obscuring traces of sensing data. However, the introduction of noise into learning models, often functioning as black boxes, complicates the prediction of how specific noise levels impact model accuracy. This study introduces the Differential Privacy-HyperDimensional computing (DP-HD) framework, leveraging the explainability of the vector symbolic paradigm to predict the noise impact on the accuracy of in-situ monitoring, safeguarding sensitive data while maintaining operational efficiency. Experimental results on real-world high-speed melt pool data of AM for detecting overhang anomalies demonstrate that DP-HD achieves superior operational efficiency, prediction accuracy, and robust privacy protection, outperforming state-of-the-art Machine Learning (ML) models. For example, when implementing the same level of privacy protection (with a privacy budget set at 1), our model achieved an accuracy of 94.43\%, surpassing the performance of traditional models such as ResNet50 (52.30\%), GoogLeNet (23.85\%), AlexNet (55.78\%), DenseNet201 (69.13\%), and EfficientNet B2 (40.81\%). Notably, DP-HD maintains high performance under substantial noise additions designed to enhance privacy, unlike current models that suffer significant accuracy declines under high privacy constraints.
Compared with visual signals, Inertial Measurement Units (IMUs) placed on human limbs can capture accurate motion signals while being robust to lighting variation and occlusion. While these characteristics are intuitively valuable to help egocentric action recognition, the potential of IMUs remains under-explored. In this work, we present a novel method for action recognition that integrates motion data from body-worn IMUs with egocentric video. Due to the scarcity of labeled multimodal data, we design an MAE-based self-supervised pretraining method, obtaining strong multi-modal representations via modeling the natural correlation between visual and motion signals. To model the complex relation of multiple IMU devices placed across the body, we exploit the collaborative dynamics in multiple IMU devices and propose to embed the relative motion features of human joints into a graph structure. Experiments show our method can achieve state-of-the-art performance on multiple public datasets. The effectiveness of our MAE-based pretraining and graph-based IMU modeling are further validated by experiments in more challenging scenarios, including partially missing IMU devices and video quality corruption, promoting more flexible usages in the real world.
Classification of different object surface material types can play a significant role in the decision-making algorithms for mobile robots and autonomous vehicles. RGB-based scene-level semantic segmentation has been well-addressed in the literature. However, improving material recognition using the depth modality and its integration with SLAM algorithms for 3D semantic mapping could unlock new potential benefits in the robotics perception pipeline. To this end, we propose a complementarity-aware deep learning approach for RGB-D-based material classification built on top of an object-oriented pipeline. The approach further integrates the ORB-SLAM2 method for 3D scene mapping with multiscale clustering of the detected material semantics in the point cloud map generated by the visual SLAM algorithm. Extensive experimental results with existing public datasets and newly contributed real-world robot datasets demonstrate a significant improvement in material classification and 3D clustering accuracy compared to state-of-the-art approaches for 3D semantic scene mapping.
In the field of Sequential Decision Making (SDM), two paradigms have historically vied for supremacy: Automated Planning (AP) and Reinforcement Learning (RL). In the spirit of reconciliation, this article reviews AP, RL and hybrid methods (e.g., novel learn to plan techniques) for solving Sequential Decision Processes (SDPs), focusing on their knowledge representation: symbolic, subsymbolic, or a combination. Additionally, it also covers methods for learning the SDP structure. Finally, we compare the advantages and drawbacks of the existing methods and conclude that neurosymbolic AI poses a promising approach for SDM, since it combines AP and RL with a hybrid knowledge representation.
Solving continuous variable optimization problems by factorization machine quantum annealing (FMQA) demonstrates the potential of Ising machines to be extended as a solver for integer and real optimization problems. However, the details of the Hamiltonian function surface obtained by factorization machine (FM) have been overlooked. This study shows that in the widely common case where real numbers are represented by a combination of binary variables, the function surface of the Hamiltonian obtained by FM can be very noisy. This noise interferes with the inherent capabilities of quantum annealing and is likely to be a substantial cause of problems previously considered unsolvable due to the limitations of FMQA performance. The origin of the noise is identified and a simple, general method is proposed to prevent its occurrence. The generalization performance of the proposed method and its ability to solve practical problems is demonstrated.
Group Equivariant CNNs (G-CNNs) have shown promising efficacy in various tasks, owing to their ability to capture hierarchical features in an equivariant manner. However, their equivariance is fixed to the symmetry of the whole group, limiting adaptability to diverse partial symmetries in real-world datasets, such as limited rotation symmetry of handwritten digit images and limited color-shift symmetry of flower images. Recent efforts address this limitation, one example being Partial G-CNN which restricts the output group space of convolution layers to break full equivariance. However, such an approach still fails to adjust equivariance levels across data. In this paper, we propose a novel approach, Variational Partial G-CNN (VP G-CNN), to capture varying levels of partial equivariance specific to each data instance. VP G-CNN redesigns the distribution of the output group elements to be conditioned on input data, leveraging variational inference to avoid overfitting. This enables the model to adjust its equivariance levels according to the needs of individual data points. Additionally, we address training instability inherent in discrete group equivariance models by redesigning the reparametrizable distribution. We demonstrate the effectiveness of VP G-CNN on both toy and real-world datasets, including MNIST67-180, CIFAR10, ColorMNIST, and Flowers102. Our results show robust performance, even in uncertainty metrics.
Practical optimization problems may contain different kinds of difficulties that are often not tractable if one relies on a particular optimization method. Different optimization approaches offer different strengths that are good at tackling one or more difficulty in an optimization problem. For instance, evolutionary algorithms have a niche in handling complexities like discontinuity, non-differentiability, discreteness and non-convexity. However, evolutionary algorithms may get computationally expensive for mathematically well behaved problems with large number of variables for which classical mathematical programming approaches are better suited. In this paper, we demonstrate a decomposition strategy that allows us to synergistically apply two complementary approaches at the same time on a complex optimization problem. Evolutionary algorithms are useful in this context as their flexibility makes pairing with other solution approaches easy. The decomposition idea is a special case of bilevel optimization that separates the difficulties into two levels and assigns different approaches at each level that is better equipped at handling them. We demonstrate the benefits of the proposed decomposition idea on a wide range of test problems.
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.
Emotion recognition in conversation (ERC) aims to detect the emotion label for each utterance. Motivated by recent studies which have proven that feeding training examples in a meaningful order rather than considering them randomly can boost the performance of models, we propose an ERC-oriented hybrid curriculum learning framework. Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC). In CC, we construct a difficulty measurer based on "emotion shift" frequency within a conversation, then the conversations are scheduled in an "easy to hard" schema according to the difficulty score returned by the difficulty measurer. For UC, it is implemented from an emotion-similarity perspective, which progressively strengthens the model's ability in identifying the confusing emotions. With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models and we are able to achieve new state-of-the-art results on four public ERC datasets.
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.