In this paper, we study the problem of multi-reward reinforcement learning to jointly optimize for multiple text qualities for natural language generation. We focus on the task of counselor reflection generation, where we optimize the generators to simultaneously improve the fluency, coherence, and reflection quality of generated counselor responses. We introduce two novel bandit methods, DynaOpt and C-DynaOpt, which rely on the broad strategy of combining rewards into a single value and optimizing them simultaneously. Specifically, we employ non-contextual and contextual multi-arm bandits to dynamically adjust multiple reward weights during training. Through automatic and manual evaluations, we show that our proposed techniques, DynaOpt and C-DynaOpt, outperform existing naive and bandit baselines, showcasing their potential for enhancing language models.
In this study, we propose a multitask reinforcement learning algorithm for foundational policy acquisition to generate novel motor skills. Inspired by human sensorimotor adaptation mechanisms, we aim to train encoder-decoder networks that can be commonly used to learn novel motor skills in a single movement category. To train the policy network, we develop the multitask reinforcement learning method, where the policy needs to cope with changes in goals or environments with different reward functions or physical parameters of the environment in dynamic movement generation tasks. Here, as a concrete task, we evaluated the proposed method with the ball heading task using a monopod robot model. The results showed that the proposed method could adapt to novel target positions or inexperienced ball restitution coefficients. Furthermore, we demonstrated that the acquired foundational policy network originally learned for heading motion, can be used to generate an entirely new overhead kicking skill.
We study the problem of learning a binary classifier on the vertices of a graph. In particular, we consider classifiers given by monophonic halfspaces, partitions of the vertices that are convex in a certain abstract sense. Monophonic halfspaces, and related notions such as geodesic halfspaces,have recently attracted interest, and several connections have been drawn between their properties(e.g., their VC dimension) and the structure of the underlying graph $G$. We prove several novel results for learning monophonic halfspaces in the supervised, online, and active settings. Our main result is that a monophonic halfspace can be learned with near-optimal passive sample complexity in time polynomial in $n = |V(G)|$. This requires us to devise a polynomial-time algorithm for consistent hypothesis checking, based on several structural insights on monophonic halfspaces and on a reduction to $2$-satisfiability. We prove similar results for the online and active settings. We also show that the concept class can be enumerated with delay $\operatorname{poly}(n)$, and that empirical risk minimization can be performed in time $2^{\omega(G)}\operatorname{poly}(n)$ where $\omega(G)$ is the clique number of $G$. These results answer open questions from the literature (Gonz\'alez et al., 2020), and show a contrast with geodesic halfspaces, for which some of the said problems are NP-hard (Seiffarth et al., 2023).
In this paper, we present a novel approach that combines deep metric learning and synthetic data generation using diffusion models for out-of-distribution (OOD) detection. One popular approach for OOD detection is outlier exposure, where models are trained using a mixture of in-distribution (ID) samples and ``seen" OOD samples. For the OOD samples, the model is trained to minimize the KL divergence between the output probability and the uniform distribution while correctly classifying the in-distribution (ID) data. In this paper, we propose a label-mixup approach to generate synthetic OOD data using Denoising Diffusion Probabilistic Models (DDPMs). Additionally, we explore recent advancements in metric learning to train our models. In the experiments, we found that metric learning-based loss functions perform better than the softmax. Furthermore, the baseline models (including softmax, and metric learning) show a significant improvement when trained with the generated OOD data. Our approach outperforms strong baselines in conventional OOD detection metrics.
In this paper, we study the performance of few-shot learning, specifically meta learning empowered few-shot relation networks, over supervised deep learning and conventional machine learning approaches in the problem of Sound Source Distance Estimation (SSDE). In previous research on deep supervised SSDE, low accuracies have often resulted from the mismatch between the training data (from known environments) and the test data (from unknown environments). By performing comparative experiments on a sufficient amount of data, we show that the few-shot relation network outperforms other competitors including eXtreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Convolutional Neural Network (CNN), and MultiLayer Perceptron (MLP). Hence it is possible to calibrate a microphone-equipped system, with a few labeled samples of audio recorded in a particular unknown environment to adjust and generalize our classifier to the possible input data and gain higher accuracies.
In this paper, the problem of spectral-efficient communication and computation resource allocation for distributed reconfigurable intelligent surfaces (RISs) assisted probabilistic semantic communication (PSC) in industrial Internet-of-Things (IIoT) is investigated. In the considered model, multiple RISs are deployed to serve multiple users, while PSC adopts compute-then-transmit protocol to reduce the transmission data size. To support high-rate transmission, the semantic compression ratio, transmit power allocation, and distributed RISs deployment must be jointly considered. This joint communication and computation problem is formulated as an optimization problem whose goal is to maximize the sum semantic-aware transmission rate of the system under total transmit power, phase shift, RIS-user association, and semantic compression ratio constraints. To solve this problem, a many-to-many matching scheme is proposed to solve the RIS-user association subproblem, the semantic compression ratio subproblem is addressed following greedy policy, while the phase shift of RIS can be optimized using the tensor based beamforming. Numerical results verify the superiority of the proposed algorithm.
In this paper, we propose an accurate data-free post-training quantization framework of diffusion models (ADP-DM) for efficient image generation. Conventional data-free quantization methods learn shared quantization functions for tensor discretization regardless of the generation timesteps, while the activation distribution differs significantly across various timesteps. The calibration images are acquired in random timesteps which fail to provide sufficient information for generalizable quantization function learning. Both issues cause sizable quantization errors with obvious image generation performance degradation. On the contrary, we design group-wise quantization functions for activation discretization in different timesteps and sample the optimal timestep for informative calibration image generation, so that our quantized diffusion model can reduce the discretization errors with negligible computational overhead. Specifically, we partition the timesteps according to the importance weights of quantization functions in different groups, which are optimized by differentiable search algorithms. We also select the optimal timestep for calibration image generation by structural risk minimizing principle in order to enhance the generalization ability in the deployment of quantized diffusion model. Extensive experimental results show that our method outperforms the state-of-the-art post-training quantization of diffusion model by a sizable margin with similar computational cost.
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. To this end, we propose to utilize prompt learning and mitigate the above two challenges together. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model. We further explore the effect of different prompt configurations and analyze the robustness to missing modality. Extensive experiments are conducted to show the effectiveness of our prompt learning framework that improves the performance under various missing-modality cases, while alleviating the requirement of heavy model re-training. Code is available.
Non-IID data present a tough challenge for federated learning. In this paper, we explore a novel idea of facilitating pairwise collaborations between clients with similar data. We propose FedAMP, a new method employing federated attentive message passing to facilitate similar clients to collaborate more. We establish the convergence of FedAMP for both convex and non-convex models, and propose a heuristic method to further improve the performance of FedAMP when clients adopt deep neural networks as personalized models. Our extensive experiments on benchmark data sets demonstrate the superior performance of the proposed methods.
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.