Exploring unknown environments efficiently is a fundamental challenge in unsupervised goal-conditioned reinforcement learning. While selecting exploratory goals at the frontier of previously explored states is an effective strategy, the policy during training may still have limited capability of reaching rare goals on the frontier, resulting in reduced exploratory behavior. We propose "Cluster Edge Exploration" ($CE^2$), a new goal-directed exploration algorithm that when choosing goals in sparsely explored areas of the state space gives priority to goal states that remain accessible to the agent. The key idea is clustering to group states that are easily reachable from one another by the current policy under training in a latent space and traversing to states holding significant exploration potential on the boundary of these clusters before doing exploratory behavior. In challenging robotics environments including navigating a maze with a multi-legged ant robot, manipulating objects with a robot arm on a cluttered tabletop, and rotating objects in the palm of an anthropomorphic robotic hand, $CE^2$ demonstrates superior efficiency in exploration compared to baseline methods and ablations.
Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces $\textbf{La}$nguage Models for $\textbf{Mo}$tion Control ($\textbf{LaMo}$), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate $\textbf{LaMo}$ achieves excellent performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples.
This study delved into the viability of Robot-supported flipped classes in English for Medical Purposes reading comprehension. In a 16-session course, the reading comprehension and then workspace performance of 444 students, with Commercially-Off-The-Shelf and Self-Generated robot flipped classes were compared. The results indicated that the flipped classes brought about a good instructional-learning ambience in postsecondary education for English for Medical Purposes (EMP) reading comprehension and adopting proactive approach for workspace performance. In tandem, the Mixed Effect Model revealed that student participation in the self-generated robot-supported flipped classes yielded a larger effect size (+17.6%) than Commercially-Off-The-Shelf robot-supported flipped classes. Analyses produced five contributing moderators of EMP reading comprehension and workspace performance: reading proficiency, attitude, manner of practicing, as well as student and teacher role.
The primary promise of decentralized learning is to allow users to engage in the training of machine learning models in a collaborative manner while keeping their data on their premises and without relying on any central entity. However, this paradigm necessitates the exchange of model parameters or gradients between peers. Such exchanges can be exploited to infer sensitive information about training data, which is achieved through privacy attacks (e.g Membership Inference Attacks -- MIA). In order to devise effective defense mechanisms, it is important to understand the factors that increase/reduce the vulnerability of a given decentralized learning architecture to MIA. In this study, we extensively explore the vulnerability to MIA of various decentralized learning architectures by varying the graph structure (e.g number of neighbors), the graph dynamics, and the aggregation strategy, across diverse datasets and data distributions. Our key finding, which to the best of our knowledge we are the first to report, is that the vulnerability to MIA is heavily correlated to (i) the local model mixing strategy performed by each node upon reception of models from neighboring nodes and (ii) the global mixing properties of the communication graph. We illustrate these results experimentally using four datasets and by theoretically analyzing the mixing properties of various decentralized architectures. Our paper draws a set of lessons learned for devising decentralized learning systems that reduce by design the vulnerability to MIA.
Q-learning is a widely used reinforcement learning technique for solving path planning problems. It primarily involves the interaction between an agent and its environment, enabling the agent to learn an optimal strategy that maximizes cumulative rewards. Although many studies have reported the effectiveness of Q-learning, it still faces slow convergence issues in practical applications. To address this issue, we propose the NDR-QL method, which utilizes neural network outputs as heuristic information to accelerate the convergence process of Q-learning. Specifically, we improved the dual-output neural network model by introducing a start-end channel separation mechanism and enhancing the feature fusion process. After training, the proposed NDR model can output a narrowly focused optimal probability distribution, referred to as the guideline, and a broadly distributed suboptimal distribution, referred to as the region. Subsequently, based on the guideline prediction, we calculate the continuous reward function for the Q-learning method, and based on the region prediction, we initialize the Q-table with a bias. We conducted training, validation, and path planning simulation experiments on public datasets. The results indicate that the NDR model outperforms previous methods by up to 5\% in prediction accuracy. Furthermore, the proposed NDR-QL method improves the convergence speed of the baseline Q-learning method by 90\% and also surpasses the previously improved Q-learning methods in path quality metrics.
In an educational setting, an estimate of the difficulty of multiple-choice questions (MCQs), a commonly used strategy to assess learning progress, constitutes very useful information for both teachers and students. Since human assessment is costly from multiple points of view, automatic approaches to MCQ item difficulty estimation are investigated, yielding however mixed success until now. Our approach to this problem takes a different angle from previous work: asking various Large Language Models to tackle the questions included in two different MCQ datasets, we leverage model uncertainty to estimate item difficulty. By using both model uncertainty features as well as textual features in a Random Forest regressor, we show that uncertainty features contribute substantially to difficulty prediction, where difficulty is inversely proportional to the number of students who can correctly answer a question. In addition to showing the value of our approach, we also observe that our model achieves state-of-the-art results on the BEA publicly available dataset.
Bidirectional privacy-preservation federated learning is crucial as both local gradients and the global model may leak privacy. However, only a few works attempt to achieve it, and they often face challenges such as excessive communication and computational overheads, or significant degradation of model accuracy, which hinders their practical applications. In this paper, we design an efficient and high-accuracy bidirectional privacy-preserving scheme for federated learning to complete secure model training and secure aggregation. To efficiently achieve bidirectional privacy, we design an efficient and accuracy-lossless model perturbation method on the server side (called $\mathbf{MP\_Server}$) that can be combined with local differential privacy (LDP) to prevent clients from accessing the model, while ensuring that the local gradients obtained on the server side satisfy LDP. Furthermore, to ensure model accuracy, we customize a distributed differential privacy mechanism on the client side (called $\mathbf{DDP\_Client}$). When combined with $\mathbf{MP\_Server}$, it ensures LDP of the local gradients, while ensuring that the aggregated result matches the accuracy of central differential privacy (CDP). Extensive experiments demonstrate that our scheme significantly outperforms state-of-the-art bidirectional privacy-preservation baselines (SOTAs) in terms of computational cost, model accuracy, and defense ability against privacy attacks. Particularly, given target accuracy, the training time of SOTAs is approximately $200$ times, or even over $1000$ times, longer than that of our scheme. When the privacy budget is set relatively small, our scheme incurs less than $6\%$ accuracy loss compared to the privacy-ignoring method, while SOTAs suffer up to $20\%$ accuracy loss. Experimental results also show that the defense capability of our scheme outperforms than SOTAs.
Federated averaging (FedAvg) is the most fundamental algorithm in Federated learning (FL). Previous theoretical results assert that FedAvg convergence and generalization degenerate under heterogeneous clients. However, recent empirical results show that FedAvg can perform well in many real-world heterogeneous tasks. These results reveal an inconsistency between FL theory and practice that is not fully explained. In this paper, we show that common heterogeneity measures contribute to this inconsistency based on rigorous convergence analysis. Furthermore, we introduce a new measure \textit{client consensus dynamics} and prove that \textit{FedAvg can effectively handle client heterogeneity when an appropriate aggregation strategy is used}. Building on this theoretical insight, we present a simple and effective FedAvg variant termed FedAWARE. Extensive experiments on three datasets and two modern neural network architectures demonstrate that FedAWARE ensures faster convergence and better generalization in heterogeneous client settings. Moreover, our results show that FedAWARE can significantly enhance the generalization performance of advanced FL algorithms when used as a plug-in module.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing, and accumulating new knowledge coming at different time steps, continual learning is considered to have high practical significance. Hence, continual learning has been studied in various artificial intelligence tasks. In this paper, we present a comprehensive review of the recent progress of continual learning in computer vision. In particular, the works are grouped by their representative techniques, including regularization, knowledge distillation, memory, generative replay, parameter isolation, and a combination of the above techniques. For each category of these techniques, both its characteristics and applications in computer vision are presented. At the end of this overview, several subareas, where continuous knowledge accumulation is potentially helpful while continual learning has not been well studied, are discussed.
Despite its great success, machine learning can have its limits when dealing with insufficient training data. A potential solution is the additional integration of prior knowledge into the training process which leads to the notion of informed machine learning. In this paper, we present a structured overview of various approaches in this field. We provide a definition and propose a concept for informed machine learning which illustrates its building blocks and distinguishes it from conventional machine learning. We introduce a taxonomy that serves as a classification framework for informed machine learning approaches. It considers the source of knowledge, its representation, and its integration into the machine learning pipeline. Based on this taxonomy, we survey related research and describe how different knowledge representations such as algebraic equations, logic rules, or simulation results can be used in learning systems. This evaluation of numerous papers on the basis of our taxonomy uncovers key methods in the field of informed machine learning.