The wider application of end-to-end learning methods to embodied decision-making domains remains bottlenecked by their reliance on a superabundance of training data representative of the target domain. Meta-reinforcement learning (meta-RL) approaches abandon the aim of zero-shot generalization--the goal of standard reinforcement learning (RL)--in favor of few-shot adaptation, and thus hold promise for bridging larger generalization gaps. While learning this meta-level adaptive behavior still requires substantial data, efficient environment simulators approaching real-world complexity are growing in prevalence. Even so, hand-designing sufficiently diverse and numerous simulated training tasks for these complex domains is prohibitively labor-intensive. Domain randomization (DR) and procedural generation (PG), offered as solutions to this problem, require simulators to possess carefully-defined parameters which directly translate to meaningful task diversity--a similarly prohibitive assumption. In this work, we present DIVA, an evolutionary approach for generating diverse training tasks in such complex, open-ended simulators. Like unsupervised environment design (UED) methods, DIVA can be applied to arbitrary parameterizations, but can additionally incorporate realistically-available domain knowledge--thus inheriting the flexibility and generality of UED, and the supervised structure embedded in well-designed simulators exploited by DR and PG. Our empirical results showcase DIVA's unique ability to overcome complex parameterizations and successfully train adaptive agent behavior, far outperforming competitive baselines from prior literature. These findings highlight the potential of such semi-supervised environment design (SSED) approaches, of which DIVA is the first humble constituent, to enable training in realistic simulated domains, and produce more robust and capable adaptive agents.
Time-evolving graphs arise frequently when modeling complex dynamical systems such as social networks, traffic flow, and biological processes. Developing techniques to identify and analyze communities in these time-varying graph structures is an important challenge. In this work, we generalize existing spectral clustering algorithms from static to dynamic graphs using canonical correlation analysis (CCA) to capture the temporal evolution of clusters. Based on this extended canonical correlation framework, we define the spatio-temporal graph Laplacian and investigate its spectral properties. We connect these concepts to dynamical systems theory via transfer operators, and illustrate the advantages of our method on benchmark graphs by comparison with existing methods. We show that the spatio-temporal graph Laplacian allows for a clear interpretation of cluster structure evolution over time for directed and undirected graphs.
This exploratory pilot study investigated the potential of combining a domain-specific model, BERN2, with large language models (LLMs) to enhance automated disease phenotyping from research survey data. Motivated by the need for efficient and accurate methods to harmonize the growing volume of survey data with standardized disease ontologies, we employed BERN2, a biomedical named entity recognition and normalization model, to extract disease information from the ORIGINS birth cohort survey data. After rigorously evaluating BERN2's performance against a manually curated ground truth dataset, we integrated various LLMs using prompt engineering, Retrieval-Augmented Generation (RAG), and Instructional Fine-Tuning (IFT) to refine the model's outputs. BERN2 demonstrated high performance in extracting and normalizing disease mentions, and the integration of LLMs, particularly with Few Shot Inference and RAG orchestration, further improved accuracy. This approach, especially when incorporating structured examples, logical reasoning prompts, and detailed context, offers a promising avenue for developing tools to enable efficient cohort profiling and data harmonization across large, heterogeneous research datasets.
A recently proposed scheme utilizing local noise addition and matrix masking enables data collection while protecting individual privacy from all parties, including the central data manager. Statistical analysis of such privacy-preserved data is particularly challenging for nonlinear models like logistic regression. By leveraging a relationship between logistic regression and linear regression estimators, we propose the first valid statistical analysis method for logistic regression under this setting. Theoretical analysis of the proposed estimators confirmed its validity under an asymptotic framework with increasing noise magnitude to account for strict privacy requirements. Simulations and real data analyses demonstrate the superiority of the proposed estimators over naive logistic regression methods on privacy-preserved data sets.
This study explores the application of the rate-splitting multiple access (RSMA) technique, vital for interference mitigation in modern communication systems. It investigates the use of precoding methods in RSMA, especially in complex multiple-antenna interference channels, employing deep reinforcement learning. The aim is to optimize precoders and power allocation for common and private data streams involving multiple decision-makers. A multi-agent deep deterministic policy gradient (MADDPG) framework is employed to address this complexity, where decentralized agents collectively learn to optimize actions in a continuous policy space. We also explore the challenges posed by imperfect channel side information at the transmitter. Additionally, decoding order estimation is addressed to determine the optimal decoding sequence for common and private data sequences. Simulation results demonstrate the effectiveness of the proposed RSMA method based on MADDPG, achieving the upper bound in single-antenna scenarios and closely approaching theoretical limits in multi-antenna scenarios. Comparative analysis shows superiority over other techniques such as MADDPG without rate-splitting, maximal ratio transmission (MRT), zero-forcing (ZF), and leakage-based precoding methods. These findings highlight the potential of deep reinforcement learning-driven RSMA in reducing interference and enhancing system performance in communication systems.
Connected and autonomous vehicles (CAVs) offload computationally intensive tasks to multi-access edge computing (MEC) servers via vehicle-to-infrastructure (V2I) communication, enabling applications within the vehicular metaverse, which transforms physical environment into the digital space enabling advanced analysis or predictive modeling. A core challenge is physical-to-virtual (P2V) synchronization through digital twins (DTs), reliant on MEC networks and ultra-reliable low-latency communication (URLLC). To address this, we introduce radiance field (RF) delta video compression (RFDVC), which uses RF-encoder and RF-decoder architecture using distributed RFs as DTs storing photorealistic 3D urban scenes in compressed form. This method extracts differences between CAV-frame capturing actual traffic and RF-frame capturing empty scene from the same camera pose in batches encoded and transmitted over the MEC network. Experiments show data savings up to 71% against H.264 codec and 44% against H.265 codec under different conditions as lighting changes, and rain. RFDVC also demonstrates resilience to transmission errors, achieving up to +0.29 structural similarity index measure (SSIM) improvement at block error rate (BLER) = 0.35 in non-rainy and +0.25 at BLER = 0.2 in rainy conditions, ensuring superior visual quality compared to standard video coding (VC) methods across various conditions.
This paper investigates the use of multi-agent reinforcement learning (MARL) to address distributed channel access in wireless local area networks. In particular, we consider the challenging yet more practical case where the agents heterogeneously adopt value-based or policy-based reinforcement learning algorithms to train the model. We propose a heterogeneous MARL training framework, named QPMIX, which adopts a centralized training with distributed execution paradigm to enable heterogeneous agents to collaborate. Moreover, we theoretically prove the convergence of the proposed heterogeneous MARL method when using the linear value function approximation. Our method maximizes the network throughput and ensures fairness among stations, therefore, enhancing the overall network performance. Simulation results demonstrate that the proposed QPMIX algorithm improves throughput, mean delay, delay jitter, and collision rates compared with conventional carrier-sense multiple access with collision avoidance in the saturated traffic scenario. Furthermore, the QPMIX is shown to be robust in unsaturated and delay-sensitive traffic scenarios, and promotes cooperation among heterogeneous agents.
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enhancing the accuracy and reliability of their outputs through the use of real-world data. As RAG grows in complexity and incorporates multiple concepts that can influence its performance, this paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation, offering a detailed perspective from the retrieval viewpoint. It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies. Additionally, the paper introduces evaluation methods for RAG, addressing the challenges faced and proposing future research directions. By offering an organized framework and categorization, the study aims to consolidate existing research on RAG, clarify its technological underpinnings, and highlight its potential to broaden the adaptability and applications of LLMs.
In semi-supervised domain adaptation, a few labeled samples per class in the target domain guide features of the remaining target samples to aggregate around them. However, the trained model cannot produce a highly discriminative feature representation for the target domain because the training data is dominated by labeled samples from the source domain. This could lead to disconnection between the labeled and unlabeled target samples as well as misalignment between unlabeled target samples and the source domain. In this paper, we propose a novel approach called Cross-domain Adaptive Clustering to address this problem. To achieve both inter-domain and intra-domain adaptation, we first introduce an adversarial adaptive clustering loss to group features of unlabeled target data into clusters and perform cluster-wise feature alignment across the source and target domains. We further apply pseudo labeling to unlabeled samples in the target domain and retain pseudo-labels with high confidence. Pseudo labeling expands the number of ``labeled" samples in each class in the target domain, and thus produces a more robust and powerful cluster core for each class to facilitate adversarial learning. Extensive experiments on benchmark datasets, including DomainNet, Office-Home and Office, demonstrate that our proposed approach achieves the state-of-the-art performance in semi-supervised domain adaptation.
Leveraging datasets available to learn a model with high generalization ability to unseen domains is important for computer vision, especially when the unseen domain's annotated data are unavailable. We study a novel and practical problem of Open Domain Generalization (OpenDG), which learns from different source domains to achieve high performance on an unknown target domain, where the distributions and label sets of each individual source domain and the target domain can be different. The problem can be generally applied to diverse source domains and widely applicable to real-world applications. We propose a Domain-Augmented Meta-Learning framework to learn open-domain generalizable representations. We augment domains on both feature-level by a new Dirichlet mixup and label-level by distilled soft-labeling, which complements each domain with missing classes and other domain knowledge. We conduct meta-learning over domains by designing new meta-learning tasks and losses to preserve domain unique knowledge and generalize knowledge across domains simultaneously. Experiment results on various multi-domain datasets demonstrate that the proposed Domain-Augmented Meta-Learning (DAML) outperforms prior methods for unseen domain recognition.
We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is trained to minimize perplexity, an automatic metric that we compare against human judgement of multi-turn conversation quality. To capture this judgement, we propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of good conversation. Interestingly, our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher than the next highest scoring chatbot that we evaluated.