Subword tokenization is a common method for vocabulary building in Neural Machine Translation (NMT) models. However, increasingly complex tasks have revealed its disadvantages. First, a vocabulary cannot be modified once it is learned, making it hard to adapt to new words. Second, in multilingual translation, the imbalance in data volumes across different languages spreads to the vocabulary, exacerbating translations involving low-resource languages. While byte-based tokenization addresses these issues, byte-based models struggle with the low information density inherent in UTF-8 byte sequences. Previous works enhance token semantics through local contextualization but fail to select an appropriate contextualizing scope based on the input. Consequently, we propose the Multi-Scale Contextualization (MSC) method, which learns contextualized information of varying scales across different hidden state dimensions. It then leverages the attention module to dynamically integrate the multi-scale contextualized information. Experiments show that MSC significantly outperforms subword-based and other byte-based methods in both multilingual and out-of-domain scenarios. Code can be found in //github.com/ictnlp/Multiscale-Contextualization.
The robotic intervention for individuals with Autism Spectrum Disorder (ASD) has generally used pre-defined scripts to deliver verbal content during one-to-one therapy sessions. This practice restricts the use of robots to limited, pre-mediated instructional curricula. In this paper, we increase robot autonomy in one such robotic intervention for children with ASD by implementing perspective-taking teaching. Our approach uses large language models (LLM) to generate verbal content as texts and then deliver it to the child via robotic speech. In the proposed pipeline, we teach perspective-taking through which our robot takes up three roles: initiator, prompter, and reinforcer. We adopted the GPT-2 + BART pipelines to generate social situations, ask questions (as initiator), and give options (as prompter) when required. The robot encourages the child by giving positive reinforcement for correct answers (as a reinforcer). In addition to our technical contribution, we conducted ten-minute sessions with domain experts simulating an actual perspective teaching session, with the researcher acting as a child participant. These sessions validated our robotic intervention pipeline through surveys, including those from NASA TLX and GodSpeed. We used BERTScore to compare our GPT-2 + BART pipeline with an all GPT-2 and found the performance of the former to be better. Based on the responses by the domain experts, the robot session demonstrated higher performance with no additional increase in mental or physical demand, temporal demand, effort, or frustration compared to a no-robot session. We also concluded that the domain experts perceived the robot as ideally safe, likable, and reliable.
Among the existing approaches for visual-based Recommender System (RS) explainability, utilizing user-uploaded item images as efficient, trustable explanations is a promising option. However, current models following this paradigm assume that, for any user, all images uploaded by other users can be considered negative training examples (i.e. bad explanatory images), an inadvertedly naive labelling assumption that contradicts the rationale of the approach. This work proposes a new explainer training pipeline by leveraging Positive-Unlabelled (PU) Learning techniques to train image-based explainer with refined subsets of reliable negative examples for each user selected through a novel user-personalized, two-step, similarity-based PU Learning algorithm. Computational experiments show this PU-based approach outperforms the state-of-the-art non-PU method in six popular real-world datasets, proving that an improvement of visual-based RS explainability can be achieved by maximizing training data quality rather than increasing model complexity.
Deep learning (DL) has successfully automated dose distribution prediction in radiotherapy planning, enhancing both efficiency and quality. However, existing methods suffer from the over-smoothing problem for their commonly used L1 or L2 loss with posterior average calculations. To alleviate this limitation, we propose a diffusion model-based method (DiffDose) for predicting the radiotherapy dose distribution of cancer patients. Specifically, the DiffDose model contains a forward process and a reverse process. In the forward process, DiffDose transforms dose distribution maps into pure Gaussian noise by gradually adding small noise and a noise predictor is simultaneously trained to estimate the noise added at each timestep. In the reverse process, it removes the noise from the pure Gaussian noise in multiple steps with the well-trained noise predictor and finally outputs the predicted dose distribution maps...
We present and evaluate a method called grammar masking, which is used to guide large language models (LLMs) toward producing syntactically correct models for a given context-free grammar. Prompt engineering methods such as few-shot learning or priming can be used to improve the chances of an LLM producing correct syntax, but the more complex the grammar, the more time-consuming and less promising these methods become. Previous work is focused primarily on the usage of either language model training or prompt engineering. In this work, a method is presented that restricts the output to a given grammar using constrained decoding to ensure the output adheres to a valid syntax. We use several DSLs built with MontiCore and task multiple LLMs to produce models with and without constrained decoding. A corresponding parser is used to confirm the syntactic correctness of each model. We show that grammar masking can dramatically improve the modeling capabilities of several LLMs, reducing the need for well-refined prompting while increasing the chance of producing correct models.
Radio link failure (RLF) prediction system in Radio Access Networks (RANs) is critical for ensuring seamless communication and meeting the stringent requirements of high data rates, low latency, and improved reliability in 5G networks. However, weather conditions such as precipitation, humidity, temperature, and wind impact these communication links. Usually, historical radio link Key Performance Indicators (KPIs) and their surrounding weather station observations are utilized for building learning-based RLF prediction models. However, such models must be capable of learning the spatial weather context in a dynamic RAN and effectively encoding time series KPIs with the weather observation data. Existing works fail to incorporate both of these essential design aspects of the prediction models. This paper fills the gap by proposing GenTrap, a novel RLF prediction framework that introduces a graph neural network (GNN)-based learnable weather effect aggregation module and employs state-of-the-art time series transformer as the temporal feature extractor for radio link failure prediction. The proposed aggregation method of GenTrap can be integrated into any existing prediction model to achieve better performance and generalizability. We evaluate GenTrap on two real-world datasets (rural and urban) with 2.6 million KPI data points and show that GenTrap offers a significantly higher F1-score (0.93 for rural and 0.79 for urban) compared to its counterparts while possessing generalization capability.
DCZNMaker is a web-based application designed to streamline decision-making processes using Multi-attribute Utility Analysis (MAUA). Built with simplicity and efficiency in mind, DCZNMaker empowers users to make informed decisions among alternatives (options) by making explicit the factors (attributes) to be taken into consideration, as well as the importance (weights) and utility (location) of each attribute. The app offers a user-friendly interface, allowing individuals to input the various attributes and their associated weights and locations effortlessly. Leveraging advanced algorithms, DCZNMaker computes and presents comprehensive analyses, aiding users in understanding the relative importance of each attribute and guiding them towards optimal decisions. Several use cases are demonstrated. Whether for personal, professional, or academic use, DCZNMaker is a versatile tool adaptable to diverse decision-making scenarios. With its intuitive design and robust functionality, DCZNMaker revolutionizes decision-making processes, empowering individuals or groups of users to make well-informed choices with confidence and clarity.
The adaptation of Large Language Model (LLM)-based agents to execute tasks via natural language prompts represents a significant advancement, notably eliminating the need for explicit retraining or fine tuning, but are constrained by the comprehensiveness and diversity of the provided examples, leading to outputs that often diverge significantly from expected results, especially when it comes to the open-ended questions. This paper introduces the Memory Sharing, a framework which integrates the real-time memory filter, storage and retrieval to enhance the In-Context Learning process. This framework allows for the sharing of memories among multiple agents, whereby the interactions and shared memories between different agents effectively enhance the diversity of the memories. The collective self-enhancement through interactive learning among multiple agents facilitates the evolution from individual intelligence to collective intelligence. Besides, the dynamically growing memory pool is utilized not only to improve the quality of responses but also to train and enhance the retriever. We evaluated our framework across three distinct domains involving specialized tasks of agents. The experimental results demonstrate that the MS framework significantly improves the agents' performance in addressing open-ended questions.
Retrieval-augmented Large Language Models (LLMs) have reshaped traditional query-answering systems, offering unparalleled user experiences. However, existing retrieval techniques often struggle to handle multi-modal query contexts. In this paper, we present an interactive Multi-modal Query Answering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph index, integrated with cutting-edge LLMs. It comprises five core components: Data Preprocessing, Vector Representation, Index Construction, Query Execution, and Answer Generation, all orchestrated by a dedicated coordinator to ensure smooth data flow from input to answer generation. One notable aspect of MQA is its utilization of contrastive learning to assess the significance of different modalities, facilitating precise measurement of multi-modal information similarity. Furthermore, the system achieves efficient retrieval through our advanced navigation graph index, refined using computational pruning techniques. Another highlight of our system is its pluggable processing framework, allowing seamless integration of embedding models, graph indexes, and LLMs. This flexibility provides users diverse options for gaining insights from their multi-modal knowledge base. A preliminary video introduction of MQA is available at //youtu.be/xvUuo2ZIqWk.
Agent-based modeling and simulation has evolved as a powerful tool for modeling complex systems, offering insights into emergent behaviors and interactions among diverse agents. Integrating large language models into agent-based modeling and simulation presents a promising avenue for enhancing simulation capabilities. This paper surveys the landscape of utilizing large language models in agent-based modeling and simulation, examining their challenges and promising future directions. In this survey, since this is an interdisciplinary field, we first introduce the background of agent-based modeling and simulation and large language model-empowered agents. We then discuss the motivation for applying large language models to agent-based simulation and systematically analyze the challenges in environment perception, human alignment, action generation, and evaluation. Most importantly, we provide a comprehensive overview of the recent works of large language model-empowered agent-based modeling and simulation in multiple scenarios, which can be divided into four domains: cyber, physical, social, and hybrid, covering simulation of both real-world and virtual environments. Finally, since this area is new and quickly evolving, we discuss the open problems and promising future directions.
Multiple instance learning (MIL) is a powerful tool to solve the weakly supervised classification in whole slide image (WSI) based pathology diagnosis. However, the current MIL methods are usually based on independent and identical distribution hypothesis, thus neglect the correlation among different instances. To address this problem, we proposed a new framework, called correlated MIL, and provided a proof for convergence. Based on this framework, we devised a Transformer based MIL (TransMIL), which explored both morphological and spatial information. The proposed TransMIL can effectively deal with unbalanced/balanced and binary/multiple classification with great visualization and interpretability. We conducted various experiments for three different computational pathology problems and achieved better performance and faster convergence compared with state-of-the-art methods. The test AUC for the binary tumor classification can be up to 93.09% over CAMELYON16 dataset. And the AUC over the cancer subtypes classification can be up to 96.03% and 98.82% over TCGA-NSCLC dataset and TCGA-RCC dataset, respectively.