Domain knowledge refers to the in-depth understanding, expertise, and familiarity with a specific subject, industry, field, or area of special interest. The existing benchmarks are all lack of an overall design for domain knowledge evaluation. Holding the belief that the real ability of domain language understanding can only be fairly evaluated by an comprehensive and in-depth benchmark, we introduces the Domma, a Domain Mastery Benchmark. DomMa targets at testing Large Language Models (LLMs) on their domain knowledge understanding, it features extensive domain coverage, large data volume, and a continually updated data set based on Chinese 112 first-level subject classifications. DomMa consist of 100,000 questions in both Chinese and English sourced from graduate entrance examinations and undergraduate exams in Chinese college. We have also propose designs to make benchmark and evaluation process more suitable to LLMs.
To plan the trajectories of a large and heterogeneous swarm, sequential or synchronous distributed methods usually become intractable, due to the lack of global connectivity and clock synchronization, Moreover, the existing asynchronously distributed schemes usually require recheck-like mechanisms instead of inherently considering the other' moving tendency. To this end, we propose a novel asynchronous protocol to allocate the agents' derivable space in a distributed way, by which each agent can replan trajectory depending on its own timetable. Properties such as collision avoidance and recursive feasibility are theoretically shown and a lower bound of protocol updating is provided. Comprehensive simulations and comparisons with five state-of-the-art methods validate the effectiveness of our method and illustrate the improvement in both the completion time and the moving distance. Finally, hardware experiments are carried out, where 8 heterogeneous unmanned ground vehicles with onboard computation navigate in cluttered scenarios at a high agility.
Modern neural collaborative filtering techniques are critical to the success of e-commerce, social media, and content-sharing platforms. However, despite technical advances -- for every new application domain, we need to train an NCF model from scratch. In contrast, pre-trained vision and language models are routinely applied to diverse applications directly (zero-shot) or with limited fine-tuning. Inspired by the impact of pre-trained models, we explore the possibility of pre-trained recommender models that support building recommender systems in new domains, with minimal or no retraining, without the use of any auxiliary user or item information. Zero-shot recommendation without auxiliary information is challenging because we cannot form associations between users and items across datasets when there are no overlapping users or items. Our fundamental insight is that the statistical characteristics of the user-item interaction matrix are universally available across different domains and datasets. Thus, we use the statistical characteristics of the user-item interaction matrix to identify dataset-independent representations for users and items. We show how to learn universal (i.e., supporting zero-shot adaptation without user or item auxiliary information) representations for nodes and edges from the bipartite user-item interaction graph. We learn representations by exploiting the statistical properties of the interaction data, including user and item marginals, and the size and density distributions of their clusters.
Due to data privacy constraints, data sharing among multiple clinical centers is restricted, which impedes the development of high performance deep learning models from multicenter collaboration. Naive weight transfer methods share intermediate model weights without raw data and hence can bypass data privacy restrictions. However, performance drops are typically observed when the model is transferred from one center to the next because of the forgetting problem. Incremental transfer learning, which combines peer-to-peer federated learning and domain incremental learning, can overcome the data privacy issue and meanwhile preserve model performance by using continual learning techniques. In this work, a conventional domain/task incremental learning framework is adapted for incremental transfer learning. A comprehensive survey on the efficacy of different regularization-based continual learning methods for multicenter collaboration is performed. The influences of data heterogeneity, classifier head setting, network optimizer, model initialization, center order, and weight transfer type have been investigated thoroughly. Our framework is publicly accessible to the research community for further development.
We introduce a new tool, Transductive Local Rademacher Complexity (TLRC), to analyze the generalization performance of transductive learning methods and motivate new transductive learning algorithms. Our work extends the idea of the popular Local Rademacher Complexity (LRC) to the transductive setting with considerable changes compared to the analysis of typical LRC methods in the inductive setting. We present a localized version of Rademacher complexity based tool wihch can be applied to various transductive learning problems and gain sharp bounds under proper conditions. Similar to the development of LRC, we build TLRC by starting from a sharp concentration inequality for independent variables with variance information. The prediction function class of a transductive learning model is then divided into pieces with a sub-root function being the upper bound for the Rademacher complexity of each piece, and the variance of all the functions in each piece is limited. A carefully designed variance operator is used to ensure that the bound for the test loss on unlabeled test data in the transductive setting enjoys a remarkable similarity to that of the classical LRC bound in the inductive setting. We use the new TLRC tool to analyze the Transductive Kernel Learning (TKL) model, where the labels of test data are generated by a kernel function. The result of TKL lays the foundation for generalization bounds for two types of transductive learning tasks, Graph Transductive Learning (GTL) and Transductive Nonparametric Kernel Regression (TNKR). When the target function is low-dimensional or approximately low-dimensional, we design low rank methods for both GTL and TNKR, which enjoy particularly sharper generalization bounds by TLRC which cannot be achieved by existing learning theory methods, to the best of our knowledge.
We study how to safely control nonlinear control-affine systems that are corrupted with bounded non-stochastic noise, i.e., noise that is unknown a priori and that is not necessarily governed by a stochastic model. We focus on safety constraints that take the form of time-varying convex constraints such as collision-avoidance and control-effort constraints. We provide an algorithm with bounded dynamic regret, i.e., bounded suboptimality against an optimal clairvoyant controller that knows the realization of the noise a prior. We are motivated by the future of autonomy where robots will autonomously perform complex tasks despite real-world unpredictable disturbances such as wind gusts. To develop the algorithm, we capture our problem as a sequential game between a controller and an adversary, where the controller plays first, choosing the control input, whereas the adversary plays second, choosing the noise's realization. The controller aims to minimize its cumulative tracking error despite being unable to know the noise's realization a prior. We validate our algorithm in simulated scenarios of (i) an inverted pendulum aiming to stay upright, and (ii) a quadrotor aiming to fly to a goal location through an unknown cluttered environment.
Mechanistic interpretability seeks to understand the internal mechanisms of machine learning models, where localization -- identifying the important model components -- is a key step. Activation patching, also known as causal tracing or interchange intervention, is a standard technique for this task (Vig et al., 2020), but the literature contains many variants with little consensus on the choice of hyperparameters or methodology. In this work, we systematically examine the impact of methodological details in activation patching, including evaluation metrics and corruption methods. In several settings of localization and circuit discovery in language models, we find that varying these hyperparameters could lead to disparate interpretability results. Backed by empirical observations, we give conceptual arguments for why certain metrics or methods may be preferred. Finally, we provide recommendations for the best practices of activation patching going forwards.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
Influenced by the stunning success of deep learning in computer vision and language understanding, research in recommendation has shifted to inventing new recommender models based on neural networks. In recent years, we have witnessed significant progress in developing neural recommender models, which generalize and surpass traditional recommender models owing to the strong representation power of neural networks. In this survey paper, we conduct a systematic review on neural recommender models, aiming to summarize the field to facilitate future progress. Distinct from existing surveys that categorize existing methods based on the taxonomy of deep learning techniques, we instead summarize the field from the perspective of recommendation modeling, which could be more instructive to researchers and practitioners working on recommender systems. Specifically, we divide the work into three types based on the data they used for recommendation modeling: 1) collaborative filtering models, which leverage the key source of user-item interaction data; 2) content enriched models, which additionally utilize the side information associated with users and items, like user profile and item knowledge graph; and 3) context enriched models, which account for the contextual information associated with an interaction, such as time, location, and the past interactions. After reviewing representative works for each type, we finally discuss some promising directions in this field, including benchmarking recommender systems, graph reasoning based recommendation models, and explainable and fair recommendations for social good.
We address the task of automatically scoring the competency of candidates based on textual features, from the automatic speech recognition (ASR) transcriptions in the asynchronous video job interview (AVI). The key challenge is how to construct the dependency relation between questions and answers, and conduct the semantic level interaction for each question-answer (QA) pair. However, most of the recent studies in AVI focus on how to represent questions and answers better, but ignore the dependency information and interaction between them, which is critical for QA evaluation. In this work, we propose a Hierarchical Reasoning Graph Neural Network (HRGNN) for the automatic assessment of question-answer pairs. Specifically, we construct a sentence-level relational graph neural network to capture the dependency information of sentences in or between the question and the answer. Based on these graphs, we employ a semantic-level reasoning graph attention network to model the interaction states of the current QA session. Finally, we propose a gated recurrent unit encoder to represent the temporal question-answer pairs for the final prediction. Empirical results conducted on CHNAT (a real-world dataset) validate that our proposed model significantly outperforms text-matching based benchmark models. Ablation studies and experimental results with 10 random seeds also show the effectiveness and stability of our models.
Commonsense knowledge and commonsense reasoning are some of the main bottlenecks in machine intelligence. In the NLP community, many benchmark datasets and tasks have been created to address commonsense reasoning for language understanding. These tasks are designed to assess machines' ability to acquire and learn commonsense knowledge in order to reason and understand natural language text. As these tasks become instrumental and a driving force for commonsense research, this paper aims to provide an overview of existing tasks and benchmarks, knowledge resources, and learning and inference approaches toward commonsense reasoning for natural language understanding. Through this, our goal is to support a better understanding of the state of the art, its limitations, and future challenges.