Data-driven decision-making is at the core of many modern applications, and understanding the data is critical in supporting trust in these decisions. However, data is dynamic and evolving, just like the real-world entities it represents. Thus, an important component of understanding data is analyzing and drawing insights from the changes it undergoes. Existing methods for exploring data change list differences exhaustively, which are not interpretable by humans and lack salient insights regarding change trends. For example, an explanation that semantically summarizes changes to highlight gender disparities in performance rewards is more human-consumable than a long list of employee salary changes. We demonstrate ChARLES, a system that derives semantic summaries of changes between two snapshots of an evolving database, in an effective, concise, and interpretable way. Our key observation is that, while datasets often evolve through point and other small-batch updates, rich data features can reveal latent semantics that can intuitively summarize the changes. Under the hood, ChARLES compares database versions, infers feasible transformations by fitting multiple regression lines over different data partitions to derive change summaries, and ranks them. ChARLES allows users to customize it to obtain their preferred explanation by navigating the accuracy-interpretability tradeoff, and offers a proof of concept for reasoning about data evolution over real-world datasets.
Obelia improves upon structured DAG-based consensus protocols used in proof-of-stake systems, allowing them to effectively scale to accommodate hundreds of validators. Obelia implements a two-tier validator system. A core group of high-stake validators that propose blocks as in current protocols and a larger group of lower-stake auxiliary validators that occasionally author blocks. Obelia incentivizes auxiliary validators to assist recovering core validators and integrates seamlessly with existing protocols. We show that Obelia does not introduce visible overhead compared to the original protocol, even when scaling to hundreds of validators, or when a large number of auxiliary validators are unreliable.
Graph neural networks are recognized for their strong performance across various applications, with the backpropagation algorithm playing a central role in the development of most GNN models. However, despite its effectiveness, BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks. While several non-BP training algorithms, such as the direct feedback alignment, have been successfully applied to fully-connected and convolutional network components for handling Euclidean data, directly adapting these non-BP frameworks to manage non-Euclidean graph data in GNN models presents significant challenges. These challenges primarily arise from the violation of the i.i.d. assumption in graph data and the difficulty in accessing prediction errors for all samples (nodes) within the graph. To overcome these obstacles, in this paper we propose DFA-GNN, a novel forward learning framework tailored for GNNs with a case study of semi-supervised learning. The proposed method breaks the limitations of BP by using a dedicated forward training mechanism. Specifically, DFA-GNN extends the principles of DFA to adapt to graph data and unique architecture of GNNs, which incorporates the information of graph topology into the feedback links to accommodate the non-Euclidean characteristics of graph data. Additionally, for semi-supervised graph learning tasks, we developed a pseudo error generator that spreads residual errors from training data to create a pseudo error for each unlabeled node. These pseudo errors are then utilized to train GNNs using DFA. Extensive experiments on 10 public benchmarks reveal that our learning framework outperforms not only previous non-BP methods but also the standard BP methods, and it exhibits excellent robustness against various types of noise and attacks.
This study investigated the integration of cutting-edge technologies and methodologies for creating dynamic, user-centered library environments. In creative strategies for engagement and innovation, library users must be empowered to undertake the new role of modernizing library services and enhancing user experiences. It also enhances the information management and user engagement. This can be attained from personalized approaches, such as recommendation systems to interactive platforms that will have effective experiences tailored to users of different natures. It investigates the consumer engagement practices of enthusiasm, sharing, and learning about their roles in cognitive, affective, and behavioural engagements. Combined, these new approaches will help promote learning, interaction, and growth, add value, and have a more positive impact on users. The challenge for libraries in this rapidly changing, technologically advancing, and digitally networked world, with a base of expectant users, is to remain relevant and engaging. This study discusses innovative strategies for empowering library users and enhancing their engagement through creative and technological approaches. This investigation was conducted to integrate cutting-edge technologies and methodologies into creating dynamic library settings that are user-centered and foster learning, interaction, and personal growth.
Before deploying outputs from foundation models in high-stakes tasks, it is imperative to ensure that they align with human values. For instance, in radiology report generation, reports generated by a vision-language model must align with human evaluations before their use in medical decision-making. This paper presents Conformal Alignment, a general framework for identifying units whose outputs meet a user-specified alignment criterion. It is guaranteed that on average, a prescribed fraction of selected units indeed meet the alignment criterion, regardless of the foundation model or the data distribution. Given any pre-trained model and new units with model-generated outputs, Conformal Alignment leverages a set of reference data with ground-truth alignment status to train an alignment predictor. It then selects new units whose predicted alignment scores surpass a data-dependent threshold, certifying their corresponding outputs as trustworthy. Through applications to question answering and radiology report generation, we demonstrate that our method is able to accurately identify units with trustworthy outputs via lightweight training over a moderate amount of reference data. En route, we investigate the informativeness of various features in alignment prediction and combine them with standard models to construct the alignment predictor.
This paper presents a novel learning-based control framework that uses keyframing to incorporate high-level objectives in natural locomotion for legged robots. These high-level objectives are specified as a variable number of partial or complete pose targets that are spaced arbitrarily in time. Our proposed framework utilizes a multi-critic reinforcement learning algorithm to effectively handle the mixture of dense and sparse rewards. Additionally, it employs a transformer-based encoder to accommodate a variable number of input targets, each associated with specific time-to-arrivals. Throughout simulation and hardware experiments, we demonstrate that our framework can effectively satisfy the target keyframe sequence at the required times. In the experiments, the multi-critic method significantly reduces the effort of hyperparameter tuning compared to the standard single-critic alternative. Moreover, the proposed transformer-based architecture enables robots to anticipate future goals, which results in quantitative improvements in their ability to reach their targets.
This work presents a meta-reinforcement learning approach to develop a universal locomotion control policy capable of zero-shot generalization across diverse quadrupedal platforms. The proposed method trains an RL agent equipped with a memory unit to imitate reference motions using a small set of procedurally generated quadruped robots. Through comprehensive simulation and real-world hardware experiments, we demonstrate the efficacy of our approach in achieving locomotion across various robots without requiring robot-specific fine-tuning. Furthermore, we highlight the critical role of the memory unit in enabling generalization, facilitating rapid adaptation to changes in the robot properties, and improving sample efficiency.
Model library is an effective tool for improving the performance of single-model Out-of-Distribution (OoD) detector, mainly through model selection and detector fusion. However, existing methods in the literature do not provide uncertainty quantification for model selection results. Additionally, the model ensemble process primarily focuses on controlling the True Positive Rate (TPR) while neglecting the False Positive Rate (FPR). In this paper, we emphasize the significance of the proportion of models in the library that identify the test sample as an OoD sample. This proportion holds crucial information and directly influences the error rate of OoD detection.To address this, we propose inverting the commonly-used sequential p-value strategies. We define the rejection region initially and then estimate the error rate. Furthermore, we introduce a novel perspective from change-point detection and propose an approach for proportion estimation with automatic hyperparameter selection. We name the proposed approach as DOS-Storey-based Detector Ensemble (DSDE). Experimental results on CIFAR10 and CIFAR100 demonstrate the effectiveness of our approach in tackling OoD detection challenges. Specifically, the CIFAR10 experiments show that DSDE reduces the FPR from 11.07% to 3.31% compared to the top-performing single-model detector.
The large models, as predicted by scaling raw forecasts, have made groundbreaking progress in many fields, particularly in natural language generation tasks, where they have approached or even surpassed human levels. However, the unprecedented scale of their parameters brings significant computational and storage costs. These large models require substantial computational resources and GPU memory to operate. When adapting large models to specific downstream tasks, their massive parameter scale poses a significant challenge in fine-tuning on hardware platforms with limited computational power and GPU memory. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) offers a practical solution by efficiently adjusting the parameters of large pre-trained models to suit various downstream tasks. Specifically, PEFT adjusts the parameters of pre-trained large models to adapt to specific tasks or domains, minimizing the introduction of additional parameters and the computational resources required. This review mainly introduces the preliminary knowledge of PEFT, the core ideas and principles of various PEFT algorithms, the applications of PEFT, and potential future research directions. By reading this review, we believe that interested parties can quickly grasp the PEFT methodology, thereby accelerating its development and innovation.
With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval methods struggle to meet the needs of users demanding access to data from various modalities. To address this, cross-modal retrieval has emerged, enabling interaction across modalities, facilitating semantic matching, and leveraging complementarity and consistency between different modal data. Although prior literature undertook a review of the cross-modal retrieval field, it exhibits numerous deficiencies pertaining to timeliness, taxonomy, and comprehensiveness. This paper conducts a comprehensive review of cross-modal retrieval's evolution, spanning from shallow statistical analysis techniques to vision-language pre-training models. Commencing with a comprehensive taxonomy grounded in machine learning paradigms, mechanisms, and models, the paper then delves deeply into the principles and architectures underpinning existing cross-modal retrieval methods. Furthermore, it offers an overview of widely used benchmarks, metrics, and performances. Lastly, the paper probes the prospects and challenges that confront contemporary cross-modal retrieval, while engaging in a discourse on potential directions for further progress in the field. To facilitate the research on cross-modal retrieval, we develop an open-source code repository at //github.com/BMC-SDNU/Cross-Modal-Retrieval.
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.