Data augmentation is an essential building block for learning efficient deep learning models. Among all augmentation techniques proposed so far, linear interpolation of training data points, also called mixup, has found to be effective for a large panel of applications. While the majority of works have focused on selecting the right points to mix, or applying complex non-linear interpolation, we are interested in mixing similar points more frequently and strongly than less similar ones. To this end, we propose to dynamically change the underlying distribution of interpolation coefficients through warping functions, depending on the similarity between data points to combine. We define an efficient and flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves both performance and calibration of models. Code available in //github.com/ENSTA-U2IS/torch-uncertainty
Learning representations of molecular structures using deep learning is a fundamental problem in molecular property prediction tasks. Molecules inherently exist in the real world as three-dimensional structures; furthermore, they are not static but in continuous motion in the 3D Euclidean space, forming a potential energy surface. Therefore, it is desirable to generate multiple conformations in advance and extract molecular representations using a 4D-QSAR model that incorporates multiple conformations. However, this approach is impractical for drug and material discovery tasks because of the computational cost of obtaining multiple conformations. To address this issue, we propose a pre-training method for molecular GNNs using an existing dataset of molecular conformations to generate a latent vector universal to multiple conformations from a 2D molecular graph. Our method, called Boltzmann GNN, is formulated by maximizing the conditional marginal likelihood of a conditional generative model for conformations generation. We show that our model has a better prediction performance for molecular properties than existing pre-training methods using molecular graphs and three-dimensional molecular structures.
As computational systems supported by artificial intelligence (AI) techniques continue to play an increasingly pivotal role in making high-stakes recommendations and decisions across various domains, the demand for explainable AI (XAI) has grown significantly, extending its impact into cognitive learning research. Providing explanations for novel concepts is recognised as a fundamental aid in the learning process, particularly when addressing challenges stemming from knowledge deficiencies and skill application. Addressing these difficulties involves timely explanations and guidance throughout the learning process, prompting the interest of AI experts in developing explainer models. In this paper, we introduce an intelligent system (CL-XAI) for Cognitive Learning which is supported by XAI, focusing on two key research objectives: exploring how human learners comprehend the internal mechanisms of AI models using XAI tools and evaluating the effectiveness of such tools through human feedback. The use of CL-XAI is illustrated with a game-inspired virtual use case where learners tackle combinatorial problems to enhance problem-solving skills and deepen their understanding of complex concepts, highlighting the potential for transformative advances in cognitive learning and co-learning.
Numerous self-supervised learning paradigms, such as contrastive learning and masked image modeling, have been proposed to acquire powerful and general representations from unlabeled data. However, these models are commonly pretrained within their specific framework alone, failing to consider the complementary nature of visual representations. To tackle this issue, we introduce Comprehensive Distillation with Multiple Self-supervised Teachers (DMT) for pretrained model compression, which leverages the strengths of multiple off-the-shelf self-supervised models. Our experimental results on prominent benchmark datasets exhibit that the proposed method significantly surpasses state-of-the-art competitors while retaining favorable efficiency metrics. On classification tasks, our DMT framework utilizing three different self-supervised ViT-Base teachers enhances the performance of both small/tiny models and the base model itself. For dense tasks, DMT elevates the AP/mIoU of standard SSL models on MS-COCO and ADE20K datasets by 4.0%.
This study introduces an innovative framework designed to automate tasks by interacting with UIs through a sequential, human-like problem-solving approach. Our approach initially transforms UI screenshots into natural language explanations through a vision-based UI analysis, circumventing traditional view hierarchy limitations. It then methodically engages with each interface, guiding the LLM to pinpoint and act on relevant UI elements, thus bolstering both precision and functionality. Employing the ERNIE Bot LLM, our approach has been demonstrated to surpass existing methodologies. It delivers superior UI interpretation across various datasets and exhibits remarkable efficiency in automating varied tasks on an Android smartphone, outperforming human capabilities in intricate tasks and significantly enhancing the PBD process.
Simulation-based inference (SBI) provides a powerful framework for inferring posterior distributions of stochastic simulators in a wide range of domains. In many settings, however, the posterior distribution is not the end goal itself -- rather, the derived parameter values and their uncertainties are used as a basis for deciding what actions to take. Unfortunately, because posterior distributions provided by SBI are (potentially crude) approximations of the true posterior, the resulting decisions can be suboptimal. Here, we address the question of how to perform Bayesian decision making on stochastic simulators, and how one can circumvent the need to compute an explicit approximation to the posterior. Our method trains a neural network on simulated data and can predict the expected cost given any data and action, and can, thus, be directly used to infer the action with lowest cost. We apply our method to several benchmark problems and demonstrate that it induces similar cost as the true posterior distribution. We then apply the method to infer optimal actions in a real-world simulator in the medical neurosciences, the Bayesian Virtual Epileptic Patient, and demonstrate that it allows to infer actions associated with low cost after few simulations.
One of the prominent methods for explaining the decision of a machine-learning classifier is by a counterfactual example. Most current algorithms for generating such examples in the textual domain are based on generative language models. Generative models, however, are trained to minimize a specific loss function in order to fulfill certain requirements for the generated texts. Any change in the requirements may necessitate costly retraining, thus potentially limiting their applicability. In this paper, we present a general search-based framework for generating counterfactual explanations in the textual domain. Our framework is model-agnostic, domain-agnostic, anytime, and does not require retraining in order to adapt to changes in the user requirements. We model the task as a search problem in a space where the initial state is the classified text, and the goal state is a text in a given target class. Our framework includes domain-independent modification operators, but can also exploit domain-specific knowledge through specialized operators. The search algorithm attempts to find a text from the target class with minimal user-specified distance from the original classified object.
Venn Prediction (VP) is a new machine learning framework for producing well-calibrated probabilistic predictions. In particular it provides well-calibrated lower and upper bounds for the conditional probability of an example belonging to each possible class of the problem at hand. This paper proposes five VP methods based on Neural Networks (NNs), which is one of the most widely used machine learning techniques. The proposed methods are evaluated experimentally on four benchmark datasets and the obtained results demonstrate the empirical well-calibratedness of their outputs and their superiority over the outputs of the traditional NN classifier.
Recommender systems have seen significant advancements with the influence of deep learning and graph neural networks, particularly in capturing complex user-item relationships. However, these graph-based recommenders heavily depend on ID-based data, potentially disregarding valuable textual information associated with users and items, resulting in less informative learned representations. Moreover, the utilization of implicit feedback data introduces potential noise and bias, posing challenges for the effectiveness of user preference learning. While the integration of large language models (LLMs) into traditional ID-based recommenders has gained attention, challenges such as scalability issues, limitations in text-only reliance, and prompt input constraints need to be addressed for effective implementation in practical recommender systems. To address these challenges, we propose a model-agnostic framework RLMRec that aims to enhance existing recommenders with LLM-empowered representation learning. It proposes a recommendation paradigm that integrates representation learning with LLMs to capture intricate semantic aspects of user behaviors and preferences. RLMRec incorporates auxiliary textual signals, develops a user/item profiling paradigm empowered by LLMs, and aligns the semantic space of LLMs with the representation space of collaborative relational signals through a cross-view alignment framework. This work further establish a theoretical foundation demonstrating that incorporating textual signals through mutual information maximization enhances the quality of representations. In our evaluation, we integrate RLMRec with state-of-the-art recommender models, while also analyzing its efficiency and robustness to noise data. Our implementation codes are available at //github.com/HKUDS/RLMRec.
Deep neural network based recommendation systems have achieved great success as information filtering techniques in recent years. However, since model training from scratch requires sufficient data, deep learning-based recommendation methods still face the bottlenecks of insufficient data and computational inefficiency. Meta-learning, as an emerging paradigm that learns to improve the learning efficiency and generalization ability of algorithms, has shown its strength in tackling the data sparsity issue. Recently, a growing number of studies on deep meta-learning based recommenddation systems have emerged for improving the performance under recommendation scenarios where available data is limited, e.g. user cold-start and item cold-start. Therefore, this survey provides a timely and comprehensive overview of current deep meta-learning based recommendation methods. Specifically, we propose a taxonomy to discuss existing methods according to recommendation scenarios, meta-learning techniques, and meta-knowledge representations, which could provide the design space for meta-learning based recommendation methods. For each recommendation scenario, we further discuss technical details about how existing methods apply meta-learning to improve the generalization ability of recommendation models. Finally, we also point out several limitations in current research and highlight some promising directions for future research in this area.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.