Neural Cellular Automata (NCA) are a powerful combination of machine learning and mechanistic modelling. We train NCA to learn complex dynamics from time series of images and PDE trajectories. Our method is designed to identify underlying local rules that govern large scale dynamic emergent behaviours. Previous work on NCA focuses on learning rules that give stationary emergent structures. We extend NCA to capture both transient and stable structures within the same system, as well as learning rules that capture the dynamics of Turing pattern formation in nonlinear Partial Differential Equations (PDEs). We demonstrate that NCA can generalise very well beyond their PDE training data, we show how to constrain NCA to respect given symmetries, and we explore the effects of associated hyperparameters on model performance and stability. Being able to learn arbitrary dynamics gives NCA great potential as a data driven modelling framework, especially for modelling biological pattern formation.
The Split and Rephrase (SPRP) task, which consists in splitting complex sentences into a sequence of shorter grammatical sentences, while preserving the original meaning, can facilitate the processing of complex texts for humans and machines alike. It is also a valuable testbed to evaluate natural language processing models, as it requires modelling complex grammatical aspects. In this work, we evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics, although still lagging in terms of splitting compliance. Results from two human evaluations further support the conclusions drawn from automated metric results. We provide a comprehensive study that includes prompting variants, domain shift, fine-tuned pretrained language models of varying parameter size and training data volumes, contrasted with both zero-shot and few-shot approaches on instruction-tuned language models. Although the latter were markedly outperformed by fine-tuned models, they may constitute a reasonable off-the-shelf alternative. Our results provide a fine-grained analysis of the potential and limitations of large language models for SPRP, with significant improvements achievable using relatively small amounts of training data and model parameters overall, and remaining limitations for all models on the task.
Fine-tuning pre-trained Large Language Models (LLMs) is essential to align them with human values and intentions. This process often utilizes methods like pairwise comparisons and KL divergence against a reference LLM, focusing on the evaluation of full answers generated by the models. However, the generation of these responses occurs in a token level, following a sequential, auto-regressive fashion. In this paper, we introduce Token-level Direct Preference Optimization (TDPO), a novel approach to align LLMs with human preferences by optimizing policy at the token level. Unlike previous methods, which face challenges in divergence efficiency, TDPO incorporates forward KL divergence constraints for each token, improving alignment and diversity. Utilizing the Bradley-Terry model for a token-based reward system, TDPO enhances the regulation of KL divergence, while preserving simplicity without the need for explicit reward modeling. Experimental results across various text tasks demonstrate TDPO's superior performance in balancing alignment with generation diversity. Notably, fine-tuning with TDPO strikes a better balance than DPO in the controlled sentiment generation and single-turn dialogue datasets, and significantly improves the quality of generated responses compared to both DPO and PPO-based RLHF methods. Our code is open-sourced at //github.com/Vance0124/Token-level-Direct-Preference-Optimization.
Optimization constrained by high-fidelity computational models has potential for transformative impact. However, such optimization is frequently unattainable in practice due to the complexity and computational intensity of the model. An alternative is to optimize a low-fidelity model and use limited evaluations of the high-fidelity model to assess the quality of the solution. This article develops a framework to use limited high-fidelity simulations to update the optimization solution computed using the low-fidelity model. Building off a previous article [22], which introduced hyper-differential sensitivity analysis with respect to model discrepancy, this article provides novel extensions of the algorithm to enable uncertainty quantification of the optimal solution update via a Bayesian framework. Specifically, we formulate a Bayesian inverse problem to estimate the model discrepancy and propagate the posterior model discrepancy distribution through the post-optimality sensitivity operator for the low-fidelity optimization problem. We provide a rigorous treatment of the Bayesian formulation, a computationally efficient algorithm to compute posterior samples, a guide to specify and interpret the algorithm hyper-parameters, and a demonstration of the approach on three examples which highlight various types of discrepancy between low and high-fidelity models.
Classical Markov Chain Monte Carlo methods have been essential for simulating statistical physical systems and have proven well applicable to other systems with complex degrees of freedom. Motivated by the statistical physics origins, Chen, Kastoryano, and Gily\'en [CKG23] proposed a continuous-time quantum thermodynamic analog to Glauber dynamic that is (i) exactly detailed balanced, (ii) efficiently implementable, and (iii) quasi-local for geometrically local systems. Physically, their construction gives a smooth variant of the Davies' generator derived from weak system-bath interaction. In this work, we give an efficiently implementable discrete-time quantum counterpart to Metropolis sampling that also enjoys the desirable features (i)-(iii). Also, we give an alternative highly coherent quantum generalization of detailed balanced dynamics that resembles another physically derived master equation, and propose a smooth interpolation between this and earlier constructions. We study generic properties of all constructions, including the uniqueness of the fixed-point and the locality of the resulting operators. We hope our results provide a systematic approach to the possible quantum generalizations of classical Glauber and Metropolis dynamics.
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. They learn stochastic parametric (hyper)policies by either exploring in the space of actions or in the space of parameters. Stochastic controllers, however, are often undesirable from a practical perspective because of their lack of robustness, safety, and traceability. In common practice, stochastic (hyper)policies are learned only to deploy their deterministic version. In this paper, we make a step towards the theoretical understanding of this practice. After introducing a novel framework for modeling this scenario, we study the global convergence to the best deterministic policy, under (weak) gradient domination assumptions. Then, we illustrate how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy. Finally, we quantitatively compare action-based and parameter-based exploration, giving a formal guise to intuitive results.
Partial Label Learning (PLL) grapples with learning from ambiguously labelled data, and it has been successfully applied in fields such as image recognition. Nevertheless, traditional PLL methods rely on the closed-world assumption, which can be limiting in open-world scenarios and negatively impact model performance and generalization. To tackle these challenges, our study introduces a novel method called PLL-OOD, which is the first to incorporate Out-of-Distribution (OOD) detection into the PLL framework. PLL-OOD significantly enhances model adaptability and accuracy by merging self-supervised learning with partial label loss and pioneering the Partial-Energy (PE) score for OOD detection. This approach improves data feature representation and effectively disambiguates candidate labels, using a dynamic label confidence matrix to refine predictions. The PE score, adjusted by label confidence, precisely identifies OOD instances, optimizing model training towards in-distribution data. This innovative method markedly boosts PLL model robustness and performance in open-world settings. To validate our approach, we conducted a comprehensive comparative experiment combining the existing state-of-the-art PLL model with multiple OOD scores on the CIFAR-10 and CIFAR-100 datasets with various OOD datasets. The results demonstrate that the proposed PLL-OOD framework is highly effective and effectiveness outperforms existing models, showcasing its superiority and effectiveness.
Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes -- a crucial component in CL -- remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation.
In Multi-Label Text Classification (MLTC), one sample can belong to more than one class. It is observed that most MLTC tasks, there are dependencies or correlations among labels. Existing methods tend to ignore the relationship among labels. In this paper, a graph attention network-based model is proposed to capture the attentive dependency structure among the labels. The graph attention network uses a feature matrix and a correlation matrix to capture and explore the crucial dependencies between the labels and generate classifiers for the task. The generated classifiers are applied to sentence feature vectors obtained from the text feature extraction network (BiLSTM) to enable end-to-end training. Attention allows the system to assign different weights to neighbor nodes per label, thus allowing it to learn the dependencies among labels implicitly. The results of the proposed model are validated on five real-world MLTC datasets. The proposed model achieves similar or better performance compared to the previous state-of-the-art models.
Recommender systems are widely used in big information-based companies such as Google, Twitter, LinkedIn, and Netflix. A recommender system deals with the problem of information overload by filtering important information fragments according to users' preferences. In light of the increasing success of deep learning, recent studies have proved the benefits of using deep learning in various recommendation tasks. However, most proposed techniques only aim to target individuals, which cannot be efficiently applied in group recommendation. In this paper, we propose a deep learning architecture to solve the group recommendation problem. On the one hand, as different individual preferences in a group necessitate preference trade-offs in making group recommendations, it is essential that the recommendation model can discover substitutes among user behaviors. On the other hand, it has been observed that a user as an individual and as a group member behaves differently. To tackle such problems, we propose using an attention mechanism to capture the impact of each user in a group. Specifically, our model automatically learns the influence weight of each user in a group and recommends items to the group based on its members' weighted preferences. We conduct extensive experiments on four datasets. Our model significantly outperforms baseline methods and shows promising results in applying deep learning to the group recommendation problem.
Recently, ensemble has been applied to deep metric learning to yield state-of-the-art results. Deep metric learning aims to learn deep neural networks for feature embeddings, distances of which satisfy given constraint. In deep metric learning, ensemble takes average of distances learned by multiple learners. As one important aspect of ensemble, the learners should be diverse in their feature embeddings. To this end, we propose an attention-based ensemble, which uses multiple attention masks, so that each learner can attend to different parts of the object. We also propose a divergence loss, which encourages diversity among the learners. The proposed method is applied to the standard benchmarks of deep metric learning and experimental results show that it outperforms the state-of-the-art methods by a significant margin on image retrieval tasks.