The rapid development of deep learning techniques, improved computational power, and the availability of vast training data have led to significant advancements in pre-trained models and large language models (LLMs). Pre-trained models based on architectures such as BERT and Transformer, as well as LLMs like ChatGPT, have demonstrated remarkable language capabilities and found applications in Software engineering. Software engineering tasks can be divided into many categories, among which generative tasks are the most concern by researchers, where pre-trained models and LLMs possess powerful language representation and contextual awareness capabilities, enabling them to leverage diverse training data and adapt to generative tasks through fine-tuning, transfer learning, and prompt engineering. These advantages make them effective tools in generative tasks and have demonstrated excellent performance. In this paper, we present a comprehensive literature review of generative tasks in SE using pre-trained models and LLMs. We accurately categorize SE generative tasks based on software engineering methodologies and summarize the advanced pre-trained models and LLMs involved, as well as the datasets and evaluation metrics used. Additionally, we identify key strengths, weaknesses, and gaps in existing approaches, and propose potential research directions. This review aims to provide researchers and practitioners with an in-depth analysis and guidance on the application of pre-trained models and LLMs in generative tasks within SE.
Bayesian hypothesis tests leverage posterior probabilities, Bayes factors, or credible intervals to inform data-driven decision making. We propose a framework for power curve approximation with such hypothesis tests. We present a fast approach to explore the approximate sampling distribution of posterior probabilities when the conditions for the Bernstein-von Mises theorem are satisfied. We extend that approach to consider segments of such sampling distributions in a targeted manner for each sample size explored. These sampling distribution segments are used to construct power curves for various types of posterior analyses. Our resulting method for power curve approximation is orders of magnitude faster than conventional power curve estimation for Bayesian hypothesis tests. We also prove the consistency of the corresponding power estimates and sample size recommendations under certain conditions.
Graph learning architectures based on the k-dimensional Weisfeiler-Leman (k-WL) hierarchy offer a theoretically well-understood expressive power. However, such architectures often fail to deliver solid predictive performance on real-world tasks, limiting their practical impact. In contrast, global attention-based models such as graph transformers demonstrate strong performance in practice, but comparing their expressive power with the k-WL hierarchy remains challenging, particularly since these architectures rely on positional or structural encodings for their expressivity and predictive performance. To address this, we show that the recently proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has at least 3-WL expressive power. Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance while not relying on positional or structural encodings. Our code is available at //github.com/luis-mueller/towards-principled-gts
This survey explores the development of adjoint Monte Carlo methods for solving optimization problems governed by kinetic equations, a common challenge in areas such as plasma control and device design. These optimization problems are particularly demanding due to the high dimensionality of the phase space and the randomness in evaluating the objective functional, a consequence of using a forward Monte Carlo solver. To overcome these difficulties, a range of ``adjoint Monte Carlo methods'' have been devised. These methods skillfully combine Monte Carlo gradient estimators with PDE-constrained optimization, introducing innovative solutions tailored for kinetic applications. In this review, we begin by examining three primary strategies for Monte Carlo gradient estimation: the score function approach, the reparameterization trick, and the coupling method. We also delve into the adjoint-state method, an essential element in PDE-constrained optimization. Focusing on applications in the radiative transfer equation and the nonlinear Boltzmann equation, we provide a comprehensive guide on how to integrate Monte Carlo gradient techniques within both the optimize-then-discretize and the discretize-then-optimize frameworks from PDE-constrained optimization. This approach leads to the formulation of effective adjoint Monte Carlo methods, enabling efficient gradient estimation in complex, high-dimensional optimization problems.
Hypergraphs are crucial for modeling higher-order interactions in real-world data. Hypergraph neural networks (HNNs) effectively utilise these structures by message passing to generate informative node features for various downstream tasks like node classification. However, the message passing block in existing HNNs typically requires a computationally intensive training process, which limits their practical use. To tackle this challenge, we propose an alternative approach by decoupling the usage of the hypergraph structural information from the model training stage. The proposed model, simplified hypergraph neural network (SHNN), contains a training-free message-passing block that can be precomputed before the training of SHNN, thereby reducing the computational burden. We theoretically support the efficiency and effectiveness of SHNN by showing that: 1) It is more training-efficient compared to existing HNNs; 2) It utilises as much information as existing HNNs for node feature generation; and 3) It is robust against the oversmoothing issue while using long-range interactions. Experiments based on six real-world hypergraph benchmarks in node classification and hyperlink prediction present that, compared to state-of-the-art HNNs, SHNN shows both competitive performance and superior training efficiency. Specifically, on Cora-CA, SHNN achieves the highest node classification accuracy with just 2% training time of the best baseline.
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.
Geometric deep learning (GDL), which is based on neural network architectures that incorporate and process symmetry information, has emerged as a recent paradigm in artificial intelligence. GDL bears particular promise in molecular modeling applications, in which various molecular representations with different symmetry properties and levels of abstraction exist. This review provides a structured and harmonized overview of molecular GDL, highlighting its applications in drug discovery, chemical synthesis prediction, and quantum chemistry. Emphasis is placed on the relevance of the learned molecular features and their complementarity to well-established molecular descriptors. This review provides an overview of current challenges and opportunities, and presents a forecast of the future of GDL for molecular sciences.
Deep learning methods for graphs achieve remarkable performance on many node-level and graph-level prediction tasks. However, despite the proliferation of the methods and their success, prevailing Graph Neural Networks (GNNs) neglect subgraphs, rendering subgraph prediction tasks challenging to tackle in many impactful applications. Further, subgraph prediction tasks present several unique challenges, because subgraphs can have non-trivial internal topology, but also carry a notion of position and external connectivity information relative to the underlying graph in which they exist. Here, we introduce SUB-GNN, a subgraph neural network to learn disentangled subgraph representations. In particular, we propose a novel subgraph routing mechanism that propagates neural messages between the subgraph's components and randomly sampled anchor patches from the underlying graph, yielding highly accurate subgraph representations. SUB-GNN specifies three channels, each designed to capture a distinct aspect of subgraph structure, and we provide empirical evidence that the channels encode their intended properties. We design a series of new synthetic and real-world subgraph datasets. Empirical results for subgraph classification on eight datasets show that SUB-GNN achieves considerable performance gains, outperforming strong baseline methods, including node-level and graph-level GNNs, by 12.4% over the strongest baseline. SUB-GNN performs exceptionally well on challenging biomedical datasets when subgraphs have complex topology and even comprise multiple disconnected components.
Graph neural networks (GNNs) have been widely used in representation learning on graphs and achieved state-of-the-art performance in tasks such as node classification and link prediction. However, most existing GNNs are designed to learn node representations on the fixed and homogeneous graphs. The limitations especially become problematic when learning representations on a misspecified graph or a heterogeneous graph that consists of various types of nodes and edges. In this paper, we propose Graph Transformer Networks (GTNs) that are capable of generating new graph structures, which involve identifying useful connections between unconnected nodes on the original graph, while learning effective node representation on the new graphs in an end-to-end fashion. Graph Transformer layer, a core layer of GTNs, learns a soft selection of edge types and composite relations for generating useful multi-hop connections so-called meta-paths. Our experiments show that GTNs learn new graph structures, based on data and tasks without domain knowledge, and yield powerful node representation via convolution on the new graphs. Without domain-specific graph preprocessing, GTNs achieved the best performance in all three benchmark node classification tasks against the state-of-the-art methods that require pre-defined meta-paths from domain knowledge.
Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.
We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.