In this paper, we consider topological featurizations of data defined over simplicial complexes, like images and labeled graphs, obtained by convolving this data with various filters before computing persistence. Viewing a convolution filter as a local motif, the persistence diagram of the resulting convolution describes the way the motif is distributed across the simplicial complex. This pipeline, which we call convolutional persistence, extends the capacity of topology to observe patterns in such data. Moreover, we prove that (generically speaking) for any two labeled complexes one can find some filter for which they produce different persistence diagrams, so that the collection of all possible convolutional persistence diagrams is an injective invariant. This is proven by showing convolutional persistence to be a special case of another topological invariant, the Persistent Homology Transform. Other advantages of convolutional persistence are improved stability, greater flexibility for data-dependent vectorizations, and reduced computational complexity for certain data types. Additionally, we have a suite of experiments showing that convolutions greatly improve the predictive power of persistence on a host of classification tasks, even if one uses random filters and vectorizes the resulting diagrams by recording only their total persistences.
In this paper, we address the limitations of existing text-to-image diffusion models in generating demographically fair results when given human-related descriptions. These models often struggle to disentangle the target language context from sociocultural biases, resulting in biased image generation. To overcome this challenge, we propose Fair Mapping, a flexible, model-agnostic, and lightweight approach that modifies a pre-trained text-to-image diffusion model by controlling the prompt to achieve fair image generation. One key advantage of our approach is its high efficiency. It only requires updating an additional linear network with few parameters at a low computational cost. By developing a linear network that maps conditioning embeddings into a debiased space, we enable the generation of relatively balanced demographic results based on the specified text condition. With comprehensive experiments on face image generation, we show that our method significantly improves image generation fairness with almost the same image quality compared to conventional diffusion models when prompted with descriptions related to humans. By effectively addressing the issue of implicit language bias, our method produces more fair and diverse image outputs.
We propose a novel approach to the problem of mutual information (MI) estimation via introducing normalizing flows-based estimator. The estimator maps original data to the target distribution with known closed-form expression for MI. We demonstrate that our approach yields MI estimates for the original data. Experiments with high-dimensional data are provided to show the advantages of the proposed estimator.
The rapid development of deep learning techniques, improved computational power, and the availability of vast training data have led to significant advancements in pre-trained models and large language models (LLMs). Pre-trained models based on architectures such as BERT and Transformer, as well as LLMs like ChatGPT, have demonstrated remarkable language capabilities and found applications in Software engineering. Software engineering tasks can be divided into many categories, among which generative tasks are the most concern by researchers, where pre-trained models and LLMs possess powerful language representation and contextual awareness capabilities, enabling them to leverage diverse training data and adapt to generative tasks through fine-tuning, transfer learning, and prompt engineering. These advantages make them effective tools in generative tasks and have demonstrated excellent performance. In this paper, we present a comprehensive literature review of generative tasks in SE using pre-trained models and LLMs. We accurately categorize SE generative tasks based on software engineering methodologies and summarize the advanced pre-trained models and LLMs involved, as well as the datasets and evaluation metrics used. Additionally, we identify key strengths, weaknesses, and gaps in existing approaches, and propose potential research directions. This review aims to provide researchers and practitioners with an in-depth analysis and guidance on the application of pre-trained models and LLMs in generative tasks within SE.
In this paper, we introduce a periodogram-like function, called expectile periodograms, for detecting and estimating hidden periodicity from observations with asymmetrically distributed noise. The expectile periodograms are constructed from trigonometric expectile regression where a specially designed objective function is used to substitute the squared $l_2$ norm that leads to the ordinary periodograms. The expectile periodograms have properties which are analogous to quantile periodograms, which provide a broader view of the time series by examining different expectile levels, but are much faster to calculate. The asymptotic properties are discussed and simulations show its efficiency and robustness in the presence of hidden periodicities with asymmetric or heavy-tailed noise. Finally, we leverage the inherent two-dimensional characteristics of the expectile periodograms and train a deep-learning (DL) model to classify the earthquake waveform data. Remarkably, our approach achieves heightened classification testing accuracy when juxtaposed with alternative periodogram-based methodologies.
Disentangled Representation Learning (DRL) aims to learn a model capable of identifying and disentangling the underlying factors hidden in the observable data in representation form. The process of separating underlying factors of variation into variables with semantic meaning benefits in learning explainable representations of data, which imitates the meaningful understanding process of humans when observing an object or relation. As a general learning strategy, DRL has demonstrated its power in improving the model explainability, controlability, robustness, as well as generalization capacity in a wide range of scenarios such as computer vision, natural language processing, data mining etc. In this article, we comprehensively review DRL from various aspects including motivations, definitions, methodologies, evaluations, applications and model designs. We discuss works on DRL based on two well-recognized definitions, i.e., Intuitive Definition and Group Theory Definition. We further categorize the methodologies for DRL into four groups, i.e., Traditional Statistical Approaches, Variational Auto-encoder Based Approaches, Generative Adversarial Networks Based Approaches, Hierarchical Approaches and Other Approaches. We also analyze principles to design different DRL models that may benefit different tasks in practical applications. Finally, we point out challenges in DRL as well as potential research directions deserving future investigations. We believe this work may provide insights for promoting the DRL research in the community.
This paper presents a new approach for assembling graph neural networks based on framelet transforms. The latter provides a multi-scale representation for graph-structured data. With the framelet system, we can decompose the graph feature into low-pass and high-pass frequencies as extracted features for network training, which then defines a framelet-based graph convolution. The framelet decomposition naturally induces a graph pooling strategy by aggregating the graph feature into low-pass and high-pass spectra, which considers both the feature values and geometry of the graph data and conserves the total information. The graph neural networks with the proposed framelet convolution and pooling achieve state-of-the-art performance in many types of node and graph prediction tasks. Moreover, we propose shrinkage as a new activation for the framelet convolution, which thresholds the high-frequency information at different scales. Compared to ReLU, shrinkage in framelet convolution improves the graph neural network model in terms of denoising and signal compression: noises in both node and structure can be significantly reduced by accurately cutting off the high-pass coefficients from framelet decomposition, and the signal can be compressed to less than half its original size with the prediction performance well preserved.
Adversarial attack is a technique for deceiving Machine Learning (ML) models, which provides a way to evaluate the adversarial robustness. In practice, attack algorithms are artificially selected and tuned by human experts to break a ML system. However, manual selection of attackers tends to be sub-optimal, leading to a mistakenly assessment of model security. In this paper, a new procedure called Composite Adversarial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms and their hyper-parameters from a candidate pool of \textbf{32 base attackers}. We design a search space where attack policy is represented as an attacking sequence, i.e., the output of the previous attacker is used as the initialization input for successors. Multi-objective NSGA-II genetic algorithm is adopted for finding the strongest attack policy with minimum complexity. The experimental result shows CAA beats 10 top attackers on 11 diverse defenses with less elapsed time (\textbf{6 $\times$ faster than AutoAttack}), and achieves the new state-of-the-art on $l_{\infty}$, $l_{2}$ and unrestricted adversarial attacks.
In order to overcome the expressive limitations of graph neural networks (GNNs), we propose the first method that exploits vector flows over graphs to develop globally consistent directional and asymmetric aggregation functions. We show that our directional graph networks (DGNs) generalize convolutional neural networks (CNNs) when applied on a grid. Whereas recent theoretical works focus on understanding local neighbourhoods, local structures and local isomorphism with no global information flow, our novel theoretical framework allows directional convolutional kernels in any graph. First, by defining a vector field in the graph, we develop a method of applying directional derivatives and smoothing by projecting node-specific messages into the field. Then we propose the use of the Laplacian eigenvectors as such vector field, and we show that the method generalizes CNNs on an n-dimensional grid, and is provably more discriminative than standard GNNs regarding the Weisfeiler-Lehman 1-WL test. Finally, we bring the power of CNN data augmentation to graphs by providing a means of doing reflection, rotation and distortion on the underlying directional field. We evaluate our method on different standard benchmarks and see a relative error reduction of 8\% on the CIFAR10 graph dataset and 11% to 32% on the molecular ZINC dataset. An important outcome of this work is that it enables to translate any physical or biological problems with intrinsic directional axes into a graph network formalism with an embedded directional field.
In this paper, we propose Latent Relation Language Models (LRLMs), a class of language models that parameterizes the joint distribution over the words in a document and the entities that occur therein via knowledge graph relations. This model has a number of attractive properties: it not only improves language modeling performance, but is also able to annotate the posterior probability of entity spans for a given text through relations. Experiments demonstrate empirical improvements over both a word-based baseline language model and a previous approach that incorporates knowledge graph information. Qualitative analysis further demonstrates the proposed model's ability to learn to predict appropriate relations in context.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.