In a seminal paper, Weitz showed that for two-state spin systems, such as the Ising and hardcore models from statistical physics, correlation decay on trees implies correlation decay on arbitrary graphs. The key gadget in Weitz's reduction has been instrumental in recent advances in approximate counting and sampling, from analysis of local Markov chains like Glauber dynamics to the design of deterministic algorithms for estimating the partition function. A longstanding open problem in the field has been to find such a reduction for more general multispin systems like the uniform distribution over proper colorings of a graph. In this paper, we show that for a rich class of multispin systems, including the ferromagnetic Potts model, there are fundamental obstacles to extending Weitz's reduction to the multispin setting. A central component of our investigation is establishing nonconvexity of the image of the belief propagation functional, the standard tool for analyzing spin systems on trees. On the other hand, we provide evidence of convexity for the antiferromagnetic Potts model.
We present a novel method for matrix completion, specifically designed for matrices where one dimension significantly exceeds the other. Our Columns Selected Matrix Completion (CSMC) method combines Column Subset Selection and Low-Rank Matrix Completion to efficiently reconstruct incomplete datasets. In each step, CSMC solves a convex optimization task. We introduce two algorithms that implement CSMC, each tailored to different problem sizes. A formal analysis outlines the necessary assumptions and the probability of a correct solution. To assess the impact of matrix size, rank, and the proportion of missing entries on solution quality and computation time, we conducted experiments on synthetic data. The method was applied to two real-world problems: recommendation systems and image inpainting. Our results show that CSMC delivers solutions comparable to state-of-the-art matrix completion algorithms based on convex optimization, but with significant runtime savings. This makes CSMC especially valuable for systems that require efficient processing of large, incomplete datasets while maintaining the integrity of the derived insights.
This study investigates the performance of three popular tokenization tools: MeCab, Sudachi, and SentencePiece, when applied as a preprocessing step for sentiment-based text classification of Japanese texts. Using Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, we evaluate two traditional machine learning classifiers: Multinomial Naive Bayes and Logistic Regression. The results reveal that Sudachi produces tokens closely aligned with dictionary definitions, while MeCab and SentencePiece demonstrate faster processing speeds. The combination of SentencePiece, TF-IDF, and Logistic Regression outperforms the other alternatives in terms of classification performance.
In this paper we derive a new direct inversion method to simulate squared Bessel processes. Since the transition probability of these processes can be represented by a non-central chi-square distribution, we construct an efficient and accurate algorithm to simulate non-central chi-square variables. In this method, the dimension of the squared Bessel process, equivalently the degrees of freedom of the chi-square distribution, is treated as a variable. We therefore use a two-dimensional Chebyshev expansion to approximate the inverse function of the central chi-square distribution with one variable being the degrees of freedom. The method is accurate and efficient for any value of degrees of freedom including the computationally challenging case of small values. One advantage of the method is that noncentral chi-square samples can be generated for a whole range of values of degrees of freedom using the same Chebyshev coefficients. The squared Bessel process is a building block for the well-known Cox-Ingersoll-Ross (CIR) processes, which can be generated from squared Bessel processes through time change and linear transformation. Our direct inversion method thus allows the efficient and accurate simulation of these processes, which are used as models in a wide variety of applications.
By generating new yet effective data, data augmentation has become a promising method to mitigate the data sparsity problem in sequential recommendation. Existing works focus on augmenting the original data but rarely explore the issue of imbalanced relevance and diversity for augmented data, leading to semantic drift problems or limited performance improvements. In this paper, we propose a novel Balanced data Augmentation Plugin for Sequential Recommendation (BASRec) to generate data that balance relevance and diversity. BASRec consists of two modules: Single-sequence Augmentation and Cross-sequence Augmentation. The former leverages the randomness of the heuristic operators to generate diverse sequences for a single user, after which the diverse and the original sequences are fused at the representation level to obtain relevance. Further, we devise a reweighting strategy to enable the model to learn the preferences based on the two properties adaptively. The Cross-sequence Augmentation performs nonlinear mixing between different sequence representations from two directions. It produces virtual sequence representations that are diverse enough but retain the vital semantics of the original sequences. These two modules enhance the model to discover fine-grained preferences knowledge from single-user and cross-user perspectives. Extensive experiments verify the effectiveness of BASRec. The average improvement is up to 72.0% on GRU4Rec, 33.8% on SASRec, and 68.5% on FMLP-Rec. We demonstrate that BASRec generates data with a better balance between relevance and diversity than existing methods. The source code is available at //github.com/KingGugu/BASRec.
Hardware failures are a growing challenge for machine learning accelerators, many of which are based on systolic arrays. When a permanent hardware failure occurs in a systolic array, existing solutions include localizing and isolating the faulty processing element (PE), using a redundant PE for re-execution, or in some extreme cases decommissioning the entire accelerator for further investigation. In this paper, we propose novel algorithmic approaches that mitigate permanent hardware faults in neural network (NN) accelerators by uniquely integrating the behavior of the faulty component instead of bypassing it. In doing so, we aim for a more sustainable use of the accelerator where faulty hardware is neither bypassed nor discarded, instead being given a second life. We first introduce a CUDA-accelerated systolic array simulator in PyTorch, which enabled us to quantify the impact of permanent faults appearing on links connecting two PEs or in weight registers, where one bit is stuck at 0 or 1 in the float32, float16, or bfloat16 representation. We then propose several algorithmic mitigation techniques for a subset of stuck-at faults, such as Invertible Scaling or Shifting of activations and weights, or fine tuning with the faulty behavior. Notably, the proposed techniques do not require any hardware modification, instead relying on existing components of widely used systolic array based accelerators, such as normalization, activation, and storage units. Extensive experimental evaluations using fully connected and convolutional NNs trained on MNIST, CIFAR-10 and ImageNet show that the proposed fault-tolerant approach matches or gets very close to the original fault-free accuracy.
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.
This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).
We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.