We consider the performance of a least-squares regression model, as judged by out-of-sample $R^2$. Shapley values give a fair attribution of the performance of a model to its input features, taking into account interdependencies between features. Evaluating the Shapley values exactly requires solving a number of regression problems that is exponential in the number of features, so a Monte Carlo-type approximation is typically used. We focus on the special case of least-squares regression models, where several tricks can be used to compute and evaluate regression models efficiently. These tricks give a substantial speed up, allowing many more Monte Carlo samples to be evaluated, achieving better accuracy. We refer to our method as least-squares Shapley performance attribution (LS-SPA), and describe our open-source implementation.
Diffusion Probabilistic Models stand as a critical tool in generative modelling, enabling the generation of complex data distributions. This family of generative models yields record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergence rates and high computational costs. In this paper, we introduce an approach that leverages continuous dynamical systems to design a novel denoising network for diffusion models that is more parameter-efficient, exhibits faster convergence, and demonstrates increased noise robustness. Experimenting with Denoising Diffusion Probabilistic Models (DDPMs), our framework operates with approximately a quarter of the parameters, and $\sim$ 30\% of the Floating Point Operations (FLOPs) compared to standard U-Nets in DDPMs. Furthermore, our model is notably faster in inference than the baseline when measured in fair and equal conditions. We also provide a mathematical intuition as to why our proposed reverse process is faster as well as a mathematical discussion of the empirical tradeoffs in the denoising downstream task. Finally, we argue that our method is compatible with existing performance enhancement techniques, enabling further improvements in efficiency, quality, and speed.
Shortcut reasoning is an irrational process of inference, which degrades the robustness of an NLP model. While a number of previous work has tackled the identification of shortcut reasoning, there are still two major limitations: (i) a method for quantifying the severity of the discovered shortcut reasoning is not provided; (ii) certain types of shortcut reasoning may be missed. To address these issues, we propose a novel method for identifying shortcut reasoning. The proposed method quantifies the severity of the shortcut reasoning by leveraging out-of-distribution data and does not make any assumptions about the type of tokens triggering the shortcut reasoning. Our experiments on Natural Language Inference and Sentiment Analysis demonstrate that our framework successfully discovers known and unknown shortcut reasoning in the previous work.
Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words. Moreover, these models often neglect sentence-level semantics, focusing primarily on token-level semantics. In this paper, we propose PromptTopic, a novel topic modeling approach that harnesses the advanced language understanding of large language models (LLMs) to address these challenges. It involves extracting topics at the sentence level from individual documents, then aggregating and condensing these topics into a predefined quantity, ultimately providing coherent topics for texts of varying lengths. This approach eliminates the need for manual parameter tuning and improves the quality of extracted topics. We benchmark PromptTopic against the state-of-the-art baselines on three vastly diverse datasets, establishing its proficiency in discovering meaningful topics. Furthermore, qualitative analysis showcases PromptTopic's ability to uncover relevant topics in multiple datasets.
Sequential recommendation models are crucial for next-item recommendations in online platforms, capturing complex patterns in user interactions. However, many focus on a single behavior, overlooking valuable implicit interactions like clicks and favorites. Existing multi-behavioral models often fail to simultaneously capture sequential patterns. We propose CASM, a Context-Aware Sequential Model, leveraging sequential models to seamlessly handle multiple behaviors. CASM employs context-aware multi-head self-attention for heterogeneous historical interactions and a weighted binary cross-entropy loss for precise control over behavior contributions. Experimental results on four datasets demonstrate CASM's superiority over state-of-the-art approaches.
Krylov subspace, which is generated by multiplying a given vector by the matrix of a linear transformation and its successive powers, has been extensively studied in classical optimization literature to design algorithms that converge quickly for large linear inverse problems. For example, the conjugate gradient method (CG), one of the most popular Krylov subspace methods, is based on the idea of minimizing the residual error in the Krylov subspace. However, with the recent advancement of high-performance diffusion solvers for inverse problems, it is not clear how classical wisdom can be synergistically combined with modern diffusion models. In this study, we propose a novel and efficient diffusion sampling strategy that synergistically combine the diffusion sampling and Krylov subspace methods. Specifically, we prove that if the tangent space at a denoised sample by Tweedie's formula forms a Krylov subspace, then the CG initialized with the denoised data ensures the data consistency update to remain in the tangent space. This negates the need to compute the manifold-constrained gradient (MCG), leading to a more efficient diffusion sampling method. Our method is applicable regardless of the parametrization and setting (i.e., VE, VP). Notably, we achieve state-of-the-art reconstruction quality on challenging real-world medical inverse imaging problems, including multi-coil MRI reconstruction and 3D CT reconstruction. Moreover, our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.
Dimensional analysis (DA) pays attention to fundamental physical dimensions such as length and mass when modelling scientific and engineering systems. It goes back at least a century to Buckingham's Pi theorem, which characterizes a scientifically meaningful model in terms of a limited number of dimensionless variables. The methodology has only been exploited relatively recently by statisticians for design and analysis of experiments, however, and computer experiments in particular. The basic idea is to build models in terms of new dimensionless quantities derived from the original input and output variables. A scientifically valid formulation has the potential for improved prediction accuracy in principle, but the implementation of DA is far from straightforward. There can be a combinatorial number of possible models satisfying the conditions of the theory. Empirical approaches for finding effective derived variables will be described, and improvements in prediction accuracy will be demonstrated. As DA's dimensionless quantities for a statistical model typically compare the original variables rather than use their absolute magnitudes, DA is less dependent on the choice of experimental ranges in the training data. Hence, we are also able to illustrate sustained accuracy gains even when extrapolating substantially outside the training data.
The prevalence of the powerful multilingual models, such as Whisper, has significantly advanced the researches on speech recognition. However, these models often struggle with handling the code-switching setting, which is essential in multilingual speech recognition. Recent studies have attempted to address this setting by separating the modules for different languages to ensure distinct latent representations for languages. Some other methods considered the switching mechanism based on language identification. In this study, a new attention-guided adaptation is proposed to conduct parameter-efficient learning for bilingual ASR. This method selects those attention heads in a model which closely express language identities and then guided those heads to be correctly attended with their corresponding languages. The experiments on the Mandarin-English code-switching speech corpus show that the proposed approach achieves a 14.2% mixed error rate, surpassing state-of-the-art method, where only 5.6% additional parameters over Whisper are trained.
L1-norm regularized logistic regression models are widely used for analyzing data with binary response. In those analyses, fusing regression coefficients is useful for detecting groups of variables. This paper proposes a binomial logistic regression model with Bayesian fused lasso. Assuming a Laplace prior on regression coefficients and differences between adjacent regression coefficients enables us to perform variable selection and variable fusion simultaneously in the Bayesian framework. We also propose assuming a horseshoe prior on the differences to improve the flexibility of variable fusion. The Gibbs sampler is derived to estimate the parameters by a hierarchical expression of priors and a data-augmentation method. Using simulation studies and real data analysis, we compare the proposed methods with the existing method.
Representing a polygon using a set of simple shapes has numerous applications in different use-case scenarios. We consider the problem of covering the interior of a rectilinear polygon with holes by a set of area-weighted, axis-aligned rectangles such that the total weight of the rectangles in the cover is minimized. Already the unit-weight case is known to be NP-hard and the general problem has, to the best of our knowledge, not been studied experimentally before. We show a new basic property of optimal solutions of the weighted problem. This allows us to speed up existing algorithms for the unit-weight case, obtain an improved ILP formulation for both the weighted and unweighted problem, and develop several approximation algorithms and heuristics for the weighted case. All our algorithms are evaluated in a large experimental study on 186 837 polygons combined with six cost functions, which provides evidence that our algorithms are both fast and yield close-to-optimal solutions in practice.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.