The template design problem (TDP) is a hard combinatorial problem with a high number of symmetries which makes solving it more complicated. A number of techniques have been proposed in the literature to optimise its resolution, ranging from complete methods to stochastic ones. However, although metaheuristics are considered efficient methods that can find enough-quality solutions at a reasonable computational cost, these techniques have not proven to be truly efficient enough to deal with this problem. This paper explores and analyses a wide range of metaheuristics to tackle the problem with the aim of assessing their suitability for finding template designs. We tackle the problem using a wide set of metaheuristics whose implementation is guided by a number of issues such as problem formulation, solution encoding, the symmetrical nature of the problem, and distinct forms of hybridisation. For the TDP, we also propose a slot-based alternative problem formulation (distinct to other slot-based proposals), which represents another option other than the classical variation-based formulation of the problem. An empirical analysis, assessing the performance of all the metaheuristics (i.e., basic, integrative and collaborative algorithms working on different search spaces and with/without symmetry breaking) shows that some of our proposals can be considered the state-of-the-art when they are applied to specific problem instances.
Language models have proven successful across a wide range of software engineering tasks, but their significant computational costs often hinder their practical adoption. To address this challenge, researchers have begun applying various compression strategies to improve the efficiency of language models for code. These strategies aim to optimize inference latency and memory usage, though often at the cost of reduced model effectiveness. However, there is still a significant gap in understanding how these strategies influence the efficiency and effectiveness of language models for code. Here, we empirically investigate the impact of three well-known compression strategies -- knowledge distillation, quantization, and pruning -- across three different classes of software engineering tasks: vulnerability detection, code summarization, and code search. Our findings reveal that the impact of these strategies varies greatly depending on the task and the specific compression method employed. Practitioners and researchers can use these insights to make informed decisions when selecting the most appropriate compression strategy, balancing both efficiency and effectiveness based on their specific needs.
System identification, the process of deriving mathematical models of dynamical systems from observed input-output data, has undergone a paradigm shift with the advent of learning-based methods. Addressing the intricate challenges of data-driven discovery in nonlinear dynamical systems, these methods have garnered significant attention. Among them, Sparse Identification of Nonlinear Dynamics (SINDy) has emerged as a transformative approach, distilling complex dynamical behaviors into interpretable linear combinations of basis functions. However, SINDy relies on domain-specific expertise to construct its foundational "library" of basis functions, which limits its adaptability and universality. In this work, we introduce a nonlinear system identification framework called LeARN that transcends the need for prior domain knowledge by learning the library of basis functions directly from data. To enhance adaptability to evolving system dynamics under varying noise conditions, we employ a novel meta-learning-based system identification approach that uses a lightweight deep neural network (DNN) to dynamically refine these basis functions. This not only captures intricate system behaviors but also adapts seamlessly to new dynamical regimes. We validate our framework on the Neural Fly dataset, showcasing its robust adaptation and generalization capabilities. Despite its simplicity, our LeARN achieves competitive dynamical error performance compared to SINDy. This work presents a step toward the autonomous discovery of dynamical systems, paving the way for a future where machine learning uncovers the governing principles of complex systems without requiring extensive domain-specific interventions.
In order to make the foundation model more efficient and effective, our idea is combining sequence transformation and state transformation. First, we prove the availability of rotary position embedding in the state space duality algorithm, which reduces the perplexity of the hybrid quadratic causal self-attention and state space duality by more than 4%, to ensure that the combining sequence transformation unifies position encoding. Second, we propose dynamic mask attention, which maintains 100% accuracy in the more challenging multi-query associative recall task, improving by more than 150% compared to quadratic causal self-attention and state space duality, to ensure that the combining sequence transformation selectively filters relevant information. Third, we design cross domain mixture of experts, which makes the computational speed of expert retrieval with more than 1024 experts 8 to 10 times faster than the mixture of experts, to ensure that the combining state transformation quickly retrieval mixture. Finally, we summarize these matrix algorithms that can form the foundation model: Wonderful Matrices, which can be a competitor to popular model architectures.
Large Language Models (LLMs) have shown strong performance in solving mathematical problems, with code-based solutions proving particularly effective. However, the best practice to leverage coding instruction data to enhance mathematical reasoning remains underexplored. This study investigates three key questions: (1) How do different coding styles of mathematical code-based rationales impact LLMs' learning performance? (2) Can general-domain coding instructions improve performance? (3) How does integrating textual rationales with code-based ones during training enhance mathematical reasoning abilities? Our findings reveal that code-based rationales with concise comments, descriptive naming, and hardcoded solutions are beneficial, while improvements from general-domain coding instructions and textual rationales are relatively minor. Based on these insights, we propose CoinMath, a learning strategy designed to enhance mathematical reasoning by diversifying the coding styles of code-based rationales. CoinMath generates a variety of code-based rationales incorporating concise comments, descriptive naming conventions, and hardcoded solutions. Experimental results demonstrate that CoinMath significantly outperforms its baseline model, MAmmoTH, one of the SOTA math LLMs.
Voronoi tessellation, also known as Voronoi diagram, is an important computational geometry technique that has applications in various scientific disciplines. It involves dividing a given space into regions based on the proximity to a set of points. Autodifferentiation is a powerful tool for solving optimization tasks. Autodifferentiation assumes constructing a computational graph that allows to compute gradients using backpropagation algorithm. However, often the Voronoi tessellation remains the only non-differentiable part of a pipeline, prohibiting end-to-end differentiation. We present the method for autodifferentiation of the 2D Voronoi tessellation. The method allows one to construct the Voronoi tessellation and pass gradients, making the construction end-to-end differentiable. We provide the implementation details and present several important applications. To the best of our knowledge this is the first autodifferentiable realization of the Voronoi tessellation providing full set of Voronoi geometrical parameters in a differentiable way.
We introduce the Coarse Payoff-Assessment Learning (CPAL) model, which captures reinforcement learning by boundedly rational decision-makers who focus on the aggregate outcomes of choosing among exogenously defined clusters of alternatives (similarity classes), rather than evaluating each alternative individually. Analyzing a smooth approximation of the model, we show that the learning dynamics exhibit steady-states corresponding to smooth Valuation Equilibria (Jehiel and Samet, 2007). We demonstrate the existence of multiple equilibria in decision trees with generic payoffs and establish the local asymptotic stability of pure equilibria when they occur. Conversely, when trivial choices featuring alternatives within the same similarity class yield sufficiently high payoffs, a unique mixed equilibrium emerges, characterized by indifferences between similarity classes, even under acute sensitivity to payoff differences. Finally, we prove that this unique mixed equilibrium is globally asymptotically stable under the CPAL dynamics.
Compressing integer keys is a fundamental operation among multiple communities, such as database management (DB), information retrieval (IR), and high-performance computing (HPC). Recent advances in \emph{learned indexes} have inspired the development of \emph{learned compressors}, which leverage simple yet compact machine learning (ML) models to compress large-scale sorted keys. The core idea behind learned compressors is to \emph{losslessly} encode sorted keys by approximating them with \emph{error-bounded} ML models (e.g., piecewise linear functions) and using a \emph{residual array} to guarantee accurate key reconstruction. While the concept of learned compressors remains in its early stages of exploration, our benchmark results demonstrate that an SIMD-optimized learned compressor can significantly outperform state-of-the-art CPU-based compressors. Drawing on our preliminary experiments, this vision paper explores the potential of learned data compression to enhance critical areas in DBMS and related domains. Furthermore, we outline the key technical challenges that existing systems must address when integrating this emerging methodology.
We present Bluebell, a program logic for reasoning about probabilistic programs where unary and relational styles of reasoning come together to create new reasoning tools. Unary-style reasoning is very expressive and is powered by foundational mechanisms to reason about probabilistic behaviour like independence and conditioning. The relational style of reasoning, on the other hand, naturally shines when the properties of interest compare the behaviour of similar programs (e.g. when proving differential privacy) managing to avoid having to characterize the output distributions of the individual programs. So far, the two styles of reasoning have largely remained separate in the many program logics designed for the deductive verification of probabilistic programs. In Bluebell, we unify these styles of reasoning through the introduction of a new modality called "joint conditioning" that can encode and illuminate the rich interaction between conditional independence and relational liftings; the two powerhouses from the two styles of reasoning.
With the advances of data-driven machine learning research, a wide variety of prediction problems have been tackled. It has become critical to explore how machine learning and specifically deep learning methods can be exploited to analyse healthcare data. A major limitation of existing methods has been the focus on grid-like data; however, the structure of physiological recordings are often irregular and unordered which makes it difficult to conceptualise them as a matrix. As such, graph neural networks have attracted significant attention by exploiting implicit information that resides in a biological system, with interactive nodes connected by edges whose weights can be either temporal associations or anatomical junctions. In this survey, we thoroughly review the different types of graph architectures and their applications in healthcare. We provide an overview of these methods in a systematic manner, organized by their domain of application including functional connectivity, anatomical structure and electrical-based analysis. We also outline the limitations of existing techniques and discuss potential directions for future research.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.