High-level synthesis (HLS) refers to the automatic translation of a software program written in a high-level language into a hardware design. Modern HLS tools have moved away from the traditional approach of static (compile time) scheduling of operations to generating dynamic circuits that schedule operations at run time. Such circuits trade-off area utilisation for increased dynamism and throughput. However, existing lowering flows in dynamically scheduled HLS tools rely on conservative assumptions on their input program due to both the intermediate representations (IR) utilised as well as the lack of formal specifications on the translation into hardware. These assumptions cause suboptimal hardware performance. In this work, we lift these assumptions by proposing a new and efficient abstraction for hardware mapping; namely h-GSA, an extension of the Gated Single Static Assignment (GSA) IR. Using this abstraction, we propose a lowering flow that transforms GSA into h-GSA and maps h-GSA into dynamically scheduled hardware circuits. We compare the schedules generated by our approach to those by the state-of-the-art dynamic-scheduling HLS tool, Dynamatic, and illustrate the potential performance improvement from hardware mapping using the proposed abstraction.
We propose task-adaptive tokenization as a way to adapt the generation pipeline to the specifics of a downstream task and enhance long-form generation in mental health. Inspired by insights from cognitive science, our task-adaptive tokenizer samples variable segmentations from multiple outcomes, with sampling probabilities optimized based on task-specific data. We introduce a strategy for building a specialized vocabulary and introduce a vocabulary merging protocol that allows for the integration of task-specific tokens into the pre-trained model's tokenization step. Through extensive experiments on psychological question-answering tasks in both Chinese and English, we find that our task-adaptive tokenization approach brings a significant improvement in generation performance while using up to 60% fewer tokens. Preliminary experiments point to promising results when using our tokenization approach with very large language models.
Large language models based on self-attention mechanisms have achieved astonishing performances not only in natural language itself, but also in a variety of tasks of different nature. However, regarding processing language, our human brain may not operate using the same principle. Then, a debate is established on the connection between brain computation and artificial self-supervision adopted in large language models. One of most influential hypothesis in brain computation is the predictive coding framework, which proposes to minimize the prediction error by local learning. However, the role of predictive coding and the associated credit assignment in language processing remains unknown. Here, we propose a mean-field learning model within the predictive coding framework, assuming that the synaptic weight of each connection follows a spike and slab distribution, and only the distribution, rather than specific weights, is trained. This meta predictive learning is successfully validated on classifying handwritten digits where pixels are input to the network in sequence, and moreover on the toy and real language corpus. Our model reveals that most of the connections become deterministic after learning, while the output connections have a higher level of variability. The performance of the resulting network ensemble changes continuously with data load, further improving with more training data, in analogy with the emergent behavior of large language models. Therefore, our model provides a starting point to investigate the connection among brain computation, next-token prediction and general intelligence.
We describe a software package, TomOpt, developed to optimise the geometrical layout and specifications of detectors designed for tomography by scattering of cosmic-ray muons. The software exploits differentiable programming for the modeling of muon interactions with detectors and scanned volumes, the inference of volume properties, and the optimisation cycle performing the loss minimisation. In doing so, we provide the first demonstration of end-to-end-differentiable and inference-aware optimisation of particle physics instruments. We study the performance of the software on a relevant benchmark scenarios and discuss its potential applications.
In clinical trials of longitudinal continuous outcomes, reference based imputation (RBI) has commonly been applied to handle missing outcome data in settings where the estimand incorporates the effects of intercurrent events, e.g. treatment discontinuation. RBI was originally developed in the multiple imputation framework, however recently conditional mean imputation (CMI) combined with the jackknife estimator of the standard error was proposed as a way to obtain deterministic treatment effect estimates and correct frequentist inference. For both multiple and CMI, a mixed model for repeated measures (MMRM) is often used for the imputation model, but this can be computationally intensive to fit to multiple data sets (e.g. the jackknife samples) and lead to convergence issues with complex MMRM models with many parameters. Therefore, a step-wise approach based on sequential linear regression (SLR) of the outcomes at each visit was developed for the imputation model in the multiple imputation framework, but similar developments in the CMI framework are lacking. In this article, we fill this gap in the literature by proposing a SLR approach to implement RBI in the CMI framework, and justify its validity using theoretical results and simulations. We also illustrate our proposal on a real data application.
Electrical circuits are present in a variety of technologies, making their design an important part of computer aided engineering. The growing number of tunable parameters that affect the final design leads to a need for new approaches of quantifying their impact. Machine learning may play a key role in this regard, however current approaches often make suboptimal use of existing knowledge about the system at hand. In terms of circuits, their description via modified nodal analysis is well-understood. This particular formulation leads to systems of differential-algebraic equations (DAEs) which bring with them a number of peculiarities, e.g. hidden constraints that the solution needs to fulfill. We aim to use the recently introduced dissection concept for DAEs that can decouple a given system into ordinary differential equations, only depending on differential variables, and purely algebraic equations that describe the relations between differential and algebraic variables. The idea then is to only learn the differential variables and reconstruct the algebraic ones using the relations from the decoupling. This approach guarantees that the algebraic constraints are fulfilled up to the accuracy of the nonlinear system solver, which represents the main benefit highlighted in this article.
Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model, but its behavior is not yet fully understood. It has been shown that standard confidence intervals for test error using estimates from CV may have coverage below nominal levels. This phenomenon occurs because each sample is used in both the training and testing procedures during CV and as a result, the CV estimates of the errors become correlated. Without accounting for this correlation, the estimate of the variance is smaller than it should be. One way to mitigate this issue is by estimating the mean squared error of the prediction error instead using nested CV. This approach has been shown to achieve superior coverage compared to intervals derived from standard CV. In this work, we generalize the nested CV idea to the Cox proportional hazards model and explore various choices of test error for this setting.
Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This limits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distributions. We address this limitation by using amortized Bayesian inference to sample from these intractable posteriors. Such amortization is algorithmically achieved by fine-tuning LLMs via diversity-seeking reinforcement learning algorithms: generative flow networks (GFlowNets). We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem and demonstrate that our approach enables data-efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use.
A central challenge in the verification of quantum computers is benchmarking their performance as a whole and demonstrating their computational capabilities. In this work, we find a universal model of quantum computation, Bell sampling, that can be used for both of those tasks and thus provides an ideal stepping stone towards fault-tolerance. In Bell sampling, we measure two copies of a state prepared by a quantum circuit in the transversal Bell basis. We show that the Bell samples are classically intractable to produce and at the same time constitute what we call a circuit shadow: from the Bell samples we can efficiently extract information about the quantum circuit preparing the state, as well as diagnose circuit errors. In addition to known properties that can be efficiently extracted from Bell samples, we give two new and efficient protocols, a test for the depth of the circuit and an algorithm to estimate a lower bound to the number of T gates in the circuit. With some additional measurements, our algorithm learns a full description of states prepared by circuits with low T-count.
Research in high energy physics (HEP) requires huge amounts of computing and storage, putting strong constraints on the code speed and resource usage. To meet these requirements, a compiled high-performance language is typically used; while for physicists, who focus on the application when developing the code, better research productivity pleads for a high-level programming language. A popular approach consists of combining Python, used for the high-level interface, and C++, used for the computing intensive part of the code. A more convenient and efficient approach would be to use a language that provides both high-level programming and high-performance. The Julia programming language, developed at MIT especially to allow the use of a single language in research activities, has followed this path. In this paper the applicability of using the Julia language for HEP research is explored, covering the different aspects that are important for HEP code development: runtime performance, handling of large projects, interface with legacy code, distributed computing, training, and ease of programming. The study shows that the HEP community would benefit from a large scale adoption of this programming language. The HEP-specific foundation libraries that would need to be consolidated are identified
Zero-shot learning in prompted vision-language models, the practice of crafting prompts to build classifiers without an explicit training process, has achieved impressive performance in many settings. This success presents a seemingly surprising observation: these methods suffer relatively little from overfitting, i.e., when a prompt is manually engineered to achieve low error on a given training set (thus rendering the method no longer actually zero-shot), the approach still performs well on held-out test data. In this paper, we show that we can explain such performance well via recourse to classical PAC-Bayes bounds. Specifically, we show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature: for instance, the generalization bound of an ImageNet classifier is often within a few percentage points of the true test error. We demonstrate empirically that this holds for existing handcrafted prompts and prompts generated through simple greedy search. Furthermore, the resulting bound is well-suited for model selection: the models with the best bound typically also have the best test performance. This work thus provides a possible justification for the widespread practice of prompt engineering, even if it seems that such methods could potentially overfit the training data.