Low-rank adaptation (LoRA) has emerged as a new paradigm for cost-efficient fine-tuning of large language models (LLMs). However, fine-tuned LLMs often become overconfident especially when fine-tuned on small datasets. Bayesian methods, with their inherent ability to estimate uncertainty, serve as potent tools to mitigate overconfidence and enhance calibration. In this work, we introduce Laplace-LoRA, which applies a Bayesian approach to the LoRA parameters. Specifically, Laplace-LoRA applies a Laplace approximation to the posterior over the LoRA parameters, considerably improving the calibration of fine-tuned LLMs.
Two new hybrid algorithms are proposed for large-scale linear discrete ill-posed problems in general-form regularization. They are both based on Krylov subspace inner-outer iterative algorithms. At each iteration, they need to solve a linear least squares problem. It is proved that the inner linear least squares problems, which are solved by LSQR, become better conditioned as k increases, so LSQR converges faster. We also prove how to choose the stopping tolerance for LSQR in order to guarantee that the computed solutions have the same accuracy with the exact best regularized solutions. Numerical experiments are given to show the effectiveness and efficiency of our new hybrid algorithms, and comparisons are made with the existing algorithm.
A central task in knowledge compilation is to compile a CNF-SAT instance into a succinct representation format that allows efficient operations such as testing satisfiability, counting, or enumerating all solutions. Useful representation formats studied in this area range from ordered binary decision diagrams (OBDDs) to circuits in decomposable negation normal form (DNNFs). While it is known that there exist CNF formulas that require exponential size representations, the situation is less well studied for other types of constraints than Boolean disjunctive clauses. The constraint satisfaction problem (CSP) is a powerful framework that generalizes CNF-SAT by allowing arbitrary sets of constraints over any finite domain. The main goal of our work is to understand for which type of constraints (also called the constraint language) it is possible to efficiently compute representations of polynomial size. We answer this question completely and prove two tight characterizations of efficiently compilable constraint languages, depending on whether target format is structured. We first identify the combinatorial property of ``strong blockwise decomposability'' and show that if a constraint language has this property, we can compute DNNF representations of linear size. For all other constraint languages we construct families of CSP-instances that provably require DNNFs of exponential size. For a subclass of ``strong uniformly blockwise decomposable'' constraint languages we obtain a similar dichotomy for structured DNNFs. In fact, strong (uniform) blockwise decomposability even allows efficient compilation into multi-valued analogs of OBDDs and FBDDs, respectively. Thus, we get complete characterizations for all knowledge compilation classes between O(B)DDs and DNNFs.
A common way to approximate $F(A)b$ -- the action of a matrix function on a vector -- is to use the Arnoldi approximation. Since a new vector needs to be generated and stored in every iteration, one is often forced to rely on restart algorithms which are either not efficient, not stable or only applicable to restricted classes of functions. We present a new representation of the error of the Arnoldi iterates if the function $F$ is given as a Laplace transform. Based on this representation we build an efficient and stable restart algorithm. In doing so we extend earlier work for the class of Stieltjes functions which are special Laplace transforms. We report several numerical experiments including comparisons with the restart method for Stieltjes functions.
In Computational Fluid Dynamics (CFD), coarse mesh simulations offer computational efficiency but often lack precision. Applying conventional super-resolution to these simulations poses a significant challenge due to the fundamental contrast between downsampling high-resolution images and authentically emulating low-resolution physics. The former method conserves more of the underlying physics, surpassing the usual constraints of real-world scenarios. We propose a novel definition of super-resolution tailored for PDE-based problems. Instead of simply downsampling from a high-resolution dataset, we use coarse-grid simulated data as our input and predict fine-grid simulated outcomes. Employing a physics-infused UNet upscaling method, we demonstrate its efficacy across various 2D-CFD problems such as discontinuity detection in Burger's equation, Methane combustion, and fouling in Industrial heat exchangers. Our method enables the generation of fine-mesh solutions bypassing traditional simulation, ensuring considerable computational saving and fidelity to the original ground truth outcomes. Through diverse boundary conditions during training, we further establish the robustness of our method, paving the way for its broad applications in engineering and scientific CFD solvers.
In many applications, a stochastic system is studied using a model implicitly defined via a simulator. We develop a simulation-based parameter inference method for implicitly defined models. Our method differs from traditional likelihood-based inference in that it uses a metamodel for the distribution of a log-likelihood estimator. The metamodel is built on a local asymptotic normality (LAN) property satisfied by the simulation-based log-likelihood estimator under certain conditions. A method for hypothesis test is developed under the metamodel. Our method can enable accurate parameter estimation and uncertainty quantification where other Monte Carlo methods for parameter inference become highly inefficient due to large Monte Carlo variance. We demonstrate our method using numerical examples including a mechanistic model for the population dynamics of infectious disease.
Dynamical low-rank (DLR) approximation has gained interest in recent years as a viable solution to the curse of dimensionality in the numerical solution of kinetic equations including the Boltzmann and Vlasov equations. These methods include the projector-splitting and Basis-update & Galerkin DLR integrators, and have shown promise at greatly improving the computational efficiency of kinetic solutions. However, this often comes at the cost of conservation of charge, current and energy. In this work we show how a novel macro-micro decomposition may be used to separate the distribution function into two components, one of which carries the conserved quantities, and the other of which is orthogonal to them. We apply DLR approximation to the latter, and thereby achieve a clean and extensible approach to a conservative DLR scheme which retains the computational advantages of the base scheme. Moreover, our decomposition is compatible with the projector-splitting integrator, and can therefore access second-order accuracy in time via a Strang splitting scheme. We describe a first-order integrator which can exactly conserve charge and either current or energy, as well as a second-order accurate integrator which exactly conserves charge and energy. To highlight the flexibility of the proposed macro-micro decomposition, we implement a pair of velocity space discretizations, and verify the claimed accuracy and conservation properties on a suite of plasma benchmark problems.
Orthogonal meta-learners, such as DR-learner, R-learner and IF-learner, are increasingly used to estimate conditional average treatment effects. They improve convergence rates relative to na\"{\i}ve meta-learners (e.g., T-, S- and X-learner) through de-biasing procedures that involve applying standard learners to specifically transformed outcome data. This leads them to disregard the possibly constrained outcome space, which can be particularly problematic for dichotomous outcomes: these typically get transformed to values that are no longer constrained to the unit interval, making it difficult for standard learners to guarantee predictions within the unit interval. To address this, we construct orthogonal meta-learners for the prediction of counterfactual outcomes which respect the outcome space. As such, the obtained i-learner or imputation-learner is more generally expected to outperform existing learners, even when the outcome is unconstrained, as we confirm empirically in simulation studies and an analysis of critical care data. Our development also sheds broader light onto the construction of orthogonal learners for other estimands.
Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graph-sprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.
It has been widely observed that exact or approximate MAP (mode-seeking) decoding from natural language generation (NLG) models consistently leads to degenerate outputs (Stahlberg and Byrne, 2019, Holtzman et al., 2019). This has generally been attributed to either a fundamental inadequacy of modes in models or weaknesses in language modeling. Contrastingly in this work, we emphasize that degenerate modes can even occur in the absence of any model error, due to contamination of the training data. Specifically, we show that mixing even a tiny amount of low-entropy noise with a population text distribution can cause the data distribution's mode to become degenerate, implying that any models trained on it will be as well. As the unconditional mode of NLG models will often be degenerate, we therefore propose to apply MAP decoding to the model's distribution conditional on avoiding specific degeneracies. Using exact-search, we empirically verify that the length-conditional modes of machine translation models and language models are indeed more fluent and topical than their unconditional modes. For the first time, we also share many examples of exact modal sequences from these models, and from several variants of the LLaMA-7B model. Notably, the modes of the LLaMA models are still degenerate, showing that improvements in modeling have not fixed this issue. Because of the cost of exact mode finding algorithms, we develop an approximate mode finding approach, ACBS, which finds sequences that are both high-likelihood and high-quality. We apply this approach to LLaMA-7B, a model which was not trained for instruction following, and find that we are able to elicit reasonable outputs without any finetuning.
Gaussian processes (GPs) are a popular class of Bayesian nonparametric models, but its training can be computationally burdensome for massive training datasets. While there has been notable work on scaling up these models for big data, existing methods typically rely on a stationary GP assumption for approximation, and can thus perform poorly when the underlying response surface is non-stationary, i.e., it has some regions of rapid change and other regions with little change. Such non-stationarity is, however, ubiquitous in real-world problems, including our motivating application for surrogate modeling of computer experiments. We thus propose a new Product of Sparse GP (ProSpar-GP) method for scalable GP modeling with massive non-stationary data. The ProSpar-GP makes use of a carefully-constructed product-of-experts formulation of sparse GP experts, where different experts are placed within local regions of non-stationarity. These GP experts are fit via a novel variational inference approach, which capitalizes on mini-batching and GPU acceleration for efficient optimization of inducing points and length-scale parameters for each expert. We further show that the ProSpar-GP is Kolmogorov-consistent, in that its generative distribution defines a valid stochastic process over the prediction space; such a property provides essential stability for variational inference, particularly in the presence of non-stationarity. We then demonstrate the improved performance of the ProSpar-GP over the state-of-the-art, in a suite of numerical experiments and an application for surrogate modeling of a satellite drag simulator.