Language models (LMs) have become important tools in a variety of applications, from data processing to the creation of instruction-following assistants. But despite their advantages, LMs have certain idiosyncratic limitations such as the problem of `strong priors', where a model learns to output typical continuations in response to certain, usually local, portions of the input regardless of any earlier instructions. For example, prompt injection attacks can induce models to ignore explicit directives. In some cases, larger models have been shown to be more susceptible to these problems than similar smaller models, an example of the phenomenon of `inverse scaling'. We develop a new technique for mitigating the problem of strong priors: we take the original set of instructions, produce a weakened version of the original prompt that is even more susceptible to the strong priors problem, and then extrapolate the continuation away from the weakened prompt. This lets us infer how the model would continue a hypothetical strengthened set of instructions. Our technique conceptualises LMs as mixture models which combine a family of data generation processes, reinforcing the desired elements of the mixture. Our approach works at inference time, removing any need for retraining. We apply it to eleven models including GPT-2, GPT-3, Llama 2, and Mistral on four tasks, and find improvements in 41/44. Across all 44 combinations the median increase in proportion of tasks completed is 40%.
The devices designed for the Internet-of-Things encompass a large variety of distinct processor architectures, forming a highly heterogeneous zoo. In order to tackle this, we employ a simulator to estimate the performance of the matrix-matrix multiplication (GEMM) kernel on processors designed to operate at the edge. Our simulator adheres to the modern implementations of GEMM, advocated by GotoBLAS2, BLIS, OpenBLAS, etc., to carefully account for the amount of data transfers across the memory hierarchy of different algorithmic variants of the kernel. %Armed with this tool, A small collection of experiments provide the necessary data to calibrate the simulator and deliver highly accurate estimations of the execution time for a given processor architecture.
Robotic manipulation relies on analytical or learned models to simulate the system dynamics. These models are often inaccurate and based on offline information, so that the robot planner is unable to cope with mismatches between the expected and the actual behavior of the system (e.g., the presence of an unexpected obstacle). In these situations, the robot should use information gathered online to correct its planning strategy and adapt to the actual system response. We propose a sampling-based motion planning approach that uses an estimate of the model error and online observations to correct the planning strategy at each new replanning. Our approach adapts the cost function and the sampling bias of a kinodynamic motion planner when the outcome of the executed transitions is different from the expected one (e.g., when the robot unexpectedly collides with an obstacle) so that future trajectories will avoid unreliable motions. To infer the properties of a new transition, we introduce the notion of context-awareness, i.e., we store local environment information for each executed transition and avoid new transitions with context similar to previous unreliable ones. This is helpful for leveraging online information even if the simulated transitions are far (in the state-and-action space) from the executed ones. Simulation and experimental results show that the proposed approach increases the success rate in execution and reduces the number of replannings needed to reach the goal.
Gaussian processes (GPs) are commonly used for geospatial analysis, but they suffer from high computational complexity when dealing with massive data. For instance, the log-likelihood function required in estimating the statistical model parameters for geospatial data is a computationally intensive procedure that involves computing the inverse of a covariance matrix with size n X n, where n represents the number of geographical locations. As a result, in the literature, studies have shifted towards approximation methods to handle larger values of n effectively while maintaining high accuracy. These methods encompass a range of techniques, including low-rank and sparse approximations. Vecchia approximation is one of the most promising methods to speed up evaluating the log-likelihood function. This study presents a parallel implementation of the Vecchia approximation, utilizing batched matrix computations on contemporary GPUs. The proposed implementation relies on batched linear algebra routines to efficiently execute individual conditional distributions in the Vecchia algorithm. We rely on the KBLAS linear algebra library to perform batched linear algebra operations, reducing the time to solution compared to the state-of-the-art parallel implementation of the likelihood estimation operation in the ExaGeoStat software by up to 700X, 833X, 1380X on 32GB GV100, 80GB A100, and 80GB H100 GPUs, respectively. We also successfully manage larger problem sizes on a single NVIDIA GPU, accommodating up to 1M locations with 80GB A100 and H100 GPUs while maintaining the necessary application accuracy. We further assess the accuracy performance of the implemented algorithm, identifying the optimal settings for the Vecchia approximation algorithm to preserve accuracy on two real geospatial datasets: soil moisture data in the Mississippi Basin area and wind speed data in the Middle East.
The capability to generate simulation-ready garment models from 3D shapes of clothed humans will significantly enhance the interpretability of captured geometry of real garments, as well as their faithful reproduction in the virtual world. This will have notable impact on fields like shape capture in social VR, and virtual try-on in the fashion industry. To align with the garment modeling process standardized by the fashion industry as well as cloth simulation softwares, it is required to recover 2D patterns. This involves an inverse garment design problem, which is the focus of our work here: Starting with an arbitrary target garment geometry, our system estimates an animatable garment model by automatically adjusting its corresponding 2D template pattern, along with the material parameters of the physics-based simulation (PBS). Built upon a differentiable cloth simulator, the optimization process is directed towards minimizing the deviation of the simulated garment shape from the target geometry. Moreover, our produced patterns meet manufacturing requirements such as left-to-right-symmetry, making them suited for reverse garment fabrication. We validate our approach on examples of different garment types, and show that our method faithfully reproduces both the draped garment shape and the sewing pattern.
Recently, watermarking schemes for large language models (LLMs) have been proposed to distinguish text generated by machines and by humans. The present paper explores philosophical, political, and ethical ramifications of implementing and using watermarking schemes. A definition of authorship that includes both machines (LLMs) and humans is proposed to serve as a backdrop. It is argued that private watermarks may provide private companies with sweeping rights to determine authorship, which is incompatible with traditional standards of authorship determination. Then, possible ramifications of the so-called entropy dependence of watermarking mechanisms are explored. It is argued that entropy may vary for different, socially salient groups. This could lead to group dependent rates at which machine generated text is detected. Specifically, groups more interested in low entropy text may face the challenge that it is harder to detect machine generated text that is of interest to them.
Federated learning (FL) is a distributed machine learning paradigm that enables training models on decentralized data. The field of FL security against poisoning attacks is plagued with confusion due to the proliferation of research that makes different assumptions about the capabilities of adversaries and the adversary models they operate under. Our work aims to clarify this confusion by presenting a comprehensive analysis of the various poisoning attacks and defensive aggregation rules (AGRs) proposed in the literature, and connecting them under a common framework. To connect existing adversary models, we present a hybrid adversary model, which lies in the middle of the spectrum of adversaries, where the adversary compromises a few clients, trains a generative (e.g., DDPM) model with their compromised samples, and generates new synthetic data to solve an optimization for a stronger (e.g., cheaper, more practical) attack against different robust aggregation rules. By presenting the spectrum of FL adversaries, we aim to provide practitioners and researchers with a clear understanding of the different types of threats they need to consider when designing FL systems, and identify areas where further research is needed.
Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.
Approximate Bayesian Computation (ABC) methods are applicable to statistical models specified by generative processes with analytically intractable likelihoods. These methods try to approximate the posterior density of a model parameter by comparing the observed data with additional process-generated simulated datasets. For computational benefit, only the values of certain well-chosen summary statistics are usually compared, instead of the whole dataset. Most ABC procedures are computationally expensive, justified only heuristically, and have poor asymptotic properties. In this article, we introduce a new empirical likelihood-based approach to the ABC paradigm called ABCel. The proposed procedure is computationally tractable and approximates the target log posterior of the parameter as a sum of two functions of the data -- namely, the mean of the optimal log-empirical likelihood weights and the estimated differential entropy of the summary functions. We rigorously justify the procedure via direct and reverse information projections onto appropriate classes of probability densities. Past applications of empirical likelihood in ABC demanded constraints based on analytically tractable estimating functions that involve both the data and the parameter; although by the nature of the ABC problem such functions may not be available in general. In contrast, we use constraints that are functions of the summary statistics only. Equally importantly, we show that our construction directly connects to the reverse information projection. We show that ABCel is posterior consistent and has highly favourable asymptotic properties. Its construction justifies the use of simple summary statistics like moments, quantiles, etc, which in practice produce an accurate approximation of the posterior density. We illustrate the performance of the proposed procedure in a range of applications.
The moving discontinuous Galerkin method with interface condition enforcement (MDG-ICE) is a high-order, r-adaptive method that treats the grid as a variable and weakly enforces the conservation law, constitutive law, and corresponding interface conditions in order to implicitly fit high-gradient flow features. In this paper, we develop an optimization solver based on the Levenberg-Marquardt algorithm that features an anisotropic, locally adaptive penalty method to enhance robustness and prevent cell degeneration in the computation of hypersonic, viscous flows. Specifically, we incorporate an anisotropic grid regularization based on the mesh-implied metric that inhibits grid motion in directions with small element length scales, an element shape regularization that inhibits nonlinear deformations of the high-order elements, and a penalty regularization that penalizes degenerate elements. Additionally, we introduce a procedure for locally scaling the regularization operators in an adaptive, elementwise manner in order to maintain grid validity. We apply the proposed MDG-ICE formulation to two- and three-dimensional test cases involving viscous shocks and/or boundary layers, including Mach 17.6 hypersonic viscous flow over a circular cylinder and Mach 5 hypersonic viscous flow over a sphere, which are very challenging test cases for conventional numerical schemes on simplicial grids. Even without artificial dissipation, the computed solutions are free from spurious oscillations and yield highly symmetric surface heat-flux profiles.
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.