Probabilistic programs are a powerful and convenient approach to formalise distributions over system executions. A classical verification problem for probabilistic programs is temporal inference: to compute the likelihood that the execution traces satisfy a given temporal property. This paper presents a general framework for temporal inference, which applies to a rich variety of quantitative models including those that arise in the operational semantics of probabilistic and weighted programs. The key idea underlying our framework is that in a variety of existing approaches, the main construction that enables temporal inference is that of a product between the system of interest and the temporal property. We provide a unifying mathematical definition of product constructions, enabled by the realisation that 1) both systems and temporal properties can be modelled as coalgebras and 2) product constructions are distributive laws in this context. Our categorical framework leads us to our main contribution: a sufficient condition for correctness, which is precisely what enables to use the product construction for temporal inference. We show that our framework can be instantiated to naturally recover a number of disparate approaches from the literature including, e.g., partial expected rewards in Markov reward models, resource-sensitive reachability analysis, and weighted optimization problems. Further, we demonstrate a product of weighted programs and weighted temporal properties as a new instance to show the scalability of our approach.
In adaptive systems, predictors are used to anticipate changes in the systems state or behavior that may require system adaption, e.g., changing its configuration or adjusting resource allocation. Therefore, the quality of predictors is crucial for the overall reliability and performance of the system under control. This paper studies predictors in systems exhibiting probabilistic and non-deterministic behavior modelled as Markov decision processes (MDPs). Main contributions are the introduction of quantitative notions that measure the effectiveness of predictors in terms of their average capability to predict the occurrence of failures or other undesired system behaviors. The average is taken over all memoryless policies. We study two classes of such notions. One class is inspired by concepts that have been introduced in statistical analysis to explain the impact of features on the decisions of binary classifiers (such as precision, recall, f-score). Second, we study a measure that borrows ideas from recent work on probability-raising causality in MDPs and determines the quality of a predictor by the fraction of memoryless policies under which (the set of states in) the predictor is a probability-raising cause for the considered failure scenario.
A scaled conjugate gradient method that accelerates existing adaptive methods utilizing stochastic gradients is proposed for solving nonconvex optimization problems with deep neural networks. It is shown theoretically that, whether with constant or diminishing learning rates, the proposed method can obtain a stationary point of the problem. Additionally, its rate of convergence with diminishing learning rates is verified to be superior to that of the conjugate gradient method. The proposed method is shown to minimize training loss functions faster than the existing adaptive methods in practical applications of image and text classification. Furthermore, in the training of generative adversarial networks, one version of the proposed method achieved the lowest Frechet inception distance score among those of the adaptive methods.
Human languages have evolved to be structured through repeated language learning and use. These processes introduce biases that operate during language acquisition and shape linguistic systems toward communicative efficiency. In this paper, we investigate whether the same happens if artificial languages are optimised for implicit biases of Large Language Models (LLMs). To this end, we simulate a classical referential game in which LLMs learn and use artificial languages. Our results show that initially unstructured holistic languages are indeed shaped to have some structural properties that allow two LLM agents to communicate successfully. Similar to observations in human experiments, generational transmission increases the learnability of languages, but can at the same time result in non-humanlike degenerate vocabularies. Taken together, this work extends experimental findings, shows that LLMs can be used as tools in simulations of language evolution, and opens possibilities for future human-machine experiments in this field.
Automated program verification has always been an important component of building trustworthy software. While the analysis of real-world programs remains a theoretical challenge, the automation of loop invariant analysis has effectively resolved the problem. However, real-world programs that often mix complex data structures and control flows pose challenges to traditional loop invariant generation tools. To enhance the applicability of invariant generation techniques, we proposed ACInv, an Automated Complex program loop Invariant generation tool, which combines static analysis with Large Language Models (LLMs) to generate the proper loop invariants. We utilize static analysis to extract the necessary information for each loop and embed it into prompts for the LLM to generate invariants for each loop. Subsequently, we employ an LLM-based evaluator to assess the generated invariants, refining them by either strengthening, weakening, or rejecting them based on their correctness, ultimately obtaining enhanced invariants. We conducted experiments on ACInv, which showed that ACInv outperformed previous tools on data sets with data structures, and maintained similar performance to the state-of-the-art tool AutoSpec on numerical programs without data structures. For the total data set, ACInv can solve 21% more examples than AutoSpec and can generate reference data structure templates.
Discretized techniques for vector tomographic reconstructions are prone to producing artifacts in the reconstructions. The quality of these reconstructions may further deteriorate as the amount of noise increases. In this work, we instead model the underlying vector fields using smooth neural fields. Owing to the fact that the activation functions in the neural network may be chosen to be smooth and the domain is no longer pixelated, the model results in high-quality reconstructions, even under presence of noise. In the case where we have underlying global continuous symmetry, we find that the neural network substantially improves the accuracy of the reconstruction over the existing techniques.
We introduce a unified, flexible, and easy-to-implement framework of sufficient dimension reduction that can accommodate both linear and nonlinear dimension reduction, and both the conditional distribution and the conditional mean as the targets of estimation. This unified framework is achieved by a specially structured neural network -- the Belted and Ensembled Neural Network (BENN) -- that consists of a narrow latent layer, which we call the belt, and a family of transformations of the response, which we call the ensemble. By strategically placing the belt at different layers of the neural network, we can achieve linear or nonlinear sufficient dimension reduction, and by choosing the appropriate transformation families, we can achieve dimension reduction for the conditional distribution or the conditional mean. Moreover, thanks to the advantage of the neural network, the method is very fast to compute, overcoming a computation bottleneck of the traditional sufficient dimension reduction estimators, which involves the inversion of a matrix of dimension either p or n. We develop the algorithm and convergence rate of our method, compare it with existing sufficient dimension reduction methods, and apply it to two data examples.
We introduce a new erasure decoder that applies to arbitrary quantum LDPC codes. Dubbed the cluster decoder, it generalizes the decomposition idea of Vertical-Horizontal (VH) decoding introduced by Connelly et al. in 2022. Like the VH decoder, the idea is to first run the peeling decoder and then post-process the resulting stopping set. The cluster decoder breaks the stopping set into a tree of clusters which can be solved sequentially via Gaussian Elimination (GE). By allowing clusters of unconstrained size, this decoder achieves maximum-likelihood (ML) performance with reduced complexity compared with full GE. When GE is applied only to clusters whose sizes are less than a constant, the performance is degraded but the complexity becomes linear in the block length. Our simulation results show that, for hypergraph product codes, the cluster decoder with constant cluster size achieves near-ML performance similar to VH decoding in the low-erasure-rate regime. For the general quantum LDPC codes we studied, the cluster decoder can be used to estimate the ML performance curve with reduced complexity over a wide range of erasure rates.
The science of cause and effect is extremely sophisticated and extremely hard to scale. Using a controlled experiment, scientists get rich insights by analyzing global effects, effects in different segments, and trends in effects over time. They use propensity scores to project external validity. To support the analysis of relative effects, scientists derive challenging ratio distributions. While the analytical capabilities in experimentation are advancing, we require new innovation within engineering and computational causal inference to enable an experimentation platform to make analyses performant and scalable. Of significant importance: we must unify the computing strategy for these models so that they can be consistently applied across experiments. In doing so, the industry can make significant progress towards developing a flywheel that unifies and accelerates the evaluation and roll out of experiments. In order to support unified computation, this paper introduces baseline vectors and delta vectors as common structure for estimating treatment effects. This common structure allows many statistics to be subsumed into a single API. The nature of its algebraic formulation allows linear algebra libraries to vectorize and optimize its performance, creating a single and efficient tool to support the many innovations in experimentation.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.