Probabilistic programming languages aid developers performing Bayesian inference. These languages provide programming constructs and tools for probabilistic modeling and automated inference. Prior work introduced a probabilistic programming language, ProbZelus, to extend probabilistic programming functionality to unbounded streams of data. This work demonstrated that the delayed sampling inference algorithm could be extended to work in a streaming context. ProbZelus showed that while delayed sampling could be effectively deployed on some programs, depending on the probabilistic model under consideration, delayed sampling is not guaranteed to use a bounded amount of memory over the course of the execution of the program. In this paper, we the present conditions on a probabilistic program's execution under which delayed sampling will execute in bounded memory. The two conditions are dataflow properties of the core operations of delayed sampling: the $m$-consumed property and the unseparated paths property. A program executes in bounded memory under delayed sampling if, and only if, it satisfies the $m$-consumed and unseparated paths properties. We propose a static analysis that abstracts over these properties to soundly ensure that any program that passes the analysis satisfies these properties, and thus executes in bounded memory under delayed sampling.
In this paper we introduce MCCE: Monte Carlo sampling of realistic Counterfactual Explanations, a model-based method that generates counterfactual explanations by producing a set of feasible examples using conditional inference trees. Unlike algorithmic-based counterfactual methods that have to solve complex optimization problems or other model based methods that model the data distribution using heavy machine learning models, MCCE is made up of only two light-weight steps (generation and post-processing). MCCE is also straightforward for the end user to understand and implement, handles any type of predictive model and type of feature, takes into account actionability constraints when generating the counterfactual explanations, and generates as many counterfactual explanations as needed. In this paper we introduce MCCE and give a comprehensive list of performance metrics that can be used to compare counterfactual explanations. We also compare MCCE with a range of state-of-the-art methods and a new baseline method on benchmark data sets. MCCE outperforms all model-based methods and most algorithmic-based methods when also taking into account validity (i.e., a correctly changed prediction) and actionability constraints. Finally, we show that MCCE has the strength of performing almost as well when given just a small subset of the training data.
We propose the first near-optimal quantum algorithm for estimating in Euclidean norm the mean of a vector-valued random variable with finite mean and covariance. Our result aims at extending the theory of multivariate sub-Gaussian estimators to the quantum setting. Unlike classically, where any univariate estimator can be turned into a multivariate estimator with at most a logarithmic overhead in the dimension, no similar result can be proved in the quantum setting. Indeed, Heinrich ruled out the existence of a quantum advantage for the mean estimation problem when the sample complexity is smaller than the dimension. Our main result is to show that, outside this low-precision regime, there is a quantum estimator that outperforms any classical estimator. Our approach is substantially more involved than in the univariate setting, where most quantum estimators rely only on phase estimation. We exploit a variety of additional algorithmic techniques such as amplitude amplification, the Bernstein-Vazirani algorithm, and quantum singular value transformation. Our analysis also uses concentration inequalities for multivariate truncated statistics. We develop our quantum estimators in two different input models that showed up in the literature before. The first one provides coherent access to the binary representation of the random variable and it encompasses the classical setting. In the second model, the random variable is directly encoded into the phases of quantum registers. This model arises naturally in many quantum algorithms but it is often incomparable to having classical samples. We adapt our techniques to these two settings and we show that the second model is strictly weaker for solving the mean estimation problem. Finally, we describe several applications of our algorithms, notably in measuring the expectation values of commuting observables and in the field of machine learning.
Thread pooling is a common programming idiom in which a fixed set of worker threads are maintained to execute tasks concurrently. The workers repeatedly pick tasks and execute them to completion. Each task is sequential, with possibly recursive code, and tasks communicate over shared memory. Executing a task can lead to more new tasks being spawned. We consider the safety verification problem for thread-pooled programs. We parameterize the problem with two parameters: the size of the thread pool as well as the number of context switches for each task. The size of the thread pool determines the number of workers running concurrently. The number of context switches determines how many times a worker can be swapped out while executing a single task - like many verification problems for multithreaded recursive programs, the context bounding is important for decidability. We show that the safety verification problem for thread-pooled, context-bounded, Boolean programs is EXPSPACE-complete, even if the size of the thread pool and the context bound are given in binary. Our main result, the EXPSPACE upper bound, is derived using a sequence of new succinct encoding techniques of independent language-theoretic interest. In particular, we show a polynomial-time construction of downward closures of languages accepted by succinct pushdown automata as doubly succinct nondeterministic finite automata. While there are explicit doubly exponential lower bounds on the size of nondeterministic finite automata accepting the downward closure, our result shows these automata can be compressed. We show that thread pooling significantly reduces computational power: in contrast, if only the context bound is provided in binary, but there is no thread pooling, the safety verification problem becomes 3EXPSPACE-complete.
Differential privacy is a rigorous definition for privacy that guarantees that any analysis performed on a sensitive dataset leaks no information about the individuals whose data are contained therein. In this work, we develop new differentially private algorithms to analyze streaming data. Specifically, we consider the problem of estimating the density of a stream of users (or, more generally, elements), which expresses the fraction of all users that actually appear in the stream. We focus on one of the strongest privacy guarantees for the streaming model, namely user-level pan-privacy, which ensures that the privacy of any user is protected, even against an adversary that observes, on rare occasions, the internal state of the algorithm. Our proposed algorithms employ optimally all the allocated privacy budget, are specially tailored for the streaming model, and, hence, outperform both theoretically and experimentally the conventional sampling-based approach.
A promising approach for obtaining improved approximation algorithms for Steiner tree is to use the bidirected cut relaxation (BCR). The integrality gap of this relaxation is at least $36/31$, and it has long been conjectured that its true value is very close to this lower bound. However, the best upper bound for general graphs is still $2$. With the aim of circumventing the asymmetric nature of BCR, Chakrabarty, Devanur and Vazirani [Math. Program., 130 (2011), pp. 1--32] introduced the simplex-embedding LP, which is equivalent to it. Using this, they gave a $\sqrt{2}$-approximation algorithm for quasi-bipartite graphs and showed that the integrality gap of the relaxation is at most $4/3$ for this class of graphs. In this paper, we extend the approach provided by these authors and show that the integrality gap of BCR is at most $7/6$ on quasi-bipartite graphs via a fast combinatorial algorithm. In doing so, we introduce a general technique, in particular a potentially widely applicable extension of the primal-dual schema. Roughly speaking, we apply the schema twice with variable rates of growth for the duals in the second phase, where the rates depend on the degrees of the duals computed in the first phase. This technique breaks the disadvantage of increasing dual variables in a monotone manner and creates a larger total dual value, thus presumably attaining the true integrality gap.
We solve university level probability and statistics questions by program synthesis using OpenAI's Codex, a Transformer trained on text and fine-tuned on code. We transform course problems from MIT's 18.05 Introduction to Probability and Statistics and Harvard's STAT110 Probability into programming tasks. We then execute the generated code to get a solution. Since these course questions are grounded in probability, we often aim to have Codex generate probabilistic programs that simulate a large number of probabilistic dependencies to compute its solution. Our approach requires prompt engineering to transform the question from its original form to an explicit, tractable form that results in a correct program and solution. To estimate the amount of work needed to translate an original question into its tractable form, we measure the similarity between original and transformed questions. Our work is the first to introduce a new dataset of university-level probability and statistics problems and solve these problems in a scalable fashion using the program synthesis capabilities of large language models.
Personalized recommender systems are playing an increasingly important role as more content and services become available and users struggle to identify what might interest them. Although matrix factorization and deep learning based methods have proved effective in user preference modeling, they violate the triangle inequality and fail to capture fine-grained preference information. To tackle this, we develop a distance-based recommendation model with several novel aspects: (i) each user and item are parameterized by Gaussian distributions to capture the learning uncertainties; (ii) an adaptive margin generation scheme is proposed to generate the margins regarding different training triplets; (iii) explicit user-user/item-item similarity modeling is incorporated in the objective function. The Wasserstein distance is employed to determine preferences because it obeys the triangle inequality and can measure the distance between probabilistic distributions. Via a comparison using five real-world datasets with state-of-the-art methods, the proposed model outperforms the best existing models by 4-22% in terms of recall@K on Top-K recommendation.
This paper proposes a model-free Reinforcement Learning (RL) algorithm to synthesise policies for an unknown Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), then construct a synchronized MDP between the automaton and the original MDP. According to the resulting LDBA, a reward function is then defined over the state-action pairs of the product MDP. With this reward function, our algorithm synthesises a policy whose traces satisfies the linear time property: as such, the policy synthesis procedure is "constrained" by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP - a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.
Neural waveform models such as the WaveNet are used in many recent text-to-speech systems, but the original WaveNet is quite slow in waveform generation because of its autoregressive (AR) structure. Although faster non-AR models were recently reported, they may be prohibitively complicated due to the use of a distilling training method and the blend of other disparate training criteria. This study proposes a non-AR neural source-filter waveform model that can be directly trained using spectrum-based training criteria and the stochastic gradient descent method. Given the input acoustic features, the proposed model first uses a source module to generate a sine-based excitation signal and then uses a filter module to transform the excitation signal into the output speech waveform. Our experiments demonstrated that the proposed model generated waveforms at least 100 times faster than the AR WaveNet and the quality of its synthetic speech is close to that of speech generated by the AR WaveNet. Ablation test results showed that both the sine-wave excitation signal and the spectrum-based training criteria were essential to the performance of the proposed model.
We present an implementation of a probabilistic first-order logic called TensorLog, in which classes of logical queries are compiled into differentiable functions in a neural-network infrastructure such as Tensorflow or Theano. This leads to a close integration of probabilistic logical reasoning with deep-learning infrastructure: in particular, it enables high-performance deep learning frameworks to be used for tuning the parameters of a probabilistic logic. Experimental results show that TensorLog scales to problems involving hundreds of thousands of knowledge-base triples and tens of thousands of examples.