Probabilistic Logic Programming is an effective formalism for encoding problems characterized by uncertainty. Some of these problems may require the optimization of probability values subject to constraints among probability distributions of random variables. Here, we introduce a new class of probabilistic logic programs, namely Probabilistic Optimizable Logic Programs, and we provide an effective algorithm to find the best assignment to probabilities of random variables, such that a set of constraints is satisfied and an objective function is optimized. This paper is under consideration for acceptance in Theory and Practice of Logic Programming.
Sample-based trajectory optimisers are a promising tool for the control of robotics with non-differentiable dynamics and cost functions. Contemporary approaches derive from a restricted subclass of stochastic optimal control where the optimal policy can be expressed in terms of an expectation over stochastic paths. By estimating the expectation with Monte Carlo sampling and reinterpreting the process as exploration noise, a stochastic search algorithm is obtained tailored to (deterministic) trajectory optimisation. For the purpose of future algorithmic development, it is essential to properly understand the underlying theoretical foundations that allow for a principled derivation of such methods. In this paper we make a connection between entropy regularisation in optimisation and deterministic optimal control. We then show that the optimal policy is given by a belief function rather than a deterministic function. The policy belief is governed by a Bayesian-type update where the likelihood can be expressed in terms of a conditional expectation over paths induced by a prior policy. Our theoretical investigation firmly roots sample based trajectory optimisation in the larger family of control as inference. It allows us to justify a number of heuristics that are common in the literature and motivate a number of new improvements that benefit convergence.
It is well known that reinforcement learning can be cast as inference in an appropriate probabilistic model. However, this commonly involves introducing a distribution over agent trajectories with probabilities proportional to exponentiated rewards. In this work, we formulate reinforcement learning as Bayesian inference without resorting to rewards, and show that rewards are derived from agent's preferences, rather than the other way around. We argue that agent preferences should be specified stochastically rather than deterministically. Reinforcement learning via inference with stochastic preferences naturally describes agent behaviors, does not require introducing rewards and exponential weighing of trajectories, and allows to reason about agents using the solid foundation of Bayesian statistics. Stochastic conditioning, a probabilistic programming paradigm for conditioning models on distributions rather than values, is the formalism behind agents with probabilistic preferences. We demonstrate realization of our approach on case studies using both a two-agent coordinate game and a single agent acting in a noisy environment, showing that despite superficial differences, both cases can be modeled and reasoned about based on the same principles.
We introduce Scruff, a new framework for developing AI systems using probabilistic programming. Scruff enables a variety of representations to be included, such as code with stochastic choices, neural networks, differential equations, and constraint systems. These representations are defined implicitly using a set of standardized operations that can be performed on them. General-purpose algorithms are then implemented using these operations, enabling generalization across different representations. Zero, one, or more operation implementations can be provided for any given representation, giving algorithms the flexibility to use the most appropriate available implementations for their purposes and enabling representations to be used in ways that suit their capabilities. In this paper, we explain the general approach of implicitly defined representations and provide a variety of examples of representations at varying degrees of abstraction. We also show how a relatively small set of operations can serve to unify a variety of AI algorithms. Finally, we discuss how algorithms can use policies to choose which operation implementations to use during execution.
This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
We present a decomposition approach for obtaining good feasible solutions for the security-constrained alternating-current optimal power flow (SCACOPF) problem at an industrial scale and under real-world time and computational limits. The approach aims at complementing the existing body of literature on bounding the problem via convex relaxations. It was designed for the participation in ARPA-E's Grid Optimization (GO) Competition Challenge 1. The challenge focused on a near-real-time version of the SCACOPF problem where a base case operating point is optimized taking into account possible single-element contingencies, after which the system adapts its operating point following the response of automatic frequency drop controllers and voltage regulators. Our solution approach for this problem relies on state-of-the-art nonlinear programming algorithms and employs nonconvex relaxations for complementarity constraints, a specialized two-stage decomposition technique with sparse approximations of recourse terms, and contingency ranking and pre-screening. The paper also outlines the salient features of our implementation, such as fast model functions and derivatives evaluation, warm-starting strategies, and asynchronous parallelism. We discuss the results of the independent benchmark of our approach done by ARPA-E's GO team in Challenge 1, which found that our methodology consistently produces high quality solutions across a wide range of network sizes and difficulty. Finally, we conclude by outlining potential extensions and improvements of our methodology.
We generalize intuitionistic tense logics to the multi-modal case by placing grammar logics on an intuitionistic footing. We provide axiomatizations for a class of base intuitionistic grammar logics as well as provide axiomatizations for extensions with combinations of seriality axioms and what we call "intuitionistic path axioms". We show that each axiomatization is sound and complete with completeness being shown via a typical canonical model construction.
Probabilistic programming languages aid developers performing Bayesian inference. These languages provide programming constructs and tools for probabilistic modeling and automated inference. Prior work introduced a probabilistic programming language, ProbZelus, to extend probabilistic programming functionality to unbounded streams of data. This work demonstrated that the delayed sampling inference algorithm could be extended to work in a streaming context. ProbZelus showed that while delayed sampling could be effectively deployed on some programs, depending on the probabilistic model under consideration, delayed sampling is not guaranteed to use a bounded amount of memory over the course of the execution of the program. In this paper, we the present conditions on a probabilistic program's execution under which delayed sampling will execute in bounded memory. The two conditions are dataflow properties of the core operations of delayed sampling: the $m$-consumed property and the unseparated paths property. A program executes in bounded memory under delayed sampling if, and only if, it satisfies the $m$-consumed and unseparated paths properties. We propose a static analysis that abstracts over these properties to soundly ensure that any program that passes the analysis satisfies these properties, and thus executes in bounded memory under delayed sampling.
Computational design problems arise in a number of settings, from synthetic biology to computer architectures. In this paper, we aim to solve data-driven model-based optimization (MBO) problems, where the goal is to find a design input that maximizes an unknown objective function provided access to only a static dataset of prior experiments. Such data-driven optimization procedures are the only practical methods in many real-world domains where active data collection is expensive (e.g., when optimizing over proteins) or dangerous (e.g., when optimizing over aircraft designs). Typical methods for MBO that optimize the design against a learned model suffer from distributional shift: it is easy to find a design that "fools" the model into predicting a high value. To overcome this, we propose conservative objective models (COMs), a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs, and uses it for optimization. Structurally, COMs resemble adversarial training methods used to overcome adversarial examples. COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems, including optimizing protein sequences, robot morphologies, neural network weights, and superconducting materials.
Knowledge graph reasoning, which aims at predicting the missing facts through reasoning with the observed facts, is critical to many applications. Such a problem has been widely explored by traditional logic rule-based approaches and recent knowledge graph embedding methods. A principled logic rule-based approach is the Markov Logic Network (MLN), which is able to leverage domain knowledge with first-order logic and meanwhile handle their uncertainty. However, the inference of MLNs is usually very difficult due to the complicated graph structures. Different from MLNs, knowledge graph embedding methods (e.g. TransE, DistMult) learn effective entity and relation embeddings for reasoning, which are much more effective and efficient. However, they are unable to leverage domain knowledge. In this paper, we propose the probabilistic Logic Neural Network (pLogicNet), which combines the advantages of both methods. A pLogicNet defines the joint distribution of all possible triplets by using a Markov logic network with first-order logic, which can be efficiently optimized with the variational EM algorithm. In the E-step, a knowledge graph embedding model is used for inferring the missing triplets, while in the M-step, the weights of logic rules are updated based on both the observed and predicted triplets. Experiments on multiple knowledge graphs prove the effectiveness of pLogicNet over many competitive baselines.
This paper proposes a Reinforcement Learning (RL) algorithm to synthesize policies for a Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the property into a Limit Deterministic Buchi Automaton (LDBA), then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the LDBA. With this reward function, our algorithm synthesizes a policy that satisfies the linear time property: as such, the policy synthesis procedure is "constrained" by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP - a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.