In mathematical finance, Levy processes are widely used for their ability to model both continuous variation and abrupt, discontinuous jumps. These jumps are practically relevant, so reliable inference on the feature that controls jump frequencies and magnitudes, namely, the Levy density, is of critical importance. A specific obstacle to carrying out model-based (e.g., Bayesian) inference in such problems is that, for general Levy processes, the likelihood is intractable. To overcome this obstacle, here we adopt a Gibbs posterior framework that updates a prior distribution using a suitable loss function instead of a likelihood. We establish asymptotic posterior concentration rates for the proposed Gibbs posterior. In particular, in the most interesting and practically relevant case, we give conditions under which the Gibbs posterior concentrates at (nearly) the minimax optimal rate, adaptive to the unknown smoothness of the true Levy density.
In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy. Our work goes beyond existing works on the optimality of policy gradient methods in that existing works use the exact policy gradient for updating the policy parameters while we use an approximate and stochastic update step. Our update step is not a gradient update because we do not use a density ratio to correct the state distribution, which aligns well with what practitioners do. Our update is approximate because we use a learned critic instead of the true value function. Our update is stochastic because at each step the update is done for only the current state action pair. Moreover, we remove several restrictive assumptions from existing works in our analysis. Central to our work is the finite sample analysis of a generic stochastic approximation algorithm with time-inhomogeneous update operators on time-inhomogeneous Markov chains, based on its uniform contraction properties.
The recent introduction of thermodynamic integration techniques has provided a new framework for understanding and improving variational inference (VI). In this work, we present a careful analysis of the thermodynamic variational objective (TVO), bridging the gap between existing variational objectives and shedding new insights to advance the field. In particular, we elucidate how the TVO naturally connects the three key variational schemes, namely the importance-weighted VI, Renyi-VI, and MCMC-VI, which subsumes most VI objectives employed in practice. To explain the performance gap between theory and practice, we reveal how the pathological geometry of thermodynamic curves negatively affects TVO. By generalizing the integration path from the geometric mean to the weighted Holder mean, we extend the theory of TVO and identify new opportunities for improving VI. This motivates our new VI objectives, named the Holder bounds, which flatten the thermodynamic curves and promise to achieve a one-step approximation of the exact marginal log-likelihood. A comprehensive discussion on the choices of numerical estimators is provided. We present strong empirical evidence on both synthetic and real-world datasets to support our claims.
Inference is the task of drawing conclusions about unobserved variables given observations of related variables. Applications range from identifying diseases from symptoms to classifying economic regimes from price movements. Unfortunately, performing exact inference is intractable in general. One alternative is variational inference, where a candidate probability distribution is optimized to approximate the posterior distribution over unobserved variables. For good approximations, a flexible and highly expressive candidate distribution is desirable. In this work, we use quantum Born machines as variational distributions over discrete variables. We apply the framework of operator variational inference to achieve this goal. In particular, we adopt two specific realizations: one with an adversarial objective and one based on the kernelized Stein discrepancy. We demonstrate the approach numerically using examples of Bayesian networks, and implement an experiment on an IBM quantum computer. Our techniques enable efficient variational inference with distributions beyond those that are efficiently representable on a classical computer.
This work considers the economic dispatch problem for a single micro-gas turbine, governed by a discrete state-space model, under combined heat and power (CHP) operation and coupled with a utility. If the exact power and heat demands are given, existing algorithms can be used to give a quick optimal solution to the economic dispatch problem. However, in practice, the power and heat demands can not be known deterministically, but are rather predicted, resulting in an estimate and a bound on the estimation error. We consider the case in which the power and heat demands are unknown, and present a robust optimization-based approach for scheduling the turbine's heat and power generation, in which the demand is assumed to be inside an uncertainty set. We consider two different choices of the uncertainty set relying on the $\ell^\infty$- and the $\ell^1$-norms, each with different advantages, and consider the associated robust economic dispatch problems. We recast these as robust shortest-path problems on appropriately defined graphs. For the first choice, we provide an exact linear-time algorithm for the solution of the robust shortest-path problem, and for the second, we provide an exact quadratic-time algorithm and an approximate linear-time algorithm. The efficiency and usefulness of the algorithms are demonstrated using a detailed case study that employs real data on energy demand profiles and electricity tariffs.
We propose a novel method for sampling and optimization tasks based on a stochastic interacting particle system. We explain how this method can be used for the following two goals: (i) generating approximate samples from a given target distribution; (ii) optimizing a given objective function. The approach is derivative-free and affine invariant, and is therefore well-suited for solving inverse problems defined by complex forward models: (i) allows generation of samples from the Bayesian posterior and (ii) allows determination of the maximum a posteriori estimator. We investigate the properties of the proposed family of methods in terms of various parameter choices, both analytically and by means of numerical simulations. The analysis and numerical simulation establish that the method has potential for general purpose optimization tasks over Euclidean space; contraction properties of the algorithm are established under suitable conditions, and computational experiments demonstrate wide basins of attraction for various specific problems. The analysis and experiments also demonstrate the potential for the sampling methodology in regimes in which the target distribution is unimodal and close to Gaussian; indeed we prove that the method recovers a Laplace approximation to the measure in certain parametric regimes and provide numerical evidence that this Laplace approximation attracts a large set of initial conditions in a number of examples.
In settings ranging from weather forecasts to political prognostications to financial projections, probability estimates of future binary outcomes often evolve over time. For example, the estimated likelihood of rain on a specific day changes by the hour as new information becomes available. Given a collection of such probability paths, we introduce a Bayesian framework -- which we call the Gaussian latent information martingale, or GLIM -- for modeling the structure of dynamic predictions over time. Suppose, for example, that the likelihood of rain in a week is 50 %, and consider two hypothetical scenarios. In the first, one expects the forecast to be equally likely to become either 25 % or 75 % tomorrow; in the second, one expects the forecast to stay constant for the next several days. A time-sensitive decision-maker might select a course of action immediately in the latter scenario, but may postpone their decision in the former, knowing that new information is imminent. We model these trajectories by assuming predictions update according to a latent process of information flow, which is inferred from historical data. In contrast to general methods for time series analysis, this approach preserves important properties of probability paths such as the martingale structure and appropriate amount of volatility and better quantifies future uncertainties around probability paths. We show that GLIM outperforms three popular baseline methods, producing better estimated posterior probability path distributions measured by three different metrics. By elucidating the dynamic structure of predictions over time, we hope to help individuals make more informed choices.
Cluster analysis requires many decisions: the clustering method and the implied reference model, the number of clusters and, often, several hyper-parameters and algorithms' tunings. In practice, one produces several partitions, and a final one is chosen based on validation or selection criteria. There exist an abundance of validation methods that, implicitly or explicitly, assume a certain clustering notion. Moreover, they are often restricted to operate on partitions obtained from a specific method. In this paper, we focus on groups that can be well separated by quadratic or linear boundaries. The reference cluster concept is defined through the quadratic discriminant score function and parameters describing clusters' size, center and scatter. We develop two cluster-quality criteria called quadratic scores. We show that these criteria are consistent with groups generated from a general class of elliptically-symmetric distributions. The quest for this type of groups is common in applications. The connection with likelihood theory for mixture models and model-based clustering is investigated. Based on bootstrap resampling of the quadratic scores, we propose a selection rule that allows choosing among many clustering solutions. The proposed method has the distinctive advantage that it can compare partitions that cannot be compared with other state-of-the-art methods. Extensive numerical experiments and the analysis of real data show that, even if some competing methods turn out to be superior in some setups, the proposed methodology achieves a better overall performance.
In this paper, we apply a Bayesian perspective to the sampling of alternatives for multinomial logit (MNL) and mixed multinomial logit (MMNL) models. A sampling of alternatives reduces the computational challenge of evaluating the denominator of the logit choice probability for large choice sets by only using a smaller subset of sampled alternatives including the chosen alternative. To correct for the resulting overestimation of the choice probability, a correction factor has to be applied. McFadden (1978) proposes a correction factor to the utility of each alternative which is based on the probability of sampling the smaller subset of alternatives and that alternative being chosen. McFadden's correction factor ensures consistency of parameter estimates under a wide range of sampling protocols. A special sampling protocol discussed by McFadden is uniform conditioning, which assigns the same sampling probability and therefore the same correction factor to each alternative in the sampled choice set. Since a constant is added to each alternative the correction factor cancels out, but consistent estimates are still obtained. Bayesian estimation is focused on describing the full posterior distributions of the parameters of interest instead of the consistency of their point estimates. We theoretically show that uniform conditioning is sufficient to minimise the loss of information from a sampling of alternatives on the parameters of interest over the full posterior distribution in Bayesian MNL models. Minimum loss of information is, however, not guaranteed for other sampling protocols. This result extends to Bayesian MMNL models estimated using the principle of data augmentation. The application of uniform conditioning, a more restrictive sampling protocol, is thus sufficient in a Bayesian estimation context to achieve finite sample properties of MNL and MMNL parameter estimates.
We introduce a flexible and scalable class of Bayesian geostatistical models for discrete data, based on the class of nearest neighbor mixture transition distribution processes (NNMP), referred to as discrete NNMP. The proposed class characterizes spatial variability by a weighted combination of first-order conditional probability mass functions (pmfs) for each one of a given number of neighbors. The approach supports flexible modeling for multivariate dependence through specification of general bivariate discrete distributions that define the conditional pmfs. Moreover, the discrete NNMP allows for construction of models given a pre-specified family of marginal distributions that can vary in space, facilitating covariate inclusion. In particular, we develop a modeling and inferential framework for copula-based NNMPs that can attain flexible dependence structures, motivating the use of bivariate copula families for spatial processes. Compared to the traditional class of spatial generalized linear mixed models, where spatial dependence is introduced through a transformation of response means, our process-based modeling approach provides both computational and inferential advantages. We illustrate the benefits with synthetic data examples and an analysis of North American Breeding Bird Survey data.
We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.