Complex scientific models where the likelihood cannot be evaluated present a challenge for statistical inference. Over the past two decades, a wide range of algorithms have been proposed for learning parameters in computationally feasible ways, often under the heading of approximate Bayesian computation or likelihood-free inference. There is, however, no consensus on how to rigorously evaluate the performance of these algorithms. Here, we argue for scoring algorithms by the mean squared error in estimating expectations of functions with respect to the posterior. We show that score implies common alternatives, including the acceptance rate and effective sample size, as limiting special cases. We then derive asymptotically optimal distributions for choosing or sampling discrete or continuous simulation parameters, respectively. Our recommendations differ significantly from guidelines based on alternative scores outside of their region of validity. As an application, we show sequential Monte Carlo in this context can be made more accurate with no new samples by accepting particles from all rounds.
We consider Ising models on the hypercube with a general interaction matrix $J$, and give a polynomial time sampling algorithm when all but $O(1)$ eigenvalues of $J$ lie in an interval of length one, a situation which occurs in many models of interest. This was previously known for the Glauber dynamics when *all* eigenvalues fit in an interval of length one; however, a single outlier can force the Glauber dynamics to mix torpidly. Our general result implies the first polynomial time sampling algorithms for low-rank Ising models such as Hopfield networks with a fixed number of patterns and Bayesian clustering models with low-dimensional contexts, and greatly improves the polynomial time sampling regime for the antiferromagnetic/ferromagnetic Ising model with inconsistent field on expander graphs. It also improves on previous approximation algorithm results based on the naive mean-field approximation in variational methods and statistical physics. Our approach is based on a new fusion of ideas from the MCMC and variational inference worlds. As part of our algorithm, we define a new nonconvex variational problem which allows us to sample from an exponential reweighting of a distribution by a negative definite quadratic form, and show how to make this procedure provably efficient using stochastic gradient descent. On top of this, we construct a new simulated tempering chain (on an extended state space arising from the Hubbard-Stratonovich transform) which overcomes the obstacle posed by large positive eigenvalues, and combine it with the SGD-based sampler to solve the full problem.
Bretz et al (2005) proposed multiple Comparison Procedure and Modeling (MCPMod) method to design and analyze dose-finding study. Pinheiro (2014) then generalized it to various types of endpoint, including but not limited to binary endpoint, survival endpoint, count data, and longitudinal data. Pinheiro (2013) recommended to use the estimated covariance matrix from the observed data to recalculate the optimal contrast and the critical value of the test For many phase II studies it is common to have small sample sizes per arm with low placebo response rates jointly. Under such circumstances, it cannot be excluded to have a zero count observed. For example, when the placebo response rate is 10%, there is about 4% chance to observe zero responders in the placebo group, or other dose group(s), which has a similar response rate as placebo. In this manuscript, we would like to illustrate the potential problem of Pinheiro (2013) using a case study and simulations. An alternative method using Firth's logistic regression was evaluated to get a stable estimate of response for each dose group. In addition, we evaluated two options to address the issue with problematic contrast coefficients.
Many research questions concern treatment effects on outcomes that can recur several times in the same individual. For example, medical researchers are interested in treatment effects on hospitalizations in heart failure patients and sports injuries in athletes. Competing events, such as death, complicate causal inference in studies of recurrent events because once a competing event occurs, an individual cannot have more recurrent events. Several statistical estimands have been studied in recurrent event settings, with and without competing events. However, the causal interpretations of these estimands, and the conditions that are required to identify these estimands from observed data, have yet to be formalized. Here we use a counterfactual framework for causal inference to formulate several causal estimands in recurrent event settings, with and without competing events. When competing events exist, we clarify when commonly used classical statistical estimands can be interpreted as causal quantities from the causal mediation literature, such as (controlled) direct effects and total effects. Furthermore, we show that recent results on interventionist mediation estimands allow us to define new causal estimands with recurrent and competing events that may be of particular clinical relevance in many subject matter settings. We use causal directed acyclic graphs and single world intervention graphs to illustrate how to reason about identification conditions for the various causal estimands using subject matter knowledge. Furthermore, using results on counting processes, we show how our causal estimands and their identification conditions, which are articulated in discrete time, converge to classical continuous-time counterparts in the limit of fine discretizations of time. Finally, we propose several estimators and establish their consistency for the various identifying functionals.
There is a rich literature on Bayesian methods for density estimation, which characterize the unknown density as a mixture of kernels. Such methods have advantages in terms of providing uncertainty quantification in estimation, while being adaptive to a rich variety of densities. However, relative to frequentist locally adaptive kernel methods, Bayesian approaches can be slow and unstable to implement in relying on Markov chain Monte Carlo algorithms. To maintain most of the strengths of Bayesian approaches without the computational disadvantages, we propose a class of nearest neighbor-Dirichlet mixtures. The approach starts by grouping the data into neighborhoods based on standard algorithms. Within each neighborhood, the density is characterized via a Bayesian parametric model, such as a Gaussian with unknown parameters. Assigning a Dirichlet prior to the weights on these local kernels, we obtain a pseudo-posterior for the weights and kernel parameters. A simple and embarrassingly parallel Monte Carlo algorithm is proposed to sample from the resulting pseudo-posterior for the unknown density. Desirable asymptotic properties are shown, and the methods are evaluated in simulation studies and applied to a motivating data set in the context of classification.
We develop methodology for testing hypotheses regarding the slope function in functional linear regression for time series via a reproducing kernel Hilbert space approach. In contrast to most of the literature, which considers tests for the exact nullity of the slope function, we are interested in the null hypothesis that the slope function vanishes only approximately, where deviations are measured with respect to the $L^2$-norm. An asymptotically pivotal test is proposed, which does not require the estimation of nuisance parameters and long-run covariances. The key technical tools to prove the validity of our approach include a uniform Bahadur representation and a weak invariance principle for a sequential process of estimates of the slope function. Both scalar-on-function and function-on-function linear regression are considered and finite-sample methods for implementing our methodology are provided. We also illustrate the potential of our methods by means of a small simulation study and a data example.
Active inference is a unifying theory for perception and action resting upon the idea that the brain maintains an internal model of the world by minimizing free energy. From a behavioral perspective, active inference agents can be seen as self-evidencing beings that act to fulfill their optimistic predictions, namely preferred outcomes or goals. In contrast, reinforcement learning requires human-designed rewards to accomplish any desired outcome. Although active inference could provide a more natural self-supervised objective for control, its applicability has been limited because of the shortcomings in scaling the approach to complex environments. In this work, we propose a contrastive objective for active inference that strongly reduces the computational burden in learning the agent's generative model and planning future actions. Our method performs notably better than likelihood-based active inference in image-based tasks, while also being computationally cheaper and easier to train. We compare to reinforcement learning agents that have access to human-designed reward functions, showing that our approach closely matches their performance. Finally, we also show that contrastive methods perform significantly better in the case of distractors in the environment and that our method is able to generalize goals to variations in the background.
This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models. Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains, which serves as prior knowledge of the changing modules for the purpose of deriving the posterior of the target variable $Y$ in the target domain. This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes, if available, can be directly incorporated to improve the graphical representation. We discuss how causality-based domain adaptation can be put under this umbrella. Experimental results on both synthetic and real data demonstrate the efficacy of the proposed framework for domain adaptation. The code is available at //github.com/mgong2/DA_Infer .
This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i.e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.
Amortized inference has led to efficient approximate inference for large datasets. The quality of posterior inference is largely determined by two factors: a) the ability of the variational distribution to model the true posterior and b) the capacity of the recognition network to generalize inference over all datapoints. We analyze approximate inference in variational autoencoders in terms of these factors. We find that suboptimal inference is often due to amortizing inference rather than the limited complexity of the approximating distribution. We show that this is due partly to the generator learning to accommodate the choice of approximation. Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation.