Bayesian paradigm takes advantage of well fitting complicated survival models and feasible computing in survival analysis owing to the superiority in tackling the complex censoring scheme, compared with the frequentist paradigm. In this chapter, we aim to display the latest tendency in Bayesian computing, in the sense of automating the posterior sampling, through Bayesian analysis of survival modeling for multivariate survival outcomes with complicated data structure. Motivated by relaxing the strong assumption of proportionality and the restriction of a common baseline population, we propose a generalized shared frailty model which includes both parametric and nonparametric frailty random effects so as to incorporate both treatment-wise and temporal variation for multiple events. We develop a survival-function version of ANOVA dependent Dirichlet process to model the dependency among the baseline survival functions. The posterior sampling is implemented by the No-U-Turn sampler in Stan, a contemporary Bayesian computing tool, automatically. The proposed model is validated by analysis of the bladder cancer recurrences data. The estimation is consistent with existing results. Our model and Bayesian inference provide evidence that the Bayesian paradigm fosters complex modeling and feasible computing in survival analysis and Stan relaxes the posterior inference.
We consider online sequential decision problems where an agent must balance exploration and exploitation. We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy. We provide a new analysis showing that any algorithm producing policies in the optimistic set enjoys $\tilde O(\sqrt{AT})$ Bayesian regret for a problem with $A$ actions after $T$ rounds. We extend the regret analysis for optimistic policies to bilinear saddle-point problems which include zero-sum matrix games and constrained bandits as special cases. In this case we show that Thompson sampling can produce policies outside of the optimistic set and suffer linear regret in some instances. Finding a policy inside the optimistic set amounts to solving a convex optimization problem and we call the resulting algorithm `variational Bayesian optimistic sampling' (VBOS). The procedure works for any posteriors, \ie, it does not require the posterior to have any special properties, such as log-concavity, unimodality, or smoothness. The variational view of the problem has many useful properties, including the ability to tune the exploration-exploitation tradeoff, add regularization, incorporate constraints, and linearly parameterize the policy.
Population adjustment methods such as matching-adjusted indirect comparison (MAIC) are increasingly used to compare marginal treatment effects when there are cross-trial differences in effect modifiers and limited patient-level data. MAIC is based on propensity score weighting, which is sensitive to poor covariate overlap and cannot extrapolate beyond the observed covariate space. Current outcome regression-based alternatives can extrapolate but target a conditional treatment effect that is incompatible in the indirect comparison. When adjusting for covariates, one must integrate or average the conditional estimate over the relevant population to recover a compatible marginal treatment effect. We propose a marginalization method based parametric G-computation that can be easily applied where the outcome regression is a generalized linear model or a Cox model. The approach views the covariate adjustment regression as a nuisance model and separates its estimation from the evaluation of the marginal treatment effect of interest. The method can accommodate a Bayesian statistical framework, which naturally integrates the analysis into a probabilistic framework. A simulation study provides proof-of-principle and benchmarks the method's performance against MAIC and the conventional outcome regression. Parametric G-computation achieves more precise and more accurate estimates than MAIC, particularly when covariate overlap is poor, and yields unbiased marginal treatment effect estimates under no failures of assumptions. Furthermore, the marginalized covariate-adjusted estimates provide greater precision and accuracy than the conditional estimates produced by the conventional outcome regression, which are systematically biased because the measure of effect is non-collapsible.
Population adjustment methods such as matching-adjusted indirect comparison (MAIC) are increasingly used to compare marginal treatment effects when there are cross-trial differences in effect modifiers and limited patient-level data. MAIC is sensitive to poor covariate overlap and cannot extrapolate beyond the observed covariate space. Current outcome regression-based alternatives can extrapolate but target a conditional treatment effect that is incompatible in the indirect comparison. When adjusting for covariates, one must integrate or average the conditional estimate over the population of interest to recover a compatible marginal treatment effect. We propose a marginalization method based on parametric G-computation that can be easily applied where the outcome regression is a generalized linear model or a Cox model. In addition, we introduce a novel general-purpose method based on multiple imputation, which we term multiple imputation marginalization (MIM) and is applicable to a wide range of models. Both methods can accommodate a Bayesian statistical framework, which naturally integrates the analysis into a probabilistic framework. A simulation study provides proof-of-principle for the methods and benchmarks their performance against MAIC and the conventional outcome regression. The marginalized outcome regression approaches achieve more precise and more accurate estimates than MAIC, particularly when covariate overlap is poor, and yield unbiased marginal treatment effect estimates under no failures of assumptions. Furthermore, the marginalized covariate-adjusted estimates provide greater precision and accuracy than the conditional estimates produced by the conventional outcome regression, which are systematically biased because the measure of effect is non-collapsible.
Bayesian inference provides a framework to combine an arbitrary number of model components with shared parameters, allowing joint uncertainty estimation and the use of all available data sources. However, misspecification of any part of the model might propagate to all other parts and lead to unsatisfactory results. Cut distributions have been proposed as a remedy, where the information is prevented from flowing along certain directions. We consider cut distributions from an asymptotic perspective, find the equivalent of the Laplace approximation, and notice a lack of frequentist coverage for the associate credible regions. We propose algorithms based on the Posterior Bootstrap that deliver credible regions with the nominal frequentist asymptotic coverage. The algorithms involve numerical optimization programs that can be performed fully in parallel. The results and methods are illustrated in various settings, such as causal inference with propensity scores and epidemiological studies.
Much of the micro data used for epidemiological studies contain sensitive measurements on real individuals. As a result, such micro data cannot be published out of privacy concerns, rendering any published statistical analyses on them nearly impossible to reproduce. To promote the dissemination of key datasets for analysis without jeopardizing the privacy of individuals, we introduce a cohesive Bayesian framework for the generation of fully synthetic, high dimensional micro datasets of mixed categorical, binary, count, and continuous variables. This process centers around a joint Bayesian model that is simultaneously compatible with all of these data types, enabling the creation of mixed synthetic datasets through posterior predictive sampling. Furthermore, a focal point of epidemiological data analysis is the study of conditional relationships between various exposures and key outcome variables through regression analysis. We design a modified data synthesis strategy to target and preserve these conditional relationships, including both nonlinearities and interactions. The proposed techniques are deployed to create a synthetic version of a confidential dataset containing dozens of health, cognitive, and social measurements on nearly 20,000 North Carolina children.
Dynamic Linear Models (DLMs) are commonly employed for time series analysis due to their versatile structure, simple recursive updating, and probabilistic forecasting. However, the options for count time series are limited: Gaussian DLMs require continuous data, while Poisson-based alternatives often lack sufficient modeling flexibility. We introduce a novel methodology for count time series by warping a Gaussian DLM. The warping function has two components: a transformation operator that provides distributional flexibility and a rounding operator that ensures the correct support for the discrete data-generating process. Importantly, we develop conjugate inference for the warped DLM, which enables analytic and recursive updates for the state space filtering and smoothing distributions. We leverage these results to produce customized and efficient computing strategies for inference and forecasting, including Monte Carlo simulation for offline analysis and an optimal particle filter for online inference. This framework unifies and extends a variety of discrete time series models and is valid for natural counts, rounded values, and multivariate observations. Simulation studies illustrate the excellent forecasting capabilities of the warped DLM. The proposed approach is applied to a multivariate time series of daily overdose counts and demonstrates both modeling and computational successes.
In this work we analyse the parameterised complexity of propositional inclusion (PINC) and independence logic (PIND). The problems of interest are model checking (MC) and satisfiability (SAT). The complexity of these problems is well understood in the classical (non-parameterised) setting. Mahmood and Meier (FoIKS 2020) recently studied the parameterised complexity of propositional dependence logic (PDL). As a continuation of their work, we classify inclusion and independence logic and thereby come closer to completing the picture with respect to the parametrised complexity for the three most studied logics in the propositional team semantics setting. We present results for each problem with respect to 8 different parameterisations. It turns out that for a team-based logic L such that L-atoms can be evaluated in polynomial time, then MC parameterised by teamsize is FPT. As a corollary, we get an FPT membership under the following parameterisations: formula-size, formula-depth, treewidth, and number of variables. The parameter teamsize shows interesting behavior for SAT. For PINC, the parameter teamsize is not meaningful, whereas for PDL and PIND the satisfiability is paraNP-complete. Finally, we prove that when parameterised by arity, both MC and SAT are paraNP-complete for each of the considered logics.
In recent years, local differential privacy (LDP) has emerged as a technique of choice for privacy-preserving data collection in several scenarios when the aggregator is not trustworthy. LDP provides client-side privacy by adding noise at the user's end. Thus, clients need not rely on the trustworthiness of the aggregator. In this work, we provide a noise-aware probabilistic modeling framework, which allows Bayesian inference to take into account the noise added for privacy under LDP, conditioned on locally perturbed observations. Stronger privacy protection (compared to the central model) provided by LDP protocols comes at a much harsher privacy-utility trade-off. Our framework tackles several computational and statistical challenges posed by LDP for accurate uncertainty quantification under Bayesian settings. We demonstrate the efficacy of our framework in parameter estimation for univariate and multi-variate distributions as well as logistic and linear regression.
We evaluate the robustness of Adversarial Logit Pairing, a recently proposed defense against adversarial examples. We find that a network trained with Adversarial Logit Pairing achieves 0.6% accuracy in the threat model in which the defense is considered. We provide a brief overview of the defense and the threat models/claims considered, as well as a discussion of the methodology and results of our attack, which may offer insights into the reasons underlying the vulnerability of ALP to adversarial attack.
Large margin nearest neighbor (LMNN) is a metric learner which optimizes the performance of the popular $k$NN classifier. However, its resulting metric relies on pre-selected target neighbors. In this paper, we address the feasibility of LMNN's optimization constraints regarding these target points, and introduce a mathematical measure to evaluate the size of the feasible region of the optimization problem. We enhance the optimization framework of LMNN by a weighting scheme which prefers data triplets which yield a larger feasible region. This increases the chances to obtain a good metric as the solution of LMNN's problem. We evaluate the performance of the resulting feasibility-based LMNN algorithm using synthetic and real datasets. The empirical results show an improved accuracy for different types of datasets in comparison to regular LMNN.