Many real-world tasks include some kind of parameter estimation, i.e., determination of a parameter encoded in a probability distribution. Often, such probability distributions arise from stochastic processes. For a stationary stochastic process with temporal correlations, the random variables that constitute it are identically distributed but not independent. This is the case, for instance, for quantum continuous measurements. In this paper we prove two fundamental results concerning the estimation of parameters encoded in a memoryful stochastic process. First, we show that for processes with finite Markov order, the Fisher information is always asymptotically linear in the number of outcomes, and determined by the conditional distribution of the process' Markov order. Second, we prove with suitable examples that correlations do not necessarily enhance the metrological precision. In fact, we show that unlike for entropic information quantities, in general nothing can be said about the sub- or super-additivity of the joint Fisher information, in the presence of correlations. We discuss how the type of correlations in the process affects the scaling. We then apply these results to the case of thermometry on a spin chain.
To quantify uncertainties in inverse problems of partial differential equations (PDEs), we formulate them into statistical inference problems using Bayes' formula. Recently, well-justified infinite-dimensional Bayesian analysis methods have been developed to construct dimension-independent algorithms. However, there are three challenges for these infinite-dimensional Bayesian methods: prior measures usually act as regularizers and are not able to incorporate prior information efficiently; complex noises, such as more practical non-i.i.d. distributed noises, are rarely considered; and time-consuming forward PDE solvers are needed to estimate posterior statistical quantities. To address these issues, an infinite-dimensional inference framework has been proposed based on the infinite-dimensional variational inference method and deep generative models. Specifically, by introducing some measure equivalence assumptions, we derive the evidence lower bound in the infinite-dimensional setting and provide possible parametric strategies that yield a general inference framework called the Variational Inverting Network (VINet). This inference framework can encode prior and noise information from learning examples. In addition, relying on the power of deep neural networks, the posterior mean and variance can be efficiently and explicitly generated in the inference stage. In numerical experiments, we design specific network structures that yield a computable VINet from the general inference framework. Numerical examples of linear inverse problems of an elliptic equation and the Helmholtz equation are presented to illustrate the effectiveness of the proposed inference framework.
Two-sample multiple testing problems of sparse spatial data are frequently arising in a variety of scientific applications. In this article, we develop a novel neighborhood-assisted and posterior-adjusted (NAPA) approach to incorporate both the spatial smoothness and sparsity type side information to improve the power of the test while controlling the false discovery of multiple testing. We translate the side information into a set of weights to adjust the $p$-values, where the spatial pattern is encoded by the ordering of the locations, and the sparsity structure is encoded by a set of auxiliary covariates. We establish the theoretical properties of the proposed test, including the guaranteed power improvement over some state-of-the-art alternative tests, and the asymptotic false discovery control. We demonstrate the efficacy of the test through intensive simulations and two neuroimaging applications.
We consider the problem of learning a set of direct causes of a target variable from an observational joint distribution. Learning directed acyclic graphs (DAGs) that represent the causal structure is a fundamental problem in science. Several results are known when the full DAG is identifiable from the distribution, such as assuming a nonlinear Gaussian data-generating process. Often, we are only interested in identifying the direct causes of one target variable (local causal structure), not the full DAG. In this paper, we discuss different assumptions for the data-generating process of the target variable under which the set of direct causes is identifiable from the distribution. While doing so, we put essentially no assumptions on the variables other than the target variable. In addition to the novel identifiability results, we provide two practical algorithms for estimating the direct causes from a finite random sample and demonstrate their effectiveness on several benchmark datasets. We apply this framework to learn direct causes of the reduction in fertility rates in different countries.
Focusing on stochastic programming (SP) with covariate information, this paper proposes an empirical risk minimization (ERM) method embedded within a nonconvex piecewise affine decision rule (PADR), which aims to learn the direct mapping from features to optimal decisions. We establish the nonasymptotic consistency result of our PADR-based ERM model for unconstrained problems and asymptotic consistency result for constrained ones. To solve the nonconvex and nondifferentiable ERM problem, we develop an enhanced stochastic majorization-minimization algorithm and establish the asymptotic convergence to (composite strong) directional stationarity along with complexity analysis. We show that the proposed PADR-based ERM method applies to a broad class of nonconvex SP problems with theoretical consistency guarantees and computational tractability. Our numerical study demonstrates the superior performance of PADR-based ERM methods compared to state-of-the-art approaches under various settings, with significantly lower costs, less computation time, and robustness to feature dimensions and nonlinearity of the underlying dependency.
This work considers Bayesian inference under misspecification for complex statistical models comprised of simpler submodels, referred to as modules, that are coupled together. Such ``multi-modular" models often arise when combining information from different data sources, where there is a module for each data source. When some of the modules are misspecified, the challenges of Bayesian inference under misspecification can sometimes be addressed by using ``cutting feedback" methods, which modify conventional Bayesian inference by limiting the influence of unreliable modules. Here we investigate cutting feedback methods in the context of generalized posterior distributions, which are built from arbitrary loss functions, and present novel findings on their behaviour. We make three main contributions. First, we describe how cutting feedback methods can be defined in the generalized Bayes setting, and discuss the appropriate scaling of the loss functions for different modules to each other and the prior. Second, we derive a novel result about the large sample behaviour of the posterior for a given module's parameters conditional on the parameters of other modules. This formally justifies the use of conditional Laplace approximations, which provide better approximations of conditional posterior distributions compared to conditional distributions from a Laplace approximation of the joint posterior. Our final contribution leverages the large sample approximations of our second contribution to provide convenient diagnostics for understanding the sensitivity of inference to the coupling of the modules, and to implement a new semi-modular posterior approach for conducting robust Bayesian modular inference. The usefulness of the methodology is illustrated in several benchmark examples from the literature on cut model inference.
Moving average processes driven by exponential-tailed L\'evy noise are important extensions of their Gaussian counterparts in order to capture deviations from Gaussianity, more flexible dependence structures, and sample paths with jumps. Popular examples include non-Gaussian Ornstein--Uhlenbeck processes and type G Mat\'ern stochastic partial differential equation random fields. This paper is concerned with the open problem of determining their extremal dependence structure. We leverage the fact that such processes admit approximations on grids or triangulations that are used in practice for efficient simulations and inference. These approximations can be expressed as special cases of a class of linear transformations of independent, exponential-tailed random variables, that bridge asymptotic dependence and independence in a novel, tractable way. This result is of independent interest since models that can capture both extremal dependence regimes are scarce and the construction of such flexible models is an active area of research. This new fundamental result allows us to show that the integral approximation of general moving average processes with exponential-tailed L\'evy noise is asymptotically independent when the mesh is fine enough. Under mild assumptions on the kernel function we also derive the limiting residual tail dependence function. For the popular exponential-tailed Ornstein--Uhlenbeck process we prove that it is asymptotically independent, but with a different residual tail dependence function than its Gaussian counterpart. Our results are illustrated through simulation studies.
Monte Carlo methods represent a cornerstone of computer science. They allow to sample high dimensional distribution functions in an efficient way. In this paper we consider the extension of Automatic Differentiation (AD) techniques to Monte Carlo process, addressing the problem of obtaining derivatives (and in general, the Taylor series) of expectation values. Borrowing ideas from the lattice field theory community, we examine two approaches. One is based on reweighting while the other represents an extension of the Hamiltonian approach typically used by the Hybrid Monte Carlo (HMC) and similar algorithms. We show that the Hamiltonian approach can be understood as a change of variables of the reweighting approach, resulting in much reduced variances of the coefficients of the Taylor series. This work opens the door to find other variance reduction techniques for derivatives of expectation values.
This paper studies the efficient estimation of a large class of treatment effect parameters that arise in the analysis of experiments. Here, efficiency is understood to be with respect to a broad class of treatment assignment schemes for which the marginal probability that any unit is assigned to treatment equals a pre-specified value, e.g., one half. Importantly, we do not require that treatment status is assigned in an i.i.d. fashion, thereby accommodating complicated treatment assignment schemes that are used in practice, such as stratified block randomization and matched pairs. The class of parameters considered are those that can be expressed as the solution to a restriction on the expectation of a known function of the observed data, including possibly the pre-specified value for the marginal probability of treatment assignment. We show that this class of parameters includes, among other things, average treatment effects, quantile treatment effects, local average treatment effects as well as the counterparts to these quantities in experiments in which the unit is itself a cluster. In this setting, we establish two results. First, we derive a lower bound on the asymptotic variance of estimators of the parameter of interest in the form of a convolution theorem. Second, we show that the n\"aive method of moments estimator achieves this bound on the asymptotic variance quite generally if treatment is assigned using a "finely stratified" design. By a "finely stratified" design, we mean experiments in which units are divided into groups of a fixed size and a proportion within each group is assigned to treatment uniformly at random so that it respects the restriction on the marginal probability of treatment assignment. In this sense, "finely stratified" experiments lead to efficient estimators of treatment effect parameters "by design" rather than through ex post covariate adjustment.
Hardware security vulnerabilities in computing systems compromise the security defenses of not only the hardware but also the software running on it. Recent research has shown that hardware fuzzing is a promising technique to efficiently detect such vulnerabilities in large-scale designs such as modern processors. However, the current fuzzing techniques do not adjust their strategies dynamically toward faster and higher design space exploration, resulting in slow vulnerability detection, evident through their low design coverage. To address this problem, we propose PSOFuzz, which uses particle swarm optimization (PSO) to schedule the mutation operators and to generate initial input programs dynamically with the objective of detecting vulnerabilities quickly. Unlike traditional PSO, which finds a single optimal solution, we use a modified PSO that dynamically computes the optimal solution for selecting mutation operators required to explore new design regions in hardware. We also address the challenge of inefficient initial input generation by employing PSO-based input generation. Including these optimizations, our final formulation outperforms fuzzers without PSO. Experiments show that PSOFuzz achieves up to 15.25$\times$ speedup for vulnerability detection and up to 2.22$\times$ speedup for coverage compared to the state-of-the-art simulation-based hardware fuzzer.
Taking online decisions is a part of everyday life. Think of buying a house, parking a car or taking part in an auction. We often take those decisions publicly, which may breach our privacy - a party observing our choices may learn a lot about our preferences. In this paper we investigate the online stopping algorithms from the privacy preserving perspective, using a mathematically rigorous differential privacy notion. In differentially private algorithms there is usually an issue of balancing the privacy and utility. In this regime, in most cases, having both optimality and high level of privacy at the same time is impossible. We propose a natural mechanism to achieve a controllable trade-off, quantified by a parameter, between the accuracy of the online algorithm and its privacy. Depending on the parameter, our mechanism can be optimal with weaker differential privacy or suboptimal, yet more privacy-preserving. We conduct a detailed accuracy and privacy analysis of our mechanism applied to the optimal algorithm for the classical secretary problem. Thereby the classical notions from two distinct areas - optimal stopping and differential privacy - meet for the first time.