Diffusion processes with small noise conditioned to reach a target set are considered. The AMS algorithm is a Monte Carlo method that is used to sample such rare events by iteratively simulating clones of the process and selecting trajectories that have reached the highest value of a so-called importance function. In this paper, the large sample size relative variance of the AMS small probability estimator is considered. The main result is a large deviations logarithmic equivalent of the latter in the small noise asymptotics, which is rigorously derived. It is given as a maximisation problem explicit in terms of the quasi-potential cost function associated with the underlying small noise large deviations. Necessary and sufficient geometric conditions ensuring the vanishing of the obtained quantity ('weak' asymptotic efficiency) are provided. Interpretations and practical consequences are discussed.
Background: Outcome measures that are count variables with excessive zeros are common in health behaviors research. There is a lack of empirical data about the relative performance of prevailing statistical models when outcomes are zero-inflated, particularly compared with recently developed approaches. Methods: The current simulation study examined five commonly used analytical approaches for count outcomes, including two linear models (with outcomes on raw and log-transformed scales, respectively) and three count distribution-based models (i.e., Poisson, negative binomial, and zero-inflated Poisson (ZIP) models). We also considered the marginalized zero-inflated Poisson (MZIP) model, a novel alternative that estimates the effects on overall mean while adjusting for zero-inflation. Extensive simulations were conducted to evaluate their the statistical power and Type I error rate across various data conditions. Results: Under zero-inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non-zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on raw scale, negative binomial model, and ZIP model. The performance of a linear model with a log-transformed outcome variable was unsatisfactory. When only one of the effects on the zero (vs. non-zero) part and the count part existed, the ZIP model had the highest statistical power. Conclusions: The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero-inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros.
We study a multi-objective multi-armed bandit problem in a dynamic environment. The problem portrays a decision-maker that sequentially selects an arm from a given set. If selected, each action produces a reward vector, where every element follows a piecewise-stationary Bernoulli distribution. The agent aims at choosing an arm among the Pareto optimal set of arms to minimize its regret. We propose a Pareto generic upper confidence bound (UCB)-based algorithm with change detection to solve this problem. By developing the essential inequalities for multi-dimensional spaces, we establish that our proposal guarantees a regret bound in the order of $\gamma_T\log(T/{\gamma_T})$ when the number of breakpoints $\gamma_T$ is known. Without this assumption, the regret bound of our algorithm is $\gamma_T\log(T)$. Finally, we formulate an energy-efficient waveform design problem in an integrated communication and sensing system as a toy example. Numerical experiments on the toy example and synthetic and real-world datasets demonstrate the efficiency of our policy compared to the current methods.
We consider signal source localization from range-difference measurements. First, we give some readily-checked conditions on measurement noises and sensor deployment to guarantee the asymptotic identifiability of the model and show the consistency and asymptotic normality of the maximum likelihood (ML) estimator. Then, we devise an estimator that owns the same asymptotic property as the ML one. Specifically, we prove that the negative log-likelihood function converges to a function, which has a unique minimum and positive-definite Hessian at the true source's position. Hence, it is promising to execute local iterations, e.g., the Gauss-Newton (GN) algorithm, following a consistent estimate. The main issue involved is obtaining a preliminary consistent estimate. To this aim, we construct a linear least-squares problem via algebraic operation and constraint relaxation and obtain a closed-form solution. We then focus on deriving and eliminating the bias of the linear least-squares estimator, which yields an asymptotically unbiased (thus consistent) estimate. Noting that the bias is a function of the noise variance, we further devise a consistent noise variance estimator which involves $3$-order polynomial rooting. Based on the preliminary consistent location estimate, we prove that a one-step GN iteration suffices to achieve the same asymptotic property as the ML estimator. Simulation results demonstrate the superiority of our proposed algorithm in the large sample case.
In the usual Bayesian setting, a full probabilistic model is required to link the data and parameters, and the form of this model and the inference and prediction mechanisms are specified via de Finetti's representation. In general, such a formulation is not robust to model mis-specification of its component parts. An alternative approach is to draw inference based on loss functions, where the quantity of interest is defined as a minimizer of some expected loss, and to construct posterior distributions based on the loss-based formulation; this strategy underpins the construction of the Gibbs posterior. We develop a Bayesian non-parametric approach; specifically, we generalize the Bayesian bootstrap, and specify a Dirichlet process model for the distribution of the observables. We implement this using direct prior-to-posterior calculations, but also using predictive sampling. We also study the assessment of posterior validity for non-standard Bayesian calculations, and provide an efficient way to calibrate the scaling parameter in the Gibbs posterior so that it can achieve the desired coverage rate. We show that the developed non-standard Bayesian updating procedures yield valid posterior distributions in terms of consistency and asymptotic normality under model mis-specification. Simulation studies show that the proposed methods can recover the true value of the parameter efficiently and achieve frequentist coverage even when the sample size is small. Finally, we apply our methods to evaluate the causal impact of speed cameras on traffic collisions in England.
We consider the problem of model selection when grouping structure is inherent within the regressors. Using a Bayesian approach, we model the mean vector by a one-group global-local shrinkage prior belonging to a broad class of such priors that includes the horseshoe prior. In the context of variable selection, this class of priors was studied by Tang et al. (2018) \cite{tang2018bayesian}. A modified form of the usual class of global-local shrinkage priors with polynomial tail on the group regression coefficients is proposed. The resulting threshold rule selects the active group if within a group, the ratio of the $L_2$ norm of the posterior mean of its group coefficient to that of the corresponding ordinary least square group estimate is greater than a half. In the theoretical part of this article, we have used the global shrinkage parameter either as a tuning one or an empirical Bayes estimate of it depending on the knowledge regarding the underlying sparsity of the model. When the proportion of active groups is known, using $\tau$ as a tuning parameter, we have proved that our method enjoys variable selection consistency. In case this proportion is unknown, we propose an empirical Bayes estimate of $\tau$. Even if this empirical Bayes estimate is used, then also our half-thresholding rule captures the true sparse group structure. Though our theoretical works rely on a special form of the design matrix, but for general design matrices also, our simulation results show that the half-thresholding rule yields results similar to that of Yang and Narisetty (2020) \cite{yang2020consistent}. As a consequence of this, in a high dimensional sparse group selection problem, instead of using the so-called `gold standard' spike and slab prior, one can use the one-group global-local shrinkage priors with polynomial tail to obtain similar results.
We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias), and study the effect of symmetries on the learned parameters and predictors. We first describe a general class of symmetries which, when satisfied by the target function $f^*$ and the input distribution, are preserved by the dynamics. We then study more specific cases. When $f^*$ is odd, we show that the dynamics of the predictor reduces to that of a (non-linearly parameterized) linear predictor, and its exponential convergence can be guaranteed. When $f^*$ has a low-dimensional structure, we prove that the gradient flow PDE reduces to a lower-dimensional PDE. Furthermore, we present informal and numerical arguments that suggest that the input neurons align with the lower-dimensional structure of the problem.
We address the weak numerical solution of stochastic differential equations driven by independent Brownian motions (SDEs for short). This paper develops a new methodology to design adaptive strategies for determining automatically the step-sizes of the numerical schemes that compute the mean values of smooth functions of the solutions of SDEs. First, we introduce a general method for constructing variable step-size weak schemes for SDEs, which is based on controlling the match between the first conditional moments of the increments of the numerical integrator and the ones corresponding to an additional weak approximation. To this end, we use certain local discrepancy functions that do not involve sampling random variables. Precise directions for designing suitable discrepancy functions and for selecting starting step-sizes are given. Second, we introduce a variable step-size Euler scheme, together with a variable step-size second order weak scheme via extrapolation. Finally, numerical simulations are presented to show the potential of the introduced variable step-size strategy and the adaptive scheme to overcome known instability problems of the conventional fixed step-size schemes in the computation of diffusion functional expectations.
The nonconvex formulation of matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient descent (GD) is the simplest yet efficient baseline algorithm for solving nonconvex optimization problems. The success of GD has been witnessed in many different problems in both theory and practice when it is combined with random initialization. However, previous works on matrix completion require either careful initialization or regularizers to prove the convergence of GD. In this work, we study the rank-1 symmetric matrix completion and prove that GD converges to the ground truth when small random initialization is used. We show that in logarithmic amount of iterations, the trajectory enters the region where local convergence occurs. We provide an upper bound on the initialization size that is sufficient to guarantee the convergence and show that a larger initialization can be used as more samples are available. We observe that implicit regularization effect of GD plays a critical role in the analysis, and for the entire trajectory, it prevents each entry from becoming much larger than the others.
Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills which aim to reach diverse goals. In particular, offline GCRL only requires purely pre-collected datasets to perform training tasks without additional interactions with the environment. Although offline GCRL has become increasingly prevalent and many previous works have demonstrated its empirical success, the theoretical understanding of efficient offline GCRL algorithms is not well established, especially when the state space is huge and the offline dataset only covers the policy we aim to learn. In this paper, we propose a novel provably efficient algorithm (the sample complexity is $\tilde{O}({\rm poly}(1/\epsilon))$ where $\epsilon$ is the desired suboptimality of the learned policy) with general function approximation. Our algorithm only requires nearly minimal assumptions of the dataset (single-policy concentrability) and the function class (realizability). Moreover, our algorithm consists of two uninterleaved optimization steps, which we refer to as $V$-learning and policy learning, and is computationally stable since it does not involve minimax optimization. To the best of our knowledge, this is the first algorithm with general function approximation and single-policy concentrability that is both statistically efficient and computationally stable.
This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.