We consider discrete-time parametric population-size-dependent branching processes (PSDBPs) with almost sure extinction and propose a new class of weighted least-squares estimators based on a single trajectory of population size counts. We prove that, conditional on non-extinction up to a finite time $n$, our estimators are consistent and asymptotic normal as $n\to\infty$. We pay particular attention to estimating the carrying capacity of a population. Our estimators are the first conditionally consistent estimators for PSDBPs, and more generally, for Markov models for populations with a carrying capacity. Through simulated examples, we demonstrate that our estimators outperform other least squares estimators for PSDBPs in a variety of settings. Finally, we apply our methods to estimate the carrying capacity of the endangered Chatham Island black robin population.
This paper considers the problem of system identification (ID) of linear and nonlinear non-autonomous systems from noisy and sparse data. We propose and analyze an objective function derived from a Bayesian formulation for learning a hidden Markov model with stochastic dynamics. We then analyze this objective function in the context of several state-of-the-art approaches for both linear and nonlinear system ID. In the former, we analyze least squares approaches for Markov parameter estimation, and in the latter, we analyze the multiple shooting approach. We demonstrate the limitations of the optimization problems posed by these existing methods by showing that they can be seen as special cases of the proposed optimization objective under certain simplifying assumptions: conditional independence of data and zero model error. Furthermore, we observe that our proposed approach has improved smoothness and inherent regularization that make it well-suited for system ID and provide mathematical explanations for these characteristics' origins. Finally, numerical simulations demonstrate a mean squared error over 8.7 times lower compared to multiple shooting when data are noisy and/or sparse. Moreover, the proposed approach can identify accurate and generalizable models even when there are more parameters than data or when the underlying system exhibits chaotic behavior.
We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss or disutility (or negative reward) and the problem is to draw valid inferences about the out-of-sample loss of the specified policy when the past data is observed under a, possibly unknown, policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finite-sample coverage guarantees that evaluate the entire loss distribution. Importantly, the method takes into account model misspecifications of the past policy -- including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under an explicitly specified range of credible model assumptions.
A common goal in network modeling is to uncover the latent community structure present among nodes. For many real-world networks, the true connections consist of events arriving as streams, which are then aggregated to form edges, ignoring the dynamic temporal component. A natural way to take account of these temporal dynamics of interactions is to use point processes as the foundation of network models for community detection. Computational complexity hampers the scalability of such approaches to large sparse networks. To circumvent this challenge, we propose a fast online variational inference algorithm for estimating the latent structure underlying dynamic event arrivals on a network, using continuous-time point process latent network models. We describe this procedure for networks models capturing community structure. This structure can be learned as new events are observed on the network, updating the inferred community assignments. We investigate the theoretical properties of such an inference scheme, and provide regret bounds on the loss function of this procedure. The proposed inference procedure is then thoroughly compared, using both simulation studies and real data, to non-online variants. We demonstrate that online inference can obtain comparable performance, in terms of community recovery, to non-online variants, while realising computational gains. Our proposed inference framework can also be readily modified to incorporate other popular network structures.
In this paper, we consider stochastic versions of three classical growth models given by ordinary differential equations (ODEs). Indeed we use stochastic versions of Von Bertalanffy, Gompertz, and Logistic differential equations as models. We assume that each stochastic differential equation (SDE) has some crucial parameters in the drift to be estimated and we use the Maximum Likelihood Estimator (MLE) to estimate them. For estimating the diffusion parameter, we use the MLE for two cases and the quadratic variation of the data for one of the SDEs. We apply the Akaike information criterion (AIC) to choose the best model for the simulated data. We consider that the AIC is a function of the drift parameter. We present a simulation study to validate our selection method. The proposed methodology could be applied to datasets with continuous and discrete observations, but also with highly sparse data. Indeed, we can use this method even in the extreme case where we have observed only one point for each path, under the condition that we observed a sufficient number of trajectories. For the last two cases, the data can be viewed as incomplete observations of a model with a tractable likelihood function; then, we propose a version of the Expectation Maximization (EM) algorithm to estimate these parameters. This type of datasets typically appears in fishery, for instance.
In this paper, I try to tame "Basu's elephants" (data with extreme selection on observables). I propose new practical large-sample and finite-sample methods for estimating and inferring heterogeneous causal effects (under unconfoundedness) in the empirically relevant context of limited overlap. I develop a general principle called "Stable Probability Weighting" (SPW) that can be used as an alternative to the widely used Inverse Probability Weighting (IPW) technique, which relies on strong overlap. I show that IPW (or its augmented version), when valid, is a special case of the more general SPW (or its doubly robust version), which adjusts for the extremeness of the conditional probabilities of the treatment states. The SPW principle can be implemented using several existing large-sample parametric, semiparametric, and nonparametric procedures for conditional moment models. In addition, I provide new finite-sample results that apply when unconfoundedness is plausible within fine strata. Since IPW estimation relies on the problematic reciprocal of the estimated propensity score, I develop a "Finite-Sample Stable Probability Weighting" (FPW) set-estimator that is unbiased in a sense. I also propose new finite-sample inference methods for testing a general class of weak null hypotheses. The associated computationally convenient methods, which can be used to construct valid confidence sets and to bound the finite-sample confidence distribution, are of independent interest. My large-sample and finite-sample frameworks extend to the setting of multivalued treatments.
We consider a high-dimensional sparse normal means model where the goal is to estimate the mean vector assuming the proportion of non-zero means is unknown. We model the mean vector by a one-group global-local shrinkage prior belonging to a broad class of such priors that includes the horseshoe prior. We address some questions related to asymptotic properties of the resulting posterior distribution of the mean vector for the said class priors. We consider two ways to model the global parameter in this paper. Firstly by considering this as an unknown fixed parameter and then by an empirical Bayes estimate of it. In the second approach, we do a hierarchical Bayes treatment by assigning a suitable non-degenerate prior distribution to it. We first show that for the class of priors under study, the posterior distribution of the mean vector contracts around the true parameter at a near minimax rate when the empirical Bayes approach is used. Next, we prove that in the hierarchical Bayes approach, the corresponding Bayes estimate attains the minimax risk asymptotically under the squared error loss function. We also show that the posterior contracts around the true parameter at a near minimax rate. These results generalize those of van der Pas et al. (2014) \cite{van2014horseshoe}, (2017) \cite{van2017adaptive}, proved for the horseshoe prior. We have also studied in this work the asymptotic Bayes optimality of global-local shrinkage priors where the number of non-null hypotheses is unknown. Here our target is to propose some conditions on the prior density of the global parameter such that the Bayes risk induced by the decision rule attains Optimal Bayes risk, up to some multiplicative constant. Using our proposed condition, under the asymptotic framework of Bogdan et al. (2011) \cite{bogdan2011asymptotic}, we are able to provide an affirmative answer to satisfy our hunch.
This paper studies computationally and theoretically attractive estimators called the Laplace type estimators (LTE), which include means and quantiles of Quasi-posterior distributions defined as transformations of general (non-likelihood-based) statistical criterion functions, such as those in GMM, nonlinear IV, empirical likelihood, and minimum distance methods. The approach generates an alternative to classical extremum estimation and also falls outside the parametric Bayesian approach. For example, it offers a new attractive estimation method for such important semi-parametric problems as censored and instrumental quantile, nonlinear GMM and value-at-risk models. The LTE's are computed using Markov Chain Monte Carlo methods, which help circumvent the computational curse of dimensionality. A large sample theory is obtained for regular cases.
We propose a flexible algorithm for feature detection and hypothesis testing in images with ultra low signal-to-noise ratio using cubical persistent homology. Our main application is in the identification of atomic columns and other features in transmission electron microscopy (TEM). Cubical persistent homology is used to identify local minima and their size in subregions in the frames of nanoparticle videos, which are hypothesized to correspond to relevant atomic features. We compare the performance of our algorithm to other employed methods for the detection of columns and their intensity. Additionally, Monte Carlo goodness-of-fit testing using real valued summaries of persistence diagrams derived from smoothed images (generated from pixels residing in the vacuum region of an image) is developed and employed to identify whether or not the proposed atomic features generated by our algorithm are due to noise. Using these summaries derived from the generated persistence diagrams, one can produce univariate time series for the nanoparticle videos, thus providing a means for assessing fluxional behavior. A guarantee on the false discovery rate for multiple Monte Carlo testing of identical hypotheses is also established.
In this article we prove that estimator stability is enough to show that leave-one-out cross validation is a sound procedure, by providing concentration bounds in a general framework. In particular, we provide concentration bounds beyond Lipschitz continuity assumptions on the loss or on the estimator. In order to obtain our results, we rely on random variables with distribution satisfying the logarithmic Sobolev inequality, providing us a relatively rich class of distributions. We illustrate our method by considering several interesting examples, including linear regression, kernel density estimation, and stabilized / truncated estimators such as stabilized kernel regression.
This note addresses the question of optimally estimating a linear functional of an object acquired through linear observations corrupted by random noise, where optimality pertains to a worst-case setting tied to a symmetric, convex, and closed model set containing the object. It complements the article "Statistical Estimation and Optimal Recovery" published in the Annals of Statistics in 1994. There, Donoho showed (among other things) that, for Gaussian noise, linear maps provide near-optimal estimation schemes relatively to a performance measure relevant in Statistical Estimation. Here, we advocate for a different performance measure arguably more relevant in Optimal Recovery. We show that, relatively to this new measure, linear maps still provide near-optimal estimation schemes even if the noise is merely log-concave. Our arguments, which make a connection to the deterministic noise situation and bypass properties specific to the Gaussian case, offer an alternative to parts of Donoho's proof.