This paper deals with a projection least square estimator of the function $J_0$ computed from multiple independent observations on $[0,T]$ of the process $Z$ defined by $dZ_t = J_0(t)d\langle M\rangle_t + dM_t$, where $M$ is a centered, continuous and square integrable martingale vanishing at $0$. Risk bounds are established on this estimator and on an associated adaptive estimator. An appropriate transformation allows to rewrite the differential equation $dX_t = V(X_t)(b_0(t)dt +\sigma(t)dB_t)$, where $B$ is a fractional Brownian motion of Hurst parameter $H\in (1/2,1)$, as a model of the previous type. So, the second part of the paper deals with risk bounds on a nonparametric estimator of $b_0$ derived from the results on the projection least square estimator of $J_0$. In particular, our results apply to the estimation of the drift function in a non-autonomous extension of the fractional Black-Scholes model introduced in Hu et al. (2003).
Given many popular functional forms for the Lorenz curve do not have a closed-form expression for the Gini index and no study has utilized the observed Gini index to estimate parameter(s) associated with the corresponding parametric functional form, a simple method for estimating the Lorenz curve is introduced. It utilizes 3 indicators, namely, the Gini index and the income shares of the bottom and the top in order to calculate the values of parameters associated with the specified functional form which has a closed-form expression for the Gini index. No error minimization technique is required in order to estimate the Lorenz curve. The data on the Gini index and the income shares of 4 countries that have different level of income inequality, economic, sociological, and regional backgrounds from the United Nations University-World Income Inequality Database are used to illustrate how the simple method works. The overall results indicate that the estimated Lorenz curves fit the actual observations practically well. This simple method could be useful in the situation where the availability of data on income distribution is low. However, if more data on income distribution are available, this study shows that the specified functional form could be used to directly estimate the Lorenz curve. Moreover, the estimated values of the Gini index calculated based on the specified functional form are virtually identical to their actual observations.
In this study we consider the problem of detecting and quantifying changes in the distribution of the annual maximum daily maximum temperature (TXx) in a large gridded data set of European daily temperature during the years 1950-2018. Several statistical models are considered, each of which models TXx using a generalized extreme value (GEV) distribution with the GEV parameters varying smoothly over space. In contrast to several previous studies which fit independent GEV models at the grid box level, our models pull information from neighbouring grid boxes for more efficient parameter estimation. The GEV location and scale parameters are allowed to vary in time using the log of atmospheric CO2 as a covariate. Changes are detected most strongly in the GEV location parameter with the TXx distributions generally shifting towards hotter temperatures. Averaged across our spatial domain, the 100-year return level of TXx based on the 2018 climate is approximately 2{\deg}C hotter than that based on the 1950 climate. Moreover, also averaging across our spatial domain, the 100-year return level of TXx based on the 1950 climate corresponds approximately to a 6-year return level in the 2018 climate.
This study considers a new multi-term urn process that has a correlation in the same term and temporal correlation. The objective is to clarify the relationship between the urn model and the Hawkes process. Correlation in the same term is represented by the P\'{o}lya urn model and the temporal correlation is incorporated by introducing the conditional initial condition. In the double-scaling limit of this urn process, the self-exciting negative binomial distribution (SE-NBD) process, which is a marked point process, is obtained. In the standard continuous limit, this process becomes the Hawkes process, which has no correlation in the same term. The difference is the variance of the intensity function in that the phase transition from the steady to the non-steady state can be observed. The critical point, at which the power law distribution is obtained, is the same for the Hawkes and the urn processes. These two processes are used to analyze empirical data of financial default to estimate the parameters of the model. For the default portfolio, the results produced by the urn process are superior to those obtained with the Hawkes process and confirm self-excitation.
Trimming consists of cutting away parts of a geometric domain, without reconstructing a global parametrization (meshing). It is a widely used operation in computer aided design, which generates meshes that are unfitted with the described physical object. This paper develops an adaptive mesh refinement strategy on trimmed geometries in the context of hierarchical B-spline based isogeometric analysis. A residual a posteriori estimator of the energy norm of the numerical approximation error is derived, in the context of Poisson equation. The reliability of the estimator is proven, and the effectivity index is shown to be independent from the number of hierarchical levels and from the way the trimmed boundaries cut the underlying mesh. In particular, it is thus independent from the size of the active part of the trimmed mesh elements. Numerical experiments are performed to validate the presented theory.
The R package BayesPPD (Bayesian Power Prior Design) supports Bayesian power and type I error calculation and model fitting after incorporating historical data with the power prior and the normalized power prior for generalized linear models (GLM). The package accommodates summary level data or subject level data with covariate information. It supports use of multiple historical datasets as well as design without historical data. Supported distributions for responses include normal, binary (Bernoulli/binomial), Poisson and exponential. The power parameter $a_0$ can be fixed or modeled as random using a normalized power prior for each of these distributions. In addition, the package supports the use of arbitrary sampling priors for computing Bayesian power and type I error rates, and has specific features for GLMs that semi-automatically generate sampling priors from historical data. Since sample size determination (SSD) for GLMs is computationally intensive, an approximation method based on asymptotic theory has been implemented to support applications using the power prior. In addition to describing the statistical methodology and functions implemented in the package to enable SSD, we also demonstrate the use of BayesPPD in two comprehensive case studies.
In Chen and Zhou 2021, they consider an inference problem for an Ornstein-Uhlenbeck process driven by a general one-dimensional centered Gaussian process $(G_t)_{t\ge 0}$. The second order mixed partial derivative of the covariance function $ R(t,\, s)=\mathbb{E}[G_t G_s]$ can be decomposed into two parts, one of which coincides with that of fractional Brownian motion and the other is bounded by $(ts)^{H-1}$ with $H\in (\frac12,\,1)$, up to a constant factor. In this paper, we investigate the same problem but with the assumption of $H\in (0,\,\frac12)$. It is well known that there is a significant difference between the Hilbert space associated with the fractional Gaussian processes in the case of $H\in (\frac12, 1)$ and that of $H\in (0, \frac12)$. The starting point of this paper is a new relationship between the inner product of $\mathfrak{H}$ associated with the Gaussian process $(G_t)_{t\ge 0}$ and that of the Hilbert space $\mathfrak{H}_1$ associated with the fractional Brownian motion $(B^{H}_t)_{t\ge 0}$. Then we prove the strong consistency with $H\in (0, \frac12)$, and the asymptotic normality and the Berry-Ess\'{e}en bounds with $H\in (0,\frac38)$ for both the least squares estimator and the moment estimator of the drift parameter constructed from the continuous observations. A good many inequality estimates are involved in and we also make use of the estimation of the inner product based on the results of $\mathfrak{H}_1$ in Hu, Nualart and Zhou 2019.
This study concerns probability distribution estimation of sample maximum. The traditional approach is the parametric fitting to the limiting distribution - the generalized extreme value distribution; however, the model in finite cases is misspecified to a certain extent. We propose a plug-in type of the kernel distribution estimator which does not need model specification. It is proved that both asymptotic convergence rates depend on the tail index and the second order parameter. As the tail gets light, the degree of misspecification of the parametric fitting becomes large, that means the convergence rate becomes slow. In the Weibull cases, which can be seen as the limit of tail-lightness, only the nonparametric distribution estimator keeps its consistency. Finally, we report results of numerical experiments and two real case studies.
The paper concerns convergence and asymptotic statistics for stochastic approximation driven by Markovian noise: $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) \,,\quad n\ge 0, $$ in which each $\theta_n\in\Re^d$, $ \{ \Phi_n \}$ is a Markov chain on a general state space X with stationary distribution $\pi$, and $f:\Re^d\times \text{X} \to\Re^d$. In addition to standard Lipschitz bounds on $f$, and conditions on the vanishing step-size sequence $\{\alpha_n\}$, it is assumed that the associated ODE is globally asymptotically stable with stationary point denoted $\theta^*$, where $\bar f(\theta)=E[f(\theta,\Phi)]$ with $\Phi\sim\pi$. Moreover, the ODE@$\infty$ defined with respect to the vector field, $$ \bar f_\infty(\theta):= \lim_{r\to\infty} r^{-1} \bar f(r\theta) \,,\qquad \theta\in\Re^d, $$ is asymptotically stable. The main contributions are summarized as follows: (i) The sequence $\theta$ is convergent if $\Phi$ is geometrically ergodic, and subject to compatible bounds on $f$. The remaining results are established under a stronger assumption on the Markov chain: A slightly weaker version of the Donsker-Varadhan Lyapunov drift condition known as (DV3). (ii) A Lyapunov function is constructed for the joint process $\{\theta_n,\Phi_n\}$ that implies convergence of $\{ \theta_n\}$ in $L_4$. (iii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error $z_n:= (\theta_n-\theta^*)/\sqrt{\alpha_n}$. Moment bounds combined with the CLT imply convergence of the normalized covariance, $$ \lim_{n \to \infty} E [ z_n z_n^T ] = \Sigma_\theta, $$ where $\Sigma_\theta$ is the asymptotic covariance appearing in the CLT. (iv) An example is provided where the Markov chain $\Phi$ is geometrically ergodic but it does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded.
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.
Image segmentation is still an open problem especially when intensities of the interested objects are overlapped due to the presence of intensity inhomogeneity (also known as bias field). To segment images with intensity inhomogeneities, a bias correction embedded level set model is proposed where Inhomogeneities are Estimated by Orthogonal Primary Functions (IEOPF). In the proposed model, the smoothly varying bias is estimated by a linear combination of a given set of orthogonal primary functions. An inhomogeneous intensity clustering energy is then defined and membership functions of the clusters described by the level set function are introduced to rewrite the energy as a data term of the proposed model. Similar to popular level set methods, a regularization term and an arc length term are also included to regularize and smooth the level set function, respectively. The proposed model is then extended to multichannel and multiphase patterns to segment colourful images and images with multiple objects, respectively. It has been extensively tested on both synthetic and real images that are widely used in the literature and public BrainWeb and IBSR datasets. Experimental results and comparison with state-of-the-art methods demonstrate that advantages of the proposed model in terms of bias correction and segmentation accuracy.