Inhomogeneous phase-type distributions (IPH) are a broad class of laws which arise from the absorption times of Markov jump processes. In the time-homogeneous particular case, we recover phase-type (PH) distributions. In matrix notation, various functionals corresponding to their distributional properties are explicitly available and succinctly described. As the number of parameters increases, IPH distributions may converge weakly to any probability measure on the positive real line, making them particularly attractive candidates for statistical modelling purposes. Contrary to PH distributions, the IPH class allows for a wide range of tail behaviours, which often leads to adequate estimation with a moderate number of parameters. One of the main difficulties in estimating PH and IPH distributions is their large number of matrix parameters. This drawback is best handled through the expectation-maximisation (EM) algorithm, exploiting the underlying and unobserved Markov structure. The matrixdist package presents tools for IPH distributions to efficiently evaluate functionals, simulate, and carry out maximum likelihood estimation through a three-step EM algorithm. Aggregated and right-censored data are supported by the fitting routines, and in particular, one may estimate time-to-event data, histograms, or discretised theoretical distributions.
We consider the least-squares regression problem with unknown noise variance, where the observed data points are allowed to be corrupted by outliers. Building on the median-of-means (MOM) method introduced by Lecue and Lerasle Ann.Statist.48(2):906-931(April 2020) in the case of known noise variance, we propose a general MOM approach for simultaneous inference of both the regression function and the noise variance, requiring only an upper bound on the noise level. Interestingly, this generalization requires care due to regularity issues that are intrinsic to the underlying convex-concave optimization problem. In the general case where the regression function belongs to a convex class, we show that our simultaneous estimator achieves with high probability the same convergence rates and a similar risk bound as if the noise level was unknown, as well as convergence rates for the estimated noise standard deviation. In the high-dimensional sparse linear setting, our estimator yields a robust analog of the square-root LASSO. Under weak moment conditions, it jointly achieves with high probability the minimax rates of estimation $s^{1/p} \sqrt{(1/n) \log(p/s)}$ for the $\ell_p$-norm of the coefficient vector, and the rate $\sqrt{(s/n) \log(p/s)}$ for the estimation of the noise standard deviation. Here $n$ denotes the sample size, $p$ the dimension and $s$ the sparsity level. We finally propose an extension to the case of unknown sparsity level $s$, providing a jointly adaptive estimator $(\widetilde \beta, \widetilde \sigma, \widetilde s)$. It simultaneously estimates the coefficient vector, the noise level and the sparsity level, with proven bounds on each of these three components that hold with high probability.
distr6 is an object-oriented (OO) probability distributions interface leveraging the extensibility and scalability of R6, and the speed and efficiency of Rcpp. Over 50 probability distributions are currently implemented in the package with `core' methods including density, distribution, and generating functions, and more `exotic' ones including hazards and distribution function anti-derivatives. In addition to simple distributions, distr6 supports compositions such as truncation, mixtures, and product distributions. This paper presents the core functionality of the package and demonstrates examples for key use-cases. In addition this paper provides a critical review of the object-oriented programming paradigms in R and describes some novel implementations for design patterns and core object-oriented features introduced by the package for supporting distr6 components.
The problems of matrix spectral factorization and J-spectral factorization appear to be important for practical use in many MIMO control systems. We propose a numerical algorithm for J-spectral factorization which extends Janashia-Lagvilava matrix spectral factorization method to the indefinite case. The algorithm can be applied to matrices that have constant signatures for all leading principle submatrices. A numerical example is presented for illustrative purposes.
The primary objective of this scholarly work is to develop two estimation procedures - maximum likelihood estimator (MLE) and method of trimmed moments (MTM) - for the mean and variance of lognormal insurance payment severity data sets affected by different loss control mechanism, for example, truncation (due to deductibles), censoring (due to policy limits), and scaling (due to coinsurance proportions), in insurance and financial industries. Maximum likelihood estimating equations for both payment-per-payment and payment-per-loss data sets are derived which can be solved readily by any existing iterative numerical methods. The asymptotic distributions of those estimators are established via Fisher information matrices. Further, with a goal of balancing efficiency and robustness and to remove point masses at certain data points, we develop a dynamic MTM estimation procedures for lognormal claim severity models for the above-mentioned transformed data scenarios. The asymptotic distributional properties and the comparison with the corresponding MLEs of those MTM estimators are established along with extensive simulation studies. Purely for illustrative purpose, numerical examples for 1500 US indemnity losses are provided which illustrate the practical performance of the established results in this paper.
Background: Subgroup analyses are frequently conducted in randomized clinical trials to assess evidence of heterogeneous treatment effect across patient subpopulations. Although randomization balances covariates within subgroups in expectation, chance imbalance may be amplified in small subgroups and harm the precision. Covariate adjustment in overall analysis of RCT is often conducted, via either ANCOVA or propensity score weighting, but for subgroup analysis has been rarely discussed. In this article, we develop propensity score weighting methodology for covariate adjustment to improve the precision and power of subgroup analyses in RCTs. Methods: We extend the propensity score weighting methodology to subgroup analyses by fitting a logistic propensity model with pre-specified covariate-subgroup interactions. We show that, by construction, overlap weighting exactly balances the covariates with interaction terms in each subgroup. Extensive simulations were performed to compare the operating characteristics of unadjusted, different propensity score weighting and the ANCOVA estimator. We apply these methods to the HF-ACTION trial to evaluate the effect of exercise training on 6-minute walk test in pre-specified subgroups. Results: Standard errors of the adjusted estimators are smaller than those of the unadjusted estimator. The propensity score weighting estimator is as efficient as ANCOVA, and is often more efficient when subgroup sample size is small (e.g.<125), and/or when outcome model is misspecified. The weighting estimators with full-interaction propensity model consistently outperform the standard main-effect propensity model. Conclusion: Propensity score weighting is a transparent and objective method to adjust chance imbalance of important covariates in subgroup analyses of RCTs. It is crucial to include the full covariate-subgroup interactions in the propensity score model.
We provide upper bounds on the perturbation of invariant subspace of normal matrices measured using a metric in the space of vector subspaces of $\mathbb{C}^n$. We derive the upper-bounds in terms of (1) the spectrum of both the unperturbed and perturbed matrices, as well as, (2) the spectrum of the unperturbed matrix only. We show that if the spectrum is well-clustered (a relation formally described as "separation-preserving perturbation"), the later kind of upper-bound is possible and the corresponding perturbed subspace is also computable. All results are computationally favorable (e.g., computing the bounds do not require combinatorial searches or solving non-trivial optimization problems). We apply the result to a graph perturbation problem.
In this paper, the exact distribution of the largest eigenvalue of a singular random matrix for multivariate analysis of variance (MANOVA) is discussed. The key to developing the distribution theory of eigenvalues of a singular random matrix is to use heterogeneous hypergeometric functions with two matrix arguments. In this study, we define the singular beta F-matrix and extend the distributions of a nonsingular beta F -matrix to the singular case. We also give the joint density of eigenvalues and the exact distribution of the largest eigenvalue in terms of heterogeneous hypergeometric functions.
We consider the estimation of the transition matrix of a hidden Markovian process by using information geometry with respect to transition matrices. In this paper, only the histogram of $k$-memory data is used for the estimation. To establish our method, we focus on a partial observation model with the Markovian process and we propose an efficient estimator whose asymptotic estimation error is given as the inverse of projective Fisher information of transition matrices. This estimator is applied to the estimation of the transition matrix of the hidden Markovian process. In this application, we carefully discuss the equivalence problem for hidden Markovian process on the tangent space.
In this paper, we investigate the classical and Bayesian estimation of unknown parameters of the Gumbel type-II distribution based on adaptive type-II progressive hybrid censored sample (AT-II PHCS). The maximum likelihood estimates (MLEs) and maximum product spacing estimates (MPSEs) are developed and computed numerically using Newton-Raphson method. Bayesian approaches are employed to estimate parameters under symmetric and asymmetric loss functions. Bayesian estimates are not in explicit forms. Thus, Bayesian estimates are obtained by using Markov chain Monte Carlo (MCMC) method along with the Metropolis-Hastings (MH) algorithm. Based on the normality property of MLEs the asymptotic confidence intervals are constructed. Also, bootstrap intervals and highest posterior density (HPD) credible intervals are constructed. Further a Monte Carlo simulation study is carried out. Finally, the data set based on the death rate due to Covid-19 in India is analyzed for illustration of the purpose.
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.