Banks and financial institutions all over the world manage portfolios containing tens of thousands of customers. Not all customers are high credit-worthy, and many possess varying degrees of risk to the Bank or financial institutions that lend money to these customers. Hence assessment of credit risk is paramount in the field of credit risk management. This paper discusses the use of Bayesian principles and simulation-techniques to estimate and calibrate the default probability of credit ratings. The methodology is a two-phase approach where, in the first phase, a posterior density of default rate parameter is estimated based the default history data. In the second phase of the approach, an estimate of true default rate parameter is obtained through simulations
This paper proposes a data-driven set-based estimation algorithm for a class of nonlinear systems with polynomial nonlinearities. Using the system's input-output data, the proposed method computes in real-time a set that guarantees the inclusion of the system's state. Although the system is assumed to be polynomial type, the exact polynomial functions and their coefficients need not be known. To this end, the estimator relies on offline and online phases. The offline phase utilizes past input-output data to estimate a set of possible coefficients of the polynomial system. Then, using this estimated set of coefficients and the side information about the system, the online phase provides a set estimate of the state. Finally, the proposed methodology is evaluated through its application on SIR (Susceptible, Infected, Recovered) epidemic model.
Estimates from infectious disease models have constituted a significant part of the scientific evidence used to inform the response to the COVID-19 pandemic in the UK. These estimates can vary strikingly in their precision, with some being over-confident and others over-cautious. The uncertainty in epidemiological forecasts should be commensurate with the errors in their predictions. We propose Normalised Estimation Error Squared (NEES) as a metric for assessing the consistency between forecasts and future observations. We introduce a novel infectious disease model for COVID-19 and use it to demonstrate the usefulness of NEES for diagnosing over-confident and over-cautious predictions resulting from different values of a regularization parameter.
In this paper, we derive the mixed and componentwise condition numbers for a linear function of the solution to the total least squares with linear equality constraint (TLSE) problem. The explicit expressions of the mixed and componentwise condition numbers by dual techniques under both unstructured and structured componentwise perturbations is considered. With the intermediate result, i.e. we can recover the both unstructured and structured condition number for the TLS problem. We choose the small-sample statistical condition estimation method to estimate both unstructured and structured condition numbers with high reliability. Numerical experiments are provided to illustrate the obtained results.
This study concerns probability distribution estimation of sample maximum. The traditional approach is the parametric fitting to the limiting distribution - the generalized extreme value distribution; however, the model in finite cases is misspecified to a certain extent. We propose a plug-in type of nonparametric estimator which does not need model specification. It is proved that both asymptotic convergence rates depend on the tail index and the second order parameter. As the tail gets light, the degree of misspecification of the parametric fitting becomes large, that means the convergence rate becomes slow. In the Weibull cases, which can be seen as the limit of tail-lightness, only the nonparametric distribution estimator keeps its consistency. Finally, we report the results of numerical experiments.
The Wasserstein distance, rooted in optimal transport (OT) theory, is a popular discrepancy measure between probability distributions with various applications to statistics and machine learning. Despite their rich structure and demonstrated utility, Wasserstein distances are sensitive to outliers in the considered distributions, which hinders applicability in practice. Inspired by the Huber contamination model, we propose a new outlier-robust Wasserstein distance $\mathsf{W}_p^\varepsilon$ which allows for $\varepsilon$ outlier mass to be removed from each contaminated distribution. Our formulation amounts to a highly regular optimization problem that lends itself better for analysis compared to previously considered frameworks. Leveraging this, we conduct a thorough theoretical study of $\mathsf{W}_p^\varepsilon$, encompassing characterization of optimal perturbations, regularity, duality, and statistical estimation and robustness results. In particular, by decoupling the optimization variables, we arrive at a simple dual form for $\mathsf{W}_p^\varepsilon$ that can be implemented via an elementary modification to standard, duality-based OT solvers. We illustrate the benefits of our framework via applications to generative modeling with contaminated datasets.
Variational methods are extremely popular in the analysis of network data. Statistical guarantees obtained for these methods typically provide asymptotic normality for the problem of estimation of global model parameters under the stochastic block model. In the present work, we consider the case of networks with missing links that is important in application and show that the variational approximation to the maximum likelihood estimator converges at the minimax rate. This provides the first minimax optimal and tractable estimator for the problem of parameter estimation for the stochastic block model with missing links. We complement our results with numerical studies of simulated and real networks, which confirm the advantages of this estimator over current methods.
During the COVID-19 pandemic, governments and public health authorities have used seroprevalence studies to estimate the proportion of persons within a given population who have antibodies to SARS-CoV-2. Seroprevalence is crucial for estimating quantities such as the infection fatality ratio, proportion of asymptomatic cases, and differences in infection rates across population subgroups. However, serologic assays are prone to false positives and negatives, and non-probability sampling methods may induce selection bias. In this paper, we consider nonparametric and parametric seroprevalence estimators that address both challenges by leveraging validation data and assuming equal probabilities of sample inclusion within covariate-defined strata. Both estimators are shown to be consistent and asymptotically normal, and consistent variance estimators are derived. Simulation studies are presented comparing the finite sample performance of the estimators over a range of assay characteristics and sampling scenarios. The methods are used to estimate SARS-CoV-2 seroprevalence in asymptomatic individuals in Belgium and North Carolina.
Non-stationary signals are ubiquitous in real life. Many techniques have been proposed in the last decades which allow decomposing multi-component signals into simple oscillatory mono-components, like the groundbreaking Empirical Mode Decomposition technique and the Iterative Filtering method. When a signal contains mono-components that have rapid varying instantaneous frequencies, we can think, for instance, to chirps or whistles, it becomes particularly hard for most techniques to properly factor out these components. The Adaptive Local Iterative Filtering technique has recently gained interest in many applied fields of research for being able to deal with non-stationary signals presenting amplitude and frequency modulation. In this work, we address the open question of how to guarantee a priori convergence of this technique, and propose two new algorithms. The first method, called Stable Adaptive Local Iterative Filtering, is a stabilized version of the Adaptive Local Iterative Filtering that we prove to be always convergent. The stability, however, comes at the cost of higher complexity in the calculations. The second technique, called Resampled Iterative Filtering, is a new generalization of the Iterative Filtering method. We prove that Resampled Iterative Filtering is guaranteed to converge a priori for any kind of signal. Furthermore, in the discrete setting, by leveraging on the mathematical properties of the matrices involved, we show that its calculations can be accelerated drastically. Finally, we present some artificial and real-life examples to show the powerfulness and performance of the proposed methods.
In clinical research, the effect of a treatment or intervention is widely assessed through clinical importance, instead of statistical significance. In this paper, we propose a principled statistical inference framework to learning the minimal clinically important difference (MCID), a vital concept in assessing clinical importance. We formulate the scientific question into a novel statistical learning problem, develop an efficient algorithm for parameter estimation, and establish the asymptotic theory for the proposed estimator. We conduct comprehensive simulation studies to examine the finite sample performance of the proposed method. We also re-analyze the ChAMP (Chondral Lesions And Meniscus Procedures) trial, where the primary outcome is the patient-reported pain score and the ultimate goal is to determine whether there exists a significant difference in post-operative knee pain between patients undergoing debridement versus observation of chondral lesions during the surgery. Some previous analysis of this trial exhibited that the effect of debriding the chondral lesions does not reach a statistical significance. Our analysis reinforces this conclusion that the effect of debriding the chondral lesions is not only statistically non-significant, but also clinically un-important.
Data augmentation has been widely used for training deep learning systems for medical image segmentation and plays an important role in obtaining robust and transformation-invariant predictions. However, it has seldom been used at test time for segmentation and not been formulated in a consistent mathematical framework. In this paper, we first propose a theoretical formulation of test-time augmentation for deep learning in image recognition, where the prediction is obtained through estimating its expectation by Monte Carlo simulation with prior distributions of parameters in an image acquisition model that involves image transformations and noise. We then propose a novel uncertainty estimation method based on the formulated test-time augmentation. Experiments with segmentation of fetal brains and brain tumors from 2D and 3D Magnetic Resonance Images (MRI) showed that 1) our test-time augmentation outperforms a single-prediction baseline and dropout-based multiple predictions, and 2) it provides a better uncertainty estimation than calculating the model-based uncertainty alone and helps to reduce overconfident incorrect predictions.