Extreme streamflow is a key indicator of flood risk, and quantifying the changes in its distribution under non-stationary climate conditions is key to mitigating the impact of flooding events. We propose a non-stationary process mixture model (NPMM) for annual streamflow maxima over the central US (CUS) which uses downscaled climate model precipitation projections to forecast extremal streamflow. Spatial dependence for the model is specified as a convex combination of transformed Gaussian and max-stable processes, indexed by a weight parameter which identifies the asymptotic regime of the process. The weight parameter is modeled as a function of region and of regional precipitation, introducing spatio-temporal non-stationarity within the model. The NPMM is flexible with desirable tail dependence properties, but yields an intractable likelihood. To address this, we embed a neural network within a density regression model which is used to learn a synthetic likelihood function using simulations from the NPMM with different parameter settings. Our model is fitted using observational data for 1972-2021, and inference carried out in a Bayesian framework. Annual streamflow maxima forecasts for 2021-2035 estimate an increase in the frequency and magnitude of extreme streamflow, with changes being more pronounced in the largest quantiles of the projected annual streamflow maxima.
Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models, but it can be affected by the presence of ``informative'' labels, which occur when some classes are more likely to be labeled than others. In the missing data literature, such labels are called missing not at random. In this paper, we propose a novel approach to address this issue by estimating the missing-data mechanism and using inverse propensity weighting to debias any SSL algorithm, including those using data augmentation. We also propose a likelihood ratio test to assess whether or not labels are indeed informative. Finally, we demonstrate the performance of the proposed methods on different datasets, in particular on two medical datasets for which we design pseudo-realistic missing data scenarios.
A practical challenge for structural estimation is the requirement to accurately minimize a sample objective function which is often non-smooth, non-convex, or both. This paper proposes a simple algorithm designed to find accurate solutions without performing an exhaustive search. It augments each iteration from a new Gauss-Newton algorithm with a grid search step. A finite sample analysis derives its optimization and statistical properties simultaneously using only econometric assumptions. After a finite number of iterations, the algorithm automatically transitions from global to fast local convergence, producing accurate estimates with high probability. Simulated examples and an empirical application illustrate the results.
Spatial process models popular in geostatistics often represent the observed data as the sum of a smooth underlying process and white noise. The variation in the white noise is attributed to measurement error, or micro-scale variability, and is called the "nugget". We formally establish results on the identifiability and consistency of the nugget in spatial models based upon the Gaussian process within the framework of in-fill asymptotics, i.e. the sample size increases within a sampling domain that is bounded. Our work extends results in fixed domain asymptotics for spatial models without the nugget. More specifically, we establish the identifiability of parameters in the Mat\'ern covariance function and the consistency of their maximum likelihood estimators in the presence of discontinuities due to the nugget. We also present simulation studies to demonstrate the role of the identifiable quantities in spatial interpolation.
Very unhealthy air quality is consistently connected with numerous diseases. Appropriate extreme analysis and accurate predictions are in rising demand for exploring potential linked causes and for providing suggestions for the environmental agency in public policy strategy. This paper aims to model the spatial and temporal pattern of both moderate and very poor PM10 concentrations collected from 342 representative monitors distributed throughout mainland Spain from 2017 to 2021. We firstly propose and compare a series of Bayesian generalized extreme models of annual maxima PM10 concentrations, including both the fixed effect as well as the spatio-temporal random effect with the Stochastic Partial Differential Equation approach and a lag-one dynamic auto-regressive component. The similar and different effects of interrelated factors are identified through a joint Bayesian model of annual mean and annual maxima PM10 concentrations, which may bring the power of statistical inference of body data to the tail analysis with implementation in the faster and more accurate Integrated Nested Laplace Approximation (INLA) algorithm with respect to MCMC. Under WAIC, DIC and other criteria, the best model is selected with good predictive ability based on the first four-year data for training and the last-year data for testing. The findings are applied to identify the hot-spot regions with extremely poor quality using excursion functions specified at the grid level. It suggests that the community of Madrid and the northwestern boundary of Spain are likely to be exposed to severe air pollution simultaneously exceeding the warning risk threshold. The joint model also provides evidence that certain predictors (precipitation, vapour pressure and population density) influence comparably while the other predictors (altitude and temperature) impact oppositely in the different scaled PM10 concentrations.
Robust estimation under multivariate normal (MVN) mixture model is always a computational challenge. A recently proposed maximum pseudo \b{eta}-likelihood estimator aims to estimate the unknown parameters of a MVN mixture model in the spirit of minimum density power divergence (DPD) methodology but with a relatively simpler and tractable computational algorithm even for larger dimensions. In this letter, we will rigorously derive the existence and weak consistency of the maximum pseudo \b{eta}-likelihood estimator in case of MVN mixture models under a reasonable set of assumptions.
This paper develops and analyzes an accelerated proximal descent method for finding stationary points of nonconvex composite optimization problems. The objective function is of the form $f+h$ where $h$ is a proper closed convex function, $f$ is a differentiable function on the domain of $h$, and $\nabla f$ is Lipschitz continuous on the domain of $h$. The main advantage of this method is that it is "parameter-free" in the sense that it does not require knowledge of the Lipschitz constant of $\nabla f$ or of any global topological properties of $f$. It is shown that the proposed method can obtain an $\varepsilon$-approximate stationary point with iteration complexity bounds that are optimal, up to logarithmic terms over $\varepsilon$, in both the convex and nonconvex settings. Some discussion is also given about how the proposed method can be leveraged in other existing optimization frameworks, such as min-max smoothing and penalty frameworks for constrained programming, to create more specialized parameter-free methods. Finally, numerical experiments are presented to support the practical viability of the method.
The Distributional Random Forest (DRF) is a recently introduced Random Forest algorithm to estimate multivariate conditional distributions. Due to its general estimation procedure, it can be employed to estimate a wide range of targets such as conditional average treatment effects, conditional quantiles, and conditional correlations. However, only results about the consistency and convergence rate of the DRF prediction are available so far. We characterize the asymptotic distribution of DRF and develop a bootstrap approximation of it. This allows us to derive inferential tools for quantifying standard errors and the construction of confidence regions that have asymptotic coverage guarantees. In simulation studies, we empirically validate the developed theory for inference of low-dimensional targets and for testing distributional differences between two populations.
This paper proposes a flexible framework for inferring large-scale time-varying and time-lagged correlation networks from multivariate or high-dimensional non-stationary time series with piecewise smooth trends. Built on a novel and unified multiple-testing procedure of time-lagged cross-correlation functions with a fixed or diverging number of lags, our method can accurately disclose flexible time-varying network structures associated with complex functional structures at all time points. We broaden the applicability of our method to the structure breaks by developing difference-based nonparametric estimators of cross-correlations, achieve accurate family-wise error control via a bootstrap-assisted procedure adaptive to the complex temporal dynamics, and enhance the probability of recovering the time-varying network structures using a new uniform variance reduction technique. We prove the asymptotic validity of the proposed method and demonstrate its effectiveness in finite samples through simulation studies and empirical applications.
The paper addresses the problem of estimation of the model parameters of the logistic exponential distribution based on progressive type-I hybrid censored sample. The maximum likelihood estimates are obtained and computed numerically using Newton-Raphson method. Further, the Bayes estimates are derived under squared error, LINEX and generalized entropy loss functions. Two types (independent and bivariate) of prior distributions are considered for the purpose of Bayesian estimation. It is seen that the Bayes estimates are not of explicit forms.Thus, Lindley's approximation technique is employed to get approximate Bayes estimates. Interval estimates of the parameters based on normal approximate of the maximum likelihood estimates and normal approximation of the log-transformed maximum likelihood estimates are constructed. The highest posterior density credible intervals are obtained by using the importance sampling method. Furthermore, numerical computations are reported to review some of the results obtained in the paper. A real life dataset is considered for the purpose of illustrations.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.