The use of expectiles in risk management contexts has recently gathered substantial momentum because of their excellent axiomatic and probabilistic properties. While expectile estimation at central levels already has a substantial history, expectile estimation at extreme levels has so far only been considered when the underlying distribution has a heavy right tail. This article focuses on the challenging short-tailed setting when the distribution of the variable of interest has a negative extreme value index and is bounded to the right. We derive an asymptotic expansion of extreme expectiles in this context under a general second-order extreme value condition. This asymptotic expansion makes it possible to study two semiparametric estimators of extreme expectiles, whose asymptotic properties we obtain in a general model of strictly stationary but weakly dependent observations. A simulation study and real data analysis illustrate the performance of the proposed statistical techniques.
We identify the average dose-response function (ADRF) for a continuously valued error-contaminated treatment by a weighted conditional expectation. We then estimate the weights nonparametrically by maximising a local generalised empirical likelihood subject to an expanding set of conditional moment equations incorporated into the deconvolution kernels. Thereafter, we construct a deconvolution kernel estimator of ADRF. We derive the asymptotic bias and variance of our ADRF estimator and provide its asymptotic linear expansion, which helps conduct statistical inference. To select our smoothing parameters, we adopt the simulation-extrapolation method and propose a new extrapolation procedure to stabilise the computation. Monte Carlo simulations and a real data study illustrate our method's practical performance.
Sampling-based inference techniques are central to modern cosmological data analysis; these methods, however, scale poorly with dimensionality and typically require approximate or intractable likelihoods. In this paper we describe how Truncated Marginal Neural Ratio Estimation (TMNRE) (a new approach in so-called simulation-based inference) naturally evades these issues, improving the $(i)$ efficiency, $(ii)$ scalability, and $(iii)$ trustworthiness of the inferred posteriors. Using measurements of the Cosmic Microwave Background (CMB), we show that TMNRE can achieve converged posteriors using orders of magnitude fewer simulator calls than conventional Markov Chain Monte Carlo (MCMC) methods. Remarkably, the required number of samples is effectively independent of the number of nuisance parameters. In addition, a property called \emph{local amortization} allows the performance of rigorous statistical consistency checks that are not accessible to sampling-based methods. TMNRE promises to become a powerful tool for cosmological data analysis, particularly in the context of extended cosmologies, where the timescale required for conventional sampling-based inference methods to converge can greatly exceed that of simple cosmological models such as $\Lambda$CDM. To perform these computations, we use an implementation of TMNRE via the open-source code \texttt{swyft}.
We investigate extreme value theory of a class of random sequences defined by the all-time suprema of aggregated self-similar Gaussian processes with trend. This study is motivated by its potential applications in various areas and its theoretical interestingness. We consider both stationary sequences and non-stationary sequences obtained by considering whether the trend functions are identical or not. We show that a sequence of suitably normalised $k$th order statistics converges in distribution to a limiting random variable which can be a negative log transformed Erlang distributed random variable, a Normal random variable or a mixture of them, according to three conditions deduced through the model parameters. Remarkably, this phenomenon resembles that for the stationary Normal sequence. We also show that various moments of the normalised $k$th order statistics converge to the moments of the corresponding limiting random variable. The obtained results enable us to analyze various properties of these random sequences, which reveals the interesting particularities of this class of random sequences in extreme value theory.
The Akaike information criterion (AIC) is a common tool for model selection. It is frequently used in violation of regularity conditions at parameter space singularities and boundaries. The expected AIC is generally not asymptotically equivalent to its target at singularities and boundaries, and convergence to the target at nearby parameter points may be slow. We develop a generalized AIC for candidate models with or without singularities and boundaries. We show that the expectation of this generalized form converges everywhere in the parameter space, and its convergence can be faster than that of the AIC. We illustrate the generalized AIC on example models from phylogenomics, showing that it can outperform the AIC and gives rise to an interpolated effective number of model parameters, which can differ substantially from the number of parameters near singularities and boundaries. We outline methods for estimating the often unknown generating parameter and bias correction term of the generalized AIC.
This work considers Gaussian process interpolation with a periodized version of the Mat{\'e}rn covariance function (Stein, 1999, Section 6.7) with Fourier coefficients $\phi$($\alpha$^2 + j^2)^(--$\nu$--1/2). Convergence rates are studied for the joint maximum likelihood estimation of $\nu$ and $\phi$ when the data is sampled according to the model. The mean integrated squared error is also analyzed with fixed and estimated parameters, showing that maximum likelihood estimation yields asymptotically the same error as if the ground truth was known. Finally, the case where the observed function is a ''deterministic'' element of a continuous Sobolev space is also considered, suggesting that bounding assumptions on some parameters can lead to different estimates.
We consider a strictly stationary random field on the two-dimensional integer lattice with regularly varying marginal and finite-dimensional distributions. Exploiting the regular variation, we define the spatial extremogram which takes into account only the largest values in the random field. This extremogram is a spatial autocovariance function. We define the corresponding extremal spectral density and its estimator, the extremal periodogram. Based on the extremal periodogram, we consider the Whittle estimator for suitable classes of parametric random fields including the Brown-Resnick random field and regularly varying max-moving averages.
The horseshoe prior is known to possess many desirable properties for Bayesian estimation of sparse parameter vectors, yet its density function lacks an analytic form. As such, it is challenging to find a closed-form solution for the posterior mode. Conventional horseshoe estimators use the posterior mean to estimate the parameters, but these estimates are not sparse. We propose a novel expectation-maximisation (EM) procedure for computing the MAP estimates of the parameters in the case of the standard linear model. A particular strength of our approach is that the M-step depends only on the form of the prior and it is independent of the form of the likelihood. We introduce several simple modifications of this EM procedure that allow for straightforward extension to generalised linear models. In experiments performed on simulated and real data, our approach performs comparable, or superior to, state-of-the-art sparse estimation methods in terms of statistical performance and computational cost.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.
A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.