Extreme value theory (EVT) provides an elegant mathematical tool for statistical analysis of rare events. Typically, when data are collected from multiple clusters, analysts want to preserve cluster information, such as region, period, and group. To consider large-sized cluster information in extreme value analysis, we incorporate the mixed effects model (MEM) into the regression technique in EVT. In the field of small area estimation, it is well known that the MEM is an important tool for providing reliable estimates of large-sized clusters with small sample sizes. In the context of EVT for rare event analysis, the sample size of extreme value data for each cluster is often small. Therefore, the MEM may contribute to improving the predictive accuracy of extreme value analysis. This motivates us to verify the effectiveness of the MEM in EVT through theoretical studies and numerical experiments, including its application to the risk assessment of heavy rainfall in Japan.
We propose a spectral clustering algorithm for analyzing the dependence structure of multivariate extremes. More specifically, we focus on the asymptotic dependence of multivariate extremes characterized by the angular or spectral measure in extreme value theory. Our work studies the theoretical performance of spectral clustering based on a random $k$-nearest neighbor graph constructed from an extremal sample, i.e., the angular part of random vectors for which the radius exceeds a large threshold. In particular, we derive the asymptotic distribution of extremes arising from a linear factor model and prove that, under certain conditions, spectral clustering can consistently identify the clusters of extremes arising in this model. Leveraging this result we propose a simple consistent estimation strategy for learning the angular measure. Our theoretical findings are complemented with numerical experiments illustrating the finite sample performance of our methods.
The L\'evy distribution, alongside the Normal and Cauchy distributions, is one of the only three stable distributions whose density can be obtained in a closed form. However, there are only a few specific goodness-of-fit tests for the L\'evy distribution. In this paper, two novel classes of goodness-of-fit tests for the L\'evy distribution are proposed. Both tests are based on V-empirical Laplace transforms. New tests are scale free under the null hypothesis, which makes them suitable for testing the composite hypothesis. The finite sample and limiting properties of test statistics are obtained. In addition, a generalization of the recent Bhati-Kattumannil goodness-of-fit test to the L\'evy distribution is considered. For assessing the quality of novel and competitor tests, the local Bahadur efficiencies are computed, and a wide power study is conducted. Both criteria clearly demonstrate the quality of the new tests. The applicability of the novel tests is demonstrated with two real-data examples.
In recent years, there has been a growing interest in understanding complex microstructures and their effect on macroscopic properties. In general, it is difficult to derive an effective constitutive law for such microstructures with reasonable accuracy and meaningful parameters. One numerical approach to bridge the scales is computational homogenization, in which a microscopic problem is solved at every macroscopic point, essentially replacing the effective constitutive model. Such approaches are, however, computationally expensive and typically infeasible in multi-query contexts such as optimization and material design. To render these analyses tractable, surrogate models that can accurately approximate and accelerate the microscopic problem over a large design space of shapes, material and loading parameters are required. In previous works, such models were constructed in a data-driven manner using methods such as Neural Networks (NN) or Gaussian Process Regression (GPR). However, these approaches currently suffer from issues, such as need for large amounts of training data, lack of physics, and considerable extrapolation errors. In this work, we develop a reduced order model based on Proper Orthogonal Decomposition (POD), Empirical Cubature Method (ECM) and a geometrical transformation method with the following key features: (i) large shape variations of the microstructure are captured, (ii) only relatively small amounts of training data are necessary, and (iii) highly non-linear history-dependent behaviors are treated. The proposed framework is tested and examined in two numerical examples, involving two scales and large geometrical variations. In both cases, high speed-ups and accuracies are achieved while observing good extrapolation behavior.
The emerging theory of graph limits exhibits an analytic perspective on graphs, showing that many important concepts and tools in graph theory and its applications can be described more naturally (and sometimes proved more easily) in analytic language. We extend the theory of graph limits to the ordered setting, presenting a limit object for dense vertex-ordered graphs, which we call an orderon. As a special case, this yields limit objects for matrices whose rows and columns are ordered, and for dynamic graphs that expand (via vertex insertions) over time. Along the way, we devise an ordered locality-preserving variant of the cut distance between ordered graphs, showing that two graphs are close with respect to this distance if and only if they are similar in terms of their ordered subgraph frequencies. We show that the space of orderons is compact with respect to this distance notion, which is key to a successful analysis of combinatorial objects through their limits. We derive several applications of the ordered limit theory in extremal combinatorics, sampling, and property testing in ordered graphs. In particular, we prove a new ordered analogue of the well-known result by Alon and Stav [RS\&A'08] on the furthest graph from a hereditary property; this is the first known result of this type in the ordered setting. Unlike the unordered regime, here the random graph model $G(n, p)$ with an ordering over the vertices is not always asymptotically the furthest from the property for some $p$. However, using our ordered limit theory, we show that random graphs generated by a stochastic block model, where the blocks are consecutive in the vertex ordering, are (approximately) the furthest. Additionally, we describe an alternative analytic proof of the ordered graph removal lemma [Alon et al., FOCS'17].
High covariate dimensionality is increasingly occurrent in model estimation, and existing techniques to address this issue typically require sparsity or discrete heterogeneity of the unobservable parameter vector. However, neither restriction may be supported by economic theory in some empirical contexts, leading to severe bias and misleading inference. The clustering-based grouped parameter estimator (GPE) introduced in this paper drops both restrictions in favour of the natural one that the parameter support be compact. GPE exhibits robust large sample properties under standard conditions and accommodates both sparse and non-sparse parameters whose support can be bounded away from zero. Extensive Monte Carlo simulations demonstrate the excellent performance of GPE in terms of bias reduction and size control compared to competing estimators. An empirical application of GPE to estimating price and income elasticities of demand for gasoline highlights its practical utility.
Variational inference is an alternative estimation technique for Bayesian models. Recent work shows that variational methods provide consistent estimation via efficient, deterministic algorithms. Other tools, such as model selection using variational AICs (VAIC) have been developed and studied for the linear regression case. While mixed effects models have enjoyed some study in the variational context, tools for model selection are lacking. One important feature of model selection in mixed effects models, particularly longitudinal models, is the selection of the random effects which in turn determine the covariance structure for the repeatedly sampled outcome. To address this, we derive a VAIC specifically for variational mixed effects (VME) models. We also implement a parameter-efficient VME as part of our study which reduces any general random effects structure down to a single subject-specific score. This model accommodates a wide range of random effect structures including random intercept and slope models as well as random functional effects. Our VAIC can model and perform selection on a variety of VME models including more classic longitudinal models as well as longitudinal scalar-on-function regression. As we demonstrate empirically, our VAIC performs well in discriminating between correctly and incorrectly specified random effects structures. Finally, we illustrate the use of VAICs for VMEs on two datasets: a study of lead levels in children and a study of diffusion tensor imaging.
In Causal Bayesian Optimization (CBO), an agent intervenes on an unknown structural causal model to maximize a downstream reward variable. In this paper, we consider the generalization where other agents or external events also intervene on the system, which is key for enabling adaptiveness to non-stationarities such as weather changes, market forces, or adversaries. We formalize this generalization of CBO as Adversarial Causal Bayesian Optimization (ACBO) and introduce the first algorithm for ACBO with bounded regret: Causal Bayesian Optimization with Multiplicative Weights (CBO-MW). Our approach combines a classical online learning strategy with causal modeling of the rewards. To achieve this, it computes optimistic counterfactual reward estimates by propagating uncertainty through the causal graph. We derive regret bounds for CBO-MW that naturally depend on graph-related quantities. We further propose a scalable implementation for the case of combinatorial interventions and submodular rewards. Empirically, CBO-MW outperforms non-causal and non-adversarial Bayesian optimization methods on synthetic environments and environments based on real-word data. Our experiments include a realistic demonstration of how CBO-MW can be used to learn users' demand patterns in a shared mobility system and reposition vehicles in strategic areas.
To quantify uncertainties in inverse problems of partial differential equations (PDEs), we formulate them into statistical inference problems using Bayes' formula. Recently, well-justified infinite-dimensional Bayesian analysis methods have been developed to construct dimension-independent algorithms. However, there are three challenges for these infinite-dimensional Bayesian methods: prior measures usually act as regularizers and are not able to incorporate prior information efficiently; complex noises, such as more practical non-i.i.d. distributed noises, are rarely considered; and time-consuming forward PDE solvers are needed to estimate posterior statistical quantities. To address these issues, an infinite-dimensional inference framework has been proposed based on the infinite-dimensional variational inference method and deep generative models. Specifically, by introducing some measure equivalence assumptions, we derive the evidence lower bound in the infinite-dimensional setting and provide possible parametric strategies that yield a general inference framework called the Variational Inverting Network (VINet). This inference framework can encode prior and noise information from learning examples. In addition, relying on the power of deep neural networks, the posterior mean and variance can be efficiently and explicitly generated in the inference stage. In numerical experiments, we design specific network structures that yield a computable VINet from the general inference framework. Numerical examples of linear inverse problems of an elliptic equation and the Helmholtz equation are presented to illustrate the effectiveness of the proposed inference framework.
In this paper, we propose a method for estimating model parameters using Small-Angle Scattering (SAS) data based on the Bayesian inference. Conventional SAS data analyses involve processes of manual parameter adjustment by analysts or optimization using gradient methods. These analysis processes tend to involve heuristic approaches and may lead to local solutions.Furthermore, it is difficult to evaluate the reliability of the results obtained by conventional analysis methods. Our method solves these problems by estimating model parameters as probability distributions from SAS data using the framework of the Bayesian inference. We evaluate the performance of our method through numerical experiments using artificial data of representative measurement target models.From the results of the numerical experiments, we show that our method provides not only high accuracy and reliability of estimation, but also perspectives on the transition point of estimability with respect to the measurement time and the lower bound of the angular domain of the measured data.
Pre-trained deep neural network language models such as ELMo, GPT, BERT and XLNet have recently achieved state-of-the-art performance on a variety of language understanding tasks. However, their size makes them impractical for a number of scenarios, especially on mobile and edge devices. In particular, the input word embedding matrix accounts for a significant proportion of the model's memory footprint, due to the large input vocabulary and embedding dimensions. Knowledge distillation techniques have had success at compressing large neural network models, but they are ineffective at yielding student models with vocabularies different from the original teacher models. We introduce a novel knowledge distillation technique for training a student model with a significantly smaller vocabulary as well as lower embedding and hidden state dimensions. Specifically, we employ a dual-training mechanism that trains the teacher and student models simultaneously to obtain optimal word embeddings for the student vocabulary. We combine this approach with learning shared projection matrices that transfer layer-wise knowledge from the teacher model to the student model. Our method is able to compress the BERT_BASE model by more than 60x, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7MB. Experimental results also demonstrate higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques.