Generalized linear models (GLMs) form one of the most popular classes of models in statistics. The gamma variant is used, for instance, in actuarial science for the modelling of claim amounts in insurance. A flaw of GLMs is that they are not robust against outliers (i.e., against erroneous or extreme data points). A difference in trends in the bulk of the data and the outliers thus yields skewed inference and predictions. To address this problem, robust methods have been introduced. The most commonly applied robust method is frequentist and consists in an estimator which is derived from a modification of the derivative of the log-likelihood. We propose an alternative approach which is modelling-based and thus fundamentally different. It allows for an understanding and interpretation of the modelling, and it can be applied for both frequentist and Bayesian statistical analyses. The approach possesses appealing theoretical and empirical properties.
Convergence rates for $L_2$ approximation in a Hilbert space $H$ are a central theme in numerical analysis. The present work is inspired by Schaback (Math. Comp., 1999), who showed, in the context of best pointwise approximation for radial basis function interpolation, that the convergence rate for sufficiently smooth functions can be doubled, compared to the general rate for functions in the "native space" $H$. Motivated by this, we obtain a general result for $H$-orthogonal projection onto a finite dimensional subspace of $H$: namely, that any known $L_2$ convergence rate for all functions in $H$ translates into a doubled $L_2$ convergence rate for functions in a smoother normed space $B$, along with a similarly improved error bound in the $H$-norm, provided that $L_2$, $H$ and $B$ are suitably related. As a special case we improve the known $L_2$ and $H$-norm convergence rates for kernel interpolation in reproducing kernel Hilbert spaces, with particular attention to a recent study (Kaarnioja, Kazashi, Kuo, Nobile, Sloan, Numer. Math., 2022) of periodic kernel-based interpolation at lattice points applied to parametric partial differential equations. A second application is to radial basis function interpolation for general conditionally positive definite basis functions, where again the $L_2$ convergence rate is doubled, and the convergence rate in the native space norm is similarly improved, for all functions in a smoother normed space $B$.
Deep learning enables the modelling of high-resolution histopathology whole-slide images (WSI). Weakly supervised learning of tile-level data is typically applied for tasks where labels only exist on the patient or WSI level (e.g. patient outcomes or histological grading). In this context, there is a need for improved spatial interpretability of predictions from such models. We propose a novel method, Wsi rEgion sElection aPproach (WEEP), for model interpretation. It provides a principled yet straightforward way to establish the spatial area of WSI required for assigning a particular prediction label. We demonstrate WEEP on a binary classification task in the area of breast cancer computational pathology. WEEP is easy to implement, is directly connected to the model-based decision process, and offers information relevant to both research and diagnostic applications.
Decision making and learning in the presence of uncertainty has attracted significant attention in view of the increasing need to achieve robust and reliable operations. In the case where uncertainty stems from the presence of adversarial attacks this need is becoming more prominent. In this paper we focus on linear and nonlinear classification problems and propose a novel adversarial training method for robust classifiers, inspired by Support Vector Machine (SVM) margins. We view robustness under a data driven lens, and derive finite sample complexity bounds for both linear and non-linear classifiers in binary and multi-class scenarios. Notably, our bounds match natural classifiers' complexity. Our algorithm minimizes a worst-case surrogate loss using Linear Programming (LP) and Second Order Cone Programming (SOCP) for linear and non-linear models. Numerical experiments on the benchmark MNIST and CIFAR10 datasets show our approach's comparable performance to state-of-the-art methods, without needing adversarial examples during training. Our work offers a comprehensive framework for enhancing binary linear and non-linear classifier robustness, embedding robustness in learning under the presence of adversaries.
We identify reduced order models (ROM) of forced systems from data using invariant foliations. The forcing can be external, parametric, periodic or quasi-periodic. The process has four steps: 1. identify an approximate invariant torus and the linear dynamics about the torus; 2. identify a globally defined invariant foliation about the torus; 3. identify a local foliation about an invariant manifold that complements the global foliation 4. extract the invariant manifold as the leaf going through the torus and interpret the result. We combine steps 2 and 3, so that we can track the location of the invariant torus and scale the invariance equations appropriately. We highlight some fundamental limitations of invariant manifolds and foliations when fitting them to data, that require further mathematics to resolve.
We present a method for end-to-end learning of Koopman surrogate models for optimal performance in control. In contrast to previous contributions that employ standard reinforcement learning (RL) algorithms, we use a training algorithm that exploits the potential differentiability of environments based on mechanistic simulation models. We evaluate the performance of our method by comparing it to that of other controller type and training algorithm combinations on a literature known eNMPC case study. Our method exhibits superior performance on this problem, thereby constituting a promising avenue towards more capable controllers that employ dynamic surrogate models.
The aim of this paper is to study the complexity of the model checking problem MC for inquisitive propositional logic InqB and for inquisitive modal logic InqM, that is, the problem of deciding whether a given finite structure for the logic satisfies a given formula. In recent years, this problem has been thoroughly investigated for several variations of dependence and teams logics, systems closely related to inquisitive logic. Building upon some ideas presented by Yang, we prove that the model checking problems for InqB and InqM are both AP-complete.
In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data. We show the consistency of two approaches in prediction. A striking result is that the widely-used method of imputing with a constant, such as the mean prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is pointed at for distorting the distribution of the data. That such a simple approach can be consistent is important in practice. We also show that a predictor suited for complete observations can predict optimally on incomplete data, through multiple imputation. Finally, to compare imputation with learning directly with a model that accounts for missing values, we analyze further decision trees. These can naturally tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing theoretically and empirically different missing values strategies in trees, we recommend using the "missing incorporated in attribute" method as it can handle both non-informative and informative missing values.
Lattices are architected metamaterials whose properties strongly depend on their geometrical design. The analogy between lattices and graphs enables the use of graph neural networks (GNNs) as a faster surrogate model compared to traditional methods such as finite element modelling. In this work, we generate a big dataset of structure-property relationships for strut-based lattices. The dataset is made available to the community which can fuel the development of methods anchored in physical principles for the fitting of fourth-order tensors. In addition, we present a higher-order GNN model trained on this dataset. The key features of the model are (i) SE(3) equivariance, and (ii) consistency with the thermodynamic law of conservation of energy. We compare the model to non-equivariant models based on a number of error metrics and demonstrate its benefits in terms of predictive performance and reduced training requirements. Finally, we demonstrate an example application of the model to an architected material design task. The methods which we developed are applicable to fourth-order tensors beyond elasticity such as piezo-optical tensor etc.
Ever since the seminal work of R. A. Fisher and F. Yates, factorial designs have been an important experimental tool to simultaneously estimate the effects of multiple treatment factors. In factorial designs, the number of treatment combinations grows exponentially with the number of treatment factors, which motivates the forward selection strategy based on the sparsity, hierarchy, and heredity principles for factorial effects. Although this strategy is intuitive and has been widely used in practice, its rigorous statistical theory has not been formally established. To fill this gap, we establish design-based theory for forward factor selection in factorial designs based on the potential outcome framework. We not only prove a consistency property for the factor selection procedure but also discuss statistical inference after factor selection. In particular, with selection consistency, we quantify the advantages of forward selection based on asymptotic efficiency gain in estimating factorial effects. With inconsistent selection in higher-order interactions, we propose two strategies and investigate their impact on subsequent inference. Our formulation differs from the existing literature on variable selection and post-selection inference because our theory is based solely on the physical randomization of the factorial design and does not rely on a correctly specified outcome model.
Most state-of-the-art machine learning techniques revolve around the optimisation of loss functions. Defining appropriate loss functions is therefore critical to successfully solving problems in this field. We present a survey of the most commonly used loss functions for a wide range of different applications, divided into classification, regression, ranking, sample generation and energy based modelling. Overall, we introduce 33 different loss functions and we organise them into an intuitive taxonomy. Each loss function is given a theoretical backing and we describe where it is best used. This survey aims to provide a reference of the most essential loss functions for both beginner and advanced machine learning practitioners.