Species distribution models (SDMs) are key tools in ecology, conservation and management of natural resources. They are commonly trained by scientific survey data but, since surveys are expensive, there is a need for complementary sources of information to train them. To this end, several authors have proposed to use expert elicitation since local citizen and substance area experts can hold valuable information on species distributions. Expert knowledge has been incorporated within SDMs, for example, through informative priors. However, existing approaches pose challenges related to assessment of the reliability of the experts. Since expert knowledge is inherently subjective and prone to biases, we should optimally calibrate experts' assessments and make inference on their reliability. Moreover, demonstrated examples of improved species distribution predictions using expert elicitation compared to using only survey data are few as well. In this work, we propose a novel approach to use expert knowledge on species distribution within SDMs and demonstrate that it leads to significantly better predictions. First, we propose expert elicitation process where experts summarize their belief on a species occurrence proability with maps. Second, we collect survey data to calibrate the expert assessments. Third, we propose a hierarchical Bayesian model that combines the two information sources and can be used to make predictions over the study area. We apply our methods to study the distribution of spring spawning pikeperch larvae in a coastal area of the Gulf of Finland. According to our results, the expert information significantly improves species distribution predictions compared to predictions conditioned on survey data only. However, experts' reliability also varies considerably, and even generally reliable experts had spatially structured biases in their assessments.
There are many issues that can cause problems when attempting to infer model parameters from data. Data and models are both imperfect, and as such there are multiple scenarios in which standard methods of inference will lead to misleading conclusions; corrupted data, models which are only representative of subsets of the data, or multiple regions in which the model is best fit using different parameters. Methods exist for the exclusion of some anomalous types of data, but in practice, data cleaning is often undertaken by hand before attempting to fit models to data. In this work, we will introduce the concept of Bayesian data selection; the simultaneous inference of both model parameters, and parameters which represent our belief that each observation within the data should be included in the inference. The aim, within a Bayesian setting, is to find the regions of observation space for which the model can well-represent the data, and to find the corresponding model parameters for those regions. A number of approaches will be explored, and applied to test problems in linear regression, and to the problem of fitting an ODE model, approximated by a finite difference method, to data. The approaches are extremely simple to implement, can aid mixing of Markov chains designed to sample from the arising densities, and are very broadly applicable to the majority of inferential problems. As such this approach has the potential to change the way that we conduct and interpret the fitting of models to data.
Agent-Based Models (ABM) are computational scenario-generators, which can be used to predict the possible future outcomes of the complex system they represent. To better understand the robustness of these predictions, it is necessary to understand the full scope of the possible phenomena the model can generate. Most often, due to high-dimensional parameter spaces, this is a computationally expensive task. Inspired by ideas coming from systems biology, we show that for multiple macroeconomic models, including an agent-based model and several Dynamic Stochastic General Equilibrium (DSGE) models, there are only a few stiff parameter combinations that have strong effects, while the other sloppy directions are irrelevant. This suggest an algorithm that efficiently explores the space of parameters by primarily moving along the stiff directions. We apply our algorithm to a medium-sized agent-based model, and show that it recovers all possible dynamics of the unemployment rate. The application of this method to Agent-based Models may lead to a more thorough and robust understanding of their features, and provide enhanced parameter sensitivity analyses. Several promising paths for future research are discussed.
Many modern online 3D applications and videogames rely on parametric models of human faces for creating believable avatars. However, manual reproduction of someone's facial likeness with a parametric model is difficult and time-consuming. Machine Learning solution for that task is highly desirable but is also challenging. The paper proposes a novel approach to the so-called Face-to-Parameters problem (F2P for short), aiming to reconstruct a parametric face from a single image. The proposed method utilizes synthetic data, domain decomposition, and domain adaptation for addressing multifaceted challenges in solving the F2P. The open-sourced codebase illustrates our key observations and provides means for quantitative evaluation. The presented approach proves practical in an industrial application; it improves accuracy and allows for more efficient models training. The techniques have the potential to extend to other types of parametric models.
Markov chain Monte Carlo (MCMC) is an all-purpose tool that allows one to generate dependent replicates from a posterior distribution for effectively any Bayesian hierarchical model. As such, MCMC has become a standard in Bayesian statistics. However, convergence issues, tuning, and the effective sample size of the MCMC are nontrivial considerations that are often overlooked or can be difficult to assess. Moreover, these practical issues can produce a significant computational burden. This motivates us to consider finding closed-form expressions of the posterior distribution that are computationally straightforward to sample from directly. We focus on a broad class of Bayesian generalized linear mixed-effects models (GLMM) that allows one to jointly model data of different types (e.g., Gaussian, Poisson, and binomial distributed observations). Exact sampling from the posterior distribution for Bayesian GLMMs is such a difficult problem that it is now arguably overlooked as a possible problem to solve. To solve this problem, we derive a new class of distributions that gives one the flexibility to specify the prior on fixed and random effects to be any conjugate multivariate distribution. We refer to this new distribution as the generalized conjugate multivariate (GCM) distribution, and several technical results are provided. The expression of the exact posterior distribution is given along with the steps to obtain direct independent simulations from the posterior distribution. These direct simulations have an efficient projection/regression form, and hence, we refer to our method as Exact Posterior Regression (EPR). Several theoretical results are developed that create the foundation for EPR. Illustrative examples are provided including a simulation study and an analysis of estimates from the U.S. Census Bureau's American Community Survey (ACS).
We showcase a variety of functions and classes that implement sampling procedures with improved exploration of the parameter space assisted by machine learning. Special attention is paid to setting sane defaults with the objective that adjustments required by different problems remain minimal. This collection of routines can be employed for different types of analysis, from finding bounds on the parameter space to accumulating samples in areas of interest. In particular, we discuss two methods assisted by incorporating different machine learning models: regression and classification. We show that a machine learning classifier can provide higher efficiency for exploring the parameter space. Also, we introduce a boosting technique to improve the slow convergence at the start of the process. The use of these routines is better explained with the help of a few examples that illustrate the type of results one can obtain. We also include examples of the code used to obtain the examples as well as descriptions of the adjustments that can be made to adapt the calculation to other problems. We finalize by showing the impact of these techniques when exploring the parameter space of the two Higgs doublet model that matches the measured Higgs Boson signal strength. The code used for this paper and instructions on how to use it are available on the web.
We provide a decision theoretic analysis of bandit experiments. Working within the framework of diffusion asymptotics, we define suitable notions of asymptotic Bayes and minimax risk for these experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distributions of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and thereby suggests a practical strategy for dimension reduction. The PDEs characterizing minimal Bayes risk can be solved efficiently using sparse matrix routines. We derive the optimal Bayes and minimax policies from their numerical solutions. These optimal policies substantially dominate existing methods such as Thompson sampling and UCB, often by a factor of two. The framework also covers time discounting and pure exploration.
When the available hardware cannot meet the memory and compute requirements to efficiently train high performing machine learning models, a compromise in either the training quality or the model complexity is needed. In Federated Learning (FL), nodes are orders of magnitude more constrained than traditional server-grade hardware and are often battery powered, severely limiting the sophistication of models that can be trained under this paradigm. While most research has focused on designing better aggregation strategies to improve convergence rates and in alleviating the communication costs of FL, fewer efforts have been devoted to accelerating on-device training. Such stage, which repeats hundreds of times (i.e. every round) and can involve thousands of devices, accounts for the majority of the time required to train federated models and, the totality of the energy consumption at the client side. In this work, we present the first study on the unique aspects that arise when introducing sparsity at training time in FL workloads. We then propose ZeroFL, a framework that relies on highly sparse operations to accelerate on-device training. Models trained with ZeroFL and 95% sparsity achieve up to 2.3% higher accuracy compared to competitive baselines obtained from adapting a state-of-the-art sparse training framework to the FL setting.
The labels used to train machine learning (ML) models are of paramount importance. Typically for ML classification tasks, datasets contain hard labels, yet learning using soft labels has been shown to yield benefits for model generalization, robustness, and calibration. Earlier work found success in forming soft labels from multiple annotators' hard labels; however, this approach may not converge to the best labels and necessitates many annotators, which can be expensive and inefficient. We focus on efficiently eliciting soft labels from individual annotators. We collect and release a dataset of soft labels for CIFAR-10 via a crowdsourcing study ($N=248$). We demonstrate that learning with our labels achieves comparable model performance to prior approaches while requiring far fewer annotators. Our elicitation methodology therefore shows promise towards enabling practitioners to enjoy the benefits of improved model performance and reliability with fewer annotators, and serves as a guide for future dataset curators on the benefits of leveraging richer information, such as categorical uncertainty, from individual annotators.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.
Graph Neural Networks (GNNs), which generalize deep neural networks to graph-structured data, have drawn considerable attention and achieved state-of-the-art performance in numerous graph related tasks. However, existing GNN models mainly focus on designing graph convolution operations. The graph pooling (or downsampling) operations, that play an important role in learning hierarchical representations, are usually overlooked. In this paper, we propose a novel graph pooling operator, called Hierarchical Graph Pooling with Structure Learning (HGP-SL), which can be integrated into various graph neural network architectures. HGP-SL incorporates graph pooling and structure learning into a unified module to generate hierarchical representations of graphs. More specifically, the graph pooling operation adaptively selects a subset of nodes to form an induced subgraph for the subsequent layers. To preserve the integrity of graph's topological information, we further introduce a structure learning mechanism to learn a refined graph structure for the pooled graph at each layer. By combining HGP-SL operator with graph neural networks, we perform graph level representation learning with focus on graph classification task. Experimental results on six widely used benchmarks demonstrate the effectiveness of our proposed model.