Visualization is an essential operation when assessing the risk of rare events such as coastal or river floodings. The goal is to display a few prototype events that best represent the probability law of the observed phenomenon, a task known as quantization. It becomes a challenge when data is expensive to generate and critical events are scarce, like extreme natural hazard. In the case of floodings, each event relies on an expensive-to-evaluate hydraulic simulator which takes as inputs offshore meteo-oceanic conditions and dyke breach parameters to compute the water level map. In this article, Lloyd's algorithm, which classically serves to quantize data, is adapted to the context of rare and costly-to-observe events. Low probability is treated through importance sampling, while Functional Principal Component Analysis combined with a Gaussian process deal with the costly hydraulic simulations. The calculated prototype maps represent the probability distribution of the flooding events in a minimal expected distance sense, and each is associated to a probability mass. The method is first validated using a 2D analytical model and then applied to a real coastal flooding scenario. The two sources of error, the metamodel and the importance sampling, are evaluated to quantify the precision of the method.
This paper studies the problem of learning an unknown function $f$ from given data about $f$. The learning problem is to give an approximation $\hat f$ to $f$ that predicts the values of $f$ away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about $f$ (known as a model class assumption), (ii) how we measure the accuracy of how well $\hat f$ predicts $f$, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal $\hat f$ can be found by solving a certain discrete over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation $\hat f$ of the function $f$ from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of $f$. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.
Predictive black-box models can exhibit high accuracy but their opaque nature hinders their uptake in safety-critical deployment environments. Explanation methods (XAI) can provide confidence for decision-making through increased transparency. However, existing XAI methods are not tailored towards models in sensitive domains where one predictor is of special interest, such as a treatment effect in a clinical model, or ethnicity in policy models. We introduce Path-Wise Shapley effects (PWSHAP), a framework for assessing the targeted effect of a binary (e.g.~treatment) variable from a complex outcome model. Our approach augments the predictive model with a user-defined directed acyclic graph (DAG). The method then uses the graph alongside on-manifold Shapley values to identify effects along causal pathways whilst maintaining robustness to adversarial attacks. We establish error bounds for the identified path-wise Shapley effects and for Shapley values. We show PWSHAP can perform local bias and mediation analyses with faithfulness to the model. Further, if the targeted variable is randomised we can quantify local effect modification. We demonstrate the resolution, interpretability, and true locality of our approach on examples and a real-world experiment.
Modeling data obtained from dynamical systems has gained attention in recent years as a challenging task for machine learning models. Previous approaches assume the measurements to be distributed on a grid. However, for real-world applications like weather prediction, the observations are taken from arbitrary locations within the spatial domain. In this paper, we propose TaylorPDENet - a novel machine learning method that is designed to overcome this challenge. Our algorithm uses the multidimensional Taylor expansion of a dynamical system at each observation point to estimate the spatial derivatives to perform predictions. TaylorPDENet is able to accomplish two objectives simultaneously: accurately forecast the evolution of a complex dynamical system and explicitly reconstruct the underlying differential equation describing the system. We evaluate our model on a variety of advection-diffusion equations with different parameters and show that it performs similarly to equivalent approaches on grid-structured data while being able to process unstructured data as well.
An N-of-1 trial is a multiple crossover trial conducted in a single individual to provide evidence to directly inform personalized treatment decisions. Advancements in wearable devices greatly improved the feasibility of adopting these trials to identify optimal individual treatment plans, particularly when treatments differ among individuals and responses are highly heterogeneous. Our work was motivated by the I-STOP-AFib Study, which examined the impact of different triggers on atrial fibrillation (AF) occurrence. We described a causal framework for 'N-of-1' trial using potential treatment selection paths and potential outcome paths. Two estimands of individual causal effect were defined:(a) the effect of continuous exposure, and (b) the effect of an individual observed behavior. We addressed three challenges: (a) imperfect compliance to the randomized treatment assignment; (b) binary treatments and binary outcomes which led to the 'non-collapsibility' issue of estimating odds ratios; and (c) serial inference in the longitudinal observations. We adopted the Bayesian IV approach where the study randomization was the IV as it impacted the choice of exposure of a subject but not directly the outcome. Estimations were through a system of two parametric Bayesian models to estimate the individual causal effect. Our model got around the non-collapsibility and non-consistency by modeling the confounding mechanism through latent structural models and by inferring with Bayesian posterior of functionals. Autocorrelation present in the repeated measurements was also accounted for. The simulation study showed our method largely reduced bias and greatly improved the coverage of the estimated causal effect, compared to existing methods (ITT, PP, and AT). We applied the method to I-STOP-AFib Study to estimate the individual effect of alcohol on AF occurrence.
We study the problem of certifying the robustness of Bayesian neural networks (BNNs) to adversarial input perturbations. Given a compact set of input points $T \subseteq \mathbb{R}^m$ and a set of output points $S \subseteq \mathbb{R}^n$, we define two notions of robustness for BNNs in an adversarial setting: probabilistic robustness and decision robustness. Probabilistic robustness is the probability that for all points in $T$ the output of a BNN sampled from the posterior is in $S$. On the other hand, decision robustness considers the optimal decision of a BNN and checks if for all points in $T$ the optimal decision of the BNN for a given loss function lies within the output set $S$. Although exact computation of these robustness properties is challenging due to the probabilistic and non-convex nature of BNNs, we present a unified computational framework for efficiently and formally bounding them. Our approach is based on weight interval sampling, integration, and bound propagation techniques, and can be applied to BNNs with a large number of parameters, and independently of the (approximate) inference method employed to train the BNN. We evaluate the effectiveness of our methods on various regression and classification tasks, including an industrial regression benchmark, MNIST, traffic sign recognition, and airborne collision avoidance, and demonstrate that our approach enables certification of robustness and uncertainty of BNN predictions.
Simultaneous localisation and mapping (SLAM) play a vital role in autonomous robotics. Robotic platforms are often resource-constrained, and this limitation motivates resource-efficient SLAM implementations. While sparse visual SLAM algorithms offer good accuracy for modest hardware requirements, even these more scalable sparse approaches face limitations when applied to large-scale and long-term scenarios. A contributing factor is that the point clouds resulting from SLAM are inefficient to use and contain significant redundancy. This paper proposes the use of subset selection algorithms to reduce the map produced by sparse visual SLAM algorithms. Information-theoretic techniques have been applied to simpler related problems before, but they do not scale if applied to the full visual SLAM problem. This paper proposes a number of novel information\hyp{}theoretic utility functions for map point selection and optimises these functions using greedy algorithms. The reduced maps are evaluated using practical data alongside an existing visual SLAM implementation (ORB-SLAM 2). Approximate selection techniques proposed in this paper achieve trajectory accuracy comparable to an offline baseline while being suitable for online use. These techniques enable the practical reduction of maps for visual SLAM with competitive trajectory accuracy. Results also demonstrate that SLAM front-end performance can significantly impact the performance of map point selection. This shows the importance of testing map point selection with a front-end implementation. To exploit this, this paper proposes an approach that includes a model of the front-end in the utility function when additional information is available. This approach outperforms alternatives on applicable datasets and highlights future research directions.
Learning on big data brings success for artificial intelligence (AI), but the annotation and training costs are expensive. In future, learning on small data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans. A series of machine learning models is going on this way such as active learning, few-shot learning, deep clustering. However, there are few theoretical guarantees for their generalization performance. Moreover, most of their settings are passive, that is, the label distribution is explicitly controlled by one specified sampling scenario. This survey follows the agnostic active sampling under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data using a supervised and unsupervised fashion. With these theoretical analyses, we categorize the small data learning models from two geometric perspectives: the Euclidean and non-Euclidean (hyperbolic) mean representation, where their optimization solutions are also presented and discussed. Later, some potential learning scenarios that may benefit from small data learning are then summarized, and their potential learning scenarios are also analyzed. Finally, some challenging applications such as computer vision, natural language processing that may benefit from learning on small data are also surveyed.
Connecting Vision and Language plays an essential role in Generative Intelligence. For this reason, in the last few years, a large research effort has been devoted to image captioning, i.e. the task of describing images with syntactically and semantically meaningful sentences. Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoding step and a language model for text generation. During these years, both components have evolved considerably through the exploitation of object regions, attributes, and relationships and the introduction of multi-modal connections, fully-attentive approaches, and BERT-like early-fusion strategies. However, regardless of the impressive results obtained, research in image captioning has not reached a conclusive answer yet. This work aims at providing a comprehensive overview and categorization of image captioning approaches, from visual encoding and text generation to training strategies, used datasets, and evaluation metrics. In this respect, we quantitatively compare many relevant state-of-the-art approaches to identify the most impactful technical innovations in image captioning architectures and training strategies. Moreover, many variants of the problem and its open challenges are analyzed and discussed. The final goal of this work is to serve as a tool for understanding the existing state-of-the-art and highlighting the future directions for an area of research where Computer Vision and Natural Language Processing can find an optimal synergy.
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.
Edge intelligence refers to a set of connected systems and devices for data collection, caching, processing, and analysis in locations close to where data is captured based on artificial intelligence. The aim of edge intelligence is to enhance the quality and speed of data processing and protect the privacy and security of the data. Although recently emerged, spanning the period from 2011 to now, this field of research has shown explosive growth over the past five years. In this paper, we present a thorough and comprehensive survey on the literature surrounding edge intelligence. We first identify four fundamental components of edge intelligence, namely edge caching, edge training, edge inference, and edge offloading, based on theoretical and practical results pertaining to proposed and deployed systems. We then aim for a systematic classification of the state of the solutions by examining research results and observations for each of the four components and present a taxonomy that includes practical problems, adopted techniques, and application goals. For each category, we elaborate, compare and analyse the literature from the perspectives of adopted techniques, objectives, performance, advantages and drawbacks, etc. This survey article provides a comprehensive introduction to edge intelligence and its application areas. In addition, we summarise the development of the emerging research field and the current state-of-the-art and discuss the important open issues and possible theoretical and technical solutions.