Large-scale administrative or observational datasets are increasingly used to inform decision making. While this effort aims to ground policy in real-world evidence, challenges have arise as that selection bias and other forms of distribution shift often plague observational data. Previous attempts to provide robust inferences have given guarantees depending on a user-specified amount of possible distribution shift (e.g., the maximum KL divergence between the observed and target distributions). However, decision makers will often have additional knowledge about the target distribution which constrains the kind of shifts which are possible. To leverage such information, we proposed a framework that enables statistical inference in the presence of distribution shifts which obey user-specified constraints in the form of functions whose expectation is known under the target distribution. The output is high-probability bounds on the value an estimand takes on the target distribution. Hence, our method leverages domain knowledge in order to partially identify a wide class of estimands. We analyze the computational and statistical properties of methods to estimate these bounds, and show that our method can produce informative bounds on a variety of simulated and semisynthetic tasks.
Recently established, directed dependence measures for pairs $(X,Y)$ of random variables build upon the natural idea of comparing the conditional distributions of $Y$ given $X=x$ with the marginal distribution of $Y$. They assign pairs $(X,Y)$ values in $[0,1]$, the value is $0$ if and only if $X,Y$ are independent, and it is $1$ exclusively for $Y$ being a function of $X$. Here we show that comparing randomly drawn conditional distributions with each other instead or, equivalently, analyzing how sensitive the conditional distribution of $Y$ given $X=x$ is on $x$, opens the door to constructing novel families of dependence measures $\Lambda_\varphi$ induced by general convex functions $\varphi: \mathbb{R} \rightarrow \mathbb{R}$, containing, e.g., Chatterjee's coefficient of correlation as special case. After establishing additional useful properties of $\Lambda_\varphi$ we focus on continuous $(X,Y)$, translate $\Lambda_\varphi$ to the copula setting, consider the $L^p$-version and establish an estimator which is strongly consistent in full generality. A real data example and a simulation study illustrate the chosen approach and the performance of the estimator. Complementing the afore-mentioned results, we show how a slight modification of the construction underlying $\Lambda_\varphi$ can be used to define new measures of explainability generalizing the fraction of explained variance.
Strong stability is a property of time integration schemes for ODEs that preserve temporal monotonicity of solutions in arbitrary (inner product) norms. It is proved that explicit Runge--Kutta schemes of order $p\in 4\mathbb{N}$ with $s=p$ stages for linear autonomous ODE systems are not strongly stable, closing an open stability question from [Z.~Sun and C.-W.~Shu, SIAM J. Numer. Anal. 57 (2019), 1158--1182]. Furthermore, for explicit Runge--Kutta methods of order $p\in\mathbb{N}$ and $s>p$ stages, we prove several sufficient as well as necessary conditions for strong stability. These conditions involve both the stability function and the hypocoercivity index of the ODE system matrix. This index is a structural property combining the Hermitian and skew-Hermitian part of the system matrix.
Temporal irreversibility, often referred to as the arrow of time, is a fundamental concept in statistical mechanics. Markers of irreversibility also provide a powerful characterisation of information processing in biological systems. However, current approaches tend to describe temporal irreversibility in terms of a single scalar quantity, without disentangling the underlying dynamics that contribute to irreversibility. Here we propose a broadly applicable information-theoretic framework to characterise the arrow of time in multivariate time series, which yields qualitatively different types of irreversible information dynamics. This multidimensional characterisation reveals previously unreported high-order modes of irreversibility, and establishes a formal connection between recent heuristic markers of temporal irreversibility and metrics of information processing. We demonstrate the prevalence of high-order irreversibility in the hyperactive regime of a biophysical model of brain dynamics, showing that our framework is both theoretically principled and empirically useful. This work challenges the view of the arrow of time as a monolithic entity, enhancing both our theoretical understanding of irreversibility and our ability to detect it in practical applications.
We present a robust deep incremental learning framework for regression tasks on financial temporal tabular datasets which is built upon the incremental use of commonly available tabular and time series prediction models to adapt to distributional shifts typical of financial datasets. The framework uses a simple basic building block (decision trees) to build self-similar models of any required complexity to deliver robust performance under adverse situations such as regime changes, fat-tailed distributions, and low signal-to-noise ratios. As a detailed study, we demonstrate our scheme using XGBoost models trained on the Numerai dataset and show that a two layer deep ensemble of XGBoost models over different model snapshots delivers high quality predictions under different market regimes. We also show that the performance of XGBoost models with different number of boosting rounds in three scenarios (small, standard and large) is monotonically increasing with respect to model size and converges towards the generalisation upper bound. We also evaluate the robustness of the model under variability of different hyperparameters, such as model complexity and data sampling settings. Our model has low hardware requirements as no specialised neural architectures are used and each base model can be independently trained in parallel.
Neural networks have revolutionized the field of machine learning with increased predictive capability. In addition to improving the predictions of neural networks, there is a simultaneous demand for reliable uncertainty quantification on estimates made by machine learning methods such as neural networks. Bayesian neural networks (BNNs) are an important type of neural network with built-in capability for quantifying uncertainty. This paper discusses aleatoric and epistemic uncertainty in BNNs and how they can be calculated. With an example dataset of images where the goal is to identify the amplitude of an event in the image, it is shown that epistemic uncertainty tends to be lower in images which are well-represented in the training dataset and tends to be high in images which are not well-represented. An algorithm for out-of-distribution (OoD) detection with BNN epistemic uncertainty is introduced along with various experiments demonstrating factors influencing the OoD detection capability in a BNN. The OoD detection capability with epistemic uncertainty is shown to be comparable to the OoD detection in the discriminator network of a generative adversarial network (GAN) with comparable network architecture.
We show how to learn discrete field theories from observational data of fields on a space-time lattice. For this, we train a neural network model of a discrete Lagrangian density such that the discrete Euler--Lagrange equations are consistent with the given training data. We, thus, obtain a structure-preserving machine learning architecture. Lagrangian densities are not uniquely defined by the solutions of a field theory. We introduce a technique to derive regularisers for the training process which optimise numerical regularity of the discrete field theory. Minimisation of the regularisers guarantees that close to the training data the discrete field theory behaves robust and efficient when used in numerical simulations. Further, we show how to identify structurally simple solutions of the underlying continuous field theory such as travelling waves. This is possible even when travelling waves are not present in the training data. This is compared to data-driven model order reduction based approaches, which struggle to identify suitable latent spaces containing structurally simple solutions when these are not present in the training data. Ideas are demonstrated on examples based on the wave equation and the Schr\"odinger equation.
Despite its importance for insurance, there is almost no literature on statistical hail damage modeling. Statistical models for hailstorms exist, though they are generally not open-source, but no study appears to have developed a stochastic hail impact function. In this paper, we use hail-related insurance claim data to build a Gaussian line process with extreme marks to model both the geographical footprint of a hailstorm and the damage to buildings that hailstones can cause. We build a model for the claim counts and claim values, and compare it to the use of a benchmark deterministic hail impact function. Our model proves to be better than the benchmark at capturing hail spatial patterns and allows for localized and extreme damage, which is seen in the insurance data. The evaluation of both the claim counts and value predictions shows that performance is improved compared to the benchmark, especially for extreme damage. Our model appears to be the first to provide realistic estimates for hail damage to individual buildings.
Very distinct strategies can be deployed to recognize and characterize an unknown environment or a shape. A recent and promising approach, especially in robotics, is to reduce the complexity of the exploratory units to a minimum. Here, we show that this frugal strategy can be taken to the extreme by exploiting the power of statistical geometry and introducing new invariant features. We show that an elementary robot devoid of any orientation or observation system, exploring randomly, can access global information about an environment such as the values of the explored area and perimeter. The explored shapes are of arbitrary geometry and may even non-connected. From a dictionary, this most simple robot can thus identify various shapes such as famous monuments and even read a text.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.
Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.