Before delegating a task to an autonomous system, a human operator may want a guarantee about the behavior of the system. This paper extends previous work on conformal prediction for functional data and conformalized quantile regression to provide conformal prediction intervals over the future behavior of an autonomous system executing a fixed control policy on a Markov Decision Process (MDP). The prediction intervals are constructed by applying conformal corrections to prediction intervals computed by quantile regression. The resulting intervals guarantee that with probability $1-\delta$ the observed trajectory will lie inside the prediction interval, where the probability is computed with respect to the starting state distribution and the stochasticity of the MDP. The method is illustrated on MDPs for invasive species management and StarCraft2 battles.
Evaluating the individual movements for teammates in soccer players is crucial for assessing teamwork, scouting, and fan engagement. It has been said that players in a 90-min game do not have the ball for about 87 minutes on average. However, it has remained difficult to evaluate an attacking player without receiving the ball, and to reveal how movement contributes to the creation of scoring opportunities for teammates. In this paper, we evaluate players who create off-ball scoring opportunities by comparing actual movements with the reference movements generated via trajectory prediction. First, we predict the trajectories of players using a graph variational recurrent neural network that can accurately model the relationship between players and predict the long-term trajectory. Next, based on the difference in the modified off-ball evaluation index between the actual and the predicted trajectory as a reference, we evaluate how the actual movement contributes to scoring opportunity compared to the predicted movement. For verification, we examined the relationship with the annual salary, the goals, and the rating in the game by experts for all games of a team in a professional soccer league in a year. The results show that the annual salary and the proposed indicator correlated significantly, which could not be explained by the existing indicators and goals. Our results suggest the effectiveness of the proposed method as an indicator for a player without the ball to create a scoring chance for teammates.
Over the years, the separate fields of motion planning, mapping, and human trajectory prediction have advanced considerably. However, the literature is still sparse in providing practical frameworks that enable mobile manipulators to perform whole-body movements and account for the predicted motion of moving obstacles. Previous optimisation-based motion planning approaches that use distance fields have suffered from the high computational cost required to update the environment representation. We demonstrate that GPU-accelerated predicted composite distance fields significantly reduce the computation time compared to calculating distance fields from scratch. We integrate this technique with a complete motion planning and perception framework that accounts for the predicted motion of humans in dynamic environments, enabling reactive and pre-emptive motion planning that incorporates predicted motions. To achieve this, we propose and implement a novel human trajectory prediction method that combines intention recognition with trajectory optimisation-based motion planning. We validate our resultant framework on a real-world Toyota Human Support Robot (HSR) using live RGB-D sensor data from the onboard camera. In addition to providing analysis on a publicly available dataset, we release the Oxford Indoor Human Motion (Oxford-IHM) dataset and demonstrate state-of-the-art performance in human trajectory prediction. The Oxford-IHM dataset is a human trajectory prediction dataset in which people walk between regions of interest in an indoor environment. Both static and robot-mounted RGB-D cameras observe the people while tracked with a motion-capture system.
Unit testing is one of the most established quality-assurance techniques for software development. One major advantage of unit testing is the adjustable trade-off between efficiency (i.e., testing effort) and effectiveness (i.e., fault-detection probability). To this end, various strategies have been proposed to exploit this trade-off. In particular, test-suite reduction (TSR) reduces the number of (presumably redundant) test cases while testing a single program version. Regression-test selection (RTS) selects test cases for testing consecutive program revisions. However, both TSR and RTS may influence -- or even obstruct -- each others' performance when used in combination. For instance, test cases discarded during TSR for a particular program version may become relevant again for RTS. However, finding a combination of both strategies leading to a reasonable trade-off throughout the version history of a program is an open question. The goal of this paper is to gain a better understanding of the interactions between TSR and RTS with respect to efficiency and effectiveness. To this end, we present a configurable framework called RegreTS for automated unit-testing of C programs. The framework comprises different strategies for TSR and RTS and possible combinations thereof. We apply this framework to a collection of subject systems, delivering several crucial insights. First, TSR has almost always a negative impact on the effectiveness of RTS, yet a positive impact on efficiency. Second, test cases revealing to testers the effect of program modifications between consecutive program versions are far more effective than test cases simply covering modified code parts, yet causing much more testing effort.
Microfinance in developing areas such as Africa has been proven to improve the local economy significantly. However, many applicants in developing areas cannot provide adequate information required by the financial institution to make a lending decision. As a result, it is challenging for microfinance institutions to assign credit properly based on conventional policies. In this paper, we formulate the decision-making of microfinance into a rigorous optimization-based framework involving learning and control. We propose an algorithm to explore and learn the optimal policy to approve or reject applicants. We provide the conditions under which the algorithms are guaranteed to converge to an optimal one. The proposed algorithm can naturally deal with missing information and systematically tradeoff multiple objectives such as profit maximization, financial inclusion, social benefits, and economic development. Through extensive simulation of both real and synthetic microfinance datasets, we showed our proposed algorithm is superior to existing benchmarks. To the best of our knowledge, this paper is the first to make a connection between microfinance and control and use control-theoretic tools to optimize the policy with a provable guarantee.
This paper considers the problem of constructing a confidence sequence for bounded random processes. Building upon the gambling approach pioneered by Hendriks (2018) and Jun and Orabona (2019) and following the recent work of Waudby-Smith and Ramdas (2020) and Orabona and Jun (2021), this paper revisits the idea of Cover (1991)'s universal portfolio in constructing confidence sequences and demonstrates new properties, based on a natural \emph{two-horse race} perspective on the gambling approach. The main result of this paper is a new algorithm based on a mixture of lower bounds, which closely approximates the performance of Cover's universal portfolio with only constant per-round time complexity. A higher-order generalization of a lower bound in (Fan et al, 2015), which is invoked in the proposed algorithm, may be of independent interest.
In modern autonomy stacks, prediction modules are paramount to planning motions in the presence of other mobile agents. However, failures in prediction modules can mislead the downstream planner into making unsafe decisions. Indeed, the high uncertainty inherent to the task of trajectory forecasting ensures that such mispredictions occur frequently. Motivated by the need to improve safety of autonomous vehicles without compromising on their performance, we develop a probabilistic run-time monitor that detects when a "harmful" prediction failure occurs, i.e., a task-relevant failure detector. We achieve this by propagating trajectory prediction errors to the planning cost to reason about their impact on the AV. Furthermore, our detector comes equipped with performance measures on the false-positive and the false-negative rate and allows for data-free calibration. In our experiments we compared our detector with various others and found that our detector has the highest area under the receiver operator characteristic curve.
Given a partial differential equation (PDE), goal-oriented error estimation allows us to understand how errors in a diagnostic quantity of interest (QoI), or goal, occur and accumulate in a numerical approximation, for example using the finite element method. By decomposing the error estimates into contributions from individual elements, it is possible to formulate adaptation methods, which modify the mesh with the objective of minimising the resulting QoI error. However, the standard error estimate formulation involves the true adjoint solution, which is unknown in practice. As such, it is common practice to approximate it with an 'enriched' approximation (e.g. in a higher order space or on a refined mesh). Doing so generally results in a significant increase in computational cost, which can be a bottleneck compromising the competitiveness of (goal-oriented) adaptive simulations. The central idea of this paper is to develop a "data-driven" goal-oriented mesh adaptation approach through the selective replacement of the expensive error estimation step with an appropriately configured and trained neural network. In doing so, the error estimator may be obtained without even constructing the enriched spaces. An element-by-element construction is employed here, whereby local values of various parameters related to the mesh geometry and underlying problem physics are taken as inputs, and the corresponding contribution to the error estimator is taken as output. We demonstrate that this approach is able to obtain the same accuracy with a reduced computational cost, for adaptive mesh test cases related to flow around tidal turbines, which interact via their downstream wakes, and where the overall power output of the farm is taken as the QoI. Moreover, we demonstrate that the element-by-element approach implies reasonably low training costs.
Variational inference has recently emerged as a popular alternative to the classical Markov chain Monte Carlo (MCMC) in large-scale Bayesian inference. The core idea of variational inference is to trade statistical accuracy for computational efficiency. It aims to approximate the posterior, reducing computation costs but potentially compromising its statistical accuracy. In this work, we study this statistical and computational trade-off in variational inference via a case study in inferential model selection. Focusing on Gaussian inferential models (a.k.a. variational approximating families) with diagonal plus low-rank precision matrices, we initiate a theoretical study of the trade-offs in two aspects, Bayesian posterior inference error and frequentist uncertainty quantification error. From the Bayesian posterior inference perspective, we characterize the error of the variational posterior relative to the exact posterior. We prove that, given a fixed computation budget, a lower-rank inferential model produces variational posteriors with a higher statistical approximation error, but a lower computational error; it reduces variances in stochastic optimization and, in turn, accelerates convergence. From the frequentist uncertainty quantification perspective, we consider the precision matrix of the variational posterior as an uncertainty estimate. We find that, relative to the true asymptotic precision, the variational approximation suffers from an additional statistical error originating from the sampling uncertainty of the data. Moreover, this statistical error becomes the dominant factor as the computation budget increases. As a consequence, for small datasets, the inferential model need not be full-rank to achieve optimal estimation error. We finally demonstrate these statistical and computational trade-offs inference across empirical studies, corroborating the theoretical findings.
Support vector machine (SVM) is a powerful classification method that has achieved great success in many fields. Since its performance can be seriously impaired by redundant covariates, model selection techniques are widely used for SVM with high dimensional covariates. As an alternative to model selection, significant progress has been made in the area of model averaging in the past decades. Yet no frequentist model averaging method was considered for SVM. This work aims to fill the gap and to propose a frequentist model averaging procedure for SVM which selects the optimal weight by cross validation. Even when the number of covariates diverges at an exponential rate of the sample size, we show asymptotic optimality of the proposed method in the sense that the ratio of its hinge loss to the lowest possible loss converges to one. We also derive the convergence rate which provides more insights to model averaging. Compared to model selection methods of SVM which require a tedious but critical task of tuning parameter selection, the model averaging method avoids the task and shows promising performances in the empirical studies.
Formal XAI (explainable AI) is a growing area that focuses on computing explanations with mathematical guarantees for the decisions made by ML models. Inside formal XAI, one of the most studied cases is that of explaining the choices taken by decision trees, as they are traditionally deemed as one of the most interpretable classes of models. Recent work has focused on studying the computation of "sufficient reasons", a kind of explanation in which given a decision tree $T$ and an instance $x$, one explains the decision $T(x)$ by providing a subset $y$ of the features of $x$ such that for any other instance $z$ compatible with $y$, it holds that $T(z) = T(x)$, intuitively meaning that the features in $y$ are already enough to fully justify the classification of $x$ by $T$. It has been argued, however, that sufficient reasons constitute a restrictive notion of explanation, and thus the community has started to study their probabilistic counterpart, in which one requires that the probability of $T(z) = T(x)$ must be at least some value $\delta \in (0, 1]$, where $z$ is a random instance that is compatible with $y$. Our paper settles the computational complexity of $\delta$-sufficient-reasons over decision trees, showing that both (1) finding $\delta$-sufficient-reasons that are minimal in size, and (2) finding $\delta$-sufficient-reasons that are minimal inclusion-wise, do not admit polynomial-time algorithms (unless P=NP). This is in stark contrast with the deterministic case ($\delta = 1$) where inclusion-wise minimal sufficient-reasons are easy to compute. By doing this, we answer two open problems originally raised by Izza et al. On the positive side, we identify structural restrictions of decision trees that make the problem tractable, and show how SAT solvers might be able to tackle these problems in practical settings.