We investigate the problem of autonomous object classification and semantic SLAM, which in general exhibits a tight coupling between classification, metric SLAM and planning under uncertainty. We contribute a unified framework for inference and belief space planning (BSP) that addresses prominent sources of uncertainty in this context: classification aliasing (classier cannot distinguish between candidate classes from certain viewpoints), classifier epistemic uncertainty (classifier receives data "far" from its training set), and localization uncertainty (camera and object poses are uncertain). Specifically, we develop two methods for maintaining a joint distribution over robot and object poses, and over posterior class probability vector that considers epistemic uncertainty in a Bayesian fashion. The first approach is Multi-Hybrid (MH), where multiple hybrid beliefs over poses and classes are maintained to approximate the joint belief over poses and posterior class probability. The second approach is Joint Lambda Pose (JLP), where the joint belief is maintained directly using a novel JLP factor. Furthermore, we extend both methods to BSP, planning while reasoning about future posterior epistemic uncertainty indirectly, or directly via a novel information-theoretic reward function. Both inference methods utilize a novel viewpoint-dependent classifier uncertainty model that leverages the coupling between poses and classification scores and predicts the epistemic uncertainty from certain viewpoints. In addition, this model is used to generate predicted measurements during planning. To the best of our knowledge, this is the first work that reasons about classifier epistemic uncertainty within semantic SLAM and BSP.
Normalizing flows can generate complex target distributions and thus show promise in many applications in Bayesian statistics as an alternative or complement to MCMC for sampling posteriors. Since no data set from the target posterior distribution is available beforehand, the flow is typically trained using the reverse Kullback-Leibler (KL) divergence that only requires samples from a base distribution. This strategy may perform poorly when the posterior is complicated and hard to sample with an untrained normalizing flow. Here we explore a distinct training strategy, using the direct KL divergence as loss, in which samples from the posterior are generated by (i) assisting a local MCMC algorithm on the posterior with a normalizing flow to accelerate its mixing rate and (ii) using the data generated this way to train the flow. The method only requires a limited amount of \textit{a~priori} input about the posterior, and can be used to estimate the evidence required for model validation, as we illustrate on examples.
We present an architecture that is effective for continual learning in an especially demanding setting, where task boundaries do not exist or are unknown. Our architecture comprises an encoder, pre-trained on a separate dataset, and an ensemble of simple one-layer classifiers. Two main innovations are required to make this combination work. First, the provision of suitably generic pre-trained encoders has been made possible thanks to recent progress in self-supervised training methods. Second, pairing each classifier in the ensemble with a key, where the key-space is identical to the latent space of the encoder, allows them to be used collectively, yet selectively, via k-nearest neighbour lookup. We show that models trained with the encoders-and-ensembles architecture are state-of-the-art for the task-free setting on standard image classification continual learning benchmarks, and improve on prior state-of-the-art by a large margin in the most challenging cases. We also show that the architecture learns well in a fully incremental setting, where one class is learned at a time, and we demonstrate its effectiveness in this setting with up to 100 classes. Finally, we show that the architecture works in a task-free continual learning context where the data distribution changes gradually, and existing approaches requiring knowledge of task boundaries cannot be applied.
In this paper, we propose an iterative receiver based on gridless variational Bayesian line spectra estimation (VALSE) named JCCD-VALSE that \emph{j}ointly estimates the \emph{c}arrier frequency offset (CFO), the \emph{c}hannel with high resolution and carries out \emph{d}ata decoding. Based on a modularized point of view and motivated by the high resolution and low complexity gridless VALSE algorithm, three modules named the VALSE module, the minimum mean squared error (MMSE) module and the decoder module are built. Soft information is exchanged between the modules to progressively improve the channel estimation and data decoding accuracy. Since the delays of multipaths of the channel are treated as continuous parameters, instead of on a grid, the leakage effect is avoided. Besides, the proposed approach is a more complete Bayesian approach as all the nuisance parameters such as the noise variance, the parameters of the prior distribution of the channel, the number of paths are automatically estimated. Numerical simulations and sea test data are utilized to demonstrate that the proposed approach performs significantly better than the existing grid-based generalized approximate message passing (GAMP) based \emph{j}oint \emph{c}hannel and \emph{d}ata decoding approach (JCD-GAMP). Furthermore, it is also verified that joint processing including CFO estimation provides performance gain.
Human motion prediction is an important and challenging topic that has promising prospects in efficient and safe human-robot-interaction systems. Currently, the majority of the human motion prediction algorithms are based on deterministic models, which may lead to risky decisions for robots. To solve this problem, we propose a probabilistic model for human motion prediction in this paper. The key idea of our approach is to extend the conventional deterministic motion prediction neural network to a Bayesian one. On one hand, our model could generate several future motions when given an observed motion sequence. On the other hand, by calculating the Epistemic Uncertainty and the Heteroscedastic Aleatoric Uncertainty, our model could tell the robot if the observation has been seen before and also give the optimal result among all possible predictions. We extensively validate our approach on a large scale benchmark dataset Human3.6m. The experiments show that our approach performs better than deterministic methods. We further evaluate our approach in a Human-Robot-Interaction (HRI) scenario. The experimental results show that our approach makes the interaction more efficient and safer.
A set of novel approaches for estimating epistemic uncertainty in deep neural networks with a single forward pass has recently emerged as a valid alternative to Bayesian Neural Networks. On the premise of informative representations, these deterministic uncertainty methods (DUMs) achieve strong performance on detecting out-of-distribution (OOD) data while adding negligible computational costs at inference time. However, it remains unclear whether DUMs are well calibrated and can seamlessly scale to real-world applications - both prerequisites for their practical deployment. To this end, we first provide a taxonomy of DUMs, evaluate their calibration under continuous distributional shifts and their performance on OOD detection for image classification tasks. Then, we extend the most promising approaches to semantic segmentation. We find that, while DUMs scale to realistic vision tasks and perform well on OOD detection, the practicality of current methods is undermined by poor calibration under realistic distributional shifts.
The ability to detect and segment moving objects in a scene is essential for building consistent maps, making future state predictions, avoiding collisions, and planning. In this paper, we address the problem of moving object segmentation from 3D LiDAR scans. We propose a novel approach that pushes the current state of the art in LiDAR-only moving object segmentation forward to provide relevant information for autonomous robots and other vehicles. Instead of segmenting the point cloud semantically, i.e., predicting the semantic classes such as vehicles, pedestrians, roads, etc., our approach accurately segments the scene into moving and static objects, i.e., also distinguishing between moving cars vs. parked cars. Our proposed approach exploits sequential range images from a rotating 3D LiDAR sensor as an intermediate representation combined with a convolutional neural network and runs faster than the frame rate of the sensor. We compare our approach to several other state-of-the-art methods showing superior segmentation quality in urban environments. Additionally, we created a new benchmark for LiDAR-based moving object segmentation based on SemanticKITTI. We published it to allow other researchers to compare their approaches transparently and we furthermore published our code.
We propose two generic methods for improving semi-supervised learning (SSL). The first integrates weight perturbation (WP) into existing "consistency regularization" (CR) based methods. We implement WP by leveraging variational Bayesian inference (VBI). The second method proposes a novel consistency loss called "maximum uncertainty regularization" (MUR). While most consistency losses act on perturbations in the vicinity of each data point, MUR actively searches for "virtual" points situated beyond this region that cause the most uncertain class predictions. This allows MUR to impose smoothness on a wider area in the input-output manifold. Our experiments show clear improvements in classification errors of various CR based methods when they are combined with VBI or MUR or both.
The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often refereed to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of hitherto attempts at handling uncertainty in general and formalizing this distinction in particular.
The main obstacle to weakly supervised semantic image segmentation is the difficulty of obtaining pixel-level information from coarse image-level annotations. Most methods based on image-level annotations use localization maps obtained from the classifier, but these only focus on the small discriminative parts of objects and do not capture precise boundaries. FickleNet explores diverse combinations of locations on feature maps created by generic deep neural networks. It selects hidden units randomly and then uses them to obtain activation scores for image classification. FickleNet implicitly learns the coherence of each location in the feature maps, resulting in a localization map which identifies both discriminative and other parts of objects. The ensemble effects are obtained from a single network by selecting random hidden unit pairs, which means that a variety of localization maps are generated from a single image. Our approach does not require any additional training steps and only adds a simple layer to a standard convolutional neural network; nevertheless it outperforms recent comparable techniques on the Pascal VOC 2012 benchmark in both weakly and semi-supervised settings.
Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework {\em PEORL} that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in a dynamic environment with uncertainties. Symbolic plans are used to guide the agent's task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL.