Learning a reward function from demonstrations suffers from low sample-efficiency. Even with abundant data, current inverse reinforcement learning methods that focus on learning from a single environment can fail to handle slight changes in the environment dynamics. We tackle these challenges through adaptive environment design. In our framework, the learner repeatedly interacts with the expert, with the former selecting environments to identify the reward function as quickly as possible from the expert's demonstrations in said environments. This results in improvements in both sample-efficiency and robustness, as we show experimentally, for both exact and approximate inference.
We explore a kind of first-order predicate logic with intended semantics in the reals. Compared to other approaches in the literature, we work predominantly in the multiplicative reals $[0,\infty]$, showing they support three generations of connectives, that we call non-linear, linear additive, and linear multiplicative. Means and harmonic means emerge as natural candidates for bounded existential and universal quantifiers, and in fact we see they behave as expected in relation to the other logical connectives. We explain this fact through the well-known fact that min/max and arithmetic mean/harmonic mean sit at opposite ends of a spectrum, that of p-means. We give syntax and semantics for this quantitative predicate logic, and as example applications, we show how softmax is the quantitative semantics of argmax, and R\'enyi entropy/Hill numbers are additive/multiplicative semantics of the same formula. Indeed, the additive reals also fit into the story by exploiting the Napierian duality $-\log \dashv 1/\exp$, which highlights a formal distinction between 'additive' and 'multiplicative' quantities. Finally, we describe two attempts at a categorical semantics via enriched hyperdoctrines. We discuss why hyperdoctrines are in fact probably inadequate for this kind of logic.
We develop statistical models for samples of distribution-valued stochastic processes featuring time-indexed univariate distributions, with emphasis on functional principal component analysis. The proposed model presents an intrinsic rather than transformation-based approach. The starting point is a transport process representation for distribution-valued processes under the Wasserstein metric. Substituting transports for distributions addresses the challenge of centering distribution-valued processes and leads to a useful and interpretable decomposition of each realized process into a process-specific single transport and a real-valued trajectory. This representation makes it possible to utilize a scalar multiplication operation for transports and facilitates not only functional principal component analysis but also to introduce a latent Gaussian process. This Gaussian process proves especially useful for the case where the distribution-valued processes are only observed on a sparse grid of time points, establishing an approach for longitudinal distribution-valued data. We study the convergence of the key components of this novel representation to their population targets and demonstrate the practical utility of the proposed approach through simulations and several data illustrations.
We propose a new Monte Carlo-based estimator for digital options with assets modelled by a stochastic differential equation (SDE). The new estimator is based on repeated path splitting and relies on the correlation of approximate paths of the underlying SDE that share parts of a Brownian path. Combining this new estimator with Multilevel Monte Carlo (MLMC) leads to an estimator with a computational complexity that is similar to the complexity of a MLMC estimator when applied to options with Lipschitz payoffs. This preprint includes detailed calculations and proofs (in grey colour) which are not peer-reviewed and not included in the published article.
Inductive conformal predictors (ICPs) are algorithms that are able to generate prediction sets, instead of point predictions, which are valid at a user-defined confidence level, only assuming exchangeability. These algorithms are useful for reliable machine learning and are increasing in popularity. The ICP development process involves dividing development data into three parts: training, calibration and test. With access to limited or expensive development data, it is an open question regarding the most efficient way to divide the data. This study provides several experiments to explore this question and consider the case for allowing overlap of examples between training and calibration sets. Conclusions are drawn that will be of value to academics and practitioners planning to use ICPs.
We give a new coalgebraic semantics for intuitionistic modal logic with $\Box$. In particular, we provide a colagebraic representation of intuitionistic descriptive modal frames and of intuitonistic modal Kripke frames based on image-finite posets. This gives a solution to a problem in the area of coalgebaic logic for these classes of frames, raised explicitly by Litak (2014) and de Groot and Pattinson (2020). Our key technical tool is a recent generalization of a construction by Ghilardi, in the form of a right adjoint to the inclusion of the category of Esakia spaces in the category of Priestley spaces. As an application of these results, we study bisimulations of intuitionistic modal frames, describe dual spaces of free modal Heyting algebras, and provide a path towards a theory of coalgebraic intuitionistic logics.
It was recently conjectured that every component of a discrete-time rational dynamical system is a solution to an algebraic difference equation that is linear in its highest-shift term (a quasi-linear equation). We prove that the conjecture holds in the special case of holonomic sequences, which can straightforwardly be represented by rational dynamical systems. We propose two algorithms for converting holonomic recurrence equations into such quasi-linear equations. The two algorithms differ in their efficiency and the minimality of orders in their outputs.
Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views of the same image, while minimize the similarities between feature representations of views of different images. In text summarization, the output summary is a shorter form of the input document and they have similar meanings. In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training. We improve over a strong sequence-to-sequence text generation model (i.e., BART) on three different summarization datasets. Human evaluation also shows that our model achieves better faithfulness ratings compared to its counterpart without contrastive objectives.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.
Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).