The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large. We investigate the setting where the learner additionally has access to interactive demonstrations from an expert policy, and we present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. In particular, Delphi requires $\tilde{\mathcal{O}}(d)$ expert queries and a $\texttt{poly}(d,H,|\mathcal{A}|,1/\varepsilon)$ amount of exploratory samples to provably recover an $\varepsilon$-suboptimal policy. Compared to pure RL approaches, this corresponds to an exponential improvement in sample complexity with surprisingly-little expert input. Compared to prior imitation learning (IL) approaches, our required number of expert demonstrations is independent of $H$ and logarithmic in $1/\varepsilon$, whereas all prior work required at least linear factors of both in addition to the same dependence on $d$. Towards establishing the minimal amount of expert queries needed, we show that, in the same setting, any learner whose exploration budget is polynomially-bounded (in terms of $d,H,$ and $|\mathcal{A}|$) will require at least $\tilde\Omega(\sqrt{d})$ oracle calls to recover a policy competing with the expert's value function. Under the weaker assumption that the expert's policy is linear, we show that the lower bound increases to $\tilde\Omega(d)$.
The problem of distributed optimization requires a group of networked agents to compute a parameter that minimizes the average of their local cost functions. While there are a variety of distributed optimization algorithms that can solve this problem, they are typically vulnerable to "Byzantine" agents that do not follow the algorithm. Recent attempts to address this issue focus on single dimensional functions, or assume certain statistical properties of the functions at the agents. In this paper, we provide two resilient, scalable, distributed optimization algorithms for multi-dimensional functions. Our schemes involve two filters, (1) a distance-based filter and (2) a min-max filter, which each remove neighborhood states that are extreme (defined precisely in our algorithms) at each iteration. We show that these algorithms can mitigate the impact of up to F (unknown) Byzantine agents in the neighborhood of each regular agent. In particular, we show that if the network topology satisfies certain conditions, all of the regular agents' states are guaranteed to converge to a bounded region that contains the minimizer of the average of the regular agents' functions.
We present a sample-efficient deep learning strategy for topology optimization. Our end-to-end approach is supervised and includes physics-based preprocessing and equivariant networks. We analyze how different components of our deep learning pipeline influence the number of required training samples via a large-scale comparison. The results demonstrate that including physical concepts not only drastically improves the sample efficiency but also the predictions' physical correctness. Finally, we publish two topology optimization datasets containing problems and corresponding ground truth solutions. We are confident that these datasets will improve comparability and future progress in the field.
Applications of industrial robotic manipulators such as cobots can require efficient online motion planning in environments that have a combination of static and non-static obstacles. Existing general purpose planning methods often produce poor quality solutions when available computation time is restricted, or fail to produce a solution entirely. We propose a new motion planning framework designed to operate in a user-defined task space, as opposed to the robot's workspace, that intentionally trades off workspace generality for planning and execution time efficiency. Our framework automatically constructs trajectory libraries that are queried online, similar to previous methods that exploit offline computation. Importantly, our method also offers bounded suboptimality guarantees on trajectory length. The key idea is to establish approximate isometries known as $\epsilon$-Gromov-Hausdorff approximations such that points that are close by in task space are also close in configuration space. These bounding relations further imply that trajectories can be smoothly concatenated, which enables our framework to address batch-query scenarios where the objective is to find a minimum length sequence of trajectories that visit an unordered set of goals. We evaluate our framework in simulation with several kinematic configurations, including a manipulator mounted to a mobile base. Results demonstrate that our method achieves feasible real-time performance for practical applications and suggest interesting opportunities for extending its capabilities.
Catastrophic forgetting (CF) occurs when a neural network loses the information previously learned while training on a set of samples from a different distribution, i.e., a new task. Existing approaches have achieved remarkable results in mitigating CF, especially in a scenario called task incremental learning. However, this scenario is not realistic, and limited work has been done to achieve good results on more realistic scenarios. In this paper, we propose a novel regularization method called Centroids Matching, that, inspired by meta-learning approaches, fights CF by operating in the feature space produced by the neural network, achieving good results while requiring a small memory footprint. Specifically, the approach classifies the samples directly using the feature vectors produced by the neural network, by matching those vectors with the centroids representing the classes from the current task, or all the tasks up to that point. Centroids Matching is faster than competing baselines, and it can be exploited to efficiently mitigate CF, by preserving the distances between the embedding space produced by the model when past tasks were over, and the one currently produced, leading to a method that achieves high accuracy on all the tasks, without using an external memory when operating on easy scenarios, or using a small one for more realistic ones. Extensive experiments demonstrate that Centroids Matching achieves accuracy gains on multiple datasets and scenarios.
Graph connectivity is a fundamental combinatorial optimization problem that arises in many practical applications, where usually a spanning subgraph of a network is used for its operation. However, in the real world, links may fail unexpectedly deeming the networks non-operational, while checking whether a link is damaged is costly and possibly erroneous. After an event that has damaged an arbitrary subset of the edges, the network operator must find a spanning tree of the network using non-damaged edges by making as few checks as possible. Motivated by such questions, we study the problem of finding a spanning tree in a network, when we only have access to noisy queries of the form "Does edge e exist?". We design efficient algorithms, even when edges fail adversarially, for all possible error regimes; 2-sided error (where any answer might be erroneous), false positives (where "no" answers are always correct) and false negatives (where "yes" answers are always correct). In the first two regimes we provide efficient algorithms and give matching lower bounds for general graphs. In the False Negative case we design efficient algorithms for large interesting families of graphs (e.g. bounded treewidth, sparse). Using the previous results, we provide tight algorithms for the practically useful family of planar graphs in all error regimes.
Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble and consensus clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen, and assumptions made, especially with small sample size or small cluster sizes. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combine cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use a combination of clustering internal validation criteria as a novel approximation of the posterior model probability for weighting the results from each model. From a combined posterior similarity matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. This method is implemented in an accompanying R package. We explore the performance of this approach through a case study that aims to to identify probabilistic clusters of individuals based on electroencephalography (EEG) data. We also use simulated datasets to explore the ability of the proposed technique to identify robust integrated clusters with varying levels of separations between subgroups, and with varying numbers of clusters between models.
We study exact active learning of binary and multiclass classifiers with margin. Given an $n$-point set $X \subset \mathbb{R}^m$, we want to learn any unknown classifier on $X$ whose classes have finite strong convex hull margin, a new notion extending the SVM margin. In the standard active learning setting, where only label queries are allowed, learning a classifier with strong convex hull margin $\gamma$ requires in the worst case $\Omega\big(1+\frac{1}{\gamma}\big)^{(m-1)/2}$ queries. On the other hand, using the more powerful seed queries (a variant of equivalence queries), the target classifier could be learned in $O(m \log n)$ queries via Littlestone's Halving algorithm; however, Halving is computationally inefficient. In this work we show that, by carefully combining the two types of queries, a binary classifier can be learned in time $\operatorname{poly}(n+m)$ using only $O(m^2 \log n)$ label queries and $O\big(m \log \frac{m}{\gamma}\big)$ seed queries; the result extends to $k$-class classifiers at the price of a $k!k^2$ multiplicative overhead. Similar results hold when the input points have bounded bit complexity, or when only one class has strong convex hull margin against the rest. We complement the upper bounds by showing that in the worst case any algorithm needs $\Omega\big(k m \log \frac{1}{\gamma}\big)$ seed and label queries to learn a $k$-class classifier with strong convex hull margin $\gamma$.
We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) well approximates the optimal value computed over all (including non-local) policies. Specifically, we prove that, if $|\mathcal{X}|, |\mathcal{U}|$ denote the size of state, and action spaces of individual agents, then for sufficiently small discount factor, the approximation error is given by $\mathcal{O}(e)$ where $e\triangleq \frac{1}{\sqrt{N}}\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]$. Moreover, in a special case where the reward and state transition functions are independent of the action distribution of the population, the error improves to $\mathcal{O}(e)$ where $e\triangleq \frac{1}{\sqrt{N}}\sqrt{|\mathcal{X}|}$. Finally, we also devise an algorithm to explicitly construct a local policy. With the help of our approximation results, we further establish that the constructed local policy is within $\mathcal{O}(\max\{e,\epsilon\})$ distance of the optimal policy, and the sample complexity to achieve such a local policy is $\mathcal{O}(\epsilon^{-3})$, for any $\epsilon>0$.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
We propose a new method for event extraction (EE) task based on an imitation learning framework, specifically, inverse reinforcement learning (IRL) via generative adversarial network (GAN). The GAN estimates proper rewards according to the difference between the actions committed by the expert (or ground truth) and the agent among complicated states in the environment. EE task benefits from these dynamic rewards because instances and labels yield to various extents of difficulty and the gains are expected to be diverse -- e.g., an ambiguous but correctly detected trigger or argument should receive high gains -- while the traditional RL models usually neglect such differences and pay equal attention on all instances. Moreover, our experiments also demonstrate that the proposed framework outperforms state-of-the-art methods, without explicit feature engineering.