Inverse Kinematics (IK) solves the problem of mapping from the Cartesian space to the joint configuration space of a robotic arm. It has a wide range of applications in areas such as computer graphics, protein structure prediction, and robotics. With the vast advances of artificial neural networks (NNs), many researchers recently turned to data-driven approaches to solving the IK problem. Unfortunately, NNs become inadequate for robotic arms with redundant Degrees-of-Freedom (DoFs). This is because such arms may have multiple angle solutions to reach the same desired pose, while typical NNs only implement one-to-one mapping functions, which associate just one consistent output for a given input. In order to train usable NNs to solve the IK problem, most existing works employ customized training datasets, in which every desired pose only has one angle solution. This inevitably limits the generalization and automation of the proposed approaches. This paper breaks through at two fronts: (1) a systematic and mechanical approach to training data collection that covers the entire working space of the robotic arm, and can be fully automated and done only once after the arm is developed; and (2) a novel NN-based framework that can leverage the redundant DoFs to produce multiple angle solutions to any given desired pose of the robotic arm. The latter is especially useful for robotic applications such as obstacle avoidance and posture imitation.
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages. On one hand, RL approaches are able to learn global control policies directly from data, but generally require large sample sizes to properly converge towards feasible policies. On the other hand, TO methods are able to exploit gradient-based information extracted from simulators to quickly converge towards a locally optimal control trajectory which is only valid within the vicinity of the solution. Over the past decade, several approaches have aimed to adequately combine the two classes of methods in order to obtain the best of both worlds. Following on from this line of research, we propose several improvements on top of these approaches to learn global control policies quicker, notably by leveraging sensitivity information stemming from TO methods via Sobolev learning, and augmented Lagrangian techniques to enforce the consensus between TO and policy learning. We evaluate the benefits of these improvements on various classical tasks in robotics through comparison with existing approaches in the literature.
Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for robotic manipulation. Existing numerical solvers are broadly applicable, but rely on local search techniques to manage highly nonconvex objective functions. Recently, learning-based approaches have shown promise as a means to generate fast and accurate IK results; learned solvers can easily be integrated with other learning algorithms in end-to-end systems. However, learning-based methods have an Achilles' heel: each robot of interest requires a specialized model which must be trained from scratch. To address this key shortcoming, we investigate a novel distance-geometric robot representation coupled with a graph structure that allows us to leverage the flexibility of graph neural networks (GNNs). We use this approach to train the first learned generative graphical inverse kinematics (GGIK) solver that is, crucially, "robot-agnostic"-a single model is able to provide IK solutions for a variety of different robots. Additionally, the generative nature of GGIK allows the solver to produce a large number of diverse solutions in parallel with minimal additional computation time, making it appropriate for applications such as sampling-based motion planning. Finally, GGIK can complement local IK solvers by providing reliable initializations. These advantages, as well as the ability to use task-relevant priors and to continuously improve with new data, suggest that GGIK has the potential to be a key component of flexible, learning-based robotic manipulation systems.
We propose a novel hybrid cable-based robot with manipulator and camera for high-accuracy, medium-throughput plant monitoring in a vertical hydroponic farm and, as an example application, demonstrate non-destructive plant mass estimation. Plant monitoring with high temporal and spatial resolution is important to both farmers and researchers to detect anomalies and develop predictive models for plant growth. The availability of high-quality, off-the-shelf structure-from-motion (SfM) and photogrammetry packages has enabled a vibrant community of roboticists to apply computer vision for non-destructive plant monitoring. While existing approaches tend to focus on either high-throughput (e.g. satellite, unmanned aerial vehicle (UAV), vehicle-mounted, conveyor-belt imagery) or high-accuracy/robustness to occlusions (e.g. turn-table scanner or robot arm), we propose a middle-ground that achieves high accuracy with a medium-throughput, highly automated robot. Our design pairs the workspace scalability of a cable-driven parallel robot (CDPR) with the dexterity of a 4 degree-of-freedom (DoF) robot arm to autonomously image many plants from a variety of viewpoints. We describe our robot design and demonstrate it experimentally by collecting daily photographs of 54 plants from 64 viewpoints each. We show that our approach can produce scientifically useful measurements, operate fully autonomously after initial calibration, and produce better reconstructions and plant property estimates than those of over-canopy methods (e.g. UAV). As example applications, we show that our system can successfully estimate plant mass with a Mean Absolute Error (MAE) of 0.586g and, when used to perform hypothesis testing on the relationship between mass and age, produces p-values comparable to ground-truth data (p=0.0020 and p=0.0016, respectively).
Connectivity augmentation problems are among the most elementary questions in Network Design. Many of these problems admit natural $2$-approximation algorithms, often through various classic techniques, whereas it remains open whether approximation factors below $2$ can be achieved. One of the most basic examples thereof is the Weighted Connectivity Augmentation Problem (WCAP). In WCAP, one is given an undirected graph together with a set of additional weighted candidate edges, and the task is to find a cheapest set of candidate edges whose addition to the graph increases its edge-connectivity. We present a $(1.5+\varepsilon)$-approximation algorithm for WCAP, showing for the first time that factors below $2$ are achievable. On a high level, we design a well-chosen local search algorithm, inspired by recent advances for Weighted Tree Augmentation. To measure progress, we consider a directed weakening of WCAP and show that it has highly structured planar solutions. Interpreting a solution of the original problem as one of this directed weakening allows us to describe local exchange steps in a clean and algorithmically amenable way. Leveraging these insights, we show that we can efficiently search for good exchange steps within a component class for link sets that is closely related to bounded treewidth subgraphs of circle graphs. Moreover, we prove that an optimum solution can be decomposed into smaller components, at least one of which leads to a good local search step as long as we did not yet achieve the claimed approximation guarantee.
Contrastive learning has recently shown immense potential in unsupervised visual representation learning. Existing studies in this track mainly focus on intra-image invariance learning. The learning typically uses rich intra-image transformations to construct positive pairs and then maximizes agreement using a contrastive loss. The merits of inter-image invariance, conversely, remain much less explored. One major obstacle to exploit inter-image invariance is that it is unclear how to reliably construct inter-image positive pairs, and further derive effective supervision from them since no pair annotations are available. In this work, we present a comprehensive empirical study to better understand the role of inter-image invariance learning from three main constituting components: pseudo-label maintenance, sampling strategy, and decision boundary design. To facilitate the study, we introduce a unified and generic framework that supports the integration of unsupervised intra- and inter-image invariance learning. Through carefully-designed comparisons and analysis, multiple valuable observations are revealed: 1) online labels converge faster and perform better than offline labels; 2) semi-hard negative samples are more reliable and unbiased than hard negative samples; 3) a less stringent decision boundary is more favorable for inter-image invariance learning. With all the obtained recipes, our final model, namely InterCLR, shows consistent improvements over state-of-the-art intra-image invariance learning methods on multiple standard benchmarks. We hope this work will provide useful experience for devising effective unsupervised inter-image invariance learning. Code: //github.com/open-mmlab/mmselfsup.
Aerial base stations (ABSs) allow smart farms to offload processing responsibility of complex tasks from internet of things (IoT) devices to ABSs. IoT devices have limited energy and computing resources, thus it is required to provide an advanced solution for a system that requires the support of ABSs. This paper introduces a novel multi-actor-based risk-sensitive reinforcement learning approach for ABS task scheduling for smart agriculture. The problem is defined as task offloading with a strict condition on completing the IoT tasks before their deadlines. Moreover, the algorithm must also consider the limited energy capacity of the ABSs. The results show that our proposed approach outperforms several heuristics and the classic Q-Learning approach. Furthermore, we provide a mixed integer linear programming solution to determine a lower bound on the performance, and clarify the gap between our risk-sensitive solution and the optimal solution, as well. The comparison proves our extensive simulation results demonstrate that our method is a promising approach for providing a guaranteed task processing services for the IoT tasks in a smart farm, while increasing the hovering time of the ABSs in this farm.
Iterative linear quadratic regulator (iLQR) has gained wide popularity in addressing trajectory optimization problems with nonlinear system models. However, as a model-based shooting method, it relies heavily on an accurate system model to update the optimal control actions and the trajectory determined with forward integration, thus becoming vulnerable to inevitable model inaccuracies. Recently, substantial research efforts in learning-based methods for optimal control problems have been progressing significantly in addressing unknown system models, particularly when the system has complex interactions with the environment. Yet a deep neural network is normally required to fit substantial scale of sampling data. In this work, we present Neural-iLQR, a learning-aided shooting method over the unconstrained control space, in which a neural network with a simple structure is used to represent the local system model. In this framework, the trajectory optimization task is achieved with simultaneous refinement of the optimal policy and the neural network iteratively, without relying on the prior knowledge of the system model. Through comprehensive evaluations on two illustrative control tasks, the proposed method is shown to outperform the conventional iLQR significantly in the presence of inaccuracies in system models.
Designing learning systems which are invariant to certain data transformations is critical in machine learning. Practitioners can typically enforce a desired invariance on the trained model through the choice of a network architecture, e.g. using convolutions for translations, or using data augmentation. Yet, enforcing true invariance in the network can be difficult, and data invariances are not always known a piori. State-of-the-art methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems, which are complex to solve and often computationally demanding. In this work we investigate new ways of learning invariances only from the training data. Using learnable augmentation layers built directly in the network, we demonstrate that our method is very versatile. It can incorporate any type of differentiable augmentation and be applied to a broad class of learning problems beyond computer vision. We provide empirical evidence showing that our approach is easier and faster to train than modern automatic data augmentation techniques based on bilevel optimization, while achieving comparable results. Experiments show that while the invariances transferred to a model through automatic data augmentation are limited by the model expressivity, the invariance yielded by our approach is insensitive to it by design.
We formulate an ergodic theory for the (almost sure) limit $\mathcal{P}^\text{co}_{\tilde{\mathcal{E}}}$ of a sequence $(\mathcal{P}^\text{co}_{\mathcal{E}_n})$ of successive dynamic imprecise probability kinematics (DIPK, introduced in Caprio and Gong, 2021) updates of a set $\mathcal{P}^\text{co}_{\mathcal{E}_0}$ representing the initial beliefs of an agent. As a consequence, we formulate a strong law of large numbers.
Deep neural networks have been able to outperform humans in some cases like image recognition and image classification. However, with the emergence of various novel categories, the ability to continuously widen the learning capability of such networks from limited samples, still remains a challenge. Techniques like Meta-Learning and/or few-shot learning showed promising results, where they can learn or generalize to a novel category/task based on prior knowledge. In this paper, we perform a study of the existing few-shot meta-learning techniques in the computer vision domain based on their method and evaluation metrics. We provide a taxonomy for the techniques and categorize them as data-augmentation, embedding, optimization and semantics based learning for few-shot, one-shot and zero-shot settings. We then describe the seminal work done in each category and discuss their approach towards solving the predicament of learning from few samples. Lastly we provide a comparison of these techniques on the commonly used benchmark datasets: Omniglot, and MiniImagenet, along with a discussion towards the future direction of improving the performance of these techniques towards the final goal of outperforming humans.