Perceiving and interacting with 3D articulated objects, such as cabinets, doors, and faucets, pose particular challenges for future home-assistant robots performing daily tasks in human environments. Besides parsing the articulated parts and joint parameters, researchers recently advocate learning manipulation affordance over the input shape geometry which is more task-aware and geometrically fine-grained. However, taking only passive observations as inputs, these methods ignore many hidden but important kinematic constraints (e.g., joint location and limits) and dynamic factors (e.g., joint friction and restitution), therefore losing significant accuracy for test cases with such uncertainties. In this paper, we propose a novel framework, named AdaAfford, that learns to perform very few test-time interactions for quickly adapting the affordance priors to more accurate instance-specific posteriors. We conduct large-scale experiments using the PartNet-Mobility dataset and prove that our system performs better than baselines.
This paper investigates the problem of regret minimization in linear time-varying (LTV) dynamical systems. Due to the simultaneous presence of uncertainty and non-stationarity, designing online control algorithms for unknown LTV systems remains a challenging task. At a cost of NP-hard offline planning, prior works have introduced online convex optimization algorithms, although they suffer from nonparametric rate of regret. In this paper, we propose the first computationally tractable online algorithm with regret guarantees that avoids offline planning over the state linear feedback policies. Our algorithm is based on the optimism in the face of uncertainty (OFU) principle in which we optimistically select the best model in a high confidence region. Our algorithm is then more explorative when compared to previous approaches. To overcome non-stationarity, we propose either a restarting strategy (R-OFU) or a sliding window (SW-OFU) strategy. With proper configuration, our algorithm is attains sublinear regret $O(T^{2/3})$. These algorithms utilize data from the current phase for tracking variations on the system dynamics. We corroborate our theoretical findings with numerical experiments, which highlight the effectiveness of our methods. To the best of our knowledge, our study establishes the first model-based online algorithm with regret guarantees under LTV dynamical systems.
Real-time control for robotics is a popular research area in the reinforcement learning (RL) community. Through the use of techniques such as reward shaping, researchers have managed to train online agents across a multitude of domains. Despite these advances, solving goal-oriented tasks still require complex architectural changes or heavy constraints to be placed on the problem. To address this issue, recent works have explored how curriculum learning can be used to separate a complex task into sequential sub-goals, hence enabling the learning of a problem that may otherwise be too difficult to learn from scratch. In this article, we present how curriculum learning, reward shaping, and a high number of efficiently parallelized environments can be coupled together to solve the problem of multiple cube stacking. Finally, we extend the best configuration identified on a higher complexity environment with differently shaped objects.
Social robots are expected to be a human labor support technology, and one application of them is an advertising medium in public spaces. When social robots provide information, such as recommended shops, adaptive communication according to the user's state is desired. User engagement, which is also defined as the level of interest in the robot, is likely to play an important role in adaptive communication. Therefore, in this paper, we propose a new framework to estimate user engagement. The proposed method focuses on four unsolved open problems: multi-party interactions, process of state change in engagement, difficulty in annotating engagement, and interaction dataset in the real world. The accuracy of the proposed method for estimating engagement was evaluated using interaction duration. The results show that the interaction duration can be accurately estimated by considering the influence of the behaviors of other people; this also implies that the proposed model accurately estimates the level of engagement during interaction with the robot.
We study the problem of online learning in competitive settings in the context of two-sided matching markets. In particular, one side of the market, the agents, must learn about their preferences over the other side, the firms, through repeated interaction while competing with other agents for successful matches. We propose a class of decentralized, communication- and coordination-free algorithms that agents can use to reach to their stable match in structured matching markets. In contrast to prior works, the proposed algorithms make decisions based solely on an agent's own history of play and requires no foreknowledge of the firms' preferences. Our algorithms are constructed by splitting up the statistical problem of learning one's preferences, from noisy observations, from the problem of competing for firms. We show that under realistic structural assumptions on the underlying preferences of the agents and firms, the proposed algorithms incur a regret which grows at most logarithmically in the time horizon. Our results show that, in the case of matching markets, competition need not drastically affect the performance of decentralized, communication and coordination free online learning algorithms.
Mesh degeneration is a bottleneck for fluid-structure interaction (FSI) simulations and for shape optimization via the method of mappings. In both cases, an appropriate mesh motion technique is required. The choice is typically based on heuristics, e.g., the solution operators of partial differential equations (PDE), such as the Laplace or biharmonic equation. Especially the latter, which shows good numerical performance for large displacements, is expensive. Moreover, from a continuous perspective, choosing the mesh motion technique is to a certain extent arbitrary and has no influence on the physically relevant quantities. Therefore, we consider approaches inspired by machine learning. We present a hybrid PDE-NN approach, where the neural network (NN) serves as parameterization of a coefficient in a second order nonlinear PDE. We ensure existence of solutions for the nonlinear PDE by the choice of the neural network architecture. Moreover, we propose a splitting of the monolithic FSI system into three smaller subsystems, in order to segregate the mesh motion. We assess the quality of the learned mesh motion technique by applying it to a FSI benchmark problem.
Learning causal structures from observation and experimentation is a central task in many domains. For example, in biology, recent advances allow us to obtain single-cell expression data under multiple interventions such as drugs or gene knockouts. However, a key challenge is that often the targets of the interventions are uncertain or unknown. Thus, standard causal discovery methods can no longer be used. To fill this gap, we propose a Bayesian framework (BaCaDI) for discovering the causal structure that underlies data generated under various unknown experimental/interventional conditions. BaCaDI is fully differentiable and operates in the continuous space of latent probabilistic representations of both causal structures and interventions. This enables us to approximate complex posteriors via gradient-based variational inference and to reason about the epistemic uncertainty in the predicted structure. In experiments on synthetic causal discovery tasks and simulated gene-expression data, BaCaDI outperforms related methods in identifying causal structures and intervention targets. Finally, we demonstrate that, thanks to its rigorous Bayesian approach, our method provides well-calibrated uncertainty estimates.
There are many challenges in applying deep reinforcement learning (DRL) to autonomous driving in a structured environment such as an urban area. This is because the massive traffic flows moving along the road network change dynamically. It is a key factor to detect changes in the intentions of surrounding vehicles and quickly find a response strategy. In this paper, we suggest a new framework that effectively combines graph-based intention representation learning and reinforcement learning for kinodynamic planning. Specifically, the movement of dynamic agents is expressed as a graph. The spatio-temporal locality of node features is conserved and the features are aggregated by considering the interaction between adjacent nodes. We simultaneously learn motion planner and controller that share the aggregated information via a safe RL framework. We intuitively interpret a given situation with predicted trajectories to generate additional cost signals. The dense cost signals encourage the policy to be safe for dynamic risk. Moreover, by utilizing the data obtained through the direct rollout of learned policy, robust intention inference is achieved for various situations encountered in training. We set up a navigation scenario in which various situations exist by using CARLA, an urban driving simulator. The experiments show the state-of-the-art performance of our approach compared to the existing baselines.
We propose and demonstrate a new approach for fast and accurate surrogate modelling of urban drainage system hydraulics based on physics-guided machine learning. The surrogates are trained against a limited set of simulation results from a hydrodynamic (HiFi) model. Our approach reduces simulation times by one to two orders of magnitude compared to a HiFi model. It is thus slower than e.g. conceptual hydrological models, but it enables simulations of water levels, flows and surcharges in all nodes and links of a drainage network and thus largely preserves the level of detail provided by HiFi models. Comparing time series simulated by the surrogate and the HiFi model, R2 values in the order of 0.9 are achieved. Surrogate training times are currently in the order of one hour. However, they can likely be reduced through the application of transfer learning and graph neural networks. Our surrogate approach will be useful for interactive workshops in initial design phases of urban drainage systems, as well as for real time applications. In addition, our model formulation is generic and future research should investigate its application for simulating other water systems.
Seamlessly interacting with humans or robots is hard because these agents are non-stationary. They update their policy in response to the ego agent's behavior, and the ego agent must anticipate these changes to co-adapt. Inspired by humans, we recognize that robots do not need to explicitly model every low-level action another agent will make; instead, we can capture the latent strategy of other agents through high-level representations. We propose a reinforcement learning-based framework for learning latent representations of an agent's policy, where the ego agent identifies the relationship between its behavior and the other agent's future strategy. The ego agent then leverages these latent dynamics to influence the other agent, purposely guiding them towards policies suitable for co-adaptation. Across several simulated domains and a real-world air hockey game, our approach outperforms the alternatives and learns to influence the other agent.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.