Predicting the future motion of dynamic agents is of paramount importance to ensure safety or assess risks in motion planning for autonomous robots. In this paper, we propose a two-stage motion prediction method, referred to as R-Pred, that effectively utilizes both the scene and interaction context using a cascade of the initial trajectory proposal network and the trajectory refinement network. The initial trajectory proposal network produces M trajectory proposals corresponding to M modes of a future trajectory distribution. The trajectory refinement network enhances each of M proposals using 1) the tube-query scene attention (TQSA) and 2) the proposal-level interaction attention (PIA). TQSA uses tube-queries to aggregate the local scene context features pooled from proximity around the trajectory proposals of interest. PIA further enhances the trajectory proposals by modeling inter-agent interactions using a group of trajectory proposals selected based on their distances from neighboring agents. Our experiments conducted on the Argoverse and nuScenes datasets demonstrate that the proposed refinement network provides significant performance improvements compared to the single-stage baseline and that R-Pred achieves state-of-the-art performance in some categories of the benchmark.
The importance of unspanned macroeconomic variables for Dynamic Term Structure Models has been intensively discussed in the literature. To our best knowledge the earlier studies considered only linear interactions between the economy and the real-world dynamics of interest rates in DTSMs. We propose a generalized modelling setup for Gaussian DTSMs which allows for unspanned nonlinear associations between the two and we exploit it in forecasting. Specifically, we construct a custom sequential Monte Carlo estimation and forecasting scheme where we introduce Gaussian Process priors to model nonlinearities. Sequential scheme we propose can also be used with dynamic portfolio optimization to assess the potential of generated economic value to investors. The methodology is presented using US Treasury data and selected macroeconomic indices. Namely, we look at core inflation and real economic activity. We contrast the results obtained from the nonlinear model with those stemming from an application of a linear model. Unlike for real economic activity, in case of core inflation we find that, compared to linear models, application of nonlinear models leads to statistically significant gains in economic value across considered maturities.
Dynamic Movement Primitives (DMP) have found remarkable applicability and success in various robotic tasks, which can be mainly attributed to their generalization, modulation and robustness properties. Nevertheless, the spatial generalization of DMP can be problematic in some cases, leading to excessive or unnatural spatial scaling. Moreover, incorporating intermediate points (via-points) to adjust the DMP trajectory, is not adequately addressed. In this work we propose an improved online spatial generalization, that remedies the shortcomings of the classical DMP generalization, and moreover allows the incorporation of dynamic via-points. This is achieved by designing an online adaptation scheme for the DMP weights which is proved to minimize the distance from the demonstrated acceleration profile in order to retain the shape of the demonstration, subject to dynamic via-point and initial/final state constraints. Extensive comparative simulations with the classical and other DMP variants are conducted, while experimental results validate the applicability and efficacy of the proposed method.
Objects rarely sit in isolation in everyday human environments. If we want robots to operate and perform tasks in our human environments, they must understand how the objects they manipulate will interact with structural elements of the environment for all but the simplest of tasks. As such, we'd like our robots to reason about how multiple objects and environmental elements relate to one another and how those relations may change as the robot interacts with the world. We examine the problem of predicting inter-object and object-environment relations between previously unseen objects and novel environments purely from partial-view point clouds. Our approach enables robots to plan and execute sequences to complete multi-object manipulation tasks defined from logical relations. This removes the burden of providing explicit, continuous object states as goals to the robot. We explore several different neural network architectures for this task. We find the best performing model to be a novel transformer-based neural network that both predicts object-environment relations and learns a latent-space dynamics function. We achieve reliable sim-to-real transfer without any fine-tuning. Our experiments show that our model understands how changes in observed environmental geometry relate to semantic relations between objects. We show more videos on our website: //sites.google.com/view/erelationaldynamics.
Recently, speech separation (SS) task has achieved remarkable progress driven by deep learning technique. However, it is still challenging to separate target signals from noisy mixture, as neural model is vulnerable to assign background noise to each speaker. In this paper, we propose a noise-aware SS method called NASS, which aims to improve the speech quality of separated signals in noisy conditions. Specifically, NASS views background noise as an independent speaker and predicts it with other speakers in a mask-based manner. Then we conduct patch-wise contrastive learning on feature level to minimize the mutual information between the predicted noise-speaker and other speakers, which suppresses the noise information in separated signals. The experimental results show that NASS effectively improves the noise-robustness for different mask-based separation backbones with less than 0.1M parameter increase. Furthermore, SI-SNRi results demonstrate that NASS achieves state-of-the-art performance on WHAM! dataset.
We introduce Reactive Action and Motion Planner (RAMP), which combines the strengths of search-based and reactive approaches for motion planning. In essence, RAMP is a hierarchical approach where a novel variant of a Model Predictive Path Integral (MPPI) controller is used to generate trajectories which are then followed asynchronously by a local vector field controller. We demonstrate, in the context of a table clearing application, that RAMP can rapidly find paths in the robot's configuration space, satisfy task and robot-specific constraints, and provide safety by reacting to static or dynamically moving obstacles. RAMP achieves superior performance through a number of key innovations: we use Signed Distance Function (SDF) representations directly from the robot configuration space, both for collision checking and reactive control. The use of SDFs allows for a smoother definition of collision cost when planning for a trajectory, and is critical in ensuring safety while following trajectories. In addition, we introduce a novel variant of MPPI which, combined with the safety guarantees of the vector field trajectory follower, performs incremental real-time global trajectory planning. Simulation results establish that our method can generate paths that are comparable to traditional and state-of-the-art approaches in terms of total trajectory length while being up to 30 times faster. Real-world experiments demonstrate the safety and effectiveness of our approach in challenging table clearing scenarios.
This research focuses on the discovery and localization of hidden objects in the wild and serves unmanned systems. Through empirical analysis, infrared and visible image fusion (IVIF) enables hard-to-find objects apparent, whereas multimodal salient object detection (SOD) accurately delineates the precise spatial location of objects within the picture. Their common characteristic of seeking complementary cues from different source images motivates us to explore the collaborative relationship between Fusion and Salient object detection tasks on infrared and visible images via an Interactively Reinforced multi-task paradigm for the first time, termed IRFS. To the seamless bridge of multimodal image fusion and SOD tasks, we specifically develop a Feature Screening-based Fusion subnetwork (FSFNet) to screen out interfering features from source images, thereby preserving saliency-related features. After generating the fused image through FSFNet, it is then fed into the subsequent Fusion-Guided Cross-Complementary SOD subnetwork (FC$^2$Net) as the third modality to drive the precise prediction of the saliency map by leveraging the complementary information derived from the fused image. In addition, we develop an interactive loop learning strategy to achieve the mutual reinforcement of IVIF and SOD tasks with a shorter training period and fewer network parameters. Comprehensive experiment results demonstrate that the seamless bridge of IVIF and SOD mutually enhances their performance, and highlights their superiority.
This paper proposes a method for learning continuous control policies for active landmark localization and exploration using an information-theoretic cost. We consider a mobile robot detecting landmarks within a limited sensing range, and tackle the problem of learning a control policy that maximizes the mutual information between the landmark states and the sensor observations. We employ a Kalman filter to convert the partially observable problem in the landmark state to Markov decision process (MDP), a differentiable field of view to shape the reward, and an attention-based neural network to represent the control policy. The approach is further unified with active volumetric mapping to promote exploration in addition to landmark localization. The performance is demonstrated in several simulated landmark localization tasks in comparison with benchmark methods.
Recommender systems play a fundamental role in web applications in filtering massive information and matching user interests. While many efforts have been devoted to developing more effective models in various scenarios, the exploration on the explainability of recommender systems is running behind. Explanations could help improve user experience and discover system defects. In this paper, after formally introducing the elements that are related to model explainability, we propose a novel explainable recommendation model through improving the transparency of the representation learning process. Specifically, to overcome the representation entangling problem in traditional models, we revise traditional graph convolution to discriminate information from different layers. Also, each representation vector is factorized into several segments, where each segment relates to one semantic aspect in data. Different from previous work, in our model, factor discovery and representation learning are simultaneously conducted, and we are able to handle extra attribute information and knowledge. In this way, the proposed model can learn interpretable and meaningful representations for users and items. Unlike traditional methods that need to make a trade-off between explainability and effectiveness, the performance of our proposed explainable model is not negatively affected after considering explainability. Finally, comprehensive experiments are conducted to validate the performance of our model as well as explanation faithfulness.
In this paper, we propose a novel multi-task learning architecture, which incorporates recent advances in attention mechanisms. Our approach, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with task-specific soft-attention modules, which are trainable in an end-to-end manner. These attention modules allow for learning of task-specific features from the global pool, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. Experiments on the CityScapes dataset show that our method outperforms several baselines in both single-task and multi-task learning, and is also more robust to the various weighting schemes in the multi-task loss function. We further explore the effectiveness of our method through experiments over a range of task complexities, and show how our method scales well with task complexity compared to baselines.
Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.