In this paper, we present a method for table tennis ball trajectory filtering and prediction. Our gray-box approach builds on a physical model. At the same time, we use data to learn parameters of the dynamics model, of an extended Kalman filter, and of a neural model that infers the ball's initial condition. We demonstrate superior prediction performance of our approach over two black-box approaches, which are not supplied with physical prior knowledge. We demonstrate that initializing the spin from parameters of the ball launcher using a neural network drastically improves long-time prediction performance over estimating the spin purely from measured ball positions. An accurate prediction of the ball trajectory is crucial for successful returns. We therefore evaluate the return performance with a pneumatic artificial muscular robot and achieve a return rate of 29/30 (97.7%).
The pedestrian trajectory prediction task is an essential component of intelligent systems. Its applications include but are not limited to autonomous driving, robot navigation, and anomaly detection of monitoring systems. Due to the diversity of motion behaviors and the complex social interactions among pedestrians, accurately forecasting their future trajectory is challenging. Existing approaches commonly adopt GANs or CVAEs to generate diverse trajectories. However, GAN-based methods do not directly model data in a latent space, which may make them fail to have full support over the underlying data distribution; CVAE-based methods optimize a lower bound on the log-likelihood of observations, which may cause the learned distribution to deviate from the underlying distribution. The above limitations make existing approaches often generate highly biased or inaccurate trajectories. In this paper, we propose a novel generative flow based framework with dual graphormer for pedestrian trajectory prediction (STGlow). Different from previous approaches, our method can more precisely model the underlying data distribution by optimizing the exact log-likelihood of motion behaviors. Besides, our method has clear physical meanings for simulating the evolution of human motion behaviors. The forward process of the flow gradually degrades complex motion behavior into simple behavior, while its reverse process represents the evolution of simple behavior into complex motion behavior. Further, we introduce a dual graphormer combining with the graph structure to more adequately model the temporal dependencies and the mutual spatial interactions. Experimental results on several benchmarks demonstrate that our method achieves much better performance compared to previous state-of-the-art approaches.
Deep learning (DL) models for spatio-temporal traffic flow forecasting employ convolutional or graph-convolutional filters along with recurrent neural networks to capture spatial and temporal dependencies in traffic data. These models, such as CNN-LSTM, utilize traffic flows from neighboring detector stations to predict flows at a specific location of interest. However, these models are limited in their ability to capture the broader dynamics of the traffic system, as they primarily learn features specific to the detector configuration and traffic characteristics at the target location. Hence, the transferability of these models to different locations becomes challenging, particularly when data is unavailable at the new location for model training. To address this limitation, we propose a traffic flow physics-based feature transformation for spatio-temporal DL models. This transformation incorporates Newell's uncongested and congested-state estimators of traffic flows at the target locations, enabling the models to learn broader dynamics of the system. Our methodology is empirically validated using traffic data from two different locations. The results demonstrate that the proposed feature transformation improves the models' performance in predicting traffic flows over different prediction horizons, as indicated by better goodness-of-fit statistics. An important advantage of our framework is its ability to be transferred to new locations where data is unavailable. This is achieved by appropriately accounting for spatial dependencies based on station distances and various traffic parameters. In contrast, regular DL models are not easily transferable as their inputs remain fixed. It should be noted that due to data limitations, we were unable to perform spatial sensitivity analysis, which calls for further research using simulated data.
Autonomous vehicles require accurate and reliable short-term trajectory predictions for safe and efficient driving. While most commercial automated vehicles currently use state machine-based algorithms for trajectory forecasting, recent efforts have focused on end-to-end data-driven systems. Often, the design of these models is limited by the availability of datasets, which are typically restricted to generic scenarios. To address this limitation, we have developed a synthetic dataset for short-term trajectory prediction tasks using the CARLA simulator. This dataset is extensive and incorporates what is considered complex scenarios - pedestrians crossing the road, vehicles overtaking - and comprises 6000 perspective view images with corresponding IMU and odometry information for each frame. Furthermore, an end-to-end short-term trajectory prediction model using convolutional neural networks (CNN) and long short-term memory (LSTM) networks has also been developed. This model can handle corner cases, such as slowing down near zebra crossings and stopping when pedestrians cross the road, without the need for explicit encoding of the surrounding environment. In an effort to accelerate this research and assist others, we are releasing our dataset and model to the research community. Our datasets are publicly available on //github.com/navigatinguncertainty.
We propose a simple and efficient approach to generate a prediction intervals (PI) for approximated and forecasted trends. Our method leverages a weighted asymmetric loss function to estimate the lower and upper bounds of the PI, with the weights determined by its coverage probability. We provide a concise mathematical proof of the method, show how it can be extended to derive PIs for parametrised functions and argue why the method works for predicting PIs of dependent variables. The presented tests of the method on a real-world forecasting task using a neural network-based model show that it can produce reliable PIs in complex machine learning scenarios.
Robotic manipulation of slender objects is challenging, especially when the induced deformations are large and nonlinear. Traditionally, learning-based control approaches, such as imitation learning, have been used to address deformable material manipulation. These approaches lack generality and often suffer critical failure from a simple switch of material, geometric, and/or environmental (e.g., friction) properties. This article tackles a fundamental but difficult deformable manipulation task: forming a predefined fold in paper with only a single manipulator. A data-driven framework combining physically-accurate simulation and machine learning is used to train a deep neural network capable of predicting the external forces induced on the manipulated paper given a grasp position. We frame the problem using scaling analysis, resulting in a control framework robust against material and geometric changes. Path planning is then carried out over the generated "neural force manifold" to produce robot manipulation trajectories optimized to prevent sliding, with offline trajectory generation finishing 15$\times$ faster than previous physics-based folding methods. The inference speed of the trained model enables the incorporation of real-time visual feedback to achieve closed-loop sensorimotor control. Real-world experiments demonstrate that our framework can greatly improve robotic manipulation performance compared to state-of-the-art folding strategies, even when manipulating paper objects of various materials and shapes.
The ICESat-2, launched in 2018, carries the ATLAS instrument, which is a photon-counting spaceborne lidar that provides strip samples over the terrain. While primarily designed for snow and ice monitoring, there has been a great interest in using ICESat-2 to predict forest above-ground biomass density (AGBD). As ICESat-2 is on a polar orbit, it provides good spatial coverage of boreal forests. The aim of this study is to evaluate the estimation of mean AGBD from ICESat-2 data using a hierarchical modeling approach combined with rigorous statistical inference. We propose a hierarchical hybrid inference approach for uncertainty quantification of the AGBD estimated from ICESat-2 lidar strips. Our approach models the errors coming from the multiple modeling steps, including the allometric models used for predicting tree-level AGB. For testing the procedure, we have data from two adjacent study sites, denoted Valtimo and Nurmes, of which Valtimo site is used for model training and Nurmes for validation. The ICESat-2 estimated mean AGBD in the Nurmes validation area was 63.2$\pm$1.9 Mg/ha (relative standard error of 2.9%). The local reference hierarchical model-based estimate obtained from wall-to-wall airborne lidar data was 63.9$\pm$0.6 Mg/ha (relative standard error of 1.0%). The reference estimate was within the 95% confidence interval of the ICESat-2 hierarchical hybrid estimate. The small standard errors indicate that the proposed method is useful for AGBD assessment. However, some sources of error were not accounted for in the study and thus the real uncertainties are probably slightly larger than those reported.
This paper addresses human-robot collaboration (HRC) challenges of integrating predictions of human activity to provide a proactive-n-reactive response capability for the robot. Prior works that consider current or predicted human poses as static obstacles are too nearsighted or too conservative in planning, potentially causing delayed robot paths. Alternatively, time-varying prediction of human poses would enable robot paths that avoid anticipated human poses, synchronized dynamically in time and space. Herein, a proactive path planning method, denoted STAP, is presented that uses spatiotemporal human occupancy maps to find robot trajectories that anticipate human movements, allowing robot passage without stopping. In addition, STAP anticipates delays from robot speed restrictions required by ISO/TS 15066 speed and separation monitoring (SSM). STAP also proposes a sampling-based planning algorithm based on RRT* to solve the spatio-temporal motion planning problem and find paths of minimum expected duration. Experimental results show STAP generates paths of shorter duration and greater average robot-human separation distance throughout tasks. Additionally, STAP more accurately estimates robot trajectory durations in HRC, which are useful in arriving at proactive-n-reactive robot sequencing.
Recent developments in deep learning have made remarkable progress in speeding up the prediction of quantum chemical (QC) properties by removing the need for expensive electronic structure calculations like density functional theory. However, previous methods learned from 1D SMILES sequences or 2D molecular graphs failed to achieve high accuracy as QC properties primarily depend on the 3D equilibrium conformations optimized by electronic structure methods, far different from the sequence-type and graph-type data. In this paper, we propose a novel approach called Uni-Mol+ to tackle this challenge. Uni-Mol+ first generates a raw 3D molecule conformation from inexpensive methods such as RDKit. Then, the raw conformation is iteratively updated to its target DFT equilibrium conformation using neural networks, and the learned conformation will be used to predict the QC properties. To effectively learn this update process towards the equilibrium conformation, we introduce a two-track Transformer model backbone and train it with the QC property prediction task. We also design a novel approach to guide the model's training process. Our extensive benchmarking results demonstrate that the proposed Uni-Mol+ significantly improves the accuracy of QC property prediction in various datasets. We have made the code and model publicly available at \url{//github.com/dptech-corp/Uni-Mol}.
Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.
Graph Convolutional Network (GCN) has been widely applied in transportation demand prediction due to its excellent ability to capture non-Euclidean spatial dependence among station-level or regional transportation demands. However, in most of the existing research, the graph convolution was implemented on a heuristically generated adjacency matrix, which could neither reflect the real spatial relationships of stations accurately, nor capture the multi-level spatial dependence of demands adaptively. To cope with the above problems, this paper provides a novel graph convolutional network for transportation demand prediction. Firstly, a novel graph convolution architecture is proposed, which has different adjacency matrices in different layers and all the adjacency matrices are self-learned during the training process. Secondly, a layer-wise coupling mechanism is provided, which associates the upper-level adjacency matrix with the lower-level one. It also reduces the scale of parameters in our model. Lastly, a unitary network is constructed to give the final prediction result by integrating the hidden spatial states with gated recurrent unit, which could capture the multi-level spatial dependence and temporal dynamics simultaneously. Experiments have been conducted on two real-world datasets, NYC Citi Bike and NYC Taxi, and the results demonstrate the superiority of our model over the state-of-the-art ones.