We propose a novel hybrid cable-based robot with manipulator and camera for high-accuracy, medium-throughput plant monitoring in a vertical hydroponic farm and, as an example application, demonstrate non-destructive plant mass estimation. Plant monitoring with high temporal and spatial resolution is important to both farmers and researchers to detect anomalies and develop predictive models for plant growth. The availability of high-quality, off-the-shelf structure-from-motion (SfM) and photogrammetry packages has enabled a vibrant community of roboticists to apply computer vision for non-destructive plant monitoring. While existing approaches tend to focus on either high-throughput (e.g. satellite, unmanned aerial vehicle (UAV), vehicle-mounted, conveyor-belt imagery) or high-accuracy/robustness to occlusions (e.g. turn-table scanner or robot arm), we propose a middle-ground that achieves high accuracy with a medium-throughput, highly automated robot. Our design pairs the workspace scalability of a cable-driven parallel robot (CDPR) with the dexterity of a 4 degree-of-freedom (DoF) robot arm to autonomously image many plants from a variety of viewpoints. We describe our robot design and demonstrate it experimentally by collecting daily photographs of 54 plants from 64 viewpoints each. We show that our approach can produce scientifically useful measurements, operate fully autonomously after initial calibration, and produce better reconstructions and plant property estimates than those of over-canopy methods (e.g. UAV). As example applications, we show that our system can successfully estimate plant mass with a Mean Absolute Error (MAE) of 0.586g and, when used to perform hypothesis testing on the relationship between mass and age, produces p-values comparable to ground-truth data (p=0.0020 and p=0.0016, respectively).
Human body orientation estimation (HBOE) is widely applied into various applications, including robotics, surveillance, pedestrian analysis and autonomous driving. Although many approaches have been addressing the HBOE problem from specific under-controlled scenes to challenging in-the-wild environments, they assume human instances are already detected and take a well cropped sub-image as the input. This setting is less efficient and prone to errors in real application, such as crowds of people. In the paper, we propose a single-stage end-to-end trainable framework for tackling the HBOE problem with multi-persons. By integrating the prediction of bounding boxes and direction angles in one embedding, our method can jointly estimate the location and orientation of all bodies in one image directly. Our key idea is to integrate the HBOE task into the multi-scale anchor channel predictions of persons for concurrently benefiting from engaged intermediate features. Therefore, our approach can naturally adapt to difficult instances involving low resolution and occlusion as in object detection. We validated the efficiency and effectiveness of our method in the recently presented benchmark MEBOW with extensive experiments. Besides, we completed ambiguous instances ignored by the MEBOW dataset, and provided corresponding weak body-orientation labels to keep the integrity and consistency of it for supporting studies toward multi-persons. Our work is available at \url{//github.com/hnuzhy/JointBDOE}.
The miniaturization of inertial measurement units (IMUs) facilitates their widespread use in a growing number of application domains. Orientation estimation is a prerequisite for most further data processing steps in inertial motion tracking, such as position/velocity estimation, joint angle estimation, and 3D visualization. Errors in the estimated orientations severely affect all further processing steps. Recent systematic comparisons of existing algorithms show that out-of-the-box accuracy is often low and that application-specific tuning is required to obtain high accuracy. In the present work, we propose and extensively evaluate a quaternion-based orientation estimation algorithm that is based on a novel approach of filtering the acceleration measurements in an almost-inertial frame and that includes extensions for gyroscope bias estimation and magnetic disturbance rejection, as well as a variant for offline data processing. In contrast to all existing work, we perform an extensive evaluation, using a large collection of publicly available datasets and eight literature methods for comparison. The proposed method consistently outperforms all literature methods and achieves an average RMSE of 2.9{\deg}, while the errors obtained with literature methods range from 5.3{\deg} to 16.7{\deg}. Since the evaluation was performed with one single fixed parametrization across a very diverse dataset collection, we conclude that the proposed method provides unprecedented out-of-the-box performance for a broad range of motions, sensor hardware, and environmental conditions. This gain in orientation estimation accuracy is expected to advance the field of IMU-based motion analysis and provide performance benefits in numerous applications. The provided open-source implementation makes it easy to employ the proposed method.
Since batch algorithms suffer from lack of proficiency in confronting model mismatches and disturbances, this contribution proposes an adaptive scheme based on continuous Lyapunov function for online robot dynamic identification. This paper suggests stable updating rules to drive neural networks inspiring from model reference adaptive paradigm. Network structure consists of three parallel self-driving neural networks which aim to estimate robot dynamic terms individually. Lyapunov candidate is selected to construct energy surface for a convex optimization framework. Learning rules are driven directly from Lyapunov functions to make the derivative negative. Finally, experimental results on 3-DOF Phantom Omni Haptic device demonstrate efficiency of the proposed method.
Continuum robots have the potential to enable new applications in medicine, inspection, and countless other areas due to their unique shape, compliance, and size. Excellent progess has been made in the mechanical design and dynamic modelling of continuum robots, to the point that there are some canonical designs, although new concepts continue to be explored. In this paper, we turn to the problem of state estimation for continuum robots that can been modelled with the common Cosserat rod model. Sensing for continuum robots might comprise external camera observations, embedded tracking coils or strain gauges. We repurpose a Gaussian process (GP) regression approach to state estimation, initially developed for continuous-time trajectory estimation in $SE(3)$. In our case, the continuous variable is not time but arclength and we show how to estimate the continuous shape (and strain) of the robot (along with associated uncertainties) given discrete, noisy measurements of both pose and strain along the length. We demonstrate our approach quantitatively through simulations as well as through experiments. Our evaluations show that accurate and continuous estimates of a continuum robot's shape can be achieved, resulting in average end-effector errors between the estimated and ground truth shape as low as 3.5mm and 0.016$^\circ$ in simulation or 3.3mm and 0.035$^\circ$ for unloaded configurations and 6.2mm and 0.041$^\circ$ for loaded ones during experiments, when using discrete pose measurements.
Estimating human pose is an important yet challenging task in multimedia applications. Existing pose estimation libraries target reproducing standard pose estimation algorithms. When it comes to customising these algorithms for real-world applications, none of the existing libraries can offer both the flexibility of developing custom pose estimation algorithms and the high-performance of executing these algorithms on commodity devices. In this paper, we introduce Hyperpose, a novel flexible and high-performance pose estimation library. Hyperpose provides expressive Python APIs that enable developers to easily customise pose estimation algorithms for their applications. It further provides a model inference engine highly optimised for real-time pose estimation. This engine can dynamically dispatch carefully designed pose estimation tasks to CPUs and GPUs, thus automatically achieving high utilisation of hardware resources irrespective of deployment environments. Extensive evaluation results show that Hyperpose can achieve up to 3.1x~7.3x higher pose estimation throughput compared to state-of-the-art pose estimation libraries without compromising estimation accuracy. By 2021, Hyperpose has received over 1000 stars on GitHub and attracted users from both industry and academy.
Liquid water, besides being fundamental for life on Earth, has long fascinated scientists due to several anomalies. Different hypotheses have been put forward to explain these peculiarities. The most accredited one foresees the presence in the supercooled region of two phases at different densities: the low-density liquid phase and the high-density liquid phase. In our previous work [Faccio et al., J. Mol. Liq. 355 (2022): 118922], we showed that it is possible to identify these two forms in water networks through a computational approach based on molecular dynamics simulation and on the calculation of the total communicability of the associated graph, in which the nodes correspond to water molecules and the edges represent the connections (interactions) between molecules. In this paper, we present a more in-depth investigation of the application of graph-theory based approaches to the analysis of the structure of water networks. In particular, we investigate different connectivity and centrality measures and we report on the use of a variety of global metrics aimed at giving a topological and geometrical characterization of liquid water.
Deep robot vision models are widely used for recognizing objects from camera images, but shows poor performance when detecting objects at untrained positions. Although such problem can be alleviated by training with large datasets, the dataset collection cost cannot be ignored. Existing visual attention models tackled the problem by employing a data efficient structure which learns to extract task relevant image areas. However, since the models cannot modify attention targets after training, it is difficult to apply to dynamically changing tasks. This paper proposed a novel Key-Query-Value formulated visual attention model. This model is capable of switching attention targets by externally modifying the Query representations, namely top-down attention. The proposed model is experimented on a simulator and a real-world environment. The model was compared to existing end-to-end robot vision models in the simulator experiments, showing higher performance and data efficiency. In the real-world robot experiments, the model showed high precision along with its scalability and extendibility.
We propose coordinating guiding vector fields to achieve two tasks simultaneously with a team of robots: first, the guidance and navigation of multiple robots to possibly different paths or surfaces typically embedded in 2D or 3D; second, their motion coordination while tracking their prescribed paths or surfaces. The motion coordination is defined by desired parametric displacements between robots on the path or surface. Such a desired displacement is achieved by controlling the virtual coordinates, which correspond to the path or surface's parameters, between guiding vector fields. Rigorous mathematical guarantees underpinned by dynamical systems theory and Lyapunov theory are provided for the effective distributed motion coordination and navigation of robots on paths or surfaces from all initial positions. As an example for practical robotic applications, we derive a control algorithm from the proposed coordinating guiding vector fields for a Dubins-car-like model with actuation saturation. Our proposed algorithm is distributed and scalable to an arbitrary number of robots. Furthermore, extensive illustrative simulations and fixed-wing aircraft outdoor experiments validate the effectiveness and robustness of our algorithm.
This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods.
Automatic image captioning has recently approached human-level performance due to the latest advances in computer vision and natural language understanding. However, most of the current models can only generate plain factual descriptions about the content of a given image. However, for human beings, image caption writing is quite flexible and diverse, where additional language dimensions, such as emotion, humor and language styles, are often incorporated to produce diverse, emotional, or appealing captions. In particular, we are interested in generating sentiment-conveying image descriptions, which has received little attention. The main challenge is how to effectively inject sentiments into the generated captions without altering the semantic matching between the visual content and the generated descriptions. In this work, we propose two different models, which employ different schemes for injecting sentiments into image captions. Compared with the few existing approaches, the proposed models are much simpler and yet more effective. The experimental results show that our model outperform the state-of-the-art models in generating sentimental (i.e., sentiment-bearing) image captions. In addition, we can also easily manipulate the model by assigning different sentiments to the testing image to generate captions with the corresponding sentiments.