Simulation is an important step in robotics for creating control policies and testing various physical parameters. Soft robotics is a field that presents unique physical challenges for simulating its subjects due to the nonlinearity of deformable material components along with other innovative, and often complex, physical properties. Because of the computational cost of simulating soft and heterogeneous objects with traditional techniques, rigid robotics simulators are not well suited to simulating soft robots. Thus, many engineers must build their own one-off simulators tailored to their system, or use existing simulators with reduced performance. In order to facilitate the development of this exciting technology, this work presents an interactive-speed, accurate, and versatile simulator for a variety of types of soft robots. Cronos, our open-source 3D simulation engine, parallelizes a mass-spring model for ultra-fast performance on both deformable and rigid objects. Our approach is applicable to a wide array of nonlinear material configurations, including high deformability, volumetric actuation, or heterogenous stiffness. This versatility provides the ability to mix materials and geometric components freely within a single robot simulation. By exploiting the flexibility and scalability of nonlinear Hookean mass-spring systems, this framework simulates soft and rigid objects via a highly parallel model for near real-time speed. We describe an efficient GPU CUDA implementation, which we demonstrate to achieve computation of over 1 billion elements per second on consumer-grade GPU cards. Dynamic physical accuracy of the system is validated by comparing results to Euler-Bernoulli beam theory, natural frequency predictions, and empirical data of a soft structure under large deformation.
Development of controllers, novel robot kinematics, and learning-based applications of robotics today happens almost exclusively in simulation first before being implemented in the real world. In particular, Modular Reconfigurable Robots (MRRs) are an exciting innovation in industrial robotics, promising greater flexibility, improved maintainability, and cost-efficiency compared to traditional manipulators. However, there is no tool or standardized way to simulate and model assemblies of modules in the same way it has been done for robotic manipulators for decades. We introduce the Toolbox for Industrial Modular Robotics (Timor), a python toolbox to bridge this gap and integrate modular robotics in existing simulation and optimization pipelines. Our open-source library comes with various examples as well as tutorials and can easily be integrated with existing simulation tools - not least by offering URDF export of arbitrary modular robot assemblies, enabling rapid model generation.
Given the versatility of generative adversarial networks (GANs), we seek to understand the benefits gained from using an existing GAN to enhance simulated images and reduce the sim-to-real gap. We conduct an analysis in the context of simulating robot performance and image-based perception. Specifically, we quantify the GAN's ability to reduce the sim-to-real difference in image perception in robotics. Using semantic segmentation, we analyze the sim-to-real difference in training and testing, using nominal and enhanced simulation of a city environment. As a secondary application, we consider use of the GAN in enhancing an indoor environment. For this application, object detection is used to analyze the enhancement in training and testing. The results presented quantify the reduction in the sim-to-real gap when using the GAN, and illustrate the benefits of its use.
Evolution is the theory that plants and animals today have come from kinds that have existed in the past. Scientists such as Charles Darwin and Alfred Wallace dedicate their life to observe how species interact with their environment, grow, and change. We are able to predict future changes as well as simulate the process using genetic algorithms. Genetic Algorithms give us the opportunity to present multiple variables and parameters to an environment and change values to simulate different situations. By optimizing genetic algorithms to hold entities in an environment, we are able to assign varying characteristics such as speed, size, and cloning probability, to the entities to simulate real natural selection and evolution in a shorter period of time. Learning about how species grow and evolve allows us to find ways to improve technology, help animals going extinct to survive, and figure* out how diseases spread and possible ways of making an environment uninhabitable for them. Using data from an environment including genetic algorithms and parameters of speed, size, and cloning percentage, the ability to test several changes in the environment and observe how the species interacts within it appears. After testing different environments with a varied amount of food while keeping the number of starting population at 10 entities, it was found that an environment with a scarce amount of food was not sustainable for small and slow entities. All environments displayed an increase in speed, but the environments that were richer in food allowed for the entities to live for the entire duration of 50 generations, as well as allowed the population to grow significantly.
Recently, there has been growing attention on an end-to-end deep learning-based stitching model. However, the most challenging point in deep learning-based stitching is to obtain pairs of input images with a narrow field of view and ground truth images with a wide field of view captured from real-world scenes. To overcome this difficulty, we develop a weakly-supervised learning mechanism to train the stitching model without requiring genuine ground truth images. In addition, we propose a stitching model that takes multiple real-world fisheye images as inputs and creates a 360 output image in an equirectangular projection format. In particular, our model consists of color consistency corrections, warping, and blending, and is trained by perceptual and SSIM losses. The effectiveness of the proposed algorithm is verified on two real-world stitching datasets.
Finite mixture modelling is a popular method in the field of clustering and is beneficial largely due to its soft cluster membership probabilities. However, the most common algorithm for fitting finite mixture models, the EM algorithm, falls victim to a number of issues. We address these issues that plague clustering using finite mixture models, including convergence to solutions corresponding to local maxima and algorithm speed concerns in high dimensional cases. This is done by developing two novel algorithms that incorporate a spectral decomposition of the data matrix and a non-parametric bootstrap sampling scheme. Simulations show the validity of our algorithms and demonstrate not only their flexibility but also their ability to avoid solutions corresponding to local-maxima, when compared to other (bootstrapped) clustering algorithms for estimating finite mixture models. Our novel algorithms have a typically more consistent convergence criteria as well as a significant increase in speed over other bootstrapped algorithms that fit finite mixture models.
Animating an avatar that reflects a user's action in the VR world enables natural interactions with the virtual environment. It has the potential to allow remote users to communicate and collaborate in a way as if they met in person. However, a typical VR system provides only a very sparse set of up to three positional sensors, including a head-mounted display (HMD) and optionally two hand-held controllers, making the estimation of the user's full-body movement a difficult problem. In this work, we present a data-driven physics-based method for predicting the realistic full-body movement of the user according to the transformations of these VR trackers and simulating an avatar character to mimic such user actions in the virtual world in real-time. We train our system using reinforcement learning with carefully designed pretraining processes to ensure the success of the training and the quality of the simulation. We demonstrate the effectiveness of the method with an extensive set of examples.
While motion compensation greatly improves video deblurring quality, separately performing motion compensation and video deblurring demands huge computational overhead. This paper proposes a real-time video deblurring framework consisting of a lightweight multi-task unit that supports both video deblurring and motion compensation in an efficient way. The multi-task unit is specifically designed to handle large portions of the two tasks using a single shared network, and consists of a multi-task detail network and simple networks for deblurring and motion compensation. The multi-task unit minimizes the cost of incorporating motion compensation into video deblurring and enables real-time deblurring. Moreover, by stacking multiple multi-task units, our framework provides flexible control between the cost and deblurring quality. We experimentally validate the state-of-the-art deblurring quality of our approach, which runs at a much faster speed compared to previous methods, and show practical real-time performance (30.99dB@30fps measured in the DVD dataset).
Evacuation planning is a crucial part of disaster management where the goal is to relocate people to safety and minimize casualties. Every evacuation plan has two essential components: routing and scheduling. However, joint optimization of these two components with objectives such as minimizing average evacuation time or evacuation completion time, is a computationally hard problem. To approach it, we present MIP-LNS, a scalable optimization method that combines heuristic search with mathematical optimization and can optimize a variety of objective functions. We use real-world road network and population data from Harris County in Houston, Texas, and apply MIP-LNS to find evacuation routes and schedule for the area. We show that, within a given time limit, our proposed method finds better solutions than existing methods in terms of average evacuation time, evacuation completion time and optimality guarantee of the solutions. We perform agent-based simulations of evacuation in our study area to demonstrate the efficacy and robustness of our solution. We show that our prescribed evacuation plan remains effective even if the evacuees deviate from the suggested schedule upto a certain extent. We also examine how evacuation plans are affected by road failures. Our results show that MIP-LNS can use information regarding estimated deadline of roads to come up with better evacuation plans in terms evacuating more people successfully and conveniently.
Image registration is a critical component in the applications of various medical image analyses. In recent years, there has been a tremendous surge in the development of deep learning (DL)-based medical image registration models. This paper provides a comprehensive review of medical image registration. Firstly, a discussion is provided for supervised registration categories, for example, fully supervised, dual supervised, and weakly supervised registration. Next, similarity-based as well as generative adversarial network (GAN)-based registration are presented as part of unsupervised registration. Deep iterative registration is then described with emphasis on deep similarity-based and reinforcement learning-based registration. Moreover, the application areas of medical image registration are reviewed. This review focuses on monomodal and multimodal registration and associated imaging, for instance, X-ray, CT scan, ultrasound, and MRI. The existing challenges are highlighted in this review, where it is shown that a major challenge is the absence of a training dataset with known transformations. Finally, a discussion is provided on the promising future research areas in the field of DL-based medical image registration.
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core designs. First, we develop an asymmetric encoder-decoder architecture, with an encoder that operates only on the visible subset of patches (without mask tokens), along with a lightweight decoder that reconstructs the original image from the latent representation and mask tokens. Second, we find that masking a high proportion of the input image, e.g., 75%, yields a nontrivial and meaningful self-supervisory task. Coupling these two designs enables us to train large models efficiently and effectively: we accelerate training (by 3x or more) and improve accuracy. Our scalable approach allows for learning high-capacity models that generalize well: e.g., a vanilla ViT-Huge model achieves the best accuracy (87.8%) among methods that use only ImageNet-1K data. Transfer performance in downstream tasks outperforms supervised pre-training and shows promising scaling behavior.