In this paper, we propose a modular navigation system that can be mounted on a regular powered wheelchair to assist disabled children and the elderly with autonomous mobility and shared-control features. The lack of independent mobility drastically affects an individual's mental and physical health making them feel less self-reliant, especially children with Cerebral Palsy and limited cognitive skills. To address this problem, we propose a comparatively inexpensive and modular system that uses a stereo camera to perform tasks such as path planning, obstacle avoidance, and collision detection in environments with narrow corridors. We avoid any major changes to the hardware of the wheelchair for an easy installation by replacing wheel encoders with a stereo camera for visual odometry. An open source software package, the Real-Time Appearance Based Mapping package, running on top of the Robot Operating System (ROS) allows us to perform visual SLAM that allows mapping and localizing itself in the environment. The path planning is performed by the move base package provided by ROS, which quickly and efficiently computes the path trajectory for the wheelchair. In this work, we present the design and development of the system along with its significant functionalities. Further, we report experimental results from a Gazebo simulation and real-world scenarios to prove the effectiveness of our proposed system with a compact form factor and a single stereo camera.
The loss of an upper limb can have a substantial impact on a person's quality of life since it limits a person's ability to work, interact, and perform daily duties independently. Artificial limbs are used in prosthetics to help people who have lost limbs enhance their function and quality of life. Despite significant breakthroughs in prosthetic technology, rejection rates for complex prosthetic devices remain high[1]-[5]. A quarter to a third of upper-limb amputees abandon their prosthetics due to a lack of comprehension of the technology. The most extensively used method for monitoring muscle activity and regulating the prosthetic arm, surface electromyography (sEMG), has significant drawbacks, including a low signal-to-noise ratio and poor amplitude resolution[6]-[8].Unlike myoelectric control systems, which use electrical muscle activation to calculate end-effector velocity, our strategy employs ultrasound to directly monitor mechanical muscle deformation and then uses the extracted signals to proportionally control end-effector location. This investigation made use of four separate hand motions performed by three physically healthy volunteers. A virtual robotic hand simulation was created using ROS. After witnessing performance comparable to that of a hand with very less training, we concluded that our control method is reliable and natural.
The proposed method in this paper is designed to address the problem of time series forecasting. Although some exquisitely designed models achieve excellent prediction performances, how to extract more useful information and make accurate predictions is still an open issue. Most of modern models only focus on a short range of information, which are fatal for problems such as time series forecasting which needs to capture long-term information characteristics. As a result, the main concern of this work is to further mine relationship between local and global information contained in time series to produce more precise predictions. In this paper, to satisfactorily realize the purpose, we make three main contributions that are experimentally verified to have performance advantages. Firstly, original time series is transformed into difference sequence which serves as input to the proposed model. And secondly, we introduce the global atrous sliding window into the forecasting model which references the concept of fuzzy time series to associate relevant global information with temporal data within a time period and utilizes central-bidirectional atrous algorithm to capture underlying-related features to ensure validity and consistency of captured data. Thirdly, a variation of widely-used asymmetric convolution which is called semi-asymmetric convolution is devised to more flexibly extract relationships in adjacent elements and corresponding associated global features with adjustable ranges of convolution on vertical and horizontal directions. The proposed model in this paper achieves state-of-the-art on most of time series datasets provided compared with competitive modern models.
Ports are striving for innovative technological solutions to cope with the ever-increasing growth of transport, while at the same time improving their environmental footprint. An emerging technology that has the potential to substantially increase the efficiency of the multifaceted and interconnected port processes is the digital twin. Although digital twins have been successfully integrated in many industries, there is still a lack of cross-domain understanding of what constitutes a digital twin. Furthermore, the implementation of the digital twin in complex systems such as the port is still in its infancy. This paper attempts to fill this research gap by conducting an extensive cross-domain literature review of what constitutes a digital twin, keeping in mind the extent to which the respective findings can be applied to the port. It turns out that the digital twin of the port is most comparable to complex systems such as smart cities and supply chains, both in terms of its functional relevance as well as in terms of its requirements and characteristics. The conducted literature review, considering the different port processes and port characteristics, results in the identification of three core requirements of a digital port twin, which are described in detail. These include situational awareness, comprehensive data analytics capabilities for intelligent decision making, and the provision of an interface to promote multi-stakeholder governance and collaboration. Finally, specific operational scenarios are proposed on how the port's digital twin can contribute to energy savings by improving the use of port resources, facilities and operations.
The StyleGAN family succeed in high-fidelity image generation and allow for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space.However, projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. Existing encoder-based or optimization-based StyleGAN inversion methods attempt to mitigate the trade-off but see limited performance. To fundamentally resolve this problem, we propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively, instead of balancing the two. Specifically, in Phase I, a W-space-oriented StyleGAN inversion network is trained and used to perform image inversion and editing, which assures the editability but sacrifices reconstruction quality. In Phase II, a carefully designed rectifying network is utilized to rectify the inversion errors and perform ideal reconstruction. Experimental results show that our approach yields near-perfect reconstructions without sacrificing the editability, thus allowing accurate manipulation of real images. Further, we evaluate the performance of our rectifying network, and see great generalizability towards unseen manipulation types and out-of-domain images.
Smart contracts offer a way to credibly commit to a mechanism, as long as it can be expressed as an easily computable mapping from inputs, in the form of transactions on-chain, to outputs: allocations and payments. But proposers decide which transactions to include, allowing them to manipulate these mechanisms and extract temporary monopoly rents known as MEV. Motivated by both general interest in running auctions on-chain, and current proposals to conduct MEV auctions on-chain, we study how these manipulations effect the equilibria of auctions. Formally, we consider an independent private value auction where bidders simultaneously submit private bids, and public tips, that are paid to the proposer upon inclusion. A single additional bidder may bribe the proposer to omit competing bids. We show that even if bids are completely sealed, tips reveal bids in equilibrium, which suggests that encrypting bids may not prevent manipulation. Further, we show that collusion at the transaction inclusion step is extremely profitable for the colluding bidder: as the number of bidders increases, the probability that the winner is not colluding and the economic efficiency of the auction both decrease faster than $1/n$. Running the auction over multiple blocks, each with a different proposer, alleviates the problem only if the number of blocks is larger than the number of bidders. We argue that blockchains with more than one concurrent proposer can credibly execute auctions on chain, as long as tips can be conditioned on the number of proposers that include the transaction.
Spatial audio, which focuses on immersive 3D sound rendering, is widely applied in the acoustic industry. One of the key problems of current spatial audio rendering methods is the lack of personalization based on different anatomies of individuals, which is essential to produce accurate sound source positions. In this work, we address this problem from an interdisciplinary perspective. The rendering of spatial audio is strongly correlated with the 3D shape of human bodies, particularly ears. To this end, we propose to achieve personalized spatial audio by reconstructing 3D human ears with single-view images. First, to benchmark the ear reconstruction task, we introduce AudioEar3D, a high-quality 3D ear dataset consisting of 112 point cloud ear scans with RGB images. To self-supervisedly train a reconstruction model, we further collect a 2D ear dataset composed of 2,000 images, each one with manual annotation of occlusion and 55 landmarks, named AudioEar2D. To our knowledge, both datasets have the largest scale and best quality of their kinds for public use. Further, we propose AudioEarM, a reconstruction method guided by a depth estimation network that is trained on synthetic data, with two loss functions tailored for ear data. Lastly, to fill the gap between the vision and acoustics community, we develop a pipeline to integrate the reconstructed ear mesh with an off-the-shelf 3D human body and simulate a personalized Head-Related Transfer Function (HRTF), which is the core of spatial audio rendering. Code and data are publicly available at //github.com/seanywang0408/AudioEar.
In autonomous robot exploration tasks, a mobile robot needs to actively explore and map an unknown environment as fast as possible. Since the environment is being revealed during exploration, the robot needs to frequently re-plan its path online, as new information is acquired by onboard sensors and used to update its partial map. While state-of-the-art exploration planners are frontier- and sampling-based, encouraged by the recent development in deep reinforcement learning (DRL), we propose ARiADNE, an attention-based neural approach to obtain real-time, non-myopic path planning for autonomous exploration. ARiADNE is able to learn dependencies at multiple spatial scales between areas of the agent's partial map, and implicitly predict potential gains associated with exploring those areas. This allows the agent to sequence movement actions that balance the natural trade-off between exploitation/refinement of the map in known areas and exploration of new areas. We experimentally demonstrate that our method outperforms both learning and non-learning state-of-the-art baselines in terms of average trajectory length to complete exploration in hundreds of simplified 2D indoor scenarios. We further validate our approach in high-fidelity Robot Operating System (ROS) simulations, where we consider a real sensor model and a realistic low-level motion controller, toward deployment on real robots.
The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on the various images (e.g., radiological, pathological, and camera images) and non-image data (e.g., clinical data and genomic data). However, such decision-making procedures can be subjective, qualitative, and have large inter-subject variabilities. With the recent advances in multi-modal deep learning technologies, an increasingly large number of efforts have been devoted to a key question: how do we extract and aggregate multi-modal information to ultimately provide more objective, quantitative computer-aided clinical decision making? This paper reviews the recent studies on dealing with such a question. Briefly, this review will include the (1) overview of current multi-modal learning workflows, (2) summarization of multi-modal fusion methods, (3) discussion of the performance, (4) applications in disease diagnosis and prognosis, and (5) challenges and future directions.
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.