In this paper, we propose a vision-based solution for indoor Micro Air Vehicle (MAV) navigation, with a primary focus on its application within autonomous warehouses. Our work centers on the utilization of a single camera as the primary sensor for tasks such as detection, localization, and path planning. To achieve these objectives, we implement the HSV color detection and the Hough Line Transform for effective line detection within warehouse environments. The integration of a Kalman filter into our system enables the camera to track yellow lines reliably. We evaluated the performance of our vision-based line following algorithm through various MAV flight tests conducted in the Gazebo 11 platform, utilizing ROS Noetic. The results of these simulations demonstrate the system capability to successfully navigate narrow indoor spaces. Our proposed system has the potential to significantly reduce labor costs and enhance overall productivity in warehouse operations. This work contributes to the growing field of MAV applications in autonomous warehouses, addressing the need for efficient logistics and supply chain solutions.
In this paper, we present SonoSAMTrack - that combines a promptable foundational model for segmenting objects of interest on ultrasound images called SonoSAM, with a state-of-the art contour tracking model to propagate segmentations on 2D+t and 3D ultrasound datasets. Fine-tuned and tested exclusively on a rich, diverse set of objects from $\approx200$k ultrasound image-mask pairs, SonoSAM demonstrates state-of-the-art performance on 7 unseen ultrasound data-sets, outperforming competing methods by a significant margin. We also extend SonoSAM to 2-D +t applications and demonstrate superior performance making it a valuable tool for generating dense annotations and segmentation of anatomical structures in clinical workflows. Further, to increase practical utility of the work, we propose a two-step process of fine-tuning followed by knowledge distillation to a smaller footprint model without comprising the performance. We present detailed qualitative and quantitative comparisons of SonoSAM with state-of-the-art methods showcasing efficacy of the method. This is followed by demonstrating the reduction in number of clicks in a dense video annotation problem of adult cardiac ultrasound chamber segmentation using SonoSAMTrack.
In this paper, we design an efficient, multi-stage image segmentation framework that incorporates a weighted difference of anisotropic and isotropic total variation (AITV). The segmentation framework generally consists of two stages: smoothing and thresholding, thus referred to as SaT. In the first stage, a smoothed image is obtained by an AITV-regularized Mumford-Shah (MS) model, which can be solved efficiently by the alternating direction method of multipliers (ADMM) with a closed-form solution of a proximal operator of the $\ell_1 -\alpha \ell_2$ regularizer. Convergence of the ADMM algorithm is analyzed. In the second stage, we threshold the smoothed image by $K$-means clustering to obtain the final segmentation result. Numerical experiments demonstrate that the proposed segmentation framework is versatile for both grayscale and color images, efficient in producing high-quality segmentation results within a few seconds, and robust to input images that are corrupted with noise, blur, or both. We compare the AITV method with its original convex TV and nonconvex TV$^p (0<p<1)$ counterparts, showcasing the qualitative and quantitative advantages of our proposed method.
Our objective in this paper is to estimate spine curvature in DXA scans. To this end we first train a neural network to predict the middle spine curve in the scan, and then use an integral-based method to determine the curvature along the spine curve. We use the curvature to compare to the standard angle scoliosis measure obtained using the DXA Scoliosis Method (DSM). The performance improves over the prior work of Jamaludin et al. 2018. We show that the maximum curvature can be used as a scoring function for ordering the severity of spinal deformation.
In this paper, we investigate a dynamic packet scheduling algorithm designed to enhance the eXtended Reality (XR) capacity of fifth-generation (5G)-Advanced networks with multiple cells, multiple users, and multiple services. The scheduler exploits the newly defined protocol data unit (PDU)-set information for XR traffic flows to enhance its quality-of-service awareness. To evaluate the performance of the proposed solution, advanced dynamic system-level simulations are conducted. The findings reveal that the proposed scheduler offers a notable improvement in increasing XR capacity up to 45%, while keeping the same enhanced mobile broadband (eMBB) cell throughput as compared to the well-known baseline schedulers.
In this paper, we investigate the issue of satellite-terrestrial computing in the sixth generation (6G) wireless networks, where multiple terrestrial base stations (BSs) and low earth orbit (LEO) satellites collaboratively provide edge computing services to ground user equipments (GUEs) and space user equipments (SUEs) over the world. In particular, we design a complete process of satellite-terrestrial computing in terms of communication and computing according to the characteristics of 6G wireless networks. In order to minimize the weighted total energy consumption while ensuring delay requirements of computing tasks, an energy-efficient satellite-terrestrial computing algorithm is put forward by jointly optimizing offloading selection, beamforming design and resource allocation. Finally, both theoretical analysis and simulation results confirm fast convergence and superior performance of the proposed algorithm for satellite-terrestrial computing in 6G wireless networks.
In power systems, the incorporation of capacitors offers a wide range of established advantages. These benefits encompass the enhancement of the systems power factor, optimization of voltage profiles, increased capacity for current flow through cables and transformers, and the mitigation of losses attributed to the compensation of reactive power components. Different techniques have been applied to enhance the performance of the distribution system by reducing line losses. This paper focuses on reducing line losses through the optimal placement and sizing of capacitors. Optimal capacitor placement is analysed using load flow analysis with the Newton Raphson method. The placement of capacitor optimization is related to the sensitivity of the buses, which depends on the loss sensitivity factor. The optimal capacitor size is determined using Particle Swarm Optimization (PSO). The analysis is conducted using the IEEE 14 bus system in MATLAB. The results reveal that placing capacitors at the most sensitive bus locations leads to a significant reduction in line losses. Additionally, the optimal capacitor size has a substantial impact on improving the voltage profile and the power loss is reduced by 21.02 percent through the proposed method.
In this paper, we address the problem of dynamic network embedding, that is, representing the nodes of a dynamic network as evolving vectors within a low-dimensional space. While the field of static network embedding is wide and established, the field of dynamic network embedding is comparatively in its infancy. We propose that a wide class of established static network embedding methods can be used to produce interpretable and powerful dynamic network embeddings when they are applied to the dilated unfolded adjacency matrix. We provide a theoretical guarantee that, regardless of embedding dimension, these unfolded methods will produce stable embeddings, meaning that nodes with identical latent behaviour will be exchangeable, regardless of their position in time or space. We additionally define a hypothesis testing framework which can be used to evaluate the quality of a dynamic network embedding by testing for planted structure in simulated networks. Using this, we demonstrate that, even in trivial cases, unstable methods are often either conservative or encode incorrect structure. In contrast, we demonstrate that our suite of stable unfolded methods are not only more interpretable but also more powerful in comparison to their unstable counterparts.
Segment Anything Model (SAM), a vision foundation model trained on large-scale annotations, has recently continued raising awareness within medical image segmentation. Despite the impressive capabilities of SAM on natural scenes, it struggles with performance decline when confronted with medical images, especially those involving blurry boundaries and highly irregular regions of low contrast. In this paper, a SAM-based parameter-efficient fine-tuning method, called SAMIHS, is proposed for intracranial hemorrhage segmentation, which is a crucial and challenging step in stroke diagnosis and surgical planning. Distinguished from previous SAM and SAM-based methods, SAMIHS incorporates parameter-refactoring adapters into SAM's image encoder and considers the efficient and flexible utilization of adapters' parameters. Additionally, we employ a combo loss that combines binary cross-entropy loss and boundary-sensitive loss to enhance SAMIHS's ability to recognize the boundary regions. Our experimental results on two public datasets demonstrate the effectiveness of our proposed method. Code is available at //github.com/mileswyn/SAMIHS .
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.
BERT, a pre-trained Transformer model, has achieved ground-breaking performance on multiple NLP tasks. In this paper, we describe BERTSUM, a simple variant of BERT, for extractive summarization. Our system is the state of the art on the CNN/Dailymail dataset, outperforming the previous best-performed system by 1.65 on ROUGE-L. The codes to reproduce our results are available at //github.com/nlpyang/BertSum