The global burden of acute and chronic wounds presents a compelling case for enhancing wound classification methods, a vital step in diagnosing and determining optimal treatments. Recognizing this need, we introduce an innovative multi-modal network based on a deep convolutional neural network for categorizing wounds into four categories: diabetic, pressure, surgical, and venous ulcers. Our multi-modal network uses wound images and their corresponding body locations for more precise classification. A unique aspect of our methodology is incorporating a body map system that facilitates accurate wound location tagging, improving upon traditional wound image classification techniques. A distinctive feature of our approach is the integration of models such as VGG16, ResNet152, and EfficientNet within a novel architecture. This architecture includes elements like spatial and channel-wise Squeeze-and-Excitation modules, Axial Attention, and an Adaptive Gated Multi-Layer Perceptron, providing a robust foundation for classification. Our multi-modal network was trained and evaluated on two distinct datasets comprising relevant images and corresponding location information. Notably, our proposed network outperformed traditional methods, reaching an accuracy range of 74.79% to 100% for Region of Interest (ROI) without location classifications, 73.98% to 100% for ROI with location classifications, and 78.10% to 100% for whole image classifications. This marks a significant enhancement over previously reported performance metrics in the literature. Our results indicate the potential of our multi-modal network as an effective decision-support tool for wound image classification, paving the way for its application in various clinical contexts.
We present a novel approach to cooperative aerial transportation through a team of drones, using optimal control theory and a hierarchical control strategy. We assume the drones are connected to the payload through rigid attachments, essentially transforming the whole system into a larger flying object with "thrust modules" at the attachment locations of the drones. We investigate the optimal arrangement of the thrust modules around the payload, so that the resulting system is robust to disturbances. We choose the $\mathcal{H}_2$ norm as a measure of robustness, and propose an iterative optimization routine to compute the optimal layout of the vehicles around the object. We experimentally validate our approach using four drones and comparing the disturbance rejection performances achieved by two different layouts (the optimal one and a sub-optimal one), and observe that the results match our predictions.
Product attribute value extraction plays an important role for many real-world applications in e-Commerce such as product search and recommendation. Previous methods treat it as a sequence labeling task that needs more annotation for position of values in the product text. This limits their application to real-world scenario in which only attribute values are weakly-annotated for each product without their position. Moreover, these methods only use product text (i.e., product title and description) and do not consider the semantic connection between the multiple attribute values of a given product and its text, which can help attribute value extraction. In this paper, we reformulate this task as a multi-label classification task that can be applied for real-world scenario in which only annotation of attribute values is available to train models (i.e., annotation of positional information of attribute values is not available). We propose a classification model with semantic matching and negative label sampling for attribute value extraction. Semantic matching aims to capture semantic interactions between attribute values of a given product and its text. Negative label sampling aims to enhance the model's ability of distinguishing similar values belonging to the same attribute. Experimental results on three subsets of a large real-world e-Commerce dataset demonstrate the effectiveness and superiority of our proposed model.
Deep Neural Networks and Reinforcement Learning methods have empirically shown great promise in tackling challenging combinatorial problems. In those methods a deep neural network is used as a solution generator which is then trained by gradient-based methods (e.g., policy gradient) to successively obtain better solution distributions. In this work we introduce a novel theoretical framework for analyzing the effectiveness of such methods. We ask whether there exist generative models that (i) are expressive enough to generate approximately optimal solutions; (ii) have a tractable, i.e, polynomial in the size of the input, number of parameters; (iii) their optimization landscape is benign in the sense that it does not contain sub-optimal stationary points. Our main contribution is a positive answer to this question. Our result holds for a broad class of combinatorial problems including Max- and Min-Cut, Max-$k$-CSP, Maximum-Weight-Bipartite-Matching, and the Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla gradient descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
Listener head generation centers on generating non-verbal behaviors (e.g., smile) of a listener in reference to the information delivered by a speaker. A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation, which varies depending on the emotions and attitudes of both the speaker and the listener. To tackle this problem, we propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords and explicitly models the probability distribution of the motions under different emotion in conversation. Benefiting from the ``explicit'' and ``discrete'' design, our ELP model can not only automatically generate natural and diverse responses toward a given speaker via sampling from the learned distribution but also generate controllable responses with a predetermined attitude. Under several quantitative metrics, our ELP exhibits significant improvements compared to previous methods.
This paper summarizes the progress in developing a rugged, low-cost, automated ground cone robot network capable of traffic delineation at lane-level precision. A holonomic omnidirectional base with a traffic delineator was developed to allow flexibility in initialization. RTK GPS was utilized to reduce minimum position error to 2 centimeters. Due to recent developments, the cost of the platform is now less than $1,600. To minimize the effects of GPS-denied environments, wheel encoders and an Extended Kalman Filter were implemented to maintain lane-level accuracy during operation and a maximum error of 1.97 meters through 50 meters with little to no GPS signal. Future work includes increasing the operational speed of the platforms, incorporating lanelet information for path planning, and cross-platform estimation.
This paper presents a comprehensive review of the current state of the art in quantum navigation systems, with a specific focus on their application in maritime navigation. Quantum technologies have the potential to revolutionise navigation and positioning systems due to their ability to provide highly accurate and secure information. The review covers the principles of quantum navigation and highlights the latest developments in quantum-enhanced sensors, atomic clocks, and quantum communication protocols. The paper also discusses the challenges and opportunities of using quantum technologies in maritime navigation, including the effects that the maritime environment and the specificity of marine applications can have on the performance of quantum sensors. Finally, the paper concludes with a discussion on the future of quantum navigation systems and their potential impact on the maritime industry. This review aims at providing a valuable resource for researchers and engineers interested in the development and deployment of quantum navigation systems.
Creating presentation materials requires complex multimodal reasoning skills to summarize key concepts and arrange them in a logical and visually pleasing manner. Can machines learn to emulate this laborious process? We present a novel task and approach for document-to-slide generation. Solving this involves document summarization, image and text retrieval, slide structure and layout prediction to arrange key elements in a form suitable for presentation. We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner. Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides. To help accelerate research in this domain, we release a dataset about 6K paired documents and slide decks used in our experiments. We show that our approach outperforms strong baselines and produces slides with rich content and aligned imagery.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.
In order to answer natural language questions over knowledge graphs, most processing pipelines involve entity and relation linking. Traditionally, entity linking and relation linking has been performed either as dependent sequential tasks or independent parallel tasks. In this paper, we propose a framework called "EARL", which performs entity linking and relation linking as a joint single task. EARL uses a graph connection based solution to the problem. We model the linking task as an instance of the Generalised Travelling Salesman Problem (GTSP) and use GTSP approximate algorithm solutions. We later develop EARL which uses a pair-wise graph-distance based solution to the problem.The system determines the best semantic connection between all keywords of the question by referring to a knowledge graph. This is achieved by exploiting the "connection density" between entity candidates and relation candidates. The "connection density" based solution performs at par with the approximate GTSP solution.We have empirically evaluated the framework on a dataset with 5000 questions. Our system surpasses state-of-the-art scores for entity linking task by reporting an accuracy of 0.65 to 0.40 from the next best entity linker.