In this study, we aim to initiate the development of Radiology Foundation Model, termed as RadFM.We consider the construction of foundational models from the perspectives of dataset construction, model design, and thorough evaluation. Our contribution can be concluded as follows: (i), we construct a large-scale Medical Multi-modal Dataset, MedMD, which consists of 16M 2D and 3D medical scans with high-quality text descriptions or reports across various data formats, modalities, and tasks, covering over 5000 distinct diseases. To the best of our knowledge, this is the first large-scale, high-quality, medical visual-language dataset, with both 2D and 3D scans; (ii ), we propose an architecture that enables visually conditioned generative pre-training, i.e., allowing for integration of text input with 2D or 3D medical scans, and generate responses for diverse radiologic tasks. The model was initially pre-trained on MedMD and subsequently fine-tuned on the domain-specific dataset, which is a radiologic cleaned version of MedMD, containing 3M radiologic visual-language pairs, termed as RadMD; (iii), we propose a new evaluation benchmark, RadBench, that comprises five tasks, including modality recognition, disease diagnosis, visual question answering, report generation and rationale diagnosis, aiming to comprehensively assess the capability of foundation models in handling practical clinical problems. We conduct both automatic and human evaluation on RadBench, in both cases, RadFM significantly outperforms existing multi-modal foundation models. The codes, data, and model checkpoint will all be made publicly available to promote further research and development in the field.
The following is a technical report to test the validity of the proposed Subspace Pyramid Fusion Module (SPFM) to capture multi-scale feature representations, which is more useful for semantic segmentation. In this investigation, we have proposed the Efficient Shuffle Attention Module(ESAM) to reconstruct the skip-connections paths by fusing multi-level global context features. Experimental results on two well-known semantic segmentation datasets, including Camvid and Cityscapes, show the effectiveness of our proposed method.
Urban Air Mobility (UAM) represents a promising solution for future transportation. In this study, we introduce VertiSim, an advanced event-driven simulator developed to evaluate e-VTOL transportation networks. Uniquely, VertiSim simultaneously models passenger, aircraft, and energy flows, reflecting the interrelated complexities of UAM systems. We utilized VertiSim to assess 19 operational scenarios serving a daily demand for 2,834 passengers with varying fleet sizes and vertiport distances. The study aims to support stakeholders in making informed decisions about fleet size, network design, and infrastructure development by understanding tradeoffs in passenger delay time, operational costs, and fleet utilization. Our simulations, guided by a heuristic dispatch and charge policy, indicate that fleet size significantly influences passenger delay and energy consumption within UAM networks. We find that increasing the fleet size can reduce average passenger delays, but this comes at the cost of higher operational expenses due to an increase in the number of repositioning flights. Additionally, our analysis highlights how vertiport distances impact fleet utilization: longer distances result in reduced total idle time and increased cruise and charge times, leading to more efficient fleet utilization but also longer passenger delays. These findings are important for UAM network planning, especially in balancing fleet size with vertiport capacity and operational costs. Simulator demo is available at: //tinyurl.com/vertisim-vis
In this work, we tested the Triplet Extraction (TE) capabilities of a variety of Large Language Models (LLMs) of different sizes in the Zero- and Few-Shots settings. In detail, we proposed a pipeline that dynamically gathers contextual information from a Knowledge Base (KB), both in the form of context triplets and of (sentence, triplets) pairs as examples, and provides it to the LLM through a prompt. The additional context allowed the LLMs to be competitive with all the older fully trained baselines based on the Bidirectional Long Short-Term Memory (BiLSTM) Network architecture. We further conducted a detailed analysis of the quality of the gathered KB context, finding it to be strongly correlated with the final TE performance of the model. In contrast, the size of the model appeared to only logarithmically improve the TE capabilities of the LLMs.
Dynamic Bayesian Networks (DBNs), renowned for their interpretability, have become increasingly vital in representing complex stochastic processes in various domains such as gene expression analysis, healthcare, and traffic prediction. Structure learning of DBNs from data is challenging, particularly for datasets with thousands of variables. Most current algorithms for DBN structure learning are adaptations from those used in static Bayesian Networks (BNs), and are typically focused on small-scale problems. In order to solve large-scale problems while taking full advantage of existing algorithms, this paper introduces a novel divide-and-conquer strategy, originally developed for static BNs, and adapts it for large-scale DBN structure learning. In this work, we specifically concentrate on 2 Time-sliced Bayesian Networks (2-TBNs), a special class of DBNs. Furthermore, we leverage the prior knowledge of 2-TBNs to enhance the performance of the strategy we introduce. Our approach significantly improves the scalability and accuracy of 2-TBN structure learning. Experimental results demonstrate the effectiveness of our method, showing substantial improvements over existing algorithms in both computational efficiency and structure learning accuracy. On problem instances with more than 1,000 variables, our approach improves two accuracy metrics by 74.45% and 110.94% on average , respectively, while reducing runtime by 93.65% on average.
Semantic communication is a promising communication paradigm that utilizes Deep Neural Networks (DNNs) to extract the information relevant to downstream tasks, hence significantly reducing the amount of transmitted data. In current practice, the semantic communication transmitter for a specific task is typically pre-trained and shared by all users. However, due to user heterogeneity, it is desirable to use different transmitters according to the available computational and communication resources of users. In this paper, we first show that it is possible to dynamically adjust the computational and communication overhead of DNN-based transmitters, thereby achieving adaptive semantic communication. After that, we investigate the user association and resource allocation problem in a multi-cell network where users are equipped with adaptive semantic communication transmitters. To solve this problem, we decompose it into three subproblems involving the scheduling of each user, the resource allocation of each base station (BS), and the user association between users and BSs. Then we solve each problem progressively based on the solution of the previous subproblem. The final algorithm can obtain near-optimal solutions in polynomial time. Numerical results show that our algorithm outperforms benchmarks under various situations.
As one of the potential key technologies of 6G, semantic communication is still in its infancy and there are many open problems, such as semantic entropy definition and semantic channel coding theory. To address these challenges, we investigate semantic information measures and semantic channel coding theorem. Specifically, we propose a semantic entropy definition as the uncertainty in the semantic interpretation of random variable symbols in the context of knowledge bases, which can be transformed into existing semantic entropy definitions under given conditions. Moreover, different from traditional communications, semantic communications can achieve accurate transmission of semantic information under a non-zero bit error rate. Based on this property, we derive a semantic channel coding theorem for a typical semantic communication with many-to-one source (i.e., multiple source sequences express the same meaning), and prove its achievability and converse based on a generalized Fano's inequality. Finally, numerical results verify the effectiveness of the proposed semantic entropy and semantic channel coding theorem.
In this preliminary study, we provide two methods for estimating the gradients of functions of real value. Both methods are built on derivative estimations that are calculated using the standard method or the Squire-Trapp method for any given direction. Gradients are computed as the average of derivatives in uniformly sampled directions. The first method uses a uniformly distributed set of axes that consists of orthogonal unit vectors that span the space. The second method only uses a uniformly distributed set of unit vectors. Both methods essentially minimize the error through an average of estimations to cancel error terms. Both methods are essentially a conceptual generalization of the method used to estimate normal fractal surfaces.
This article presents the affordances that Generative Artificial Intelligence can have in disinformation context, one of the major threats to our digitalized society. We present a research framework to generate customized agent-based social networks for disinformation simulations that would enable understanding and evaluation of the phenomena whilst discussing open challenges.
Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.
With the rapid growth of knowledge bases (KBs), question answering over knowledge base, a.k.a. KBQA has drawn huge attention in recent years. Most of the existing KBQA methods follow so called encoder-compare framework. They map the question and the KB facts to a common embedding space, in which the similarity between the question vector and the fact vectors can be conveniently computed. This, however, inevitably loses original words interaction information. To preserve more original information, we propose an attentive recurrent neural network with similarity matrix based convolutional neural network (AR-SMCNN) model, which is able to capture comprehensive hierarchical information utilizing the advantages of both RNN and CNN. We use RNN to capture semantic-level correlation by its sequential modeling nature, and use an attention mechanism to keep track of the entities and relations simultaneously. Meanwhile, we use a similarity matrix based CNN with two-directions pooling to extract literal-level words interaction matching utilizing CNNs strength of modeling spatial correlation among data. Moreover, we have developed a new heuristic extension method for entity detection, which significantly decreases the effect of noise. Our method has outperformed the state-of-the-arts on SimpleQuestion benchmark in both accuracy and efficiency.