Introduction: The current study investigates the complex association between COVID-19 and the studied districts' socioecology (e.g. internal and external built environment, sociodemographic profiles, etc.) to quantify their contributions to the early outbreaks and epidemic resurgence of COVID-19. Methods: We aligned the analytic model's architecture with the hierarchical structure of the resident's socioecology using a multi-headed hierarchical convolutional neural network to structure the vast array of hierarchically related predictive features representing buildings' internal and external built environments and residents' sociodemographic profiles as model input. COVID-19 cases accumulated in buildings across three adjacent districts in HK, both before and during HK's epidemic resurgence, were modeled. A forward-chaining validation was performed to examine the model's performance in forecasting COVID-19 cases over the 3-, 7-, and 14-day horizons during the two months subsequent to when the model for COVID-19 resurgence was built to align with the forecasting needs in an evolving pandemic. Results: Different sets of factors were found to be linked to the earlier waves of COVID-19 outbreaks compared to the epidemic resurgence of the pandemic. Sociodemographic factors such as work hours, monthly household income, employment types, and the number of non-working adults or children in household populations were of high importance to the studied buildings' COVID-19 case counts during the early waves of COVID-19. Factors constituting one's internal built environment, such as the number of distinct households in the buildings, the number of distinct households per floor, and the number of floors, corridors, and lifts, had the greatest unique contributions to the building-level COVID-19 case counts during epidemic resurgence.
In Internet of Things (IoT) status update systems, where information is sampled and subsequently transmitted from a source to a destination node, the imperative necessity lies in maintaining the timeliness of information and updating the system with optimal frequency. Optimizing information freshness in resource-limited status update systems often involves Constrained Markov Decision Process (CMDP) problems with update rate constraints. Solving CMDP problems, especially with multiple constraints, is a challenging task. To address this, we present a token-based approach that transforms CMDP into an unconstrained MDP, simplifying the solution process. We apply this approach to systems with one and two update rate constraints for optimizing Age of Incorrect Information (AoII) and Age of Information (AoI) metrics, respectively, and explore the analytical and numerical aspects. Additionally, we introduce an iterative triangle bisection method for solving the CMDP problems with two constraints, comparing its results with the token-based MDP approach. Our findings show that the token-based approach yields superior performance over baseline policies, converging to the optimal policy as the maximum number of tokens increases.
This research focuses on evaluating the non-commercial open-source large language models (LLMs) Meditron, MedAlpaca, Mistral, and Llama-2 for their efficacy in interpreting medical guidelines saved in PDF format. As a specific test scenario, we applied these models to the guidelines for hypertension in children and adolescents provided by the European Society of Cardiology (ESC). Leveraging Streamlit, a Python library, we developed a user-friendly medical document chatbot tool (MedDoc-Bot). This tool enables authorized users to upload PDF files and pose questions, generating interpretive responses from four locally stored LLMs. A pediatric expert provides a benchmark for evaluation by formulating questions and responses extracted from the ESC guidelines. The expert rates the model-generated responses based on their fidelity and relevance. Additionally, we evaluated the METEOR and chrF metric scores to assess the similarity of model responses to reference answers. Our study found that Llama-2 and Mistral performed well in metrics evaluation. However, Llama-2 was slower when dealing with text and tabular data. In our human evaluation, we observed that responses created by Mistral, Meditron, and Llama-2 exhibited reasonable fidelity and relevance. This study provides valuable insights into the strengths and limitations of LLMs for future developments in medical document interpretation. Open-Source Code: //github.com/yaseen28/MedDoc-Bot
In this work, we propose a martingale based neural network, SOC-MartNet, for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) equations where no explicit expression is needed for the Hamiltonian $\inf_{u \in U} H(t,x,u, z,p)$, and stochastic optimal control problems with controls on both drift and volatility. We reformulate the HJB equations into a stochastic neural network learning process, i.e., training a control network and a value network such that the associated Hamiltonian process is minimized and the cost process becomes a martingale.To enforce the martingale property for the cost process, we employ an adversarial network and construct a loss function based on the projection property of conditional expectations. Then, the control/value networks and the adversarial network are trained adversarially, such that the cost process is driven towards a martingale and the minimum principle is satisfied for the control.Numerical results show that the proposed SOC-MartNet is effective and efficient for solving HJB-type equations and SOCP with a dimension up to $500$ in a small number of training epochs.
Motivation: Alzheimer's Disease hallmarks include amyloid-beta deposits and brain atrophy, detectable via PET and MRI scans, respectively. PET is expensive, invasive and exposes patients to ionizing radiation. MRI is cheaper, non-invasive, and free from ionizing radiation but limited to measuring brain atrophy. Goal: To develop an 3D image translation model that synthesizes amyloid-beta PET images from T1-weighted MRI, exploiting the known relationship between amyloid-beta and brain atrophy. Approach: The model was trained on 616 PET/MRI pairs and validated with 264 pairs. Results: The model synthesized amyloid-beta PET images from T1-weighted MRI with high-degree of similarity showing high SSIM and PSNR metrics (SSIM>0.95&PSNR=28). Impact: Our model proves the feasibility of synthesizing amyloid-beta PET images from structural MRI ones, significantly enhancing accessibility for large-cohort studies and early dementia detection, while also reducing cost, invasiveness, and radiation exposure.
The integration of AI assistants, especially through the development of Large Language Models (LLMs), into computer science education has sparked significant debate. An emerging body of work has looked into using LLMs in education, but few have examined the impacts of LLMs on students in entry-level programming courses, particularly in real-world contexts and over extended periods. To address this research gap, we conducted a semester-long, between-subjects study with 50 students using CodeTutor, an LLM-powered assistant developed by our research team. Our study results show that students who used CodeTutor (the experimental group) achieved statistically significant improvements in their final scores compared to peers who did not use the tool (the control group). Within the experimental group, those without prior experience with LLM-powered tools demonstrated significantly greater performance gain than their counterparts. We also found that students expressed positive feedback regarding CodeTutor's capability, though they also had concerns about CodeTutor's limited role in developing critical thinking skills. Over the semester, students' agreement with CodeTutor's suggestions decreased, with a growing preference for support from traditional human teaching assistants. Our analysis further reveals that the quality of user prompts was significantly correlated with CodeTutor's response effectiveness. Building upon our results, we discuss the implications of our findings for integrating Generative AI literacy into curricula to foster critical thinking skills and turn to examining the temporal dynamics of user engagement with LLM-powered tools. We further discuss the discrepancy between the anticipated functions of tools and students' actual capabilities, which sheds light on the need for tailored strategies to improve educational outcomes.
This article presents the affordances that Generative Artificial Intelligence can have in disinformation context, one of the major threats to our digitalized society. We present a research framework to generate customized agent-based social networks for disinformation simulations that would enable understanding and evaluation of the phenomena whilst discussing open challenges.
Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and analyzing what BERT has already known when solving this task, we obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed. The analysis further motivates us to take a different approach than most existing works. Instead of using prior knowledge to create a new training task for fine-tuning BERT, we directly inject knowledge into BERT's multi-head attention mechanism. This leads us to a simple yet effective approach that enjoys fast training stage as it saves the model from training on additional data or tasks other than the main task. Extensive experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance over the original BERT model, and the performance benefit is most salient when training data is scarce.
Deep convolutional neural networks (CNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and sharing will be described at the beginning, after that the other techniques will be introduced. For each scheme, we provide insightful analysis regarding the performance, related applications, advantages, and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrix, the main datasets used for evaluating the model performance and recent benchmarking efforts. Finally, we conclude this paper, discuss remaining challenges and possible directions on this topic.