With the increasing importance of remote healthcare monitoring in the healthcare industry, it is essential to evaluate the usefulness and the ease of use the technology brings in remote healthcare. With this research, we want to understand the perspective of healthcare professionals, their competencies in using technology related to remote healthcare monitoring, and their trust and adoption of technology. In addition to these core factors, we introduce sustainability as a pivotal dimension in the Technology Acceptance Model, reflecting its importance in motivating and determining the use of remote healthcare technology. The results suggest that the participants have a positive view towards the use of remote monitoring devices for telemedicine, but have some concerns about security and privacy, and believe that network coverage needs to improve in remote areas. However, advances in technology and a focus on sustainable development can facilitate more effective and widespread adoption of remote monitoring devices in telemedicine.
Understanding the severity of conditions shown in images in medical diagnosis is crucial, serving as a key guide for clinical assessment, treatment, as well as evaluating longitudinal progression. This paper proposes Con- PrO: a novel representation learning method for severity assessment in medical images using Contrastive learningintegrated Preference Optimization. Different from conventional contrastive learning methods that maximize the distance between classes, ConPrO injects into the latent vector the distance preference knowledge between various severity classes and the normal class. We systematically examine the key components of our framework to illuminate how contrastive prediction tasks acquire valuable representations. We show that our representation learning framework offers valuable severity ordering in the feature space while outperforming previous state-of-the-art methods on classification tasks. We achieve a 6% and 20% relative improvement compared to a supervised and a self-supervised baseline, respectively. In addition, we derived discussions on severity indicators and related applications of preference comparison in the medical domain.
Current recommendation systems are significantly affected by a serious issue of temporal data shift, which is the inconsistency between the distribution of historical data and that of online data. Most existing models focus on utilizing updated data, overlooking the transferable, temporal data shift-free information that can be learned from shifting data. We propose the Temporal Invariance of Association theorem, which suggests that given a fixed search space, the relationship between the data and the data in the search space keeps invariant over time. Leveraging this principle, we designed a retrieval-based recommendation system framework that can train a data shift-free relevance network using shifting data, significantly enhancing the predictive performance of the original model in the recommendation system. However, retrieval-based recommendation models face substantial inference time costs when deployed online. To address this, we further designed a distill framework that can distill information from the relevance network into a parameterized module using shifting data. The distilled model can be deployed online alongside the original model, with only a minimal increase in inference time. Extensive experiments on multiple real datasets demonstrate that our framework significantly improves the performance of the original model by utilizing shifting data.
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures without explicitly encoding any structural bias. In this work, we investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge. We extensively experiment with transformer models trained on multiple synthetic datasets and with different training objectives and show that while other objectives e.g. sequence-to-sequence modeling, prefix language modeling, often failed to lead to hierarchical generalization, models trained with the language modeling objective consistently learned to generalize hierarchically. We then conduct pruning experiments to study how transformers trained with the language modeling objective encode hierarchical structure. When pruned, we find joint existence of subnetworks within the model with different generalization behaviors (subnetworks corresponding to hierarchical structure and linear order). Finally, we take a Bayesian perspective to further uncover transformers' preference for hierarchical generalization: We establish a correlation between whether transformers generalize hierarchically on a dataset and whether the simplest explanation of that dataset is provided by a hierarchical grammar compared to regular grammars exhibiting linear generalization.
The concept of differential privacy (DP) can quantitatively measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset. The DP, which is generally used as a constraint, has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google. A common methodology for guaranteeing DP is incorporating appropriate noise into query outputs, thereby establishing statistical defense systems against privacy attacks such as membership inference and linkage attacks. However, especially for small datasets, existing DP mechanisms occasionally add excessive amount of noise to query output, thereby discarding data utility. This is because the traditional DP computes privacy loss based on the worst-case scenario, i.e., statistical outliers. In this work, to tackle this challenge, we utilize per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances. In a nutshell, we propose a per-instance noise variance optimization (NVO) game, framed as a common interest sequential game, and show that the Nash equilibrium (NE) points of it inherently guarantee pDP for all data instances. Through extensive experiments, our proposed pDP algorithm demonstrated an average performance improvement of up to 99.53% compared to the conventional DP algorithm in terms of KL divergence.
Hypertension is a global health concern with an increasing prevalence, underscoring the need for effective monitoring and analysis of blood pressure (BP) dynamics. We analyzed a substantial BP dataset comprising 75,636,128 records from 2,054,462 unique patients collected between 2000 and 2022 at Emory Healthcare in Georgia, USA, representing a demographically diverse population. We examined and compared population-wide statistics of bivariate changes in systolic BP (SBP) and diastolic BP (DBP) across sex, age, and race/ethnicity. The analysis revealed that males have higher BP levels than females and exhibit a distinct BP profile with age. Notably, average SBP consistently rises with age, whereas average DBP peaks in the forties age group. Among the ethnic groups studied, Blacks have marginally higher BPs and a greater standard deviation. We also discovered a significant correlation between SBP and DBP at the population level, a phenomenon not previously researched. These results emphasize the importance of demography-specific BP analysis for clinical diagnosis and provide valuable insights for developing personalized, demography-specific healthcare interventions.
Artificial Intelligence applications have shown promise in the management of pandemics and have been widely used to assist the identification, classification, and diagnosis of medical images. In response to the global outbreak of Monkeypox (Mpox), the HeHealth.ai team leveraged an existing tool to screen for sexually transmitted diseases to develop a digital screening test for symptomatic Mpox through AI approaches. Prior to the global outbreak of Mpox, the team developed a smartphone app, where app users can use their own smartphone cameras to take pictures of their own penises to screen for symptomatic STD. The AI model was initially developed using 5000 cases and use a modified convolutional neural network to output prediction scores across visually diagnosable penis pathologies including Syphilis, Herpes Simplex Virus, and Human Papilloma Virus. From June 2022 to October 2022, a total of about 22,000 users downloaded the HeHealth app, and about 21,000 images have been analyzed using HeHealth AI technology. We then engaged in formative research, stakeholder engagement, rapid consolidation images, a validation study, and implementation of the tool from July 2022. From July 2022 to October 2022, a total of 1000 Mpox related images had been used to train the Mpox symptom checker tool. Our digital symptom checker tool showed accuracy of 87% to rule in Mpox and 90% to rule out symptomatic Mpox. Several hurdles identified included issues of data privacy and security for app users, initial lack of data to train the AI tool, and the potential generalizability of input data. We offer several suggestions to help others get started on similar projects in emergency situations, including engaging a wide range of stakeholders, having a multidisciplinary team, prioritizing pragmatism, as well as the concept that big data in fact is made up of small data.
Autonomous systems are soon to be ubiquitous, from manufacturing autonomy to agricultural field robots, and from health care assistants to the entertainment industry. The majority of these systems are developed with modular sub-components for decision-making, planning, and control that may be hand-engineered or learning-based. While these existing approaches have been shown to perform well under the situations they were specifically designed for, they can perform especially poorly in rare, out-of-distribution scenarios that will undoubtedly arise at test-time. The rise of foundation models trained on multiple tasks with impressively large datasets from a variety of fields has led researchers to believe that these models may provide common sense reasoning that existing planners are missing. Researchers posit that this common sense reasoning will bridge the gap between algorithm development and deployment to out-of-distribution tasks, like how humans adapt to unexpected scenarios. Large language models have already penetrated the robotics and autonomous systems domains as researchers are scrambling to showcase their potential use cases in deployment. While this application direction is very promising empirically, foundation models are known to hallucinate and generate decisions that may sound reasonable, but are in fact poor. We argue there is a need to step back and simultaneously design systems that can quantify the certainty of a model's decision, and detect when it may be hallucinating. In this work, we discuss the current use cases of foundation models for decision-making tasks, provide a general definition for hallucinations with examples, discuss existing approaches to hallucination detection and mitigation with a focus on decision problems, and explore areas for further research in this exciting field.
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated in one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey specific to attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and open questions related to attention mechanism in general. Finally, we recommend possible future research directions for deep attention.