Radiology reports are an instrumental part of modern medicine, informing key clinical decisions such as diagnosis and treatment. The worldwide shortage of radiologists, however, restricts access to expert care and imposes heavy workloads, contributing to avoidable errors and delays in report delivery. While recent progress in automated report generation with vision-language models offer clear potential in ameliorating the situation, the path to real-world adoption has been stymied by the challenge of evaluating the clinical quality of AI-generated reports. In this study, we build a state-of-the-art report generation system for chest radiographs, \textit{Flamingo-CXR}, by fine-tuning a well-known vision-language foundation model on radiology data. To evaluate the quality of the AI-generated reports, a group of 16 certified radiologists provide detailed evaluations of AI-generated and human written reports for chest X-rays from an intensive care setting in the United States and an inpatient setting in India. At least one radiologist (out of two per case) preferred the AI report to the ground truth report in over 60$\%$ of cases for both datasets. Amongst the subset of AI-generated reports that contain errors, the most frequently cited reasons were related to the location and finding, whereas for human written reports, most mistakes were related to severity and finding. This disparity suggested potential complementarity between our AI system and human experts, prompting us to develop an assistive scenario in which \textit{Flamingo-CXR} generates a first-draft report, which is subsequently revised by a clinician. This is the first demonstration of clinician-AI collaboration for report writing, and the resultant reports are assessed to be equivalent or preferred by at least one radiologist to reports written by experts alone in 80$\%$ of in-patient cases and 60$\%$ of intensive care cases.
Most existing neural network-based approaches for solving stochastic optimal control problems using the associated backward dynamic programming principle rely on the ability to simulate the underlying state variables. However, in some problems, this simulation is infeasible, leading to the discretization of state variable space and the need to train one neural network for each data point. This approach becomes computationally inefficient when dealing with large state variable spaces. In this paper, we consider a class of this type of stochastic optimal control problems and introduce an effective solution employing multitask neural networks. To train our multitask neural network, we introduce a novel scheme that dynamically balances the learning across tasks. Through numerical experiments on real-world derivatives pricing problems, we prove that our method outperforms state-of-the-art approaches.
Patient care may be improved by recommending treatments based on patient characteristics when there is treatment effect heterogeneity. Recently, there has been a great deal of attention focused on the estimation of optimal treatment rules that maximize expected outcomes. However, there has been comparatively less attention given to settings where the outcome is right-censored, especially with regard to the practical use of estimators. In this study, simulations were undertaken to assess the finite-sample performance of estimators for optimal treatment rules and estimators for the expected outcome under treatment rules. The simulations were motivated by the common setting in biomedical and public health research where the data is observational, survival times may be right-censored, and there is interest in estimating baseline treatment decisions to maximize survival probability. A variety of outcome regression and direct search estimation methods were compared for optimal treatment rule estimation across a range of simulation scenarios. Methods that flexibly model the outcome performed comparatively well, including in settings where the treatment rule was non-linear. R code to reproduce this study's results are available on Github.
Epilepsy is a clinical neurological disorder characterized by recurrent and spontaneous seizures consisting of abnormal high-frequency electrical activity in the brain. In this condition, the transmembrane potential dynamics are characterized by rapid and sharp wavefronts traveling along the heterogeneous and anisotropic conduction pathways of the brain. This work employs the monodomain model, coupled with specific neuronal ionic models characterizing ion concentration dynamics, to mathematically describe brain tissue electrophysiology in grey and white matter at the organ scale. This multiscale model is discretized in space with the high-order discontinuous Galerkin method on polygonal and polyhedral grids (PolyDG) and advanced in time with a Crank-Nicolson scheme. This ensures, on the one hand, efficient and accurate simulations of the high-frequency electrical activity that is responsible for epileptic seizure and, on the other hand, keeps reasonably low the computational costs by a suitable combination of high-order approximations and agglomerated polytopal meshes. We numerically investigate synthetic test cases on a two-dimensional heterogeneous squared domain discretized with a polygonal grid, and on a two-dimensional brainstem in a sagittal plane with an agglomerated polygonal grid that takes full advantage of the flexibility of the PolyDG approximation of the semidiscrete formulation. Finally, we provide a theoretical analysis of stability and an a-priori convergence analysis for a simplified mathematical problem.
Physics-informed neural networks (PINNs), rooted in deep learning, have emerged as a promising approach for solving partial differential equations (PDEs). By embedding the physical information described by PDEs into feedforward neural networks, PINNs are trained as surrogate models to approximate solutions without the need for label data. Nevertheless, even though PINNs have shown remarkable performance, they can face difficulties, especially when dealing with equations featuring rapidly changing solutions. These difficulties encompass slow convergence, susceptibility to becoming trapped in local minima, and reduced solution accuracy. To address these issues, we propose a binary structured physics-informed neural network (BsPINN) framework, which employs binary structured neural network (BsNN) as the neural network component. By leveraging a binary structure that reduces inter-neuron connections compared to fully connected neural networks, BsPINNs excel in capturing the local features of solutions more effectively and efficiently. These features are particularly crucial for learning the rapidly changing in the nature of solutions. In a series of numerical experiments solving Burgers equation, Euler equation, Helmholtz equation, and high-dimension Poisson equation, BsPINNs exhibit superior convergence speed and heightened accuracy compared to PINNs. From these experiments, we discover that BsPINNs resolve the issues caused by increased hidden layers in PINNs resulting in over-smoothing, and prevent the decline in accuracy due to non-smoothness of PDEs solutions.
Recent lay language generation systems have used Transformer models trained on a parallel corpus to increase health information accessibility. However, the applicability of these models is constrained by the limited size and topical breadth of available corpora. We introduce CELLS, the largest (63k pairs) and broadest-ranging (12 journals) parallel corpus for lay language generation. The abstract and the corresponding lay language summary are written by domain experts, assuring the quality of our dataset. Furthermore, qualitative evaluation of expert-authored plain language summaries has revealed background explanation as a key strategy to increase accessibility. Such explanation is challenging for neural models to generate because it goes beyond simplification by adding content absent from the source. We derive two specialized paired corpora from CELLS to address key challenges in lay language generation: generating background explanations and simplifying the original abstract. We adopt retrieval-augmented models as an intuitive fit for the task of background explanation generation, and show improvements in summary quality and simplicity while maintaining factual correctness. Taken together, this work presents the first comprehensive study of background explanation for lay language generation, paving the path for disseminating scientific knowledge to a broader audience. CELLS is publicly available at: //github.com/LinguisticAnomalies/pls_retrieval.
In this study, we proposed a design methodology for a piezoelectric energy-harvesting device optimized for maximal power generation at a designated frequency using topology optimization. The proposed methodology emphasizes the design of a unimorph-type piezoelectric energy harvester, wherein a piezoelectric film is affixed to a singular side of a silicon cantilever beam. Both the substrate and the piezoelectric film components underwent concurrent optimization. Constraints were imposed to ensure that the resultant design is amenable to microfabrication, with specific emphasis on the etchability of piezoelectric energy harvesters. Several numerical examples were provided to validate the efficacy of the proposed method. The results showed that the proposed method derives both the substrate and piezoelectric designs that maximize the electromechanical coupling coefficient and allows the eigenfrequency of the device and minimum output voltage to be set to the desired values. Furthermore, the proposed method can provide solutions that satisfy the cross-sectional shape, substrate-depend, and minimum output voltage constraints. The solutions obtained by the proposed method are manufacturable in the field of microfabrication.
Although supervised machine learning is popular for information extraction from clinical notes, creating large annotated datasets requires extensive domain expertise and is time-consuming. Meanwhile, large language models (LLMs) have demonstrated promising transfer learning capability. In this study, we explored whether recent LLMs can reduce the need for large-scale data annotations. We curated a manually-labeled dataset of 769 breast cancer pathology reports, labeled with 13 categories, to compare zero-shot classification capability of the GPT-4 model and the GPT-3.5 model with supervised classification performance of three model architectures: random forests classifier, long short-term memory networks with attention (LSTM-Att), and the UCSF-BERT model. Across all 13 tasks, the GPT-4 model performed either significantly better than or as well as the best supervised model, the LSTM-Att model (average macro F1 score of 0.83 vs. 0.75). On tasks with high imbalance between labels, the differences were more prominent. Frequent sources of GPT-4 errors included inferences from multiple samples and complex task design. On complex tasks where large annotated datasets cannot be easily collected, LLMs can reduce the burden of large-scale data labeling. However, if the use of LLMs is prohibitive, the use of simpler supervised models with large annotated datasets can provide comparable results. LLMs demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for curating large annotated datasets. This may result in an increase in the utilization of NLP-based variables and outcomes in observational clinical studies.
Languages have long been described according to their perceived rhythmic attributes. The associated typologies are of interest in psycholinguistics as they partly predict newborns' abilities to discriminate between languages and provide insights into how adult listeners process non-native languages. Despite the relative success of rhythm metrics in supporting the existence of linguistic rhythmic classes, quantitative studies have yet to capture the full complexity of temporal regularities associated with speech rhythm. We argue that deep learning offers a powerful pattern-recognition approach to advance the characterization of the acoustic bases of speech rhythm. To explore this hypothesis, we trained a medium-sized recurrent neural network on a language identification task over a large database of speech recordings in 21 languages. The network had access to the amplitude envelopes and a variable identifying the voiced segments, assuming that this signal would poorly convey phonetic information but preserve prosodic features. The network was able to identify the language of 10-second recordings in 40% of the cases, and the language was in the top-3 guesses in two-thirds of the cases. Visualization methods show that representations built from the network activations are consistent with speech rhythm typologies, although the resulting maps are more complex than two separated clusters between stress and syllable-timed languages. We further analyzed the model by identifying correlations between network activations and known speech rhythm metrics. The findings illustrate the potential of deep learning tools to advance our understanding of speech rhythm through the identification and exploration of linguistically relevant acoustic feature spaces.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.