Continuous monitoring and patient acuity assessments are key aspects of Intensive Care Unit (ICU) practice, but both are limited by time constraints imposed on healthcare providers. Moreover, anticipating clinical trajectories remains imprecise. The objectives of this study are to (1) develop an electronic phenotype of acuity using automated variable retrieval within the electronic health records and (2) describe transitions between acuity states that illustrate the clinical trajectories of ICU patients. We gathered two single-center, longitudinal electronic health record datasets for 51,372 adult ICU patients admitted to the University of Florida Health (UFH) Gainesville (GNV) and Jacksonville (JAX). We developed algorithms to quantify acuity status at four-hour intervals for each ICU admission and identify acuity phenotypes using continuous acuity status and k-means clustering approach. 51,073 admissions for 38,749 patients in the UFH GNV dataset and 22,219 admissions for 12,623 patients in the UFH JAX dataset had at least one ICU stay lasting more than four hours. There were three phenotypes: persistently stable, persistently unstable, and transitioning from unstable to stable. For stable patients, approximately 0.7%-1.7% would transition to unstable, 0.02%-0.1% would expire, 1.2%-3.4% would be discharged, and the remaining 96%-97% would remain stable in the ICU every four hours. For unstable patients, approximately 6%-10% would transition to stable, 0.4%-0.5% would expire, and the remaining 89%-93% would remain unstable in the ICU in the next four hours. We developed phenotyping algorithms for patient acuity status every four hours while admitted to the ICU. This approach may be useful in developing prognostic and clinical decision-support tools to aid patients, caregivers, and providers in shared decision-making processes regarding escalation of care and patient values.
Modern language models capture a large body of factual knowledge. However, some facts can be incorrectly induced or become obsolete over time, resulting in factually incorrect generations. This has led to the development of various editing methods that allow updating facts encoded by the model. Evaluation of these methods has primarily focused on testing whether an individual fact has been successfully injected, and if similar predictions for other subjects have not changed. Here we argue that such evaluation is limited, since injecting one fact (e.g. ``Jack Depp is the son of Johnny Depp'') introduces a ``ripple effect'' in the form of additional facts that the model needs to update (e.g.``Jack Depp is the sibling of Lily-Rose Depp''). To address this issue, we propose a novel set of evaluation criteria that consider the implications of an edit on related facts. Using these criteria, we then construct RippleEdits, a diagnostic benchmark of 5K factual edits, capturing a variety of types of ripple effects. We evaluate prominent editing methods on RippleEdits, showing that current methods fail to introduce consistent changes in the model's knowledge. In addition, we find that a simple in-context editing baseline obtains the best scores on our benchmark, suggesting a promising research direction for model editing.
Predicting the future trajectories of nearby objects plays a pivotal role in Robotics and Automation such as autonomous driving. While learning-based trajectory prediction methods have achieved remarkable performance on public benchmarks, the generalization ability of these approaches remains questionable. The poor generalizability on unseen domains, a well-recognized defect of data-driven approaches, can potentially harm the real-world performance of trajectory prediction models. We are thus motivated to improve generalization ability of models instead of merely pursuing high accuracy on average. Due to the lack of benchmarks for quantifying the generalization ability of trajectory predictors, we first construct a new benchmark called argoverse-shift, where the data distributions of domains are significantly different. Using this benchmark for evaluation, we identify that the domain shift problem seriously hinders the generalization of trajectory predictors since state-of-the-art approaches suffer from severe performance degradation when facing those out-of-distribution scenes. To enhance the robustness of models against domain shift problems, we propose a plug-and-play strategy for domain normalization in trajectory prediction. Our strategy utilizes the Frenet coordinate frame for modeling and can effectively narrow the domain gap of different scenes caused by the variety of road geometry and topology. Experiments show that our strategy noticeably boosts the prediction performance of the state-of-the-art in domains that were previously unseen to the models, thereby improving the generalization ability of data-driven trajectory prediction methods.
Federated Averaging (FedAvg) is known to experience convergence issues when encountering significant clients system heterogeneity and data heterogeneity. Server momentum has been proposed as an effective mitigation. However, existing server momentum works are restrictive in the momentum formulation, do not properly schedule hyperparameters and focus only on system homogeneous settings, which leaves the role of server momentum still an under-explored problem. In this paper, we propose a general framework for server momentum, that (a) covers a large class of momentum schemes that are unexplored in federated learning (FL), (b) enables a popular stagewise hyperparameter scheduler, (c) allows heterogeneous and asynchronous local computing. We provide rigorous convergence analysis for the proposed framework. To our best knowledge, this is the first work that thoroughly analyzes the performances of server momentum with a hyperparameter scheduler and system heterogeneity. Extensive experiments validate the effectiveness of our proposed framework.
In the United States, regions are frequently divided into districts for the purpose of electing representatives. How the districts are drawn can affect who's elected, and drawing districts to give an advantage to a certain group is known as gerrymandering. It can be surprisingly difficult to detect gerrymandering, but one algorithmic method is to compare a current districting plan to a large number of randomly sampled plans to see whether it is an outlier. Recombination Markov chains are often used for this random sampling: randomly choose two districts, consider their union, and split this union in a new way. This works well in practice, but the theory behind it remains underdeveloped. For example, it's not known if recombination Markov chains are irreducible, that is, if recombination moves suffice to move from any districting plan to any other. Irreducibility of recombination Markov chains can be formulated as a graph problem: for a graph $G$, is the space of all partitions of $G$ into $k$ connected subgraphs ($k$ districts) connected by recombination moves? We consider three simply connected districts and district sizes $k_1\pm 1$ vertices, $k_2\pm 1$ vertices, and $k3\pm 1$ vertices. We prove for arbitrarily large triangular regions in the triangular lattice, recombination Markov chains are irreducible. This is the first proof of irreducibility under tight district size constraints for recombination Markov chains beyond small or trivial examples.
This dissertation reports some first steps towards a compositional account of active inference and the Bayesian brain. Specifically, we use the tools of contemporary applied category theory to supply functorial semantics for approximate inference. To do so, we define on the `syntactic' side the new notion of Bayesian lens and show that Bayesian updating composes according to the compositional lens pattern. Using Bayesian lenses, and inspired by compositional game theory, we define fibrations of statistical games and classify various problems of statistical inference as corresponding sections: the chain rule of the relative entropy is formalized as a strict section, while maximum likelihood estimation and the free energy give lax sections. In the process, we introduce a new notion of `copy-composition'. On the `semantic' side, we present a new formalization of general open dynamical systems (particularly: deterministic, stochastic, and random; and discrete- and continuous-time) as certain coalgebras of polynomial functors, which we show collect into monoidal opindexed categories (or, alternatively, into algebras for multicategories of generalized polynomial functors). We use these opindexed categories to define monoidal bicategories of cilia: dynamical systems which control lenses, and which supply the target for our functorial semantics. Accordingly, we construct functors which explain the bidirectional compositional structure of predictive coding neural circuits under the free energy principle, thereby giving a formal mathematical underpinning to the bidirectionality observed in the cortex. Along the way, we explain how to compose rate-coded neural circuits using an algebra for a multicategory of linear circuit diagrams, showing subsequently that this is subsumed by lenses and polynomial functors.
End-to-end Speech Translation (ST) aims to convert speech into target text within a unified model. The inherent differences between speech and text modalities often impede effective cross-modal and cross-lingual transfer. Existing methods typically employ hard alignment (H-Align) of individual speech and text segments, which can degrade textual representations. To address this, we introduce Soft Alignment (S-Align), using adversarial training to align the representation spaces of both modalities. S-Align creates a modality-invariant space while preserving individual modality quality. Experiments on three languages from the MuST-C dataset show S-Align outperforms H-Align across multiple tasks and offers translation capabilities on par with specialized translation models.
Traditional neural networks are simple to train but they typically produce overconfident predictions. In contrast, Bayesian neural networks provide good uncertainty quantification but optimizing them is time consuming due to the large parameter space. This paper proposes to combine the advantages of both approaches by performing Variational Inference in the Final layer Output space (VIFO), because the output space is much smaller than the parameter space. We use neural networks to learn the mean and the variance of the probabilistic output. Like standard, non-Beyesian models, VIFO enjoys simple training and one can use Rademacher complexity to provide risk bounds for the model. On the other hand, using the Bayesian formulation we incorporate collapsed variational inference with VIFO which significantly improves the performance in practice. Experiments show that VIFO and ensembles of VIFO provide a good tradeoff in terms of run time and uncertainty quantification, especially for out of distribution data.
The wayward quality of continuous prompts stresses the importance of their interpretability as unexpected and unpredictable behaviors appear following training, especially in the context of large language models automating people-sensitive tasks such as resume screening. In this paper we present a novel method of constructing continuous prompts via discrete prompt embeddings and evaluate improvements to continuous prompt interpretability and inference accuracy. For a set of manually designed discrete prompts $\mathcal{D}$, which we tokenize each into tensor form, we train a model to predict the weights such that the linear combinations of those prompts correspond to higher performance on natural language understanding tasks.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.
Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.