A typical gait analysis requires the examination of the motion of nine joint angles on the left-hand side and six joint angles on the right-hand side across multiple subjects. Due to the quantity and complexity of the data, it is useful to calculate the amount by which a subject's gait deviates from an average normal profile and to represent this deviation as a single number. Such a measure can quantify the overall severity of a condition affecting walking, monitor progress, or evaluate the outcome of an intervention prescribed to improve the gait pattern. The gait deviation index, gait profile score, and the overall abnormality measure are standard benchmarks for quantifying gait abnormality. However, these indices do not account for the intrinsic smoothness of the gait movement at each joint/plane and the potential co-variation between the joints/planes. Utilizing a multivariate functional principal components analysis we propose the functional gait deviation index (FGDI). FGDI accounts for the intrinsic smoothness of the gait movement at each joint/plane and the potential co-variation between the joints. We show that FGDI scales with overall gait function, provides a consistent measure of gait abnormality, and is implemented easily using an interactive web app.
The field of computational pathology has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders. However, despite the explosive growth of generative artificial intelligence (AI), there has been limited study on building general purpose, multimodal AI assistants tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology using an in-house developed foundational vision encoder pretrained on 100 million histology images from over 100,000 patient cases and 1.18 million pathology image-caption pairs. The vision encoder is then combined with a pretrained large language model and the whole system is finetuned on over 250,000 diverse disease agnostic visual language instructions. We compare PathChat against several multimodal vision language AI assistants as well as GPT4V, which powers the commercially available multimodal general purpose AI assistant ChatGPT-4. When relevant clinical context is provided with the histology image, PathChat achieved a diagnostic accuracy of 87% on multiple-choice questions based on publicly available cases of diverse tissue origins and disease models. Additionally, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive and general vision language AI assistant that can flexibly handle both visual and natural language inputs, PathChat can potentially find impactful applications in pathology education, research, and human-in-the-loop clinical decision making.
Human trajectory prediction is typically posed as a zero-shot generalization problem: a predictor is learnt on a dataset of human motion in training scenes, and then deployed on unseen test scenes. While this paradigm has yielded tremendous progress, it fundamentally assumes that trends in human behavior within the deployment scene are constant over time. As such, current prediction models are unable to adapt to scene-specific transient human behaviors, such as crowds temporarily gathering to see buskers, pedestrians hurrying through the rain and avoiding puddles, or a protest breaking out. We formalize the problem of scene-specific adaptive trajectory prediction and propose a new adaptation approach inspired by prompt tuning called latent corridors. By augmenting the input of any pre-trained human trajectory predictor with learnable image prompts, the predictor can improve in the deployment scene by inferring trends from extremely small amounts of new data (e.g., 2 humans observed for 30 seconds). With less than 0.1% additional model parameters, we see up to 23.9% ADE improvement in MOTSynth simulated data and 16.4% ADE in MOT and Wildtrack real pedestrian data. Qualitatively, we observe that latent corridors imbue predictors with an awareness of scene geometry and scene-specific human behaviors that non-adaptive predictors struggle to capture. The project website can be found at //neerja.me/atp_latent_corridors/.
We simulate behaviour of independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We show that a sender and a receiver training together converge to strategies approximating the ex-ante optimal equilibrium of the game. Communication occurs to the largest extent predicted by Nash equilibrium. The conclusion is robust to alternative specifications of the learning hyperparameters and of the game. We discuss implications for theories of equilibrium selection in information transmission games, for work on emerging communication among algorithms in computer science, and for the economics of collusions in markets populated by artificially intelligent agents.
We apply a physics-informed deep-learning approach the PINN approach to the Black-Scholes equation for pricing American and European options. We test our approach on both simulated as well as real market data, compare it to analytical/numerical benchmarks. Our model is able to accurately capture the price behaviour on simulation data, while also exhibiting reasonable performance for market data. We also experiment with the architecture and learning process of our PINN model to provide more understanding of convergence and stability issues that impact performance.
Inferring a diffusion equation from discretely-observed measurements is a statistical challenge of significant importance in a variety of fields, from single-molecule tracking in biophysical systems to modeling financial instruments. Assuming that the underlying dynamical process obeys a $d$-dimensional stochastic differential equation of the form $$\mathrm{d}\boldsymbol{x}_t=\boldsymbol{b}(\boldsymbol{x}_t)\mathrm{d} t+\Sigma(\boldsymbol{x}_t)\mathrm{d}\boldsymbol{w}_t,$$ we propose neural network-based estimators of both the drift $\boldsymbol{b}$ and the spatially-inhomogeneous diffusion tensor $D = \Sigma\Sigma^{T}$ and provide statistical convergence guarantees when $\boldsymbol{b}$ and $D$ are $s$-H\"older continuous. Notably, our bound aligns with the minimax optimal rate $N^{-\frac{2s}{2s+d}}$ for nonparametric function estimation even in the presence of correlation within observational data, which necessitates careful handling when establishing fast-rate generalization bounds. Our theoretical results are bolstered by numerical experiments demonstrating accurate inference of spatially-inhomogeneous diffusion tensors.
In this chapter we provide an introduction to fractional dissipative partial differential equations (PDEs) with a focus on trying to understand their dynamics. The class of PDEs we focus on are reaction-diffusion equations but we also provide an outlook on closely related classes of PDEs. To simplify the exposition, we only discuss the cases of fractional time derivatives and fractional space derivatives in the PDE separately. As our main tools, we describe analytical as well as numerical methods, which are generically necessary to study nonlinear dynamics. We start with the analytical study of steady states and local linear stability for fractional time derivatives. Then we extend this view to a global perspective and consider time-fractional PDEs and gradient flows. Next, we continue to steady states, linear stability analysis and bifurcations for space-fractional PDEs. As a final analytical consideration we discuss existence and stability of traveling waves for space-fractional PDEs. In the last parts, we provide numerical discretization schemes for fractional (dissipative) PDEs and we utilize these techniques within numerical continuation in applied examples of fractional reaction-diffusion PDEs. We conclude with a brief summary and outlook on open questions in the field.
The multiobjective evolutionary optimization algorithm (MOEA) is a powerful approach for tackling multiobjective optimization problems (MOPs), which can find a finite set of approximate Pareto solutions in a single run. However, under mild regularity conditions, the Pareto optimal set of a continuous MOP could be a low dimensional continuous manifold that contains infinite solutions. In addition, structure constraints on the whole optimal solution set, which characterize the patterns shared among all solutions, could be required in many real-life applications. It is very challenging for existing finite population based MOEAs to handle these structure constraints properly. In this work, we propose the first model-based algorithmic framework to learn the whole solution set with structure constraints for multiobjective optimization. In our approach, the Pareto optimality can be traded off with a preferred structure among the whole solution set, which could be crucial for many real-world problems. We also develop an efficient evolutionary learning method to train the set model with structure constraints. Experimental studies on benchmark test suites and real-world application problems demonstrate the promising performance of our proposed framework.
Multicalibration is a notion of fairness for predictors that requires them to provide calibrated predictions across a large set of protected groups. Multicalibration is known to be a distinct goal than loss minimization, even for simple predictors such as linear functions. In this work, we consider the setting where the protected groups can be represented by neural networks of size $k$, and the predictors are neural networks of size $n > k$. We show that minimizing the squared loss over all neural nets of size $n$ implies multicalibration for all but a bounded number of unlucky values of $n$. We also give evidence that our bound on the number of unlucky values is tight, given our proof technique. Previously, results of the flavor that loss minimization yields multicalibration were known only for predictors that were near the ground truth, hence were rather limited in applicability. Unlike these, our results rely on the expressivity of neural nets and utilize the representation of the predictor.
Disentangled Representation Learning (DRL) aims to learn a model capable of identifying and disentangling the underlying factors hidden in the observable data in representation form. The process of separating underlying factors of variation into variables with semantic meaning benefits in learning explainable representations of data, which imitates the meaningful understanding process of humans when observing an object or relation. As a general learning strategy, DRL has demonstrated its power in improving the model explainability, controlability, robustness, as well as generalization capacity in a wide range of scenarios such as computer vision, natural language processing, data mining etc. In this article, we comprehensively review DRL from various aspects including motivations, definitions, methodologies, evaluations, applications and model designs. We discuss works on DRL based on two well-recognized definitions, i.e., Intuitive Definition and Group Theory Definition. We further categorize the methodologies for DRL into four groups, i.e., Traditional Statistical Approaches, Variational Auto-encoder Based Approaches, Generative Adversarial Networks Based Approaches, Hierarchical Approaches and Other Approaches. We also analyze principles to design different DRL models that may benefit different tasks in practical applications. Finally, we point out challenges in DRL as well as potential research directions deserving future investigations. We believe this work may provide insights for promoting the DRL research in the community.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.