Learning from demonstrations (LfD) enables humans to easily teach collaborative robots (cobots) new motions that can be generalized to new task configurations without retraining. However, state-of-the-art LfD methods require manually tuning intrinsic parameters and have rarely been used in industrial contexts without experts. We propose a parameter-free LfD method based on probabilistic movement primitives, where parameters are determined using Jensen-Shannon divergence and Bayesian optimization, and users do not have to perform manual parameter tuning. The cobot's precision in reproducing learned motions, and its ease of teaching and use by non-expert users are evaluated in two field tests. In the first field test, the cobot works on elevator door maintenance. In the second test, three factory workers teach the cobot tasks useful for their daily workflow. Errors between the cobot and target joint angles are insignificant -- at worst 0.28 deg -- and the motion is accurately reproduced -- GMCC score of 1. Questionnaires completed by the workers highlighted the method's ease of use and the accuracy of the reproduced motion. Public implementation of our method and datasets are made available online.
A Retrieval-Augmented Language Model (RALM) augments a generative language model by retrieving context-specific knowledge from an external database. This strategy facilitates impressive text generation quality even with smaller models, thus reducing orders of magnitude of computational demands. However, RALMs introduce unique system design challenges due to (a) the diverse workload characteristics between LM inference and retrieval and (b) the various system requirements and bottlenecks for different RALM configurations such as model sizes, database sizes, and retrieval frequencies. We propose Chameleon, a heterogeneous accelerator system that integrates both LM and retrieval accelerators in a disaggregated architecture. The heterogeneity ensures efficient acceleration of both LM inference and retrieval, while the accelerator disaggregation enables the system to independently scale both types of accelerators to fulfill diverse RALM requirements. Our Chameleon prototype implements retrieval accelerators on FPGAs and assigns LM inference to GPUs, with a CPU server orchestrating these accelerators over the network. Compared to CPU-based and CPU-GPU vector search systems, Chameleon achieves up to 23.72x speedup and 26.2x energy efficiency. Evaluated on various RALMs, Chameleon exhibits up to 2.16x reduction in latency and 3.18x speedup in throughput compared to the hybrid CPU-GPU architecture. These promising results pave the way for bringing accelerator heterogeneity and disaggregation into future RALM systems.
Image Schemas are repetitive cognitive patterns that influence the way we conceptualize and reason about various concepts present in speech. These patterns are deeply embedded within our cognitive processes and are reflected in our bodily expressions including gestures. Particularly, metaphoric gestures possess essential characteristics and semantic meanings that align with Image Schemas, to visually represent abstract concepts. The shape and form of gestures can convey abstract concepts, such as extending the forearm and hand or tracing a line with hand movements to visually represent the image schema of PATH. Previous behavior generation models have primarily focused on utilizing speech (acoustic features and text) to drive the generation model of virtual agents. They have not considered key semantic information as those carried by Image Schemas to effectively generate metaphoric gestures. To address this limitation, we introduce META4, a deep learning approach that generates metaphoric gestures from both speech and Image Schemas. Our approach has two primary goals: computing Image Schemas from input text to capture the underlying semantic and metaphorical meaning, and generating metaphoric gestures driven by speech and the computed image schemas. Our approach is the first method for generating speech driven metaphoric gestures while leveraging the potential of Image Schemas. We demonstrate the effectiveness of our approach and highlight the importance of both speech and image schemas in modeling metaphoric gestures.
Generalized linear models (GLMs) are routinely used for modeling relationships between a response variable and a set of covariates. The simple form of a GLM comes with easy interpretability, but also leads to concerns about model misspecification impacting inferential conclusions. A popular semi-parametric solution adopted in the frequentist literature is quasi-likelihood, which improves robustness by only requiring correct specification of the first two moments. We develop a robust approach to Bayesian inference in GLMs through quasi-posterior distributions. We show that quasi-posteriors provide a coherent generalized Bayes inference method, while also approximating so-called coarsened posteriors. In so doing, we obtain new insights into the choice of coarsening parameter. Asymptotically, the quasi-posterior converges in total variation to a normal distribution and has important connections with the loss-likelihood bootstrap posterior. We demonstrate that it is also well-calibrated in terms of frequentist coverage. Moreover, the loss-scale parameter has a clear interpretation as a dispersion, and this leads to a consolidated method of moments estimator.
The remarkable success of GPT models across various tasks, including toponymy recognition motivates us to assess the performance of the GPT-3 model in the geocoding address parsing task. To ensure that the evaluation more accurately mirrors performance in real-world scenarios with diverse user input qualities and resolve the pressing need for a 'gold standard' evaluation dataset for geocoding systems, we introduce a benchmark dataset of low-quality address descriptions synthesized based on human input patterns mining from actual input logs of a geocoding system in production. This dataset has 21 different input errors and variations; contains over 239,000 address records that are uniquely selected from streets across all U.S. 50 states and D.C.; and consists of three subsets to be used as training, validation, and testing sets. Building on this, we train and gauge the performance of the GPT-3 model in extracting address components, contrasting its performance with transformer-based and LSTM-based models. The evaluation results indicate that Bidirectional LSTM-CRF model has achieved the best performance over these transformer-based models and GPT-3 model. Transformer-based models demonstrate very comparable results compared to the Bidirectional LSTM-CRF model. The GPT-3 model, though trailing in performance, showcases potential in the address parsing task with few-shot examples, exhibiting room for improvement with additional fine-tuning. We open source the code and data of this presented benchmark so that researchers can utilize it for future model development or extend it to evaluate similar tasks, such as document geocoding.
Throughout the life sciences we routinely seek to interpret measurements and observations using parameterised mechanistic mathematical models. A fundamental and often overlooked choice in this approach involves relating the solution of a mathematical model with noisy and incomplete measurement data. This is often achieved by assuming that the data are noisy measurements of the solution of a deterministic mathematical model, and that measurement errors are additive and normally distributed. While this assumption of additive Gaussian noise is extremely common and simple to implement and interpret, it is often unjustified and can lead to poor parameter estimates and non-physical predictions. One way to overcome this challenge is to implement a different measurement error model. In this review, we demonstrate how to implement a range of measurement error models in a likelihood-based framework for estimation, identifiability analysis, and prediction, called Profile-Wise Analysis. This frequentist approach to uncertainty quantification for mechanistic models leverages the profile likelihood for targeting parameters and understanding their influence on predictions. Case studies, motivated by simple caricature models routinely used in systems biology and mathematical biology literature, illustrate how the same ideas apply to different types of mathematical models. Open-source Julia code to reproduce results is available on GitHub.
When modeling a vector of risk variables, extreme scenarios are often of special interest. The peaks-over-thresholds method hinges on the notion that, asymptotically, the excesses over a vector of high thresholds follow a multivariate generalized Pareto distribution. However, existing literature has primarily concentrated on the setting when all risk variables are always large simultaneously. In reality, this assumption is often not met, especially in high dimensions. In response to this limitation, we study scenarios where distinct groups of risk variables may exhibit joint extremes while others do not. These discernible groups are derived from the angular measure inherent in the corresponding max-stable distribution, whence the term extreme direction. We explore such extreme directions within the framework of multivariate generalized Pareto distributions, with a focus on their probability density functions in relation to an appropriate dominating measure. Furthermore, we provide a stochastic construction that allows any prespecified set of risk groups to constitute the distribution's extreme directions. This construction takes the form of a smoothed max-linear model and accommodates the full spectrum of conceivable max-stable dependence structures. Additionally, we introduce a generic simulation algorithm tailored for multivariate generalized Pareto distributions, offering specific implementations for extensions of the logistic and H\"usler-Reiss families capable of carrying arbitrary extreme directions.
Self-attention and position embedding are two key modules in Transformer based LLMs. The potential relationship among them are far from well studied, especially for context window extending. In this paper, we introduce collinear constrained relationship to fuse RoPE and self-attention, and name it as Collinear Constrained Attention (CoCA). We've analyzed the computational and spatial complexity of CoCA and have determined that it adds only minimal additional overhead compared to the original Transformer-based models. We provide an efficient implementation of CoCA, and make it drop-in replacement for any existing position embedding and attention modules in Transformer based models. Experiments show that CoCA performs extraordinary well on context window extending. For instance, a CoCA based GPT model trained with 512 context length can extend the context window up to 8K without perplexity diverging. This indicates more than 16x context window extending without any fine-tuning. Our code is released here: //github.com/codefuse-ai/Collinear-Constrained-Attention
Complete observation of event histories is often impossible due to sampling effects such as right-censoring and left-truncation, but also due to reporting delays and incomplete event adjudication. This is for example the case during interim stages of clinical trials and for health insurance claims. In this paper, we develop a parametric method that takes the aforementioned effects into account, treating the latter two as partially exogenous. The method, which takes the form of a two-step M-estimation procedure, is applicable to multistate models in general, including competing risks and recurrent event models. The effect of reporting delays is derived via thinning, extending existing results for Poisson models. To address incomplete event adjudication, we propose an imputed likelihood approach which, compared to existing methods, has the advantage of allowing for dependencies between the event history and adjudication processes as well as allowing for unreported events and multiple event types. We establish consistency and asymptotic normality under standard identifiability, integrability, and smoothness conditions, and we demonstrate the validity of the percentile bootstrap. Finally, a simulation study shows favorable finite sample performance of our method compared to other alternatives, while an application to disability insurance data illustrates its practical potential.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.