There are increasing efforts to automate clinical methods for early diagnosis of developmental disorders, among them the General Movement Assessment (GMA), a video-based tool to classify infant motor functioning. Optimal pose estimation is a crucial part of the automated GMA. In this study we compare the performance of available generic- and infant-pose estimators, and the choice of viewing angle for optimal recordings, i.e., conventional diagonal view used in GMA vs. top-down view. For this study, we used 4500 annotated video-frames from 75 recordings of infant spontaneous motor functions from 4 to 26 weeks. To determine which available pose estimation method and camera angle yield the best pose estimation accuracy on infants in a GMA related setting, the distance to human annotations as well as the percentage of correct key-points (PCK) were computed and compared. The results show that the best performing generic model trained on adults, ViTPose, also performs best on infants. We see no improvement from using specialized infant-pose estimators over the generic pose estimators on our own infant dataset. However, when retraining a generic model on our data, there is a significant improvement in pose estimation accuracy. The pose estimation accuracy obtained from the top-down view is significantly better than that obtained from the diagonal view, especially for the detection of the hip key-points. The results also indicate only limited generalization capabilities of infant-pose estimators to other infant datasets, which hints that one should be careful when choosing infant pose estimators and using them on infant datasets which they were not trained on. While the standard GMA method uses a diagonal view for assessment, pose estimation accuracy significantly improves using a top-down view. This suggests that a top-down view should be included in recording setups for automated GMA research.
Text-to-image generation and text-guided image manipulation have received considerable attention in the field of image generation tasks. However, the mainstream evaluation methods for these tasks have difficulty in evaluating whether all the information from the input text is accurately reflected in the generated images, and they mainly focus on evaluating the overall alignment between the input text and the generated images. This paper proposes new evaluation metrics that assess the alignment between input text and generated images for every individual object. Firstly, according to the input text, chatGPT is utilized to produce questions for the generated images. After that, we use Visual Question Answering(VQA) to measure the relevance of the generated images to the input text, which allows for a more detailed evaluation of the alignment compared to existing methods. In addition, we use Non-Reference Image Quality Assessment(NR-IQA) to evaluate not only the text-image alignment but also the quality of the generated images. Experimental results show that our proposed evaluation approach is the superior metric that can simultaneously assess finer text-image alignment and image quality while allowing for the adjustment of these ratios.
Various technologies, including computer vision models, are employed for the automatic monitoring of manual assembly processes in production. These models detect and classify events such as the presence of components in an assembly area or the connection of components. A major challenge with detection and classification algorithms is their susceptibility to variations in environmental conditions and unpredictable behavior when processing objects that are not included in the training dataset. As it is impractical to add all possible subjects in the training sample, an alternative solution is necessary. This study proposes a model that simultaneously performs classification and anomaly detection, employing metric learning to generate vector representations of images in a multidimensional space, followed by classification using cross-entropy. For experimentation, a dataset of over 327,000 images was prepared. Experiments were conducted with various computer vision model architectures, and the outcomes of each approach were compared.
The direct parametrisation method for invariant manifolds is adjusted to consider a varying parameter. More specifically, the case of systems experiencing a Hopf bifurcation in the parameter range of interest are investigated, and the ability to predict the amplitudes of the limit cycle oscillations after the bifurcation is demonstrated. The cases of the Ziegler pendulum and Beck's column, both of which have a follower force, are considered for applications. By comparison with the eigenvalue trajectories in the conservative case, it is advocated that using two master modes to derive the ROM, instead of only considering the unstable one, should give more accurate results. Also, in the specific case where an exceptional bifurcation point is met, a numerical strategy enforcing the presence of Jordan blocks in the Jacobian matrix during the procedure, is devised. The ROMs are constructed for the Ziegler pendulum having two and three degrees of freedom, and then Beck's column is investigated, where a finite element procedure is used to space discretize the problem. The numerical results show the ability of the ROMs to correctly predict the amplitude of the limit cycles up to a certain range, and it is shown that computing the ROM after the Hopf bifurcation gives the most satisfactory results. This feature is analyzed in terms of phase space representations, and the two proposed adjustments are shown to improve the validity range of the ROMs.
Despite significant advancements, segmentation based on deep neural networks in medical and surgical imaging faces several challenges, two of which we aim to address in this work. First, acquiring complete pixel-level segmentation labels for medical images is time-consuming and requires domain expertise. Second, typical segmentation pipelines cannot detect out-of-distribution (OOD) pixels, leaving them prone to spurious outputs during deployment. In this work, we propose a novel segmentation approach exploiting OOD detection that learns only from sparsely annotated pixels from multiple positive-only classes. %but \emph{no background class} annotation. These multi-class positive annotations naturally fall within the in-distribution (ID) set. Unlabelled pixels may contain positive classes but also negative ones, including what is typically referred to as \emph{background} in standard segmentation formulations. Here, we forgo the need for background annotation and consider these together with any other unseen classes as part of the OOD set. Our framework can integrate, at a pixel-level, any OOD detection approaches designed for classification tasks. To address the lack of existing OOD datasets and established evaluation metric for medical image segmentation, we propose a cross-validation strategy that treats held-out labelled classes as OOD. Extensive experiments on both multi-class hyperspectral and RGB surgical imaging datasets demonstrate the robustness and generalisation capability of our proposed framework.
Splitting methods are widely used for solving initial value problems (IVPs) due to their ability to simplify complicated evolutions into more manageable subproblems which can be solved efficiently and accurately. Traditionally, these methods are derived using analytic and algebraic techniques from numerical analysis, including truncated Taylor series and their Lie algebraic analogue, the Baker--Campbell--Hausdorff formula. These tools enable the development of high-order numerical methods that provide exceptional accuracy for small timesteps. Moreover, these methods often (nearly) conserve important physical invariants, such as mass, unitarity, and energy. However, in many practical applications the computational resources are limited. Thus, it is crucial to identify methods that achieve the best accuracy within a fixed computational budget, which might require taking relatively large timesteps. In this regime, high-order methods derived with traditional methods often exhibit large errors since they are only designed to be asymptotically optimal. Machine Learning techniques offer a potential solution since they can be trained to efficiently solve a given IVP with less computational resources. However, they are often purely data-driven, come with limited convergence guarantees in the small-timestep regime and do not necessarily conserve physical invariants. In this work, we propose a framework for finding machine learned splitting methods that are computationally efficient for large timesteps and have provable convergence and conservation guarantees in the small-timestep limit. We demonstrate numerically that the learned methods, which by construction converge quadratically in the timestep size, can be significantly more efficient than established methods for the Schr\"{o}dinger equation if the computational budget is limited.
An aspect of interest in surveillance of diseases is whether the survival time distribution changes over time. By following data in health registries over time, this can be monitored, either in real time or retrospectively. With relevant risk factors registered, these can be taken into account in the monitoring as well. A challenge in monitoring survival times based on registry data is that data on cause of death might either be missing or uncertain. To quantify the burden of disease in such cases, excess hazard methods can be used, where the total hazard is modelled as the population hazard plus the excess hazard due to the disease. We propose a CUSUM procedure for monitoring for changes in the survival time distribution in cases where use of excess hazard models is relevant. The procedure is based on a survival log-likelihood ratio and extends previously suggested methods for monitoring of time to event to the excess hazard setting. The procedure takes into account changes in the population risk over time, as well as changes in the excess hazard which is explained by observed covariates. Properties, challenges and an application to cancer registry data will be presented.
Lattice Boltzmann models are briefly introduced together with references to methods used to predict their ability for simulations of systems described by partial differential equations that are first order in time and low order in space derivatives. Several previous works have been devoted to analyzing the accuracy of these models with special emphasis on deviations from pure Newtonian viscous behaviour, related to higher order space derivatives of even order. The presentcontribution concentrates on possible inaccuracies of the advection behaviour linked to space derivatives of odd order. Detailed properties of advection-diffusion and athermal fluids are presented for two-dimensional situations allowing to propose situations that are accurate to third order in space derivatives. Simulations of the advection of a gaussian dot or vortex are presented. Similar results are discussed in appendices for three-dimensional advection-diffusion.
DNA technologies have evolved significantly in the past years enabling the sequencing of a large number of genomes in a short time. Nevertheless, the underlying computational problem is hard, and many technical factors and limitations complicate obtaining the complete sequence of a genome. Many genomes are left in a draft state, in which each chromosome is represented by a set of sequences with partial information on their relative order. Recently, some approaches have been proposed to compare draft genomes by comparing paths in de Bruijn graphs, which are constructed by many practical genome assemblers. In this article we introduce gcBB, a method for comparing genomes represented as succinct colored de Bruijn graphs directly, without resorting to sequence alignments, by means of the entropy and expectation measures based on the Burrows-Wheeler Similarity Distribution. We also introduce an improved version of gcBB, called mgcBB, that improves the time performance considerably through the selection of different data structures. We have compared phylogenies of genomes obtained by other methods to those obtained with gcBB, achieving promising results.
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
Radiologist is "doctor's doctor", biomedical image segmentation plays a central role in quantitative analysis, clinical diagnosis, and medical intervention. In the light of the fully convolutional networks (FCN) and U-Net, deep convolutional networks (DNNs) have made significant contributions in biomedical image segmentation applications. In this paper, based on U-Net, we propose MDUnet, a multi-scale densely connected U-net for biomedical image segmentation. we propose three different multi-scale dense connections for U shaped architectures encoder, decoder and across them. The highlights of our architecture is directly fuses the neighboring different scale feature maps from both higher layers and lower layers to strengthen feature propagation in current layer. Which can largely improves the information flow encoder, decoder and across them. Multi-scale dense connections, which means containing shorter connections between layers close to the input and output, also makes much deeper U-net possible. We adopt the optimal model based on the experiment and propose a novel Multi-scale Dense U-Net (MDU-Net) architecture with quantization. Which reduce overfitting in MDU-Net for better accuracy. We evaluate our purpose model on the MICCAI 2015 Gland Segmentation dataset (GlaS). The three multi-scale dense connections improve U-net performance by up to 1.8% on test A and 3.5% on test B in the MICCAI Gland dataset. Meanwhile the MDU-net with quantization achieves the superiority over U-Net performance by up to 3% on test A and 4.1% on test B.