Machine learning models trained with structural health monitoring data have become a powerful tool for system identification. This paper presents a physics-informed Gaussian process (GP) model for Timoshenko beam elements. The model is constructed as a multi-output GP with covariance and cross-covariance kernels analytically derived based on the differential equations for deflections, rotations, strains, bending moments, shear forces and applied loads. Stiffness identification is performed in a Bayesian format by maximising a posterior model through a Markov chain Monte Carlo method, yielding a stochastic model for the structural parameters. The optimised GP model is further employed for probabilistic predictions of unobserved responses. Additionally, an entropy-based method for physics-informed sensor placement optimisation is presented, exploiting heterogeneous sensor position information and structural boundary conditions built into the GP model. Results demonstrate that the proposed approach is effective at identifying structural parameters and is capable of fusing data from heterogeneous and multi-fidelity sensors. Probabilistic predictions of structural responses and internal forces are in closer agreement with measured data. We validate our model with an experimental setup and discuss the quality and uncertainty of the obtained results. The proposed approach has potential applications in the field of structural health monitoring (SHM) for both mechanical and structural systems.
While model architectures and training strategies have become more generic and flexible with respect to different data modalities over the past years, a persistent limitation lies in the assumption of fixed quantities and arrangements of input features. This limitation becomes particularly relevant in scenarios where the attributes captured during data acquisition vary across different samples. In this work, we aim at effectively leveraging data with varying features, without the need to constrain the input space to the intersection of potential feature sets or to expand it to their union. We propose a novel architecture that can directly process data without the necessity of aligned feature modalities by learning a general embedding space that captures the relationship between features across data samples with varying sets of features. This is achieved via a set-transformer architecture augmented by feature-encoder layers, thereby enabling the learning of a shared latent feature space from data originating from heterogeneous feature spaces. The advantages of the model are demonstrated for automatic cancer cell detection in acute myeloid leukemia in flow cytometry data, where the features measured during acquisition often vary between samples. Our proposed architecture's capacity to operate seamlessly across incongruent feature spaces is particularly relevant in this context, where data scarcity arises from the low prevalence of the disease. The code is available for research purposes at //github.com/lisaweijler/FATE.
Estimating a prediction function is a fundamental component of many data analyses. The Super Learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms, including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a Super Learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screening algorithms should be used to protect against poor performance of any one screen, similar to the guidance for choosing a library of prediction algorithms for the Super Learner.
Imaging through perturbed multimode fibres based on deep learning has been widely researched. However, existing methods mainly use target-speckle pairs in different configurations. It is challenging to reconstruct targets without trained networks. In this paper, we propose a physics-assisted, unsupervised, learning-based fibre imaging scheme. The role of the physical prior is to simplify the mapping relationship between the speckle pattern and the target image, thereby reducing the computational complexity. The unsupervised network learns target features according to the optimized direction provided by the physical prior. Therefore, the reconstruction process of the online learning only requires a few speckle patterns and unpaired targets. The proposed scheme also increases the generalization ability of the learning-based method in perturbed multimode fibres. Our scheme has the potential to extend the application of multimode fibre imaging.
We collect robust proposals given in the field of regression models with heteroscedastic errors. Our motivation stems from the fact that the practitioner frequently faces the confluence of two phenomena in the context of data analysis: non--linearity and heteroscedasticity. The impact of heteroscedasticity on the precision of the estimators is well--known, however the conjunction of these two phenomena makes handling outliers more difficult. An iterative procedure to estimate the parameters of a heteroscedastic non--linear model is considered. The studied estimators combine weighted $MM-$regression estimators, to control the impact of high leverage points, and a robust method to estimate the parameters of the variance function.
Modern deep learning models with great expressive power can be trained to overfit the training data but still generalize well. This phenomenon is referred to as \textit{benign overfitting}. Recently, a few studies have attempted to theoretically understand benign overfitting in neural networks. However, these works are either limited to neural networks with smooth activation functions or to the neural tangent kernel regime. How and when benign overfitting can occur in ReLU neural networks remains an open problem. In this work, we seek to answer this question by establishing algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk. Our result also reveals a sharp transition between benign and harmful overfitting under different conditions on data distribution in terms of test risk. Experiments on synthetic data back up our theory.
Purpose: To provide a simulation framework for routine neuroimaging test data, which allows for "stress testing" of deep segmentation networks against acquisition shifts that commonly occur in clinical practice for T2 weighted (T2w) fluid attenuated inversion recovery (FLAIR) Magnetic Resonance Imaging (MRI) protocols. Approach: The approach simulates "acquisition shift derivatives" of MR images based on MR signal equations. Experiments comprise the validation of the simulated images by real MR scans and example stress tests on state-of-the-art MS lesion segmentation networks to explore a generic model function to describe the F1 score in dependence of the contrast-affecting sequence parameters echo time (TE) and inversion time (TI). Results: The differences between real and simulated images range up to 19 % in gray and white matter for extreme parameter settings. For the segmentation networks under test the F1 score dependency on TE and TI can be well described by quadratic model functions (R^2 > 0.9). The coefficients of the model functions indicate that changes of TE have more influence on the model performance than TI. Conclusions: We show that these deviations are in the range of values as may be caused by erroneous or individual differences of relaxation times as described by literature. The coefficients of the F1 model function allow for quantitative comparison of the influences of TE and TI. Limitations arise mainly from tissues with the low baseline signal (like CSF) and when the protocol contains contrast-affecting measures that cannot be modelled due to missing information in the DICOM header.
Transfer learning and ensembling are two popular techniques for improving the performance and robustness of neural networks. Due to the high cost of pre-training, ensembles of models fine-tuned from a single pre-trained checkpoint are often used in practice. Such models end up in the same basin of the loss landscape, which we call the pre-train basin, and thus have limited diversity. In this work, we show that ensembles trained from a single pre-trained checkpoint may be improved by better exploring the pre-train basin, however, leaving the basin results in losing the benefits of transfer learning and in degradation of the ensemble quality. Based on the analysis of existing exploration methods, we propose a more effective modification of the Snapshot Ensembles (SSE) for transfer learning setup, StarSSE, which results in stronger ensembles and uniform model soups.
We propose a learning-based compression scheme that envelopes a standard codec between pre and post-processing deep CNNs. Specifically, we demonstrate improvements over prior approaches utilizing a compression-decompression network by introducing: (a) an edge-aware loss function to prevent blurring that is commonly occurred in prior works & (b) a super-resolution convolutional neural network (CNN) for post-processing along with a corresponding pre-processing network for improved rate-distortion performance in the low rate regime. The algorithm is assessed on a variety of datasets varying from low to high resolution namely Set 5, Set 7, Classic 5, Set 14, Live 1, Kodak, General 100, CLIC 2019. When compared to JPEG, JPEG2000, BPG, and recent CNN approach, the proposed algorithm contributes significant improvement in PSNR with an approximate gain of 20.75%, 8.47%, 3.22%, 3.23% and 24.59%, 14.46%, 10.14%, 8.57% at low and high bit-rates respectively. Similarly, this improvement in MS-SSIM is approximately 71.43%, 50%, 36.36%, 23.08%, 64.70% and 64.47%, 61.29%, 47.06%, 51.52%, 16.28% at low and high bit-rates respectively. With CLIC 2019 dataset, PSNR is found to be superior with approximately 16.67%, 10.53%, 6.78%, and 24.62%, 17.39%, 14.08% at low and high bit-rates respectively, over JPEG2000, BPG, and recent CNN approach. Similarly, the MS-SSIM is found to be superior with approximately 72%, 45.45%, 39.13%, 18.52%, and 71.43%, 50%, 41.18%, 17.07% at low and high bit-rates respectively, compared to the same approaches. A similar type of improvement is achieved with other datasets also.
The study of time-inhomogeneous Markov jump processes is a traditional topic within probability theory that has recently attracted substantial attention in various applications. However, their flexibility also incurs a substantial mathematical burden which is usually circumvented by using well-known generic distributional approximations or simulations. This article provides a novel approximation method that tailors the dynamics of a time-homogeneous Markov jump process to meet those of its time-inhomogeneous counterpart on an increasingly fine Poisson grid. Strong convergence of the processes in terms of the Skorokhod $J_1$ metric is established, and convergence rates are provided. Under traditional regularity assumptions, distributional convergence is established for unconditional proxies, to the same limit. Special attention is devoted to the case where the target process has one absorbing state and the remaining ones transient, for which the absorption times also converge. Some applications are outlined, such as univariate hazard-rate density estimation, ruin probabilities, and multivariate phase-type density evaluation.
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.