In epidemiological studies, participants' disease status is often collected through self-reported outcomes in place of formal medical tests due to budget constraints. However, self-reported outcomes are often subject to measurement errors, and may lead to biased estimates if used in statistical analyses. In this paper, we propose statistical methods to correct for outcome measurement errors in survival analyses with multiple failure types through a reweighting strategy. We also discuss asymptotic properties of the proposed estimators and derive their asymptotic variances. The work is motivated by Conservation of Hearing Study (CHEARS) which aims to evaluate risk factors for hearing loss in the Nurses' Health Studies II (NHS II). We apply the proposed method to adjust for the measurement errors in self-reported hearing outcomes; the analysis results suggest that tinnitus is positively associated with moderate hearing loss at both low or mid and high sound frequencies, while the effects between different frequencies are similar.
Digital image correlation (DIC) has become a valuable tool in the evaluation of mechanical experiments, particularly fatigue crack growth experiments. The evaluation requires accurate information of the crack path and crack tip position, which is difficult to obtain due to inherent noise and artefacts. Machine learning models have been extremely successful in recognizing this relevant information. But for the training of robust models, which generalize well, big data is needed. However, data is typically scarce in the field of material science and engineering because experiments are expensive and time-consuming. We present a method to generate synthetic DIC data using generative adversarial networks with a physics-guided discriminator. To decide whether data samples are real or fake, this discriminator additionally receives the derived von Mises equivalent strain. We show that this physics-guided approach leads to improved results in terms of visual quality of samples, sliced Wasserstein distance, and geometry score.
The purpose of the paper is to provide a characterization of the error of the best polynomial approximation of composite functions in weighted spaces. Such a characterization is essential for the convergence analysis of numerical methods applied to non-linear problems or for numerical approaches that make use of regularization techniques to cure low smoothness of the solution. This result is obtained through an estimate of the derivatives of composite functions in weighted uniform norm.
While there is wide agreement that physical activity is an important component of a healthy lifestyle, it is unclear how many people adhere to public health recommendations on physical activity. The Physical Activity Guidelines (PAG), published by the CDC, provide guidelines to American adults, but it is difficult to assess compliance with these guidelines. The PAG further complicate adherence assessment by recommending activity to occur in at least 10 minute bouts. To better understand the measurement capabilities of various instruments to quantify activity, and to propose an approach to evaluate activity relative to the PAG, researchers at Iowa State University administered the Physical Activity Measurement Survey (PAMS) to over 1,000 participants in four different Iowa counties. In this paper, we develop a two-part Bayesian measurement error model and apply it to the PAMS data in order to assess compliance to the PAG in the Iowa adult population. The model accurately accounts for the 10 minute bout requirement put forth in the PAG. The measurement error model corrects biased estimates and accounts for day to day variation in activity. The model is also applied to the nationally representative National Health and Nutrition Examination Survey.
The application of deep learning to non-stationary temporal datasets can lead to overfitted models that underperform under regime changes. In this work, we propose a modular machine learning pipeline for ranking predictions on temporal panel datasets which is robust under regime changes. The modularity of the pipeline allows the use of different models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks, with and without feature engineering. We evaluate our framework on financial data for stock portfolio prediction, and find that GBDT models with dropout display high performance, robustness and generalisability with reduced complexity and computational cost. We then demonstrate how online learning techniques, which require no retraining of models, can be used post-prediction to enhance the results. First, we show that dynamic feature projection improves robustness by reducing drawdown in regime changes. Second, we demonstrate that dynamical model ensembling based on selection of models with good recent performance leads to improved Sharpe and Calmar ratios of out-of-sample predictions. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility.
We present a robust deep incremental learning framework for regression tasks on financial temporal tabular datasets which is built upon the incremental use of commonly available tabular and time series prediction models to adapt to distributional shifts typical of financial datasets. The framework uses a simple basic building block (decision trees) to build self-similar models of any required complexity to deliver robust performance under adverse situations such as regime changes, fat-tailed distributions, and low signal-to-noise ratios. As a detailed study, we demonstrate our scheme using XGBoost models trained on the Numerai dataset and show that a two layer deep ensemble of XGBoost models over different model snapshots delivers high quality predictions under different market regimes. We also show that the performance of XGBoost models with different number of boosting rounds in three scenarios (small, standard and large) is monotonically increasing with respect to model size and converges towards the generalisation upper bound. We also evaluate the robustness of the model under variability of different hyperparameters, such as model complexity and data sampling settings. Our model has low hardware requirements as no specialised neural architectures are used and each base model can be independently trained in parallel.
Computational models of neurodegeneration aim to emulate the evolving pattern of pathology in the brain during neurodegenerative disease, such as Alzheimer's disease. Previous studies have made specific choices on the mechanisms of pathology production and diffusion, or assume that all the subjects lie on the same disease progression trajectory. However, the complexity and heterogeneity of neurodegenerative pathology suggests that multiple mechanisms may contribute synergistically with complex interactions, meanwhile the degree of contribution of each mechanism may vary among individuals. We thus put forward a coupled-mechanisms modelling framework which non-linearly combines the network-topology-informed pathology appearance with the process of pathology spreading within a dynamic modelling system. We account for the heterogeneity of disease by fitting the model at the individual level, allowing the epicenters and rate of progression to vary among subjects. We construct a Bayesian model selection framework to account for feature importance and parameter uncertainty. This provides a combination of mechanisms that best explains the observations for each individual from the ADNI dataset. With the obtained distribution of mechanism importance for each subject, we are able to identify subgroups of patients sharing similar combinations of apparent mechanisms.
Quantile treatment effects (QTEs) can characterize the potentially heterogeneous causal effect of a treatment on different points of the entire outcome distribution. Propensity score (PS) methods are commonly employed for estimating QTEs in non-randomized studies. Empirical and theoretical studies have shown that insufficient and unnecessary adjustment for covariates in PS models can lead to bias and efficiency loss in estimating treatment effects. Striking a balance between bias and efficiency through variable selection is a crucial concern in casual inference. It is essential to acknowledge that the covariates related treatment and outcome may vary across different quantiles of the outcome distribution. However, previous studies have overlooked to adjust for different covariates separately in the PS models when estimating different QTEs. In this article, we proposed the quantile regression outcome-adaptive lasso (QROAL) method to select covariates that can provide unbiased and efficient estimates of QTEs. A distinctive feature of our proposed method is the utilization of linear quantile regression models for constructing penalty weights, enabling covariate selection in PS models separately when estimating different QTEs. We conducted simulation studies to show the superiority of our proposed method over the outcome-adaptive lasso (OAL) method in variable selection. Moreover, the proposed method exhibited favorable performance compared to the OAL method in terms of root mean square error in a range of settings, including both homogeneous and heterogeneous scenarios. Additionally, we applied the QROAL method to datasets from the China Health and Retirement Longitudinal Study (CHARLS) to explore the impact of smoking status on the severity of depression symptoms.
A variant of the standard notion of branching bisimilarity for processes with discrete relative timing is proposed which is coarser than the standard notion. Using a version of ACP (Algebra of Communicating Processes) with abstraction for processes with discrete relative timing, it is shown that the proposed variant allows of both the functional correctness and the performance properties of the PAR (Positive Acknowledgement with Retransmission) protocol to be analyzed. In the version of ACP concerned, the difference between the standard notion of branching bisimilarity and its proposed variant is characterized by a single axiom schema.
Compared to widely used likelihood-based approaches, the minimum contrast (MC) method is a computationally efficient method for estimation and inference of parametric stationary point processes. This advantage becomes more pronounced when analyzing complex point process models, such as multivariate log-Gaussian Cox processes (LGCP). Despite its practical advantages, there is very little work on the MC method for multivariate point processes. The aim of this article is to introduce a new MC method for parametric multivariate stationary spatial point processes. A contrast function is calculated based on the trace of the power of the difference between the conjectured $K$-function matrix and its nonparametric unbiased edge-corrected estimator. Under standard assumptions, the asymptotic normality of the MC estimator of the model parameters is derived. The performance of the proposed method is illustrated with bivariate LGCP simulations and a real data analysis of a bivariate point pattern of the 2014 terrorist attacks in Nigeria.
This article presents the openCFS submodule scattered data reader for coupling multi-physical simulations performed in different simulation programs. For instance, by considering a forward-coupling of a surface vibration simulation (mechanical system) to an acoustic propagation simulation using time-dependent acoustic absorbing material as a noise mitigation measure. The nearest-neighbor search of the target and source points from the interpolation is performed using the FLANN or the CGAL library. In doing so, the coupled field (e.g., surface velocity) is interpolated from a source representation consisting of field values physically stored and organized in a file directory to a target representation being the quadrature points in the case of the finite element method. A test case of the functionality is presented in the "testsuite" module of the openCFS software called "Abc2dcsvt". This scattered data reader module was successfully applied in numerous studies on flow-induced sound generation. Within this short article, the functionality, and usability of this module are described.