Cough is a primary symptom of most respiratory diseases, and changes in cough characteristics provide valuable information for diagnosing respiratory diseases. The characterization of cough sounds still lacks concrete evidence, which makes it difficult to accurately distinguish between different types of coughs and other sounds. The objective of this research work is to characterize cough sounds with voiced content and cough sounds without voiced content. Further, the cough sound characteristics are compared with the characteristics of speech. The proposed method to achieve this goal utilized spectral roll-off, spectral entropy, spectral flatness, spectral flux, zero crossing rate, spectral centroid, and spectral bandwidth attributes which describe the cough sounds related to the respiratory system, glottal information, and voice model. These attributes are then subjected to statistical analysis using the measures of minimum, maximum, mean, median, and standard deviation. The experimental results show that the mean and frequency distribution of spectral roll-off, spectral centroid, and spectral bandwidth are found to be higher for cough sounds than for speech signals. Spectral flatness levels in cough sounds will rise to 0.22, whereas spectral flux varies between 0.3 and 0.6. The Zero Crossing Rate (ZCR) of most frames of cough sounds is between 0.05 and 0.4. These attributes contribute significant information while characterizing cough sounds.
In survival analysis, longitudinal information on the health status of a patient can be used to dynamically update the predicted probability that a patient will experience an event of interest. Traditional approaches to dynamic prediction such as joint models become computationally unfeasible with more than a handful of longitudinal covariates, warranting the development of methods that can handle a larger number of longitudinal covariates. We introduce the R package pencal, which implements a Penalized Regression Calibration approach that makes it possible to handle many longitudinal covariates as predictors of survival. pencal uses mixed-effects models to summarize the trajectories of the longitudinal covariates up to a prespecified landmark time, and a penalized Cox model to predict survival based on both baseline covariates and summary measures of the longitudinal covariates. This article illustrates the structure of the R package, provides a step by step example showing how to estimate PRC, compute dynamic predictions of survival and validate performance, and shows how parallelization can be used to significantly reduce computing time.
We consider the flow of a fluid whose response characteristics change due the value of the norm of the symmetric part of the velocity gradient, behaving as an Euler fluid below a critical value and as a Navier-Stokes fluid at and above the critical value, the norm being determined by the external stimuli. We show that such a fluid, while flowing past a bluff body, develops boundary layers which are practically identical to those that one encounters within the context of the classical boundary layer theory propounded by Prandtl. Unlike the classical boundary layer theory that arises as an approximation within the context of the Navier-Stokes theory, here the development of boundary layers is due to a change in the response characteristics of the constitutive relation. We study the flow of such a fluid past an airfoil and compare the same against the solution of the Navier-Stokes equations. We find that the results are in excellent agreement with regard to the velocity and vorticity fields for the two cases.
Model averaging (MA), a technique for combining estimators from a set of candidate models, has attracted increasing attention in machine learning and statistics. In the existing literature, there is an implicit understanding that MA can be viewed as a form of shrinkage estimation that draws the response vector towards the subspaces spanned by the candidate models. This paper explores this perspective by establishing connections between MA and shrinkage in a linear regression setting with multiple nested models. We first demonstrate that the optimal MA estimator is the best linear estimator with monotone non-increasing weights in a Gaussian sequence model. The Mallows MA, which estimates weights by minimizing the Mallows' $C_p$, is a variation of the positive-part Stein estimator. Motivated by these connections, we develop a novel MA procedure based on a blockwise Stein estimation. Our resulting Stein-type MA estimator is asymptotically optimal across a broad parameter space when the variance is known. Numerical results support our theoretical findings. The connections established in this paper may open up new avenues for investigating MA from different perspectives. A discussion on some topics for future research concludes the paper.
We consider the problem of estimating the marginal independence structure of a Bayesian network from observational data in the form of an undirected graph called the unconditional dependence graph. We show that unconditional dependence graphs of Bayesian networks correspond to the graphs having equal independence and intersection numbers. Using this observation, a Gr\"obner basis for a toric ideal associated to unconditional dependence graphs of Bayesian networks is given and then extended by additional binomial relations to connect the space of all such graphs. An MCMC method, called GrUES (Gr\"obner-based Unconditional Equivalence Search), is implemented based on the resulting moves and applied to synthetic Gaussian data. GrUES recovers the true marginal independence structure via a penalized maximum likelihood or MAP estimate at a higher rate than simple independence tests while also yielding an estimate of the posterior, for which the $20\%$ HPD credible sets include the true structure at a high rate for data-generating graphs with density at least $0.5$.
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.
This paper investigates the problem of estimating the larger location parameter of two general location families from a decision-theoretic perspective. In this estimation problem, we use the criteria of minimizing the risk function and the Pitman closeness under a general bowl-shaped loss function. Inadmissibility of a general location and equivariant estimators is provided. We prove that a natural estimator (analogue of the BLEE of unordered location parameters) is inadmissible, under certain conditions on underlying densities, and propose a dominating estimator. We also derive a class of improved estimators using the Kubokawa's IERD approach and observe that the boundary estimator of this class is the Brewster-Zidek type estimator. Additionally, under the generalized Pitman criterion, we show that the natural estimator is inadmissible and obtain improved estimators. The results are implemented for different loss functions, and explicit expressions for the dominating estimators are provided. We explore the applications of these results to for exponential and normal distribution under specified loss functions. A simulation is also conducted to compare the risk performance of the proposed estimators. Finally, we present a real-life data analysis to illustrate the practical applications of the paper's findings.
Cancer is a significant health issue globally and it is well known that cancer risk varies geographically. However in many countries there are no small area-level data on cancer risk factors with high resolution and complete reach, which hinders the development of targeted prevention strategies. Using Australia as a case study, the 2017-2018 National Health Survey was used to generate prevalence estimates for 2221 small areas across Australia for eight cancer risk factor measures covering smoking, alcohol, physical activity, diet and weight. Utilising a recently developed Bayesian two-stage small area estimation methodology, the model incorporated survey-only covariates, spatial smoothing and hierarchical modelling techniques, along with a vast array of small area-level auxiliary data, including census, remoteness, and socioeconomic data. The models borrowed strength from previously published cancer risk estimates provided by the Social Health Atlases of Australia. Estimates were internally and externally validated. We illustrated that in 2017-18 health behaviours across Australia exhibited more spatial disparities than previously realised by improving the reach and resolution of formerly published cancer risk factors. The derived estimates reveal higher prevalence of unhealthy behaviours in more remote areas, and areas of lower socioeconomic status; a trend that aligns well with previous work. Our study addresses the gaps in small area level cancer risk factor estimates in Australia. The new estimates provide improved spatial resolution and reach and will enable more targeted cancer prevention strategies at the small area level, supporting policy makers, researchers, and the general public in understanding the spatial distribution of cancer risk factors in Australia. To help disseminate the results of this work, they will be made available in the Australian Cancer Atlas 2.0.
Out of the participants in a randomized experiment with anticipated heterogeneous treatment effects, is it possible to identify which subjects have a positive treatment effect? While subgroup analysis has received attention, claims about individual participants are much more challenging. We frame the problem in terms of multiple hypothesis testing: each individual has a null hypothesis (stating that the potential outcomes are equal, for example) and we aim to identify those for whom the null is false (the treatment potential outcome stochastically dominates the control one, for example). We develop a novel algorithm that identifies such a subset, with nonasymptotic control of the false discovery rate (FDR). Our algorithm allows for interaction -- a human data scientist (or a computer program) may adaptively guide the algorithm in a data-dependent manner to gain power. We show how to extend the methods to observational settings and achieve a type of doubly-robust FDR control. We also propose several extensions: (a) relaxing the null to nonpositive effects, (b) moving from unpaired to paired samples, and (c) subgroup identification. We demonstrate via numerical experiments and theoretical analysis that the proposed method has valid FDR control in finite samples and reasonably high identification power.
Flooding is one of the most disruptive and costliest climate-related disasters and presents an escalating threat to population health due to climate change and urbanization patterns. Previous studies have investigated the consequences of flood exposures on only a handful of health outcomes and focus on a single flood event or affected region. To address this gap, we conducted a nationwide, multi-decade analysis of the impacts of severe floods on a wide range of health outcomes in the United States by linking a novel satellite-based high-resolution flood exposure database with Medicare cause-specific hospitalization records over the period 2000- 2016. Using a self-matched study design with a distributed lag model, we examined how cause-specific hospitalization rates deviate from expected rates during and up to four weeks after severe flood exposure. Our results revealed that risk of hospitalization was consistently elevated during and for at least four weeks following severe flood exposure for nervous system diseases (3.5 %; 95 % confidence interval [CI]: 0.6 %, 6.4 %), skin and subcutaneous tissue diseases (3.4 %; 95 % CI: 0.3 %, 6.7 %), and injury and poisoning (1.5 %; 95 % CI: -0.07 %, 3.2 %). Increases in hospitalization rate for these causes, musculoskeletal system diseases, and mental health-related impacts varied based on proportion of Black residents in each ZIP Code. Our findings demonstrate the need for targeted preparedness strategies for hospital personnel before, during, and after severe flooding.
Health-related quality of life (Hr-QoL) scales provide crucial information on neurodegenerative disease progression, help improving patient care, and constitute a meaningful endpoint for therapeutic research. However, Hr-QoL progression is usually poorly documented, as for multiple system atrophy (MSA), a rare and rapidly progressing alpha-synucleinopathy. This work aimed to describe Hr-QoL progression during the natural course of MSA, explore disparities between patients, and identify informative items using a four-step statistical strategy.We leveraged the data of the French MSA cohort comprising annual assessments with the MSA-QoL questionnaire for more than 500 patients over up to 11 years. The four-step strategy (1) determined the subdimensions of Hr-QoL in MSA; (2) modelled the subdimension trajectories over time, accounting for the risk of death; (3) mapped the sequence of item impairments with disease stages; and (4) identified the most informative items specific to each disease stage.Among the 536 patients included, 50% were women and they were aged on average 65.1 years old at entry. Among them, 63.1% died during the follow-up. Four dimensions were identified. In addition to the original motor, nonmotor, and emotional domains, an oropharyngeal component was highlighted. While the motor and oropharyngeal domains deteriorated rapidly, the nonmotor and emotional aspects were already slightly to moderately impaired at cohort entry and deteriorated slowly over the course of the disease. Impairments were associated with sex, diagnosis subtype, and delay since symptom onset. Except for the emotional domain, each dimension was driven by key identified items.Hr-QoL is a multidimensional concept that deteriorates progressively over the course of MSA and brings essential knowledge for improving patient care. As exemplified with MSA, the thorough description of Hr-QoL using the 4-step original analysis can provide new perspectives on neurodegenerative diseases' management to ultimately deliver better support focused on the patient's perspective.