Adverse posttraumatic neuropsychiatric sequelae (APNS) are common among veterans and millions of Americans after traumatic exposures, resulting in substantial burdens for trauma survivors and society. Despite numerous studies conducted on APNS over the past decades, there has been limited progress in understanding the underlying neurobiological mechanisms due to several unique challenges. One of these challenges is the reliance on subjective self-report measures to assess APNS, which can easily result in measurement errors and biases (e.g., recall bias). To mitigate this issue, in this paper, we investigate the potential of leveraging the objective longitudinal mobile device data to identify homogeneous APNS states and study the dynamic transitions and potential risk factors of APNS after trauma exposure. To handle specific challenges posed by longitudinal mobile device data, we developed exploratory hidden Markov factor models and designed a Stabilized Expectation-Maximization algorithm for parameter estimation. Simulation studies were conducted to evaluate the performance of parameter estimation and model selection. Finally, to demonstrate the practical utility of the method, we applied it to mobile device data collected from the Advancing Understanding of RecOvery afteR traumA (AURORA) study.
Understanding how and why certain communities bear a disproportionate burden of disease is challenging due to the scarcity of data on these communities. Surveys provide a useful avenue for accessing hard-to-reach populations, as many surveys specifically oversample understudied and vulnerable populations. When survey data is used for analysis, it is important to account for the complex survey design that gave rise to the data, in order to avoid biased conclusions. The field of Bayesian survey statistics aims to account for such survey design while leveraging the advantages of Bayesian models, which can flexibly handle sparsity through borrowing of information and provide a coherent inferential framework to easily obtain variances for complex models and data types. For these reasons, Bayesian survey methods seem uniquely well-poised for health disparities research, where heterogeneity and sparsity are frequent considerations. This review discusses three main approaches found in the Bayesian survey methodology literature: 1) multilevel regression and post-stratification, 2) weighted pseudolikelihood-based methods, and 3) synthetic population generation. We discuss advantages and disadvantages of each approach, examine recent applications and extensions, and consider how these approaches may be leveraged to improve research in population health equity.
According to ICH Q8 guidelines, the biopharmaceutical manufacturer submits a design space (DS) definition as part of the regulatory approval application, in which case process parameter (PP) deviations within this space are not considered a change and do not trigger a regulatory post approval procedure. A DS can be described by non-linear PP ranges, i.e., the range of one PP conditioned on specific values of another. However, independent PP ranges (linear combinations) are often preferred in biopharmaceutical manufacturing due to their operation simplicity. While some statistical software supports the calculation of a DS comprised of linear combinations, such methods are generally based on discretizing the parameter space - an approach that scales poorly as the number of PPs increases. Here, we introduce a novel method for finding linear PP combinations using a numeric optimizer to calculate the largest design space within the parameter space that results in critical quality attribute (CQA) boundaries within acceptance criteria, predicted by a regression model. A precomputed approximation of tolerance intervals is used in inequality constraints to facilitate fast evaluations of this boundary using a single matrix multiplication. Correctness of the method was validated against different ground truths with known design spaces. Compared to stateof-the-art, grid-based approaches, the optimizer-based procedure is more accurate, generally yields a larger DS and enables the calculation in higher dimensions. Furthermore, a proposed weighting scheme can be used to favor certain PPs over others and therefore enabling a more dynamic approach to DS definition and exploration. The increased PP ranges of the larger DS provide greater operational flexibility for biopharmaceutical manufacturers.
BACKGROUND: Software Process Improvement (SPI) is a systematic approach to increase the efficiency and effectiveness of a software development organization and to enhance software products. OBJECTIVE: This paper aims to identify and characterize evaluation strategies and measurements used to assess the impact of different SPI initiatives. METHOD: The systematic literature review includes 148 papers published between 1991 and 2008. The selected papers were classified according to SPI initiative, applied evaluation strategies, and measurement perspectives. Potential confounding factors interfering with the evaluation of the improvement effort were assessed. RESULTS: Seven distinct evaluation strategies were identified, wherein the most common one, "Pre-Post Comparison" was applied in 49 percent of the inspected papers. Quality was the most measured attribute (62 percent), followed by Cost (41 percent), and Schedule (18 percent). Looking at measurement perspectives, "Project" represents the majority with 66 percent. CONCLUSION: The evaluation validity of SPI initiatives is challenged by the scarce consideration of potential confounding factors, particularly given that "Pre-Post Comparison" was identified as the most common evaluation strategy, and the inaccurate descriptions of the evaluation context. Measurements to assess the short and mid-term impact of SPI initiatives prevail, whereas long-term measurements in terms of customer satisfaction and return on investment tend to be less used.
The aim of this paper is to develop estimation and inference methods for the drift parameters of multivariate L\'evy-driven continuous-time autoregressive processes of order $p\in\mathbb{N}$. Starting from a continuous-time observation of the process, we develop consistent and asymptotically normal maximum likelihood estimators. We then relax the unrealistic assumption of continuous-time observation by considering natural discretizations based on a combination of Riemann-sum, finite difference, and thresholding approximations. The resulting estimators are also proven to be consistent and asymptotically normal under a general set of conditions, allowing for both finite and infinite jump activity in the driving L\'evy process. When discretizing the estimators, allowing for irregularly spaced observations is of great practical importance. In this respect, CAR($p$) models are not just relevant for "true" continuous-time processes: a CAR($p$) specification provides a natural continuous-time interpolation for modeling irregularly spaced data - even if the observed process is inherently discrete. As a practically relevant application, we consider the setting where the multivariate observation is known to possess a graphical structure. We refer to such a process as GrCAR and discuss the corresponding drift estimators and their properties. The finite sample behavior of all theoretical asymptotic results is empirically assessed by extensive simulation experiments.
Antimicrobial resistance is becoming a major threat to public health throughout the world. Researchers are attempting to contrast it by developing both new antibiotics and patient-specific treatments. In the second case, whole-genome sequencing has had a huge impact in two ways: first, it is becoming cheaper and faster to perform whole-genome sequencing, and this makes it competitive with respect to standard phenotypic tests; second, it is possible to statistically associate the phenotypic patterns of resistance to specific mutations in the genome. Therefore, it is now possible to develop catalogues of genomic variants associated with resistance to specific antibiotics, in order to improve prediction of resistance and suggest treatments. It is essential to have robust methods for identifying mutations associated to resistance and continuously updating the available catalogues. This work proposes a general method to study minimal inhibitory concentration (MIC) distributions and to identify clusters of strains showing different levels of resistance to antimicrobials. Once the clusters are identified and strains allocated to each of them, it is possible to perform regression method to identify with high statistical power the mutations associated with resistance. The method is applied to a new 96-well microtiter plate used for testing M. Tuberculosis.
In the literature, the reliability analysis of one-shot devices is found under accelerated life testing in the presence of various stress factors. The application of one-shot devices can be extended to the bio-medical field, where we often evidence that inflicted with a certain disease, survival time would be under different stress factors like environmental stress, co-morbidity, the severity of disease etc. This work is concerned with a one-shot device data analysis and applies it to SEER Gallbladder cancer data. The two-parameter logistic exponential distribution is applied as a lifetime distribution. For robust parameter estimation, weighted minimum density power divergence estimators (WMDPDE) is obtained along with the conventional maximum likelihood estimators (MLE). The asymptotic behaviour of the WMDPDE and the robust test statistic based on the density power divergence measure are also studied. The performances of estimators are evaluated through extensive simulation experiments. Later those developments are applied to SEER Gallbladder cancer data. Citing the importance of knowing exactly when to inspect the one-shot devices put to the test, a search for optimum inspection times is performed. This optimization is designed to minimize a defined cost function which strikes a trade-off between the precision of the estimation and experimental cost. The search is accomplished through the population-based heuristic optimization method Genetic Algorithm.
The crossed random-effects model is widely used in applied statistics, finding applications in various fields such as longitudinal studies, e-commerce, and recommender systems, among others. However, these models encounter scalability challenges, as the computational time grows disproportionately with the number of data points, typically following a cubic root relationship (N^(3/2) or worse) with N. Our inspiration for addressing this issue comes from observing the recommender system employed by an online clothing retailer. Our dataset comprises over 700,000 clients, 5,000 items, and 5,000,000 measurements. When applying the maximum likelihood approach to fit crossed random effects, computational inefficiency becomes a significant concern, limiting the applicability of this approach in large-scale settings. To tackle the scalability issues, previous research by Ghosh et al. (2022a) and Ghosh et al. (2022b) has explored linear and logistic regression models utilizing fixed-effect features based on client and item variables, while incorporating random intercept terms for clients and items. In this study, we present a more generalized version of the problem, allowing random effect sizes/slopes. This extension enables us to capture the variability in effect size among both clients and items. Importantly, we have developed a scalable solution to address the aforementioned problem and have empirically demonstrated the consistency of our estimates. Specifically, as the number of data points increases, our estimates converge towards the true parameters. To validate our approach, we implement the proposed algorithm using Stitch Fix data.
Mobile Artificial Intelligence-Generated Content (AIGC) technology refers to the adoption of AI algorithms deployed at mobile edge networks to automate the information creation process while fulfilling the requirements of end users. Mobile AIGC has recently attracted phenomenal attentions and can be a key enabling technology for an emerging application, called human digital twin (HDT). HDT empowered by the mobile AIGC is expected to revolutionize the personalized healthcare by generating rare disease data, modeling high-fidelity digital twin, building versatile testbeds, and providing 24/7 customized medical services. To promote the development of this new breed of paradigm, in this article, we propose a system architecture of mobile AIGC-driven HDT and highlight the corresponding design requirements and challenges. Moreover, we illustrate two use cases, i.e., mobile AIGC-driven HDT in customized surgery planning and personalized medication. In addition, we conduct an experimental study to prove the effectiveness of the proposed mobile AIGC-driven HDT solution, which shows a particular application in a virtual physical therapy teaching platform. Finally, we conclude this article by briefly discussing several open issues and future directions.
Multivariate functional data that are cross-sectionally compositional data are attracting increasing interest in the statistical modeling literature, a major example being trajectories over time of compositions derived from cause-specific mortality rates. In this work, we develop a novel functional concurrent regression model in which independent variables are functional compositions. This allows us to investigate the relationship over time between life expectancy at birth and compositions derived from cause-specific mortality rates of four distinct age classes, namely 0--4, 5--39, 40--64 and 65+. A penalized approach is developed to estimate the regression coefficients and select the relevant variables. Then an efficient computational strategy based on an augmented Lagrangian algorithm is derived to solve the resulting optimization problem. The good performances of the model in predicting the response function and estimating the unknown functional coefficients are shown in a simulation study. The results on real data confirm the important role of neoplasms and cardiovascular diseases in determining life expectancy emerged in other studies and reveal several other contributions not yet observed.
Recently, there has been great interest in estimating the conditional average treatment effect using flexible machine learning methods. However, in practice, investigators often have working hypotheses about effect heterogeneity across pre-defined subgroups of study units, which we call the groupwise approach. The paper compares two modern ways to estimate groupwise treatment effects, a nonparametric approach and a semiparametric approach, with the goal of better informing practice. Specifically, we compare (a) the underlying assumptions, (b) efficiency and adaption to the underlying data generating models, and (c) a way to combine the two approaches. We also discuss how to test a key assumption concerning the semiparametric estimator and to obtain cluster-robust standard errors if study units in the same subgroups are correlated. We demonstrate our findings by conducting simulation studies and reanalyzing the Early Childhood Longitudinal Study.