亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This work is motivated by learning the individualized minimal clinically important difference, a vital concept to assess clinical importance in various biomedical studies. We formulate the scientific question into a high-dimensional statistical problem where the parameter of interest lies in an individualized linear threshold. The goal is to develop a hypothesis testing procedure for the significance of a single element in this parameter as well as of a linear combination of this parameter. The difficulty dues to the high-dimensional nuisance in developing such a testing procedure, and also stems from the fact that this high-dimensional threshold model is nonregular and the limiting distribution of the corresponding estimator is nonstandard. To deal with these challenges, we construct a test statistic via a new bias-corrected smoothed decorrelated score approach, and establish its asymptotic distributions under both null and local alternative hypotheses. We propose a double-smoothing approach to select the optimal bandwidth in our test statistic and provide theoretical guarantees for the selected bandwidth. We conduct simulation studies to demonstrate how our proposed procedure can be applied in empirical studies. We apply the proposed method to a clinical trial where the scientific goal is to assess the clinical importance of a surgery procedure.

相關內容

Differential privacy (DP) has become a rigorous central concept for privacy protection in the past decade. We use Gaussian differential privacy (GDP) in gauging the level of privacy protection for releasing statistical summaries from data. The GDP is a natural and easy-to-interpret differential privacy criterion based on the statistical hypothesis testing framework. The Gaussian mechanism is a natural and fundamental mechanism that can be used to perturb multivariate statistics to satisfy a $\mu$-GDP criterion, where $\mu>0$ stands for the level of privacy protection. Requiring a certain level of differential privacy inevitably leads to a loss of statistical utility. We improve ordinary Gaussian mechanisms by developing rank-deficient James-Stein Gaussian mechanisms for releasing private multivariate statistics, and show that the proposed mechanisms have higher statistical utilities. Laplace mechanisms, the most commonly used mechanisms in the pure DP framework, are also investigated under the GDP criterion. We show that optimal calibration of multivariate Laplace mechanisms requires more information on the statistic than just the global sensitivity, and derive the minimal amount of Laplace perturbation for releasing $\mu$-GDP contingency tables. Gaussian mechanisms are shown to have higher statistical utilities than Laplace mechanisms, except for very low levels of privacy. The utility of proposed multivariate mechanisms is further demonstrated using differentially private hypotheses tests on contingency tables. Bootstrap-based goodness-of-fit and homogeneity tests, utilizing the proposed rank-deficient James--Stein mechanisms, exhibit higher powers than natural competitors.

A multiverse analysis evaluates all combinations of "reasonable" analytic decisions to promote robustness and transparency, but can lead to a combinatorial explosion of analyses to compute. Long delays before assessing results prevent users from diagnosing errors and iterating early. We contribute (1) approximation algorithms for estimating multiverse sensitivity and (2) monitoring visualizations for assessing progress and controlling execution on the fly. We evaluate how quickly three sampling-based algorithms converge to accurately rank sensitive decisions in both synthetic and real multiverse analyses. Compared to uniform random sampling, round robin and sketching approaches are 2 times faster in the best case, while on average estimating sensitivity accurately using 20% of the full multiverse. To enable analysts to stop early to fix errors or decide when results are "good enough" to move forward, we visualize both effect size and decision sensitivity estimates with confidence intervals, and surface potential issues including runtime warnings and model quality metrics.

Generalized linear mixed models (GLMMs) are widely used in research for their ability to model correlated outcomes with non-Gaussian conditional distributions. The proper selection of fixed and random effects is a critical part of the modeling process, where model misspecification may lead to significant bias. However, the joint selection of fixed and and random effects has historically been limited to lower dimensional GLMMs, largely due to the use of criterion-based model selection strategies. Here we present the R package glmmPen, one of the first that to select fixed and random effects in higher dimension using a penalized GLMM modeling framework. Model parameters are estimated using a Monte Carlo expectation conditional minimization (MCECM) algorithm, which leverages Stan and RcppArmadillo for increased computational efficiency. Our package supports multiple distributional families and penalty functions. In this manuscript we discuss the modeling procedure, estimation scheme, and software implementation through application to a pancreatic cancer subtyping study.

Modern biomedical datasets are increasingly high dimensional and exhibit complex correlation structures. Generalized Linear Mixed Models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effects. We present a novel reformulation of the GLMM using a factor model decomposition of the random effects, enabling scalable computation of GLMMs in high dimensions by reducing the latent space from a large number of random effects to a smaller set of latent factors. We also extend our prior work to estimate model parameters using a modified Monte Carlo Expectation Conditional Minimization algorithm, allowing us to perform variable selection on both the fixed and random effects simultaneously. We show through simulation that through this factor model decomposition, our method can fit high dimensional penalized GLMMs faster than comparable methods and more easily scale to larger dimensions not previously seen in existing approaches.

Taxi-demand prediction is an important application of machine learning that enables taxi-providing facilities to optimize their operations and city planners to improve transportation infrastructure and services. However, the use of sensitive data in these systems raises concerns about privacy and security. In this paper, we propose the use of federated learning for taxi-demand prediction that allows multiple parties to train a machine learning model on their own data while keeping the data private and secure. This can enable organizations to build models on data they otherwise would not be able to access. Despite its potential benefits, federated learning for taxi-demand prediction poses several technical challenges, such as class imbalance, data scarcity among some parties, and the need to ensure model generalization to accommodate diverse facilities and geographic regions. To effectively address these challenges, we propose a system that utilizes region-independent encoding for geographic lat-long coordinates. By doing so, the proposed model is not limited to a specific region, enabling it to perform optimally in any area. Furthermore, we employ cost-sensitive learning and various regularization techniques to mitigate issues related to data scarcity and overfitting, respectively. Evaluation with real-world data collected from 16 taxi service providers in Japan over a period of six months showed the proposed system predicted demand level accurately within 1\% error compared to a single model trained with integrated data. The system also effectively defended against membership inference attacks on passenger data.

Attention-based graph neural networks (GNNs), such as graph attention networks (GATs), have become popular neural architectures for processing graph-structured data and learning node embeddings. Despite their empirical success, these models rely on labeled data and the theoretical properties of these models have yet to be fully understood. In this work, we propose a novel attention-based node embedding framework for graphs. Our framework builds upon a hierarchical kernel for multisets of subgraphs around nodes (e.g. neighborhoods) and each kernel leverages the geometry of a smooth statistical manifold to compare pairs of multisets, by "projecting" the multisets onto the manifold. By explicitly computing node embeddings with a manifold of Gaussian mixtures, our method leads to a new attention mechanism for neighborhood aggregation. We provide theoretical insights into genralizability and expressivity of our embeddings, contributing to a deeper understanding of attention-based GNNs. We propose efficient unsupervised and supervised methods for learning the embeddings, with the unsupervised method not requiring any labeled data. Through experiments on several node classification benchmarks, we demonstrate that our proposed method outperforms existing attention-based graph models like GATs. Our code is available at //github.com/BorgwardtLab/fisher_information_embedding.

Objective. Algorithmic differentiation (AD) can be a useful technique to numerically optimize design and algorithmic parameters by, and quantify uncertainties in, computer simulations. However, the effectiveness of AD depends on how "well-linearizable" the software is. In this study, we assess how promising derivative information of a typical proton computed tomography (pCT) scan computer simulation is for the aforementioned applications. Approach. This study is mainly based on numerical experiments, in which we repeatedly evaluate three representative computational steps with perturbed input values. We support our observations with a review of the algorithmic steps and arithmetic operations performed by the software, using debugging techniques. Main results. The model-based iterative reconstruction (MBIR) subprocedure (at the end of the software pipeline) and the Monte Carlo (MC) simulation (at the beginning) were piecewise differentiable. Jumps in the MBIR function arose from the discrete computation of the set of voxels intersected by a proton path. Jumps in the MC function likely arose from changes in the control flow that affect the amount of consumed random numbers. The tracking algorithm solves an inherently non-differentiable problem. Significance. The MC and MBIR codes are ready for the integration of AD, and further research on surrogate models for the tracking subprocedure is necessary.

Quantile regression (QR) can be used to describe the comprehensive relationship between a response and predictors. Prior domain knowledge and assumptions in application are usually formulated as constraints of parameters to improve the estimation efficiency. This paper develops methods based on multi-block ADMM to fit general penalized QR with linear constraints of regression coefficients. Different formulations to handle the linear constraints and general penalty are explored and compared. The most efficient one has explicit expressions for each parameter and avoids nested-loop iterations in some existing algorithms. Additionally, parallel ADMM algorithm for big data is also developed when data are stored in a distributed fashion. The stopping criterion and convergence of the algorithm are established. Extensive numerical experiments and a real data example demonstrate the computational efficiency of the proposed algorithms. The details of theoretical proofs and different algorithm variations are presented in Appendix.

Access to individual-level health data is essential for gaining new insights and advancing science. In particular, modern methods based on artificial intelligence rely on the availability of and access to large datasets. In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data, i.e. data generated through a randomised process that have similar statistical properties as the original data, but do not have a one-to-one correspondence with the original individual-level records. In this study, we use a state-of-the-art synthetic data generation method and perform in-depth quality analyses of the generated data for a specific use case in the field of nutrition. We demonstrate the need for careful analyses of synthetic data that go beyond descriptive statistics and provide valuable insights into how to realise the full potential of synthetic datasets. By extending the methods, but also by thoroughly analysing the effects of sampling from a trained model, we are able to largely reproduce significant real-world analysis results in the chosen use case.

We propose a data segmentation methodology for the high-dimensional linear regression problem where the regression parameters are allowed to undergo multiple changes. The proposed methodology, MOSEG, proceeds in two stages where the data is first scanned for multiple change points using a moving window-based procedure, which is followed by a location refinement stage. MOSEG enjoys computational efficiency thanks to the adoption of a coarse grid in the first stage, as well as achieving theoretical consistency in estimating both the total number and the locations of the change points without requiring independence or sub-Gaussianity. We also propose MOSEG$.$MS, a multiscale extension of MOSEG which, while comparable to MOSEG in terms of computational complexity, achieves theoretical consistency for a broader parameter space that permits multiscale change points. We demonstrate good performance of the proposed methods in comparative simulation studies and in an application to to predicting the equity premium.

北京阿比特科技有限公司