99视频在线播放喷射,国产高清一区二区在线影院,亚洲视频华人在线播放

This work is motivated by learning the individualized minimal clinically important difference, a vital concept to assess clinical importance in various biomedical studies. We formulate the scientific question into a high-dimensional statistical problem where the parameter of interest lies in an individualized linear threshold. The goal is to develop a hypothesis testing procedure for the significance of a single element in this parameter as well as of a linear combination of this parameter. The difficulty dues to the high-dimensional nuisance in developing such a testing procedure, and also stems from the fact that this high-dimensional threshold model is nonregular and the limiting distribution of the corresponding estimator is nonstandard. To deal with these challenges, we construct a test statistic via a new bias-corrected smoothed decorrelated score approach, and establish its asymptotic distributions under both null and local alternative hypotheses. We propose a double-smoothing approach to select the optimal bandwidth in our test statistic and provide theoretical guarantees for the selected bandwidth. We conduct simulation studies to demonstrate how our proposed procedure can be applied in empirical studies. We apply the proposed method to a clinical trial where the scientific goal is to assess the clinical importance of a surgery procedure.

相關內容

高(gao)維

關注 1

統計量 · 列聯表 · 準則 · Analysis · 假設檢驗 ·

2023 年 5 月 15 日

Differentially Private Multivariate Statistics with an Application to Contingency Table Analysis

Minwoo Kim,Jonghyeok Lee,Seung Woo Kwak,Sungkyu Jung

Differential privacy (DP) has become a rigorous central concept for privacy protection in the past decade. We use Gaussian differential privacy (GDP) in gauging the level of privacy protection for releasing statistical summaries from data. The GDP is a natural and easy-to-interpret differential privacy criterion based on the statistical hypothesis testing framework. The Gaussian mechanism is a natural and fundamental mechanism that can be used to perturb multivariate statistics to satisfy a $\mu$-GDP criterion, where $\mu>0$ stands for the level of privacy protection. Requiring a certain level of differential privacy inevitably leads to a loss of statistical utility. We improve ordinary Gaussian mechanisms by developing rank-deficient James-Stein Gaussian mechanisms for releasing private multivariate statistics, and show that the proposed mechanisms have higher statistical utilities. Laplace mechanisms, the most commonly used mechanisms in the pure DP framework, are also investigated under the GDP criterion. We show that optimal calibration of multivariate Laplace mechanisms requires more information on the statistic than just the global sensitivity, and derive the minimal amount of Laplace perturbation for releasing $\mu$-GDP contingency tables. Gaussian mechanisms are shown to have higher statistical utilities than Laplace mechanisms, except for very low levels of privacy. The utility of proposed multivariate mechanisms is further demonstrated using differentially private hypotheses tests on contingency tables. Bootstrap-based goodness-of-fit and homogeneity tests, utilizing the proposed rank-deficient James--Stein mechanisms, exhibit higher powers than natural competitors.

估計/估計量 · 近似 · on the fly · 隨機采樣 · UniFormer ·

2023 年 5 月 15 日

Approximation and Progressive Display of Multiverse Analyses

Yang Liu,Tim Althoff,Jeffrey Heer

A multiverse analysis evaluates all combinations of "reasonable" analytic decisions to promote robustness and transparency, but can lead to a combinatorial explosion of analyses to compute. Long delays before assessing results prevent users from diagnosing errors and iterating early. We contribute (1) approximation algorithms for estimating multiverse sensitivity and (2) monitoring visualizations for assessing progress and controlling execution on the fly. We evaluate how quickly three sampling-based algorithms converge to accurately rank sensitive decisions in both synthetic and real multiverse analyses. Compared to uniform random sampling, round robin and sketching approaches are 2 times faster in the best case, while on average estimating sensitivity accurately using 20% of the full multiverse. To enable analysts to stop early to fix errors or decide when results are "good enough" to move forward, we visualize both effect size and decision sensitivity estimates with confidence intervals, and surface potential issues including runtime warnings and model quality metrics.

MoDELS · 線性的 · 估計/估計量 · 蒙特卡羅 · cancer ·

2023 年 5 月 14 日

glmmPen: High Dimensional Penalized Generalized Linear Mixed Models

Hillary M. Heiling,Naim U. Rashid,Quefeng Li,Joseph G. Ibrahim

Generalized linear mixed models (GLMMs) are widely used in research for their ability to model correlated outcomes with non-Gaussian conditional distributions. The proper selection of fixed and random effects is a critical part of the modeling process, where model misspecification may lead to significant bias. However, the joint selection of fixed and and random effects has historically been limited to lower dimensional GLMMs, largely due to the use of criterion-based model selection strategies. Here we present the R package glmmPen, one of the first that to select fixed and random effects in higher dimension using a penalized GLMM modeling framework. Model parameters are estimated using a Monte Carlo expectation conditional minimization (MCECM) algorithm, which leverages Stan and RcppArmadillo for increased computational efficiency. Our package supports multiple distributional families and penalty functions. In this manuscript we discuss the modeling procedure, estimation scheme, and software implementation through application to a pancreatic cancer subtyping study.

分解的 · MoDELS · 線性的 · 潛在 · 可約的 ·

2023 年 5 月 14 日

Efficient Computation of High-Dimensional Penalized Generalized Linear Mixed Models by Latent Factor Modeling of the Random Effects

Hillary M. Heiling,Naim U. Rashid,Quefeng Li,Xianlu L. Peng,Jen Jen Yeh,Joseph G. Ibrahim

Modern biomedical datasets are increasingly high dimensional and exhibit complex correlation structures. Generalized Linear Mixed Models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effects. We present a novel reformulation of the GLMM using a factor model decomposition of the random effects, enabling scalable computation of GLMMs in high dimensions by reducing the latent space from a large number of random effects to a smaller set of latent factors. We also extend our prior work to estimate model parameters using a modified Monte Carlo Expectation Conditional Minimization algorithm, allowing us to perform variable selection on both the fixed and random effects simultaneously. We show through simulation that through this factor model decomposition, our method can fit high dimensional penalized GLMMs faster than comparable methods and more easily scale to larger dimensions not previously seen in existing approaches.

Learning · MoDELS · Machine Learning · 優化器 · 聯邦學習 ·

2023 年 5 月 14 日

Balancing Privacy and Utility of Spatio-Temporal Data for Taxi-Demand Prediction

Yumeki Goto,Tomoya Matsumoto,Hamada Rizk,Naoto Yanai,Hirozumi Yamaguchi

Taxi-demand prediction is an important application of machine learning that enables taxi-providing facilities to optimize their operations and city planners to improve transportation infrastructure and services. However, the use of sensitive data in these systems raises concerns about privacy and security. In this paper, we propose the use of federated learning for taxi-demand prediction that allows multiple parties to train a machine learning model on their own data while keeping the data private and secure. This can enable organizations to build models on data they otherwise would not be able to access. Despite its potential benefits, federated learning for taxi-demand prediction poses several technical challenges, such as class imbalance, data scarcity among some parties, and the need to ensure model generalization to accommodate diverse facilities and geographic regions. To effectively address these challenges, we propose a system that utilizes region-independent encoding for geographic lat-long coordinates. By doing so, the proposed model is not limited to a specific region, enabling it to perform optimally in any area. Furthermore, we employ cost-sensitive learning and various regularization techniques to mitigate issues related to data scarcity and overfitting, respectively. Evaluation with real-world data collected from 16 taxi service providers in Japan over a period of six months showed the proposed system predicted demand level accurately within 1\% error compared to a single model trained with integrated data. The system also effectively defended against membership inference attacks on passenger data.

結點 · 圖 · 流形 · 多重集 · Learning ·

2023 年 5 月 12 日

Fisher Information Embedding for Node and Graph Learning

Dexiong Chen,Paolo Pellizzoni,Karsten Borgwardt

from arxiv, ICML 2023

Attention-based graph neural networks (GNNs), such as graph attention networks (GATs), have become popular neural architectures for processing graph-structured data and learning node embeddings. Despite their empirical success, these models rely on labeled data and the theoretical properties of these models have yet to be fully understood. In this work, we propose a novel attention-based node embedding framework for graphs. Our framework builds upon a hierarchical kernel for multisets of subgraphs around nodes (e.g. neighborhoods) and each kernel leverages the geometry of a smooth statistical manifold to compare pairs of multisets, by "projecting" the multisets onto the manifold. By explicitly computing node embeddings with a manifold of Gaussian mixtures, our method leads to a new attention mechanism for neighborhood aggregation. We provide theoretical insights into genralizability and expressivity of our embeddings, contributing to a deeper understanding of attention-based GNNs. We propose efficient unsupervised and supervised methods for learning the embeddings, with the unsupervised method not requiring any labeled data. Through experiments on several node classification benchmarks, we demonstrate that our proposed method outperforms existing attention-based graph models like GATs. Our code is available at //github.com/BorgwardtLab/fisher_information_embedding.

泛函 · INFORMS · Performer · 蒙特卡羅 · Integration ·

2023 年 5 月 12 日

Exploration of Differentiability in a Proton Computed Tomography Simulation Framework

Max Aehle,Johan Alme,Gergely Gábor Barnaf?ldi,Johannes Blühdorn,Tea Bodova,Vyacheslav Borshchov,Anthony van den Brink,Viljar Eikeland,Gregory Feofilov,Christoph Garth,Nicolas R. Gauger,Ola Gr?ttvik,H?vard Helstrup,Sergey Igolkin,Ralf Keidel,Chinorat Kobdaj,Tobias Kortus,Lisa Kusch,Viktor Leonhardt,Shruti Mehendale,Raju Ningappa Mulawade,Odd Harald Odland,George O'Neill,Gábor Papp,Thomas Peitzmann,Helge Egil Seime Pettersen,Pierluigi Piersimoni,Rohit Pochampalli,Maksym Protsenko,Max Rauch,Attiq Ur Rehman,Matthias Richter,Dieter R?hrich,Max Sagebaum,Joshua Santana,Alexander Schilling,Joao Seco,Arnon Songmoolnak,ákos Sudár,Ganesh Tambave,Ihor Tymchuk,Kjetil Ullaland,Monika Varga-Kofarago,Lennart Volz,Boris Wagner,Steffen Wendzel,Alexander Wiebel,RenZheng Xiao,Shiming Yang,Sebastian Zillien

from arxiv, 27 pages, 11 figures

Objective. Algorithmic differentiation (AD) can be a useful technique to numerically optimize design and algorithmic parameters by, and quantify uncertainties in, computer simulations. However, the effectiveness of AD depends on how "well-linearizable" the software is. In this study, we assess how promising derivative information of a typical proton computed tomography (pCT) scan computer simulation is for the aforementioned applications. Approach. This study is mainly based on numerical experiments, in which we repeatedly evaluate three representative computational steps with perturbed input values. We support our observations with a review of the algorithmic steps and arithmetic operations performed by the software, using debugging techniques. Main results. The model-based iterative reconstruction (MBIR) subprocedure (at the end of the software pipeline) and the Monte Carlo (MC) simulation (at the beginning) were piecewise differentiable. Jumps in the MBIR function arose from the discrete computation of the set of voxels intersected by a proton path. Jumps in the MC function likely arose from changes in the control flow that affect the amount of consumed random numbers. The tracking algorithm solves an inherently non-differentiable problem. Significance. The MC and MBIR codes are ready for the integration of AD, and further research on surrogate models for the tracking subprocedure is necessary.

線性的 · 約束 · 大數據 · Extensibility · 估計/估計量 ·

2023 年 5 月 12 日

Extended ADMM for general penalized quantile regression with linear constraints in big data

Yongxin Liu,Peng Zeng

Quantile regression (QR) can be used to describe the comprehensive relationship between a response and predictors. Prior domain knowledge and assumptions in application are usually formulated as constraints of parameters to improve the estimation efficiency. This paper develops methods based on multi-block ADMM to fit general penalized QR with linear constraints of regression coefficients. Different formulations to handle the linear constraints and general penalty are explored and compared. The most efficient one has explicit expressions for each parameter and avoids nested-loop iterations in some existing algorithms. Additionally, parallel ADMM algorithm for big data is also developed when data are stored in a distributed fashion. The stopping criterion and convergence of the algorithm are established. Extensive numerical experiments and a real data example demonstrate the computational efficiency of the proposed algorithms. The details of theoretical proofs and different algorithm variations are presented in Appendix.

Analysis · Extensibility · Use Case · 統計量 · 原點 ·

2023 年 5 月 12 日

Synthetic data generation for a longitudinal cohort study -- Evaluation, method extension and reproduction of published data analysis results

Lisa Kühnel,Julian Schneider,Ines Perrar,Tim Adams,Fabian Prasser,Ute N?thlings,Holger Fr?hlich,Juliane Fluck

Access to individual-level health data is essential for gaining new insights and advancing science. In particular, modern methods based on artificial intelligence rely on the availability of and access to large datasets. In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data, i.e. data generated through a randomised process that have similar statistical properties as the original data, but do not have a one-to-one correspondence with the original individual-level records. In this study, we use a state-of-the-art synthetic data generation method and perform in-depth quality analyses of the generated data for a specific use case in the field of nutrition. We demonstrate the need for careful analyses of synthetic data that go beyond descriptive statistics and provide valuable insights into how to realise the full potential of synthetic datasets. By extending the methods, but also by thoroughly analysing the effects of sampling from a trained model, we are able to largely reproduce significant real-world analysis results in the chosen use case.

情景 · Extensibility · 相互獨立的 · 參數空間 · Performer ·

2023 年 5 月 12 日

High-dimensional data segmentation in regression settings permitting heavy tails and temporal dependence

Haeran Cho,Dom Owens

We propose a data segmentation methodology for the high-dimensional linear regression problem where the regression parameters are allowed to undergo multiple changes. The proposed methodology, MOSEG, proceeds in two stages where the data is first scanned for multiple change points using a moving window-based procedure, which is followed by a location refinement stage. MOSEG enjoys computational efficiency thanks to the adoption of a coarse grid in the first stage, as well as achieving theoretical consistency in estimating both the total number and the locations of the change points without requiring independence or sub-Gaussianity. We also propose MOSEG$.$MS, a multiscale extension of MOSEG which, while comparable to MOSEG in terms of computational complexity, achieves theoretical consistency for a broader parameter space that permits multiscale change points. We demonstrate good performance of the proposed methods in comparative simulation studies and in an application to to predicting the equity premium.