亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Variable selection is crucial in high-dimensional omics-based analyses, since it is biologically reasonable to assume only a subset of non-noisy features contributes to the data structures. However, the task is particularly hard in an unsupervised setting, and a priori ad hoc variable selection is still a very frequent approach, despite the evident drawbacks and lack of reproducibility. We propose a Bayesian variable selection approach for rank-based transcriptomic analysis. Making use of data rankings instead of the actual continuous measurements increases the robustness of conclusions when compared to classical statistical methods, and embedding variable selection into the inferential tasks allows complete reproducibility. Specifically, we develop a novel extension of the Bayesian Mallows model for variable selection that allows for a full probabilistic analysis, leading to coherent quantification of uncertainties. We test our approach on simulated data using several data generating procedures, demonstrating the versatility and robustness of the method under different scenarios. We then use the novel approach to analyse genome-wide RNAseq gene expression data from ovarian cancer samples: several genes that affect cancer development are correctly detected in a completely unsupervised fashion, showing the method usefulness in the context of signature discovery for cancer genomics. Moreover, the possibility to also perform uncertainty quantification plays a key role in the subsequent biological investigation.

相關內容

iOS 8 提供的應用間和應用跟系統的功能交互特性。
  • Today (iOS and OS X): widgets for the Today view of Notification Center
  • Share (iOS and OS X): post content to web services or share content with others
  • Actions (iOS and OS X): app extensions to view or manipulate inside another app
  • Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
  • Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
  • Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
  • Custom Keyboard (iOS): system-wide alternative keyboards

Source:

Correlated data are ubiquitous in today's data-driven society. A fundamental task in analyzing these data is to understand, characterize and utilize the correlations in them in order to conduct valid inference. Yet explicit regression analysis of correlations has been so far limited to longitudinal data, a special form of correlated data, while implicit analysis via mixed-effects models lacks generality as a full inferential tool. This paper proposes a novel regression approach for modelling the correlation structure, leveraging a new generalized z-transformation. This transformation maps correlation matrices that are constrained to be positive definite to vectors with un-restricted support, and is order-invariant. Building on these two properties, we develop a regression model to relate the transformed parameters to any covariates. We show that coupled with a mean and a variance regression model, the use of maximum likelihood leads to asymptotically normal parameter estimates, and crucially enables statistical inference for all the parameters. The performance of our framework is demonstrated in extensive simulation. More importantly, we illustrate the use of our model with the analysis of the classroom data, a highly unbalanced multilevel clustered data with within-class and within-school correlations, and the analysis of the malaria immune response data in Benin, a longitudinal data with time-dependent covariates in addition to time. Our analyses reveal new insights not previously known.

Anomaly detection is the problem of recognizing abnormal inputs based on the seen examples of normal data. Despite recent advances of deep learning in recognizing image anomalies, these methods still prove incapable of handling complex medical images, such as barely visible abnormalities in chest X-rays and metastases in lymph nodes. To address this problem, we introduce a new powerful method of image anomaly detection. It relies on the classical autoencoder approach with a re-designed training pipeline to handle high-resolution, complex images and a robust way of computing an image abnormality score. We revisit the very problem statement of fully unsupervised anomaly detection, where no abnormal examples at all are provided during the model setup. We propose to relax this unrealistic assumption by using a very small number of anomalies of confined variability merely to initiate the search of hyperparameters of the model. We evaluate our solution on natural image datasets with a known benchmark, as well as on two medical datasets containing radiology and digital pathology images. The proposed approach suggests a new strong baseline for image anomaly detection and outperforms state-of-the-art approaches in complex medical image analysis tasks.

We study the finite sample behavior of Lasso-based inference methods such as post double Lasso and debiased Lasso. We show that these methods can exhibit substantial omitted variable biases (OVBs) due to Lasso not selecting relevant controls. This phenomenon can occur even when the coefficients are sparse and the sample size is large and larger than the number of controls. Therefore, relying on the existing asymptotic inference theory can be problematic in empirical applications. We compare the Lasso-based inference methods to modern high-dimensional OLS-based methods and provide practical guidance.

In this paper, we propose a propensity score adapted variable selection procedure to select covariates for inclusion in propensity score models, in order to eliminate confounding bias and improve statistical efficiency in observational studies. Our variable selection approach is specially designed for causal inference, it only requires the propensity scores to be $\sqrt{n}$-consistently estimated through a parametric model and need not correct specification of potential outcome models. By using estimated propensity scores as inverse probability treatment weights in performing an adaptive lasso on the outcome, it successfully excludes instrumental variables, and includes confounders and outcome predictors. We show its oracle properties under the "linear association" conditions. We also perform some numerical simulations to illustrate our propensity score adapted covariate selection procedure and evaluate its performance under model misspecification. Comparison to other covariate selection methods is made using artificial data as well, through which we find that it is more powerful in excluding instrumental variables and spurious covariates.

Plausibility is a formalization of exact tests for parametric models and generalizes procedures such as Fisher's exact test. The resulting tests are based on cumulative probabilities of the probability density function and evaluate consistency with a parametric family while providing exact control of the $\alpha$ level for finite sample size. Model comparisons are inefficient in this approach. We generalize plausibility by incorporating weighing which allows to perform model comparisons. We show that one weighing scheme is asymptotically equivalent to the likelihood ratio test (LRT) and has finite sample guarantees for the test size under the null hypothesis unlike the LRT. We confirm theoretical properties in simulations that mimic the data set of our data application. We apply the method to a retinoblastoma data set and demonstrate a parent-of-origin effect. Weighted plausibility also has applications in high-dimensional data analysis and P-values for penalized regression models can be derived. We demonstrate superior performance as compared to a data-splitting procedure in a simulation study. We apply weighted plausibility to a high-dimensional gene expression, case-control prostate cancer data set. We discuss the flexibility of the approach by relating weighted plausibility to targeted learning, the bootstrap, and sparsity selection.

Ensuring fairness in machine learning algorithms is a challenging and important task. We consider the problem of clustering a set of points while ensuring fairness constraints. While there have been several attempts to capture group fairness in the k-clustering problem, fairness at an individual level is not well-studied. We introduce a new notion of individual fairness in k-clustering based on features that are not necessarily used for clustering. We show that this problem is NP-hard and does not admit a constant factor approximation. We then design a randomized algorithm that guarantees approximation both in terms of minimizing the clustering distance objective as well as individual fairness under natural restrictions on the distance metric and fairness constraints. Finally, our experimental results validate that our algorithm produces lower clustering costs compared to existing algorithms while being competitive in individual fairness.

This paper addresses the task of estimating a covariance matrix under a patternless sparsity assumption. In contrast to existing approaches based on thresholding or shrinkage penalties, we propose a likelihood-based method that regularizes the distance from the covariance estimate to a symmetric sparsity set. This formulation avoids unwanted shrinkage induced by more common norm penalties and enables optimization of the resulting non-convex objective by solving a sequence of smooth, unconstrained subproblems. These subproblems are generated and solved via the proximal distance version of the majorization-minimization principle. The resulting algorithm executes rapidly, gracefully handles settings where the number of parameters exceeds the number of cases, yields a positive definite solution, and enjoys desirable convergence properties. Empirically, we demonstrate that our approach outperforms competing methods by several metrics across a suite of simulated experiments. Its merits are illustrated on an international migration dataset and a classic case study on flow cytometry. Our findings suggest that the marginal and conditional dependency networks for the cell signalling data are more similar than previously concluded.

Algorithmic fairness has aroused considerable interests in data mining and machine learning communities recently. So far the existing research has been mostly focusing on the development of quantitative metrics to measure algorithm disparities across different protected groups, and approaches for adjusting the algorithm output to reduce such disparities. In this paper, we propose to study the problem of identification of the source of model disparities. Unlike existing interpretation methods which typically learn feature importance, we consider the causal relationships among feature variables and propose a novel framework to decompose the disparity into the sum of contributions from fairness-aware causal paths, which are paths linking the sensitive attribute and the final predictions, on the graph. We also consider the scenario when the directions on certain edges within those paths cannot be determined. Our framework is also model agnostic and applicable to a variety of quantitative disparity measures. Empirical evaluations on both synthetic and real-world data sets are provided to show that our method can provide precise and comprehensive explanations to the model disparities.

The aim of this paper is to offer the first systematic exploration and definition of equivalent causal models in the context where both models are not made up of the same variables. The idea is that two models are equivalent when they agree on all "essential" causal information that can be expressed using their common variables. I do so by focussing on the two main features of causal models, namely their structural relations and their functional relations. In particular, I define several relations of causal ancestry and several relations of causal sufficiency, and require that the most general of these relations are preserved across equivalent models.

We address the problem of anomaly detection, that is, detecting anomalous events in a video sequence. Anomaly detection methods based on convolutional neural networks (CNNs) typically leverage proxy tasks, such as reconstructing input video frames, to learn models describing normality without seeing anomalous samples at training time, and quantify the extent of abnormalities using the reconstruction error at test time. The main drawbacks of these approaches are that they do not consider the diversity of normal patterns explicitly, and the powerful representation capacity of CNNs allows to reconstruct abnormal video frames. To address this problem, we present an unsupervised learning approach to anomaly detection that considers the diversity of normal patterns explicitly, while lessening the representation capacity of CNNs. To this end, we propose to use a memory module with a new update scheme where items in the memory record prototypical patterns of normal data. We also present novel feature compactness and separateness losses to train the memory, boosting the discriminative power of both memory items and deeply learned features from normal data. Experimental results on standard benchmarks demonstrate the effectiveness and efficiency of our approach, which outperforms the state of the art.

北京阿比特科技有限公司