国产一国产一级毛片A久久久-欧美精品A在线观看

Inference of population structure from genetic data plays an important role in population and medical genetics studies. The traditional EIGENSTRAT method has been widely used for computing and selecting top principal components that capture population structure information (Price et al., 2006). With the advancement and decreasing cost of sequencing technology, whole-genome sequencing data provide much richer information about the underlying population structures. However, the EIGENSTRAT method was originally developed for analyzing array-based genotype data and thus may not perform well on sequencing data for two reasons. First, the number of genetic variants $p$ is much larger than the sample size $n$ in sequencing data such that the sample-to-marker ratio $n/p$ is nearly zero, violating the assumption of the Tracy-Widom test used in the EIGENSTRAT method. Second, the EIGENSTRAT method might not be able to handle the linkage disequilibrium (LD) well in sequencing data. To resolve those two critical issues, we propose a new statistical method called ERStruct to estimate the number of latent sub-populations based on sequencing data. We propose to use the ratio of successive eigenvalues as a more robust testing statistic, and then we approximate the null distribution of our proposed test statistic using modern random matrix theory. Simulation studies found that our proposed ERStruct method has outperformed the traditional Tracy-Widom test on sequencing data. We further use two public data sets from the HapMap 3 and the 1000 Genomes Projects to demonstrate the performance of our ERStruct method. We also implement our ERStruct in a MATLAB toolbox which is now publicly available on GitHub through //github.com/bglvly/ERStruct.

相關內容

INFORMS

關注 10

《計算機信息》雜志發表高質量的論文，擴大了運籌學和計算的范圍，尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文，以及描述新的和有用的軟件工具的論文。官網鏈接： · 模型評估 · 線性的 · Performer · 優化器 ·

2021 年 10 月 11 日

$φ$-FEM: an efficient simulation tool using simple meshes for problems in structure mechanics and heat transfer

Stephane Cotin,Michel Duprez,Vanessa Lleras,Alexei Lozinski,Killian Vuillemot

One of the major issues in the computational mechanics is to take into account the geometrical complexity. To overcome this difficulty and to avoid the expensive mesh generation, geometrically unfitted methods, i.e. the numerical methods using the simple computational meshes that do not fit the boundary of the domain, and/or the internal interfaces, have been widely developed. In the present work, we investigate the performances of an unfitted method called $\phi$-FEM that converges optimally and uses classical finite element spaces so that it can be easily implemented using general FEM libraries. The main idea is to take into account the geometry thanks to a level set function describing the boundary or the interface. Up to now, the $\phi$-FEM approach has been proposed, tested and substantiated mathematically only in some simplest settings: Poisson equation with Dirichlet/Neumann/Robin boundary conditions. Our goal here is to demonstrate its applicability to some more sophisticated governing equations arising in the computational mechanics. We consider the linear elasticity equations accompanied by either pure Dirichlet boundary conditions or by the mixed ones (Dirichlet and Neumann boundary conditions co-existing on parts of the boundary), an interface problem (linear elasticity with material coefficients abruptly changing over an internal interface), a model of elastic structures with cracks, and finally the heat equation. In all these settings, we derive an appropriate variant of $\phi$-FEM and then illustrate it by numerical tests on manufactured solutions. We also compare the accuracy and efficiency of $\phi$-FEM with those of the standard fitted FEM on the meshes of similar size, revealing the substantial gains that can be achieved by $\phi$-FEM in both the accuracy and the computational time.

統計量 · INTERACT · INFORMS · Performer · 可理解性 ·

2021 年 10 月 10 日

Co-clustering of Spatially Resolved Transcriptomic Data

Andrea Sottosanti,Davide Risso

from arxiv, Supplementary material attached

Spatial transcriptomics is a modern sequencing technology that allows the measurement of the activity of thousands of genes in a tissue sample and map where the activity is occurring. This technology has enabled the study of the so-called spatially expressed genes, i.e., genes which exhibit spatial variation across the tissue. Comprehending their functions and their interactions in different areas of the tissue is of great scientific interest, as it might lead to a deeper understanding of several key biological mechanisms. However, adequate statistical tools that exploit the newly spatial mapping information to reach more specific conclusions are still lacking. In this work, we introduce SpaRTaCo, a new statistical model that clusters the spatial expression profiles of the genes according to the areas of the tissue. This is accomplished by performing a co-clustering, i.e., inferring the latent block structure of the data and inducing two types of clustering: of the genes, using their expression across the tissue, and of the image areas, using the gene expression in the spots where the RNA is collected. Our proposed methodology is validated with a series of simulation experiments and its usefulness in responding to specific biological questions is illustrated with an application to a human brain tissue sample processed with the 10X-Visium protocol.

優化器 · 樣本 · Performer · CASE · 統計量 ·

2021 年 10 月 10 日

A computational approach to the Kiefer-Weiss problem for sampling from a Bernoulli population

Andrey Novikov,Andrei Novikov,Fahil Farkhshatov

from arxiv, 23 pages, 5 Tables, 3 Figures

We present a computational approach to solution of the Kiefer-Weiss problem. Algorithms for construction of the optimal sampling plans and evaluation of their performance are proposed. In the particular case of Bernoulli observations, the proposed algorithms are implemented in the form of R program code. Using the developed computer program, we numerically compare the optimal tests with the respective sequential probability ratio test (SPRT) and the fixed sample size test, for a wide range of hypothesized values and type I and type II errors. The results are compared with those of D.~Freeman and L.~Weiss (Journal of the American Statistical Association, 59(1964)). The R source code for the algorithms of construction of optimal sampling plans and evaluation of their characteristics is available at //github.com/tosinabase/Kiefer-Weiss.

結構化學習 · 潛變量/隱變量 · state-of-the-art · 學成 · MoDELS ·

2021 年 10 月 8 日

Integer Programming for Causal Structure Learning in the Presence of Latent Variables

Rui Chen,Sanjeeb Dash,Tian Gao

from arxiv, Published in ICML 2021

The problem of finding an ancestral acyclic directed mixed graph (ADMG) that represents the causal relationships between a set of variables is an important area of research on causal inference. Most existing score-based structure learning methods focus on learning directed acyclic graph (DAG) models without latent variables. A number of score-based methods have recently been proposed for the ADMG learning, yet they are heuristic in nature and do not guarantee an optimal solution. We propose a novel exact score-based method that solves an integer programming (IP) formulation and returns a score-maximizing ancestral ADMG for a set of continuous variables that follow a multivariate Gaussian distribution. We generalize the state-of-the-art IP model for DAG learning problems and derive new classes of valid inequalities to formulate an IP model for ADMG learning. Empirically, our model can be solved efficiently for medium-sized problems and achieves better accuracy than state-of-the-art score-based methods as well as benchmark constraint-based methods.

估計/估計量 · GROUP · 近似 · 線性的 · 有偏 ·

2021 年 10 月 7 日

Approximate Post-Selective Inference for Regression with the Group LASSO

Snigdha Panigrahi,Peter W. MacDonald,Daniel Kessler

from arxiv, 9 figures, 66 Pages

We develop a post-selective Bayesian framework to jointly and consistently estimate parameters in group-sparse linear regression models. After selection with the Group LASSO (or generalized variants such as the overlapping, sparse, or standardized Group LASSO), uncertainty estimates for the selected parameters are unreliable in the absence of adjustments for selection bias. Existing post-selective approaches are limited to uncertainty estimation for (i) real-valued projections onto very specific selected subspaces for the group-sparse problem, (ii) selection events categorized broadly as polyhedral events that are expressible as linear inequalities in the data variables. Our Bayesian methods address these gaps by deriving a likelihood adjustment factor, and an approximation thereof, that eliminates bias from selection. Paying a very nominal price for this adjustment, experiments on simulated data, and data from the Human Connectome Project demonstrate the efficacy of our methods for a joint estimation of group-sparse parameters and their uncertainties post selection.

概率圖模型 · 簇 · MoDELS · GM · 信念傳播 ·

2021 年 10 月 7 日

A Probabilistic Graphical Model Approach to the Structure-and-Motion Problem

Simon Streicher,Willie Brink,Johan du Preez

We present a means of formulating and solving the well known structure-and-motion problem in computer vision with probabilistic graphical models. We model the unknown camera poses and 3D feature coordinates as well as the observed 2D projections as Gaussian random variables, using sigma point parameterizations to effectively linearize the nonlinear relationships between these variables. Those variables involved in every projection are grouped into a cluster, and we connect the clusters in a cluster graph. Loopy belief propagation is performed over this graph, in an iterative re-initialization and estimation procedure, and we find that our approach shows promise in both simulation and on real-world data. The PGM is easily extendable to include additional parameters or constraints.

Backbone · MoDELS · 統計量 · 邊 · Networking ·

2021 年 10 月 7 日

Comparing Alternatives to the Fixed Degree Sequence Model for Extracting the Backbone of Bipartite Projections

Zachary P. Neal,Rachel Domagalski,Bruce Sagan

Projections of bipartite or two-mode networks capture co-occurrences, and are used in diverse fields (e.g., ecology, economics, bibliometrics, politics) to represent unipartite networks. A key challenge in analyzing such networks is determining whether an observed number of co-occurrences between two nodes is significant, and therefore whether an edge exists between them. One approach, the fixed degree sequence model (FDSM), evaluates the significance of an edge's weight by comparison to a null model in which the degree sequences of the original bipartite network are fixed. Although the FDSM is an intuitive null model, it is computationally expensive because it requires Monte Carlo simulation to estimate each edge's $p$-value, and therefore is impractical for large projections. In this paper, we explore four potential alternatives to FDSM: fixed fill model (FFM), fixed row model (FRM), fixed column model (FCM), and stochastic degree sequence model (SDSM). We compare these models to FDSM in terms of accuracy, speed, statistical power, similarity, and ability to recover known communities. We find that the computationally-fast SDSM offers a statistically conservative but close approximation of the computationally-impractical FDSM under a wide range of conditions, and that it correctly recovers a known community structure even when the signal is weak. Therefore, although each backbone model may have particular applications, we recommend SDSM for extracting the backbone of bipartite projections when FDSM is impractical.

Networking · 級聯 · 推斷 · 情景 · 極大 ·

2021 年 6 月 7 日

Network Inference and Influence Maximization from Samples

Wei Chen,Xiaoming Sun,Jialin Zhang,Zhijie Zhang

from arxiv, Accepted by ICML 2021

Influence maximization is the task of selecting a small number of seed nodes in a social network to maximize the spread of the influence from these seeds, and it has been widely investigated in the past two decades. In the canonical setting, the whole social network as well as its diffusion parameters is given as input. In this paper, we consider the more realistic sampling setting where the network is unknown and we only have a set of passively observed cascades that record the set of activated nodes at each diffusion step. We study the task of influence maximization from these cascade samples (IMS), and present constant approximation algorithms for this task under mild conditions on the seed set distribution. To achieve the optimization goal, we also provide a novel solution to the network inference problem, that is, learning diffusion parameters and the network structure from the cascade data. Comparing with prior solutions, our network inference algorithm requires weaker assumptions and does not rely on maximum-likelihood estimation and convex programming. Our IMS algorithms enhance the learning-and-then-optimization approach by allowing a constant approximation ratio even when the diffusion parameters are hard to learn, and we do not need any assumption related to the network structure or diffusion parameters.

SCAN · 3D · MoDELS · 多樣性 · 學成 ·

2019 年 2 月 13 日

3D Face Modeling from Diverse Raw Scan Data

Feng Liu,Tran Luan,Xiaoming Liu

from arxiv, 14 pages, 18 figures

Traditional 3D models learn a latent representation of faces using linear subspaces from no more than 300 training scans of a single database. The main roadblock of building a large-scale face model from diverse 3D databases lies in the lack of dense correspondence among raw scans. To address these problems, this paper proposes an innovative framework to jointly learn a nonlinear face model from a diverse set of raw 3D scan databases and establish dense point-to-point correspondence among their scans. Specifically, by treating input raw scans as unorganized point clouds, we explore the use of PointNet architectures for converting point clouds to identity and expression feature representations, from which the decoder networks recover their 3D face shapes. Further, we propose a weakly supervised learning approach that does not require correspondence label for the scans. We demonstrate the superior dense correspondence and representation power of our proposed method in shape and expression, and its contribution to single-image 3D face reconstruction.

MoDELS · SimPLe · CC · 模型評估 · 高斯混合（模型） ·

2018 年 2 月 24 日

The Search Problem in Mixture Models

Avik Ray,Joe Neeman,Sujay Sanghavi,Sanjay Shakkottai

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.