国产欧美日韩综合在线,欧美日韩性爱视频免费观看,亚洲一区二区极品在线观看,日本高清视频在线观看,欧美人日人人操人

Precision medicine is a clinical approach for disease prevention, detection and treatment, which considers each individual's genetic background, environment and lifestyle. The development of this tailored avenue has been driven by the increased availability of omics methods, large cohorts of temporal samples, and their integration with clinical data. Despite the immense progression, existing computational methods for data analysis fail to provide appropriate solutions for this complex, high-dimensional and longitudinal data. In this work we have developed a new method termed TCAM, a dimensionality reduction technique for multi-way data, that overcomes major limitations when doing trajectory analysis of longitudinal omics data. Using real-world data, we show that TCAM outperforms traditional methods, as well as state-of-the-art tensor-based approaches for longitudinal microbiome data analysis. Moreover, we demonstrate the versatility of TCAM by applying it to several different omics datasets, and the applicability of it as a drop-in replacement within straightforward ML tasks.

相關內容

降維

關注 1

降維是將數據從高維空間轉換為低維空間，以便低維表示保留原始數據的某些有意義的屬性，理想情況下接近其固有維。降維在處理大量觀察和/或大量變量的領域很常見，例如信號處理，語音識別，神經信息學和生物信息學。

流形 · 估計/估計量 · 規范化的 · NFS · 噪聲 ·

2022 年 2 月 1 日

Density estimation on low-dimensional manifolds: an inflation-deflation approach

Christian Horvat,Jean-Pascal Pfister

Normalizing Flows (NFs) are universal density estimators based on Neural Networks. However, this universality is limited: the density's support needs to be diffeomorphic to a Euclidean space. In this paper, we propose a novel method to overcome this limitation without sacrificing universality. The proposed method inflates the data manifold by adding noise in the normal space, trains an NF on this inflated manifold, and, finally, deflates the learned density. Our main result provides sufficient conditions on the manifold and the specific choice of noise under which the corresponding estimator is exact. Our method has the same computational complexity as NFs and does not require computing an inverse flow. We also show that, if the embedding dimension is much larger than the manifold dimension, noise in the normal space can be well approximated by Gaussian noise. This allows using our method for approximating arbitrary densities on unknown manifolds provided that the manifold dimension is known.

估計/估計量 · 標準正交 · 累積分布函數 · Continuity · Performer ·

2022 年 1 月 31 日

Wavelet-based estimation of power densities of size-biased data

Michel H. Montoril,Aluísio Pinheiro,Brani Vidakovic

from arxiv, 29 pages, 9 figures

We propose a new wavelet-based method for density estimation when the data are size-biased. More specifically, we consider a power of the density of interest, where this power exceeds 1/2. Warped wavelet bases are employed, where warping is attained by some continuous cumulative distribution function. This can be seen as a general framework in which the conventional orthonormal wavelet estimation is the case where warping distribution is the standard uniform c.d.f. We show that both linear and nonlinear wavelet estimators are consistent, with optimal and/or near-optimal rates. Monte Carlo simulations are performed to compare four special settings which are easy to interpret in practice. An application with a real dataset on fatal traffic accidents involving alcohol illustrates the method. We observe that warped bases provide more flexible and superior estimates for both simulated and real data. Moreover, we find that estimating the power of a density (for instance, its square root) further improves the results.

方陣 · Frobenius 范數 · 近似 · 秩 · 近似誤差 ·

2022 年 1 月 31 日

Low-Rank Updates of Matrix Square Roots

Shany Shumeli,Petros Drineas,Haim Avron

Models in which the covariance matrix has the structure of a sparse matrix plus a low rank perturbation are ubiquitous in machine learning applications. It is often desirable for learning algorithms to take advantage of such structures, avoiding costly matrix computations that often require cubic time and quadratic storage. This is often accomplished by performing operations that maintain such structures, e.g. matrix inversion via the Sherman-Morrison-Woodbury formula. In this paper we consider the matrix square root and inverse square root operations. Given a low rank perturbation to a matrix, we argue that a low-rank approximate correction to the (inverse) square root exists. We do so by establishing a geometric decay bound on the true correction's eigenvalues. We then proceed to frame the correction has the solution of an algebraic Ricatti equation, and discuss how a low-rank solution to that equation can be computed. We analyze the approximation error incurred when approximately solving the algebraic Ricatti equation, providing spectral and Frobenius norm forward and backward error bounds. Finally, we describe several applications of our algorithms, and demonstrate their utility in numerical experiments.

流 · 近似 · 估計/估計量 · 無偏估計 · 優化器 ·

2022 年 1 月 31 日

Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Streaming Data

Antoine Godichon-Baggioni,Nicklas Werge,Olivier Wintenberger

Motivated by the high-frequency data streams continuously generated, real-time learning is becoming increasingly important. These data streams should be processed sequentially with the property that the data stream may change over time. In this streaming setting, we propose techniques for minimizing convex objectives through unbiased estimates of their gradients, commonly referred to as stochastic approximation problems. Our methods rely on stochastic approximation algorithms because of their applicability and computational advantages. The reasoning includes iterate averaging that guarantees optimal statistical efficiency under classical conditions. Our non-asymptotic analysis shows accelerated convergence by selecting the learning rate according to the expected data streams. We show that the average estimate converges optimally and robustly for any data stream rate. In addition, noise reduction can be achieved by processing the data in a specific pattern, which is advantageous for large-scale machine learning problems. These theoretical results are illustrated for various data streams, showing the effectiveness of the proposed algorithms.

估計/估計量 · CASES · 統計量 · MoDELS · INFORMS ·

2022 年 1 月 31 日

Estimation of World Seroprevalence of SARS-CoV-2 antibodies

Kwangmin Lee,Seongmin Kim,Seongil Jo,Jaeyong Lee

In this paper, we estimate the seroprevalence against COVID-19 by country and derive the seroprevalence over the world. To estimate seroprevalence, we use serological surveys (also called the serosurveys) conducted within each country. When the serosurveys are incorporated to estimate world seroprevalence, there are two issues. First, there are countries in which a serological survey has not been conducted. Second, the sample collection dates differ from country to country. We attempt to tackle these problems using the vaccination data, confirmed cases data, and national statistics. We construct Bayesian models to estimate the numbers of people who have antibodies produced by infection or vaccination separately. For the number of people with antibodies due to infection, we develop a hierarchical model for combining the information included in both confirmed cases data and national statistics. At the same time, we propose regression models to estimate missing values in the vaccination data. As of 31st of July 2021, using the proposed methods, we obtain the 95% credible interval of the world seroprevalence as [38.6%, 59.2%].

Networking · MoDELS · 估計/估計量 · 類別 · 樣本 ·

2022 年 1 月 30 日

Multilevel Longitudinal Analysis of Social Networks

Johan Koskinen,Tom A. B. Snijders

Stochastic actor-oriented models (SAOM) are a broadly applied modelling framework for analysing network dynamics using network panel data. They have been extended to address co-evolution of multiple networks as well as networks and behaviour. This paper extends the SAOM to the analysis of multiple network panels through a random coefficient multilevel model, estimated with a Bayesian approach. This is illustrated by a study of the dynamic interdependence of friendship and minor delinquency, represented by the combination of a one-mode and a two-mode network, using a sample of 81 school classes in the first year of secondary school.

可約的 · 秩 · 奇異值分解 · 泛函 · PDE ·

2022 年 1 月 29 日

Reduced Higher Order SVD: ubiquitous rank-reduction method in tensor-based scientific computing

Venera Khoromskaia,Boris N. Khoromskij

from arxiv, 32 pages, 15 figures

Tensor numerical methods, based on the rank-structured tensor representation of $d$-variate functions and operators, are designed to provide $O(dn)$ complexity of numerical calculations on $n^{\otimes d }$ grids contrary to $O(n^d)$ scaling by conventional grid-based methods. However, multiple tensor operations may lead to enormous increase in the tensor ranks (curse of ranks) of the target data, making calculation intractable. Therefore one of the most important steps in tensor calculations is the robust and efficient rank reduction procedure which should be performed many times in the course of various tensor transforms in multidimensional operator and function calculus. The rank reduction scheme based on the Reduced Higher Order SVD (RHOSVD) introduced in [33] played a significant role in the development of tensor numerical methods. Here, we briefly survey the essentials of RHOSVD method and then focus on some new theoretical and computational aspects of the RHOSVD demonstrating that this rank reduction technique constitutes the basic ingredient in tensor computations for real-life problems. In particular, the stability analysis of RHOSVD is presented. We introduce the multilinear algebra of tensors represented in the range-separated (RS) tensor format. This allows to apply the RHOSVD rank-reduction techniques to non-regular functional data with many singularities, for example, to the rank-structured computation of the collective multi-particle interaction potentials in bio-molecular modeling, as well as to complicated composite radial functions. The new theoretical and numerical results on application of the RHOSVD in scattered data modeling are presented. RHOSVD proved to be the efficient rank reduction technique in numerous applications ranging from numerical treatment of multi-particle systems up to a numerical solution of PDE constrained control problems.

圖 · 近似 · 置換 · CASE · RECOMB ·

2022 年 1 月 28 日

The Complexity of Approximate Pattern Matching on De Bruijn Graphs

Daniel Gibney,Sharma V. Thankachan,Srinivas Aluru

Aligning a sequence to a walk in a labeled graph is a problem of fundamental importance to Computational Biology. For finding a walk in an arbitrary graph with $|E|$ edges that exactly matches a pattern of length $m$, a lower bound based on the Strong Exponential Time Hypothesis (SETH) implies an algorithm significantly faster than $O(|E|m)$ time is unlikely [Equi et al., ICALP 2019]. However, for many special graphs, such as de Bruijn graphs, the problem can be solved in linear time [Bowe et al., WABI 2012]. For approximate matching, the picture is more complex. When edits (substitutions, insertions, and deletions) are only allowed to the pattern, or when the graph is acyclic, the problem is again solvable in $O(|E|m)$ time. When edits are allowed to arbitrary cyclic graphs, the problem becomes NP-complete, even on binary alphabets [Jain et al., RECOMB 2019]. These results hold even when edits are restricted to only substitutions. The complexity of approximate pattern matching on de Bruijn graphs remained open. We investigate this problem and show that the properties that make de Bruijn graphs amenable to efficient exact pattern matching do not extend to approximate matching, even when restricted to the substitutions only case with alphabet size four. We prove that determining the existence of a matching walk in a de Bruijn graph is NP-complete when substitutions are allowed to the graph. In addition, we demonstrate that an algorithm significantly faster than $O(|E|m)$ is unlikely for de Bruijn graphs in the case where only substitutions are allowed to the pattern. This stands in contrast to pattern-to-text matching where exact matching is solvable in linear time, like on de Bruijn graphs, but approximate matching under substitutions is solvable in subquadratic $O(n\sqrt{m})$ time, where $n$ is the text's length [Abrahamson, SIAM J. Computing 1987].

降維 · 去噪自編碼 · Performer · PCA · CASE ·

2022 年 1 月 27 日

The Effects of Spectral Dimensionality Reduction on Hyperspectral Pixel Classification: A Case Study

Kiran Mantripragada,Phuong D. Dao,Yuhong He,Faisal Z. Qureshi

from arxiv, 15 pages

This paper presents a systematic study of the effects of hyperspectral pixel dimensionality reduction on the pixel classification task. We use five dimensionality reduction methods -- PCA, KPCA, ICA, AE, and DAE -- to compress 301-dimensional hyperspectral pixels. Compressed pixels are subsequently used to perform pixel classifications. Pixel classification accuracies together with compression method, compression rates, and reconstruction errors provide a new lens to study the suitability of a compression method for the task of pixel classification. We use three high-resolution hyperspectral image datasets, representing three common landscape types (i.e. urban, transitional suburban, and forests) collected by the Remote Sensing and Spatial Ecosystem Modeling laboratory of the University of Toronto. We found that PCA, KPCA, and ICA post greater signal reconstruction capability; however, when compression rates are more than 90\% these methods show lower classification scores. AE and DAE methods post better classification accuracy at 95\% compression rate, however their performance drops as compression rate approaches 97\%. Our results suggest that both the compression method and the compression rate are important considerations when designing a hyperspectral pixel classification pipeline.

周期的 · 近似 · 可辨認的 · 高斯過程回歸 · Weight ·

2022 年 1 月 27 日

An application of Saddlepoint Approximation for period detection of stellar light observations

Efthymia Derezea,Alfred Kume,Dirk Froebrich

One of the main features of interest in analysing the light curves of stars is the underlying periodic behaviour. The corresponding observations are a complex type of time series with unequally spaced time points and are sometimes accompanied by varying measures of accuracy. The main tools for analysing these type of data rely on the periodogram-like functions, constructed with a desired feature so that the peaks indicate the presence of a potential period. In this paper, we explore a particular periodogram for the irregularly observed time series data, similar to Thieler et. al. (2013). We identify the potential periods at the appropriate peaks and more importantly with a quantifiable uncertainty. Our approach is shown to easily generalise to non-parametric methods including a weighted Gaussian process regression periodogram. We also extend this approach to correlated background noise. The proposed method for period detection relies on a test based on quadratic forms with normally distributed components. We implement the saddlepoint approximation, as a faster and more accurate alternative to the simulation-based methods that are currently used. The power analysis of the testing methodology is reported together with applications using light curves from the Hunting Outbursting Young Stars citizen science project.