精品夜色国产国偷自产乱码_久久人妻中出按摩_欧美偷拍性爱三级_亚洲无AV码一区二区三区HD_综合缴情网缴情五月无码_精品愉拍在线观看_日韩一区二区三区无码中文字幕

Due to the exponential growth of genomic data, constructing dedicated data structures has become the principal bottleneck in common bioinformatics applications. In particular, the Burrows-Wheeler Transform (BWT) is the basis of some of the most popular self-indexes for genomic data, due to its known favourable behaviour on repetitive data. Some tools that exploit the intrinsic repetitiveness of biological data have risen in popularity, due to their speed and low space consumption. We introduce a new algorithm for computing the BWT, which takes advantage of the redundancy of the data through a compressed version of matching statistics, the $\textit{CMS}$ of [Lipt\'ak et al., WABI 2022]. We show that it suffices to sort a small subset of suffixes, lowering both computation time and space. Our result is due to a new insight which links the so-called insert-heads of [Lipt\'ak et al., WABI 2022] to the well-known run boundaries of the BWT. We give two implementations of our algorithm, called $\texttt{CMS}$-$\texttt{BWT}$, both competitive in our experimental validation on highly repetitive real-life datasets. In most cases, they outperform other tools w.r.t. running time, trading off a higher memory footprint, which, however, is still considerably smaller than the total size of the input data.

相關內容

統計(ji)量

關注 3

優化器 · 泛函 · 情景 · 數學 · Analysis ·

2023 年 6 月 28 日

A Review on Optimality Investigation Strategies for the Balanced Assignment Problem

Anurag Dutta,K. Lakshmanan,A. Ramamoorthy,Liton Chandra Voumik,John Harshith,John Pravin Motha

Mathematical Selection is a method in which we select a particular choice from a set of such. It have always been an interesting field of study for mathematicians. Accordingly, Combinatorial Optimization is a sub field of this domain of Mathematical Selection, where we generally, deal with problems subjecting to Operation Research, Artificial Intelligence and many more promising domains. In a broader sense, an optimization problem entails maximising or minimising a real function by systematically selecting input values from within an allowed set and computing the function's value. A broad region of applied mathematics is the generalisation of metaheuristic theory and methods to other formulations. More broadly, optimization entails determining the finest virtues of some fitness function, offered a fixed space, which may include a variety of distinct types of decision variables and contexts. In this work, we will be working on the famous Balanced Assignment Problem, and will propose a comparative analysis on the Complexity Metrics of Computational Time for different Notions of solving the Balanced Assignment Problem.

可約的 · 數據集 · MoDELS · Learning · 服務器 ·

2023 年 6 月 28 日

An Efficient Virtual Data Generation Method for Reducing Communication in Federated Learning

Cheng Yang,Xue Yang,Dongxian Wu,Xiaohu Tang

from arxiv, There are some errors in the experimental setup of this paper

Communication overhead is one of the major challenges in Federated Learning(FL). A few classical schemes assume the server can extract the auxiliary information about training data of the participants from the local models to construct a central dummy dataset. The server uses the dummy dataset to finetune aggregated global model to achieve the target test accuracy in fewer communication rounds. In this paper, we summarize the above solutions into a data-based communication-efficient FL framework. The key of the proposed framework is to design an efficient extraction module(EM) which ensures the dummy dataset has a positive effect on finetuning aggregated global model. Different from the existing methods that use generator to design EM, our proposed method, FedINIBoost borrows the idea of gradient match to construct EM. Specifically, FedINIBoost builds a proxy dataset of the real dataset in two steps for each participant at each communication round. Then the server aggregates all the proxy datasets to form a central dummy dataset, which is used to finetune aggregated global model. Extensive experiments verify the superiority of our method compared with the existing classical method, FedAVG, FedProx, Moon and FedFTG. Moreover, FedINIBoost plays a significant role in finetuning the performance of aggregated global model at the initial stage of FL.

Extensibility · 統計量 · 得分 · LD · 估計/估計量 ·

2023 年 6 月 27 日

High-dimensional statistical inference for linkage disequilibrium score regression and its cross-ancestry extensions

Fei Xue,Bingxin Zhao

from arxiv, 13 figures

Linkage disequilibrium score regression (LDSC) has emerged as an essential tool for genetic and genomic analyses of complex traits, utilizing high-dimensional data derived from genome-wide association studies (GWAS). LDSC computes the linkage disequilibrium (LD) scores using an external reference panel, and integrates the LD scores with only summary data from the original GWAS. In this paper, we investigate LDSC within a fixed-effect data integration framework, underscoring its ability to merge multi-source GWAS data and reference panels. In particular, we take account of the genome-wide dependence among the high-dimensional GWAS summary statistics, along with the block-diagonal dependence pattern in estimated LD scores. Our analysis uncovers several key factors of both the original GWAS and reference panel datasets that determine the performance of LDSC. We show that it is relatively feasible for LDSC-based estimators to achieve asymptotic normality when applied to genome-wide genetic variants (e.g., in genetic variance and covariance estimation), whereas it becomes considerably challenging when we focus on a much smaller subset of genetic variants (e.g., in partitioned heritability analysis). Moreover, by modeling the disparities in LD patterns across different populations, we unveil that LDSC can be expanded to conduct cross-ancestry analyses using data from distinct global populations (such as European and Asian). We validate our theoretical findings through extensive numerical evaluations using real genetic data from the UK Biobank study.

奇異的 · Tensor · 樣本 · 左奇異向量 · 奇異向量 ·

2023 年 6 月 27 日

A non-backtracking method for long matrix and tensor completion

Ludovic Stephan,Yizhe Zhu

We consider the problem of low-rank rectangular matrix completion in the regime where the matrix $M$ of size $n\times m$ is ``long", i.e., the aspect ratio $m/n$ diverges to infinity. Such matrices are of particular interest in the study of tensor completion, where they arise from the unfolding of a low-rank tensor. In the case where the sampling probability is $\frac{d}{\sqrt{mn}}$, we propose a new spectral algorithm for recovering the singular values and left singular vectors of the original matrix $M$ based on a variant of the standard non-backtracking operator of a suitably defined bipartite weighted random graph, which we call a \textit{non-backtracking wedge operator}. When $d$ is above a Kesten-Stigum-type sampling threshold, our algorithm recovers a correlated version of the singular value decomposition of $M$ with quantifiable error bounds. This is the first result in the regime of bounded $d$ for weak recovery and the first for weak consistency when $d\to\infty$ arbitrarily slowly without any polylog factors. As an application, for low-rank orthogonal $k$-tensor completion, we efficiently achieve weak recovery with sample size $O(n^{k/2})$, and weak consistency with sample size $\omega(n^{k/2})$.

集成 · 核化 · 可辨認的 · 假陽性 · MoDELS ·

2023 年 6 月 26 日

Ensemble of Random and Isolation Forests for Graph-Based Intrusion Detection in Containers

Alfonso Iacovazzi,Shahid Raza

We propose a novel solution combining supervised and unsupervised machine learning models for intrusion detection at kernel level in cloud containers. In particular, the proposed solution is built over an ensemble of random and isolation forests trained on sequences of system calls that are collected at the hosting machine's kernel level. The sequence of system calls are translated into a weighted and directed graph to obtain a compact description of the container behavior, which is given as input to the ensemble model. We executed a set of experiments in a controlled environment in order to test our solution against the two most common threats that have been identified in cloud containers, and our results show that we can achieve high detection rates and low false positives in the tested attacks.

圖 · 類別 · 可交換的 · 可約的 · 相互獨立的 ·

2023 年 6 月 26 日

Stochastic dynamic matching: A mixed graph-theory and linear-algebra approach

Céline Comte,Fabien Mathieu,Ana Bu?i?

The stochastic dynamic matching problem has recently drawn attention in the stochastic-modeling community due to its numerous applications, ranging from supply-chain management to kidney exchange programs. In this paper, we consider a matching problem in which items of different classes arrive according to independent Poisson processes. Unmatched items are stored in a queue, and compatibility constraints are described by a simple graph on the classes, so that two items can be matched if their classes are neighbors in the graph. We analyze the efficiency of matching policies, not only in terms of system stability, but also in terms of matching rates between different classes. Our results rely on the observation that, under any stable policy, the matching rates satisfy a conservation equation that equates the arrival and departure rates of each item class. Our main contributions are threefold. We first introduce a mapping between the dimension of the solution set of this conservation equation, the structure of the compatibility graph, and the existence of a stable policy. In particular, this allows us to derive a necessary and sufficient stability condition that is verifiable in polynomial time. Secondly, we describe the convex polytope of non-negative solutions of the conservation equation. When this polytope is reduced to a single point, we give a closed-form expression of the solution; in general, we characterize the vertices of this polytope using again the graph structure. Lastly, we show that greedy policies cannot, in general, achieve every point in the polytope. In contrast, non-greedy policies can reach any point of the interior of this polytope, and we give a condition for these policies to also reach the boundary of the polytope.

缺失值 · 變換 · Extensibility · 求逆 · state-of-the-art ·

2023 年 6 月 23 日

Transformed Distribution Matching for Missing Value Imputation

He Zhao,Ke Sun,Amir Dezfouli,Edwin Bonilla

from arxiv, ICML 2023 camera-ready version, //openreview.net/forum?id=WBWb1FU8iz

We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally. To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed. Our algorithm has fewer hyperparameters to fine-tune and generates high-quality imputations regardless of how missing values are generated. Extensive experiments over a large number of datasets and competing benchmark algorithms show that our method achieves state-of-the-art performance.

矩陣論 · 線性的 · 歐氏空間 · 反向傳播算法 · AIM ·

2022 年 1 月 1 日

Matrix Decomposition and Applications

Jun Lu

from arxiv, arXiv admin note: substantial text overlap with arXiv:2107.02579

In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.

圖 · 知識圖譜 · Machine Learning · 優化器 · Continuity ·

2021 年 9 月 22 日

Updating Embeddings for Dynamic Knowledge Graphs

Christopher Wewer,Florian Lemmerich,Michael Cochez

Data in Knowledge Graphs often represents part of the current state of the real world. Thus, to stay up-to-date the graph data needs to be updated frequently. To utilize information from Knowledge Graphs, many state-of-the-art machine learning approaches use embedding techniques. These techniques typically compute an embedding, i.e., vector representations of the nodes as input for the main machine learning algorithm. If a graph update occurs later on -- specifically when nodes are added or removed -- the training has to be done all over again. This is undesirable, because of the time it takes and also because downstream models which were trained with these embeddings have to be retrained if they change significantly. In this paper, we investigate embedding updates that do not require full retraining and evaluate them in combination with various embedding models on real dynamic Knowledge Graphs covering multiple use cases. We study approaches that place newly appearing nodes optimally according to local information, but notice that this does not work well. However, we find that if we continue the training of the old embedding, interleaved with epochs during which we only optimize for the added and removed parts, we obtain good results in terms of typical metrics used in link prediction. This performance is obtained much faster than with a complete retraining and hence makes it possible to maintain embeddings for dynamic Knowledge Graphs.

MoDELS · CLUES · INTERACT · 圖形處理器 · Neural Networks ·

2021 年 1 月 28 日

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

Yufeng Zhang,Jinghao Zhang,Zeyu Cui,Shu Wu,Liang Wang

from arxiv, To appear at AAAI 2021

To retrieve more relevant, appropriate and useful documents given a query, finding clues about that query through the text is crucial. Recent deep learning models regard the task as a term-level matching problem, which seeks exact or similar query patterns in the document. However, we argue that they are inherently based on local interactions and do not generalise to ubiquitous, non-consecutive contextual relationships.In this work, we propose a novel relevance matching model based on graph neural networks to leverage the document-level word relationships for ad-hoc retrieval. In addition to the local interactions, we explicitly incorporate all contexts of a term through the graph-of-word text format. Matching patterns can be revealed accordingly to provide a more accurate relevance score. Our approach significantly outperforms strong baselines on two ad-hoc benchmarks. We also experimentally compare our model with BERT and show our ad-vantages on long documents.