一本色道综合久久欧美日韩精品_99久久国产精品综合久久国产_色中文字幕完整在线电影_黄网站色视频免费茄子视频_久久久久国产精品国产三级_国产黄色AV入口_77777日本欧美在线观看

Single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) assesses genome-wide chromatin accessibility in thousands of cells to reveal regulatory landscapes in high resolutions. However, the analysis presents challenges due to the high dimensionality and sparsity of the data. Several methods have been developed, including transformation techniques of term-frequency inverse-document frequency (TF-IDF), dimension reduction methods such as singular value decomposition (SVD), factor analysis, and autoencoders. Yet, a comprehensive study on the mentioned methods has not been fully performed. It is not clear what is the best practice when analyzing scATAC-seq data. We compared several scenarios for transformation and dimension reduction as well as the SVD-based feature analysis to investigate potential enhancements in scATAC-seq information retrieval. Additionally, we investigate if autoencoders benefit from the TF-IDF transformation. Our results reveal that the TF-IDF transformation generally leads to improved clustering and biologically relevant feature extraction.

相關內容

TF-IDF

關注 0

TF-IDF（英語：term frequency–inverse document frequency）是(shi)一種(zhong)用(yong)(yong)于(yu)信息檢索與文(wen)本挖掘(jue)的(de)常用(yong)(yong)加(jia)權技術。tf-idf是(shi)一種(zhong)統(tong)計方(fang)法，用(yong)(yong)以評(ping)(ping)估一字(zi)(zi)詞(ci)對于(yu)一個(ge)文(wen)件(jian)集或一個(ge)語料庫中的(de)其中一份文(wen)件(jian)的(de)重要程(cheng)(cheng)度。字(zi)(zi)詞(ci)的(de)重要性隨(sui)著它(ta)在文(wen)件(jian)中出(chu)現(xian)的(de)次數成(cheng)正比(bi)增加(jia)，但同(tong)時會隨(sui)著它(ta)在語料庫中出(chu)現(xian)的(de)頻率成(cheng)反比(bi)下降(jiang)。tf-idf加(jia)權的(de)各(ge)種(zhong)形式常被搜(sou)索引(yin)擎應用(yong)(yong)，作為文(wen)件(jian)與用(yong)(yong)戶查詢之(zhi)間相關程(cheng)(cheng)度的(de)度量或評(ping)(ping)級。除(chu)了(le)tf-idf以外(wai)，互聯網上的(de)搜(sou)索引(yin)擎還(huan)會使用(yong)(yong)基于(yu)鏈接分析的(de)評(ping)(ping)級方(fang)法，以確(que)定文(wen)件(jian)在搜(sou)索結果中出(chu)現(xian)的(de)順序。

查準率/準確率 · 情景 · 概率論 · 統計量 · Attention ·

2023 年 2 月 14 日

The Set Structure of Precision: Coherent Probabilities on Pre-Dynkin-Systems

Rabanus Derr,Robert C. Williamson

In literature on imprecise probability little attention is paid to the fact that imprecise probabilities are precise on some events. We call these sets system of precision. We show that, under mild assumptions, the system of precision of a lower and upper probability form a so-called (pre-)Dynkin-system. Interestingly, there are several settings, ranging from machine learning on partial data over frequential probability theory to quantum probability theory and decision making under uncertainty, in which a priori the probabilities are only desired to be precise on a specific underlying set system. At the core of all of these settings lies the observation that precise beliefs, probabilities or frequencies on two events do not necessarily imply this precision to hold for the intersection of those events. Here, (pre-)Dynkin-systems have been adopted as systems of precision, too. We show that, under extendability conditions, those pre-Dynkin-systems equipped with probabilities can be embedded into algebras of sets. Surprisingly, the extendability conditions elaborated in a strand of work in quantum physics are equivalent to coherence in the sense of Walley (1991, Statistical reasoning with imprecise probabilities, p. 84). Thus, literature on probabilities on pre-Dynkin-systems gets linked to the literature on imprecise probability. Finally, we spell out a lattice duality which rigorously relates the system of precision to credal sets of probabilities. In particular, we provide a hitherto undescribed, parametrized family of coherent imprecise probabilities.

contrastive · Performer · Learning · 奇異值分解 · 多峰值 ·

2023 年 2 月 13 日

Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data

Ryumei Nakada,Halil Ibrahim Gulluk,Zhun Deng,Wenlong Ji,James Zou,Linjun Zhang

from arxiv, 43 pages, 3 figures, accepted by AISTATS 2023

Language-supervised vision models have recently attracted great attention in computer vision. A common approach to build such models is to use contrastive learning on paired data across the two modalities, as exemplified by Contrastive Language-Image Pre-Training (CLIP). In this paper, under linear representation settings, (i) we initiate the investigation of a general class of nonlinear loss functions for multimodal contrastive learning (MMCL) including CLIP loss and show its connection to singular value decomposition (SVD). Namely, we show that each step of loss minimization by gradient descent can be seen as performing SVD on a contrastive cross-covariance matrix. Based on this insight, (ii) we analyze the performance of MMCL. We quantitatively show that the feature learning ability of MMCL can be better than that of unimodal contrastive learning applied to each modality even under the presence of wrongly matched pairs. This characterizes the robustness of MMCL to noisy data. Furthermore, when we have access to additional unpaired data, (iii) we propose a new MMCL loss that incorporates additional unpaired datasets. We show that the algorithm can detect the ground-truth pairs and improve performance by fully exploiting unpaired datasets. The performance of the proposed algorithm was verified by numerical experiments.

估計/估計量 · 通道 · Performer · 正交 · Extensibility ·

2023 年 2 月 13 日

On the Doppler Squint Effect in OTFS Systems over Doubly-Dispersive Channels: Modeling and Evaluation

Xuehan Wang,Xu Shi,Jintao Wang,Jian Song

Extensive work has demonstrated the excellent performance of orthogonal time frequency space (OTFS) modulation in high-mobility scenarios. Time-variant wideband channel estimation serves as one of the key compositions of OTFS receivers since the data detection requires accurate channel state information (CSI). In practical wideband OTFS systems, the Doppler shift brought by the high mobility is frequency-dependent, which is referred to as the Doppler Squint Effect (DSE). Unfortunately, DSE was ignored in overall prior estimation schemes employed in OTFS systems, which leads to severe performance loss in channel estimation and the consequent data detection. In this paper, we investigate DSE of wideband time-variant channel in delay-Doppler domain and concentrate on the characterization of OTFS channel coefficients considering DSE. The formulation and evaluation of OTFS input-output relationship are provided for both ideal and rectangular waveforms considering DSE. The channel estimation is therefore formulated as a sparse signal recovery problem and an orthogonal matching pursuit (OMP)-based scheme is adopted to solve it. Simulation results confirm the significance of DSE and the performance superiority compared with traditional channel estimation approaches ignoring DSE.

Tensor · Analysis · 有向 · 頻率主義學派 · MoDELS ·

2023 年 2 月 12 日

Bayesian Methods in Tensor Analysis

Yiyao Shi,Weining Shen

Tensors, also known as multidimensional arrays, are useful data structures in machine learning and statistics. In recent years, Bayesian methods have emerged as a popular direction for analyzing tensor-valued data since they provide a convenient way to introduce sparsity into the model and conduct uncertainty quantification. In this article, we provide an overview of frequentist and Bayesian methods for solving tensor completion and regression problems, with a focus on Bayesian methods. We review common Bayesian tensor approaches including model formulation, prior assignment, posterior computation, and theoretical properties. We also discuss potential future directions in this field.

后向 · 優化器 · Extensibility · AIM · 近似 ·

2023 年 2 月 11 日

Numerical methods for backward stochastic differential equations: A survey

Jared Chessari,Reiichiro Kawai,Yuji Shinozaki,Toshihiro Yamada

from arxiv, 51 pages

Backward Stochastic Differential Equations (BSDEs) have been widely employed in various areas of social and natural sciences, such as the pricing and hedging of financial derivatives, stochastic optimal control problems, optimal stopping problems and gene expression. Most BSDEs cannot be solved analytically and thus numerical methods must be applied to approximate their solutions. There have been a variety of numerical methods proposed over the past few decades as well as many more currently being developed. For the most part, they exist in a complex and scattered manner with each requiring a variety of assumptions and conditions. The aim of the present work is thus to systematically survey various numerical methods for BSDEs, and in particular, compare and categorize them, for further developments and improvements. To achieve this goal, we focus primarily on the core features of each method based on an extensive collection of 333 references: the main assumptions, the numerical algorithm itself, key convergence properties and advantages and disadvantages, to provide an up-to-date coverage of numerical methods for BSDEs, with insightful summaries of each and a useful comparison and categorization.

Tensor · 優化器 · 近似 · FAST · Processing（編程語言） ·

2023 年 2 月 10 日

Fast Learnings of Coupled Nonnegative Tensor Decomposition Using Optimal Gradient and Low-rank Approximation

Xiulin Wang,Tapani Ristaniemi,Fengyu Cong

from arxiv, 12 pages, 3 figures

Nonnegative tensor decomposition has been widely applied in signal processing and neuroscience, etc. When it comes to group analysis of multi-block tensors, traditional tensor decomposition is insufficient to utilize the shared/similar information among tensors. In this study, we propose a coupled nonnegative CANDECOMP/PARAFAC decomposition algorithm optimized by the alternating proximal gradient method (CoNCPDAPG), which is capable of a simultaneous decomposition of tensors from different samples that are partially linked and a simultaneous extraction of common components, individual components and core tensors. Due to the low optimization efficiency brought by the nonnegative constraint and the high-dimensional nature of the data, we further propose the lraCoNCPD-APG algorithm by combining low-rank approximation and the proposed CoNCPD-APG method. When processing multi-block large-scale tensors, the proposed lraCoNCPD-APG algorithm can greatly reduce the computational load without compromising the decomposition quality. Experiment results of coupled nonnegative tensor decomposition problems designed for synthetic data, real-world face images and event-related potential data demonstrate the practicability and superiority of the proposed algorithms.

INFORMS · IR · 信息檢索 · 秩 · 學成 ·

2021 年 11 月 27 日

Pre-training Methods in Information Retrieval

Yixing Fan,Xiaohui Xie,Yinqiong Cai,Jia Chen,Xinyu Ma,Xiangsheng Li,Ruqing Zhang,Jiafeng Guo,Yiqun Liu

The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to user's information need. Recently, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Since there have been a large number of works dedicating to the application of PTMs in IR, we believe it is the right time to summarize the current status, learn from existing methods, and gain some insights for future development. In this survey, we present an overview of PTMs applied in different components of IR system, including the retrieval component, the re-ranking component, and other components. In addition, we also introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards. Moreover, we discuss some open challenges and envision some promising directions, with the hope of inspiring more works on these topics for future research.

文本分類 · 標注 · Extensibility · state-of-the-art · 正則化項 ·

2021 年 2 月 15 日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Yu Zhang,Zhihong Shen,Yuxiao Dong,Kuansan Wang,Jiawei Han

from arxiv, 12 pages; Accepted to WWW 2021

Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set. Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications. However, most existing studies focus on only modeling the text information, with a few attempts to utilize either metadata or hierarchy signals, but not both of them. In this paper, we bridge the gap by formalizing the problem of metadata-aware text classification in a large label hierarchy (e.g., with tens of thousands of labels). To address this problem, we present the MATCH solution -- an end-to-end framework that leverages both metadata and hierarchy information. To incorporate metadata, we pre-train the embeddings of text and metadata in the same space and also leverage the fully-connected attentions to capture the interrelations between them. To leverage the label hierarchy, we propose different ways to regularize the parameters and output probability of each child label by its parents. Extensive experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH over state-of-the-art deep learning baselines.

MoDELS · CLUES · INTERACT · 圖形處理器 · Neural Networks ·

2021 年 1 月 28 日

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

Yufeng Zhang,Jinghao Zhang,Zeyu Cui,Shu Wu,Liang Wang

from arxiv, To appear at AAAI 2021

To retrieve more relevant, appropriate and useful documents given a query, finding clues about that query through the text is crucial. Recent deep learning models regard the task as a term-level matching problem, which seeks exact or similar query patterns in the document. However, we argue that they are inherently based on local interactions and do not generalise to ubiquitous, non-consecutive contextual relationships.In this work, we propose a novel relevance matching model based on graph neural networks to leverage the document-level word relationships for ad-hoc retrieval. In addition to the local interactions, we explicitly incorporate all contexts of a term through the graph-of-word text format. Matching patterns can be revealed accordingly to provide a more accurate relevance score. Our approach significantly outperforms strong baselines on two ad-hoc benchmarks. We also experimentally compare our model with BERT and show our ad-vantages on long documents.

entity · Performer · 圖 · 知識圖譜 · MoDELS ·

2019 年 6 月 4 日

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

Deepak Nathani,Jatin Chauhan,Charu Sharma,Manohar Kaul

from arxiv, accepted as long paper in ACL 2019

The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.