亚洲AV午夜成人片精品网站听书_亚洲欧洲综合成人AV一区_日韩欧美国产一区二区三区三州_九九精品视频一区二区三区_亚洲欧美精品伊人久久男同_国产亚洲综合精品专区在线_久久午夜福利精品一区二区

Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The second step, read mapping, finds the correct location of a read in a reference genome. In existing genome analysis pipelines, basecalling and read mapping are executed separately. We observe in this work that such separate execution of the two most time-consuming steps inherently leads to (1) significant data movement and (2) redundant computations on the data, slowing down the genome analysis pipeline. This paper proposes GenPIP, an in-memory genome analysis accelerator that tightly integrates basecalling and read mapping. GenPIP improves the performance of the genome analysis pipeline with two key mechanisms: (1) in-memory fine-grained collaborative execution of the major genome analysis steps in parallel; (2) a new technique for early-rejection of low-quality and unmapped reads to timely stop the execution of genome analysis for such reads, reducing inefficient computation. Our experiments show that, for the execution of the genome analysis pipeline, GenPIP provides 41.6X (8.4X) speedup and 32.8X (20.8X) energy savings with negligible accuracy loss compared to the state-of-the-art software genome analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design that combines state-of-the-art in-memory basecalling and read mapping accelerators, GenPIP provides 1.39X speedup and 1.37X energy savings.

相關內容

Analysis

關注 2

可辨認的 · Analysis · MoDELS · 推斷 · 矩 ·

2022 年 10 月 27 日

Efficient inference and identifiability analysis for differential equation models with random parameters

Alexander P. Browning,Christopher Drovandi,Ian W. Turner,Adrianne L. Jenner,Matthew J. Simpson

from arxiv, Minor changes to text. Additional results in supplementary material. Additional statistics regarding results given in main and supplementary material

Heterogeneity is a dominant factor in the behaviour of many biological processes. Despite this, it is common for mathematical and statistical analyses to ignore biological heterogeneity as a source of variability in experimental data. Therefore, methods for exploring the identifiability of models that explicitly incorporate heterogeneity through variability in model parameters are relatively underdeveloped. We develop a new likelihood-based framework, based on moment matching, for inference and identifiability analysis of differential equation models that capture biological heterogeneity through parameters that vary according to probability distributions. As our novel method is based on an approximate likelihood function, it is highly flexible; we demonstrate identifiability analysis using both a frequentist approach based on profile likelihood, and a Bayesian approach based on Markov-chain Monte Carlo. Through three case studies, we demonstrate our method by providing a didactic guide to inference and identifiability analysis of hyperparameters that relate to the statistical moments of model parameters from independent observed data. Our approach has a computational cost comparable to analysis of models that neglect heterogeneity, a significant improvement over many existing alternatives. We demonstrate how analysis of random parameter models can aid better understanding of the sources of heterogeneity from biological data.

點云 · Boosting（一種模型訓練加速方式） · Less · motivation · state-of-the-art ·

2022 年 10 月 27 日

Boosting Point Clouds Rendering via Radiance Mapping

Xiaoyang Huang,Yi Zhang,Bingbing Ni,Teng Li,Kai Chen,Wenjun Zhang

Recent years we have witnessed rapid development in NeRF-based image rendering due to its high quality. However, point clouds rendering is somehow less explored. Compared to NeRF-based rendering which suffers from dense spatial sampling, point clouds rendering is naturally less computation intensive, which enables its deployment in mobile computing device. In this work, we focus on boosting the image quality of point clouds rendering with a compact model design. We first analyze the adaption of the volume rendering formulation on point clouds. Based on the analysis, we simplify the NeRF representation to a spatial mapping function which only requires single evaluation per pixel. Further, motivated by ray marching, we rectify the the noisy raw point clouds to the estimated intersection between rays and surfaces as queried coordinates, which could avoid spatial frequency collapse and neighbor point disturbance. Composed of rasterization, spatial mapping and the refinement stages, our method achieves the state-of-the-art performance on point clouds rendering, outperforming prior works by notable margins, with a smaller model size. We obtain a PSNR of 31.74 on NeRF-Synthetic, 25.88 on ScanNet and 30.81 on DTU. Code and data would be released soon.

真實值 · Analysis · binary · TOOLS · 置信度 ·

2022 年 10 月 26 日

The Inconvenient Truths of Ground Truth for Binary Analysis

Jim Alves-Foss,Varsah Venugopal

The effectiveness of binary analysis tools and techniques is often measured with respect to how well they map to a ground truth. We have found that not all ground truths are created equal. This paper challenges the binary analysis community to take a long look at the concept of ground truth, to ensure that we are in agreement with definition(s) of ground truth, so that we can be confident in the evaluation of tools and techniques. This becomes even more important as we move to trained machine learning models, which are only as useful as the validity of the ground truth in the training.

似然 · 推斷 · Learning · MoDELS · 極大似然 ·

2022 年 10 月 26 日

Maximum Likelihood Learning of Energy-Based Models for Simulation-Based Inference

Pierre Glaser,Michael Arbel,Arnaud Doucet,Arthur Gretton

We introduce two synthetic likelihood methods for Simulation-Based Inference (SBI), to conduct either amortized or targeted inference from experimental observations when a high-fidelity simulator is available. Both methods learn a conditional energy-based model (EBM) of the likelihood using synthetic data generated by the simulator, conditioned on parameters drawn from a proposal distribution. The learned likelihood can then be combined with any prior to obtain a posterior estimate, from which samples can be drawn using MCMC. Our methods uniquely combine a flexible Energy-Based Model and the minimization of a KL loss: this is in contrast to other synthetic likelihood methods, which either rely on normalizing flows, or minimize score-based objectives; choices that come with known pitfalls. Our first method, Amortized Unnormalized Neural Likelihood Estimation (AUNLE), introduces a tilting trick during training that allows to significantly lower the computational cost of inference by enabling the use of efficient MCMC techniques. Our second method, Sequential UNLE (SUNLE), employs a robust doubly intractable approach in order to re-use simulation data and improve posterior accuracy on a specific dataset. We demonstrate the properties of both methods on a range of synthetic datasets, and apply them to a neuroscience model of the pyloric network in the crab Cancer Borealis, matching the performance of other synthetic likelihood methods at a fraction of the simulation budget.

Subspace · 非凸 · PCA · 線性的 · 優化器 ·

2022 年 10 月 25 日

Local Linear Convergence of Gradient Methods for Subspace Optimization via Strict Complementarity

Dan Garber,Ron Fisher

from arxiv, In Neural Information Processing Systems (NeurIPS) 2022

We consider optimization problems in which the goal is find a $k$-dimensional subspace of $\mathbb{R}^n$, $k<<n$, which minimizes a convex and smooth loss. Such problems generalize the fundamental task of principal component analysis (PCA) to include robust and sparse counterparts, and logistic PCA for binary data, among others. This problem could be approached either via nonconvex gradient methods with highly-efficient iterations, but for which arguing about fast convergence to a global minimizer is difficult or, via a convex relaxation for which arguing about convergence to a global minimizer is straightforward, but the corresponding methods are often inefficient in high dimensions. In this work we bridge these two approaches under a strict complementarity assumption, which in particular implies that the optimal solution to the convex relaxation is unique and is also the optimal solution to the original nonconvex problem. Our main result is a proof that a natural nonconvex gradient method which is \textit{SVD-free} and requires only a single QR-factorization of an $n\times k$ matrix per iteration, converges locally with a linear rate. We also establish linear convergence results for the nonconvex projected gradient method, and the Frank-Wolfe method when applied to the convex relaxation.

INFORMS · 簇 · 聚類方法 · 向量化 · 可約的 ·

2022 年 10 月 25 日

Clustering of Threat Information to Mitigate Information Overload for Computer Emergency Response Teams

Philipp Kuehn,Moritz Kerk,Marc Wendelborn,Christian Reuter

from arxiv, 12 pages, 7 figures

The constantly increasing number of threats and the existing diversity of information sources pose challenges for Computer Emergency Response Teams (CERTs). In order to respond to new threats, CERTs need to gather information in a timely and comprehensive manner. However, the volume of information and sources can lead to information overload. This paper answers the question of how to reduce information overload for CERTs with the help of clustering methods. Conditions for such a framework were established and subsequently tested. In order to perform an evaluation, different types of evaluation metrics were introduced and selected in relation to the framework conditions. Furthermore, different vectorizations and distance measures in combination with the clustering methods were evaluated and interpreted. Two different ground-truth datasets were used for the evaluation, one containing threat messages and a dataset with messages from different news categories. The work shows that the K-means clustering method along with TF-IDF vectorization and cosine distance provide the best results in the domain of threat messages.

QUIC · 可約的 · Guidance · CASES · 可理解性 ·

2022 年 10 月 24 日

Technical Report: Implementation of Single Packet Number Space in Multi-Path QUIC

Yingqi Tang,Yunfei Ma,Yanmei Liu

from arxiv, 6 pages

Over the past few of years, we have witnessed increasing interests in the use cases of multi-path QUIC from both industry and academia. For example, Alibaba deployed XLINK, a QoE-driven multi-path QUIC solution, in Taobao short video and showed benefits in both reduced tail latency and video re-buffering. For the time being, the multi-path QUIC protocol is in the process of standardization at the IETF QUIC working group, with the draft recently updated to version 02. The focus of the draft is to provide basic guidance on the implementation so that we can encourage more exploration, testing, and finally, an accelerated adoption of this technology. However, draft-02 has brought up an open issue on whether the multi-path QUIC should be implemented using single packet number space (SPNS) or multiple packet number space (MPNS), as in the current draft, both options co-exist. Knowing that one cannot draw a solid conclusion without experiments, we implement both SPNS and MPNS at Alibaba and measured their performance. The goal is to help the community better understand the implication, and we hope this report can be a useful resource for engineers and researchers who are interested in deploying multi-path QUIC.

MoDELS · Transformer模型 · 變換 · 推斷 · 模型評估 ·

2020 年 6 月 23 日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Zhuohan Li,Eric Wallace,Sheng Shen,Kevin Lin,Kurt Keutzer,Dan Klein,Joseph E. Gonzalez

from arxiv, ICML 2020

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.

情感分析 · MoDELS · 循環神經網絡 · entity · Neural Networks ·

2018 年 6 月 8 日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Ethem F. Can,Aysu Ezen-Can,Fazli Can

from arxiv, ACM SIGIR 2018 Workshop on Learning from Limited or Noisy Data (LND4IR'18)

Sentiment analysis is a widely studied NLP task where the goal is to determine opinions, emotions, and evaluations of users towards a product, an entity or a service that they are reviewing. One of the biggest challenges for sentiment analysis is that it is highly language dependent. Word embeddings, sentiment lexicons, and even annotated data are language specific. Further, optimizing models for each language is very time consuming and labor intensive especially for recurrent neural network models. From a resource perspective, it is very challenging to collect data for different languages. In this paper, we look for an answer to the following research question: can a sentiment analysis model trained on a language be reused for sentiment analysis in other languages, Russian, Spanish, Turkish, and Dutch, where the data is more limited? Our goal is to build a single model in the language with the largest dataset available for the task, and reuse it for languages that have limited resources. For this purpose, we train a sentiment analysis model using recurrent neural networks with reviews in English. We then translate reviews in other languages and reuse this model to evaluate the sentiments. Experimental results show that our robust approach of single model trained on English reviews statistically significantly outperforms the baselines in several different languages.

向量化 · 圖 · 知識圖譜 · Principle · 情景 ·

2018 年 5 月 26 日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Víctor Gutiérrez-Basulto,Steven Schockaert

Recent years have witnessed the enormous success of low-dimensional vector space representations of knowledge graphs to predict missing facts or find erroneous ones. Currently, however, it is not yet well-understood how ontological knowledge, e.g. given as a set of (existential) rules, can be embedded in a principled way. To address this shortcoming, in this paper we introduce a framework based on convex regions, which can faithfully incorporate ontological knowledge into the vector space embedding. Our technical contribution is two-fold. First, we show that some of the most popular existing embedding approaches are not capable of modelling even very simple types of rules. Second, we show that our framework can represent ontologies that are expressed using so-called quasi-chained existential rules in an exact way, such that any set of facts which is induced using that vector space embedding is logically consistent and deductively closed with respect to the input ontology.