亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='h94s3'></tfoot>

<legend id='h94s3'><style id='h94s3'><dir id='h94s3'><q id='h94s3'></q></dir></style></legend>

<i id='h94s3'><tr id='h94s3'><dt id='h94s3'><q id='h94s3'><span id='h94s3'><b id='h94s3'><form id='h94s3'><ins id='h94s3'></ins><ul id='h94s3'></ul><sub id='h94s3'></sub></form><legend id='h94s3'></legend><bdo id='h94s3'><pre id='h94s3'><center id='h94s3'></center></pre></bdo></b><th id='h94s3'></th></span></q></dt></tr></i><div id='h94s3'><tfoot id='h94s3'></tfoot><dl id='h94s3'><fieldset id='h94s3'></fieldset></dl></div>

·

Performer · GPUs · 編譯器 · C++17 · 可辨認的 ·

2024 年 2 月 9 日

pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations

Ruben Laso,Diego Krupitza,Sascha Hunold

from arxiv, 15 pages, 24 figures, 4 tables

Since the advent of parallel algorithms in the C++17 Standard Template Library (STL), the STL has become a viable framework for creating performance-portable applications. Given multiple existing implementations of the parallel algorithms, a systematic, quantitative performance comparison is essential for choosing the appropriate implementation for a particular hardware configuration. In this work, we introduce a specialized set of micro-benchmarks to assess the scalability of the parallel algorithms in the STL. By selecting different backends, our micro-benchmarks can be used on multi-core systems and GPUs. Using the suite, in a case study on AMD and Intel CPUs and NVIDIA GPUs, we were able to identify substantial performance disparities among different implementations, including GCC+TBB, GCC+HPX, Intel's compiler with TBB, or NVIDIA's compiler with OpenMP and CUDA.

相關內容

Performer

Networking · 圖形處理器 · 稀疏 · 圖 · PyTorch ·

2024 年 3 月 21 日

iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations

Md Saidul Hoque Anik,Pranav Badhe,Rohit Gampa,Ariful Azad

Core computations in Graph Neural Network (GNN) training and inference are often mapped to sparse matrix operations such as sparse-dense matrix multiplication (SpMM). These sparse operations are harder to optimize by manual tuning because their performance depends significantly on the sparsity of input graphs, GNN models, and computing platforms. To address this challenge, we present iSpLib, a PyTorch-based C++ library equipped with auto-tuned sparse operations. iSpLib expedites GNN training with a cache-enabled backpropagation that stores intermediate matrices in local caches. The library offers a user-friendly Python plug-in that allows users to take advantage of our optimized PyTorch operations out-of-the-box for any existing linear algebra-based PyTorch implementation of popular GNNs (Graph Convolution Network, GraphSAGE, Graph Inference Network, etc.) with only two lines of additional code. We demonstrate that iSpLib obtains up to 27x overall training speedup compared to the equivalent PyTorch 2.1.0 and PyTorch Geometric 2.4.0 implementations on the CPU. Our library is publicly available at //github.com/HipGraph/iSpLib (//doi.org/10.5281/zenodo.10806511).

MoDELS · Performer · Continuity · 穩健性 · 控制器 ·

2024 年 3 月 21 日

TD-MPC2: Scalable, Robust World Models for Continuous Control

Nicklas Hansen,Hao Su,Xiaolong Wang

from arxiv, ICLR 2024. Explore videos, models, data, code, and more at //tdmpc2.com

TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at //tdmpc2.com

整數線性規劃 · 線性的 · Neural Networks · Networking · 優化器 ·

2024 年 3 月 21 日

Learning to Solve Integer Linear Programs with Davis-Yin Splitting

Daniel McKenzie,Samy Wu Fung,Howard Heaton

In many applications, a combinatorial problem must be repeatedly solved with similar, but distinct parameters. Yet, the parameters $w$ are not directly observed; only contextual data $d$ that correlates with $w$ is available. It is tempting to use a neural network to predict $w$ given $d$. However, training such a model requires reconciling the discrete nature of combinatorial optimization with the gradient-based frameworks used to train neural networks. When the problem in question is an Integer Linear Program (ILP), one approach to overcome this training issue is to consider a continuous relaxation of the combinatorial problem. While existing methods utilizing this approach have shown to be highly effective on small problems, they do not always scale well to large problems. In this work, we draw on ideas from modern convex optimization to design a network and training scheme which scales effortlessly to problems with thousands of variables. Our experiments verify the computational advantage our proposed method enjoys on two representative problems, namely the shortest path problem and the knapsack problem.

CoT · MoDELS · 語言模型化 · Prompt · Extensibility ·

2024 年 3 月 21 日

ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting

Xiaoxue Cheng,Junyi Li,Wayne Xin Zhao,Ji-Rong Wen

from arxiv, Accepted to LREC-COLING 2024

Chain-of-Thought (CoT) prompting can enhance the reasoning capabilities of large language models (LLMs), establishing itself as a primary approach to solving complex reasoning tasks. Existing CoT synthesis approaches usually focus on simpler reasoning tasks and thus result in low-quality and inconsistent CoT prompts. In response to this challenge, we present an empirical investigation of CoT prompting and introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts. CoTGenius is developed based on three major evolution strategies, i.e., complicate, diversify, and specify-alongside two filtering mechanisms: evolutionary success judgement and correctness verification. We further employ CoTGenius to create an extensive CoT dataset, and subsequently fine-tune the Llama 2-Chat 7B and 13B models on this dataset. We call the resulting model ChainLM. To deal with the cumulative error issue in reasoning steps, we propose a step-level debating method, wherein multiple debaters discuss each reasoning step to arrive at the correct answer. Extensive experiments demonstrate that our ChainLM models exhibit enhanced proficiency in addressing a spectrum of complex reasoning problems compared to existing models. In addition, we conduct an in-depth analysis of the impact of data categories within CoTGenius on the model performance. We release our dataset and code at //github.com/RUCAIBox/ChainLM.

Integration · 數據庫管理員（DBA） · Performer · INFORMS · 分解的 ·

2024 年 3 月 20 日

DBA-Fusion: Tightly Integrating Deep Dense Visual Bundle Adjustment with Multiple Sensors for Large-Scale Localization and Mapping

Yuxuan Zhou,Xingxing Li,Shengyu Li,Xuanbin Wang,Shaoquan Feng,Yuxuan Tan

Visual simultaneous localization and mapping (VSLAM) has broad applications, with state-of-the-art methods leveraging deep neural networks for better robustness and applicability. However, there is a lack of research in fusing these learning-based methods with multi-sensor information, which could be indispensable to push related applications to large-scale and complex scenarios. In this paper, we tightly integrate the trainable deep dense bundle adjustment (DBA) with multi-sensor information through a factor graph. In the framework, recurrent optical flow and DBA are performed among sequential images. The Hessian information derived from DBA is fed into a generic factor graph for multi-sensor fusion, which employs a sliding window and supports probabilistic marginalization. A pipeline for visual-inertial integration is firstly developed, which provides the minimum ability of metric-scale localization and mapping. Furthermore, other sensors (e.g., global navigation satellite system) are integrated for driftless and geo-referencing functionality. Extensive tests are conducted on both public datasets and self-collected datasets. The results validate the superior localization performance of our approach, which enables real-time dense mapping in large-scale environments. The code has been made open-source (//github.com/GREAT-WHU/DBA-Fusion).

多峰值 · MoDELS · 狀態空間 · 語言模型化 · 多模態學習 ·

2024 年 3 月 20 日

VL-Mamba: Exploring State Space Models for Multimodal Learning

Yanyuan Qiao,Zheng Yu,Longteng Guo,Sihan Chen,Zijia Zhao,Mingzhen Sun,Qi Wu,Jing Liu

Multimodal large language models (MLLMs) have attracted widespread interest and have rich applications. However, the inherent attention mechanism in its Transformer structure requires quadratic complexity and results in expensive computational overhead. Therefore, in this work, we propose VL-Mamba, a multimodal large language model based on state space models, which have been shown to have great potential for long-sequence modeling with fast inference and linear scaling in sequence length. Specifically, we first replace the transformer-based backbone language model such as LLama or Vicuna with the pre-trained Mamba language model. Then, we empirically explore how to effectively apply the 2D vision selective scan mechanism for multimodal learning and the combinations of different vision encoders and variants of pretrained Mamba language models. The extensive experiments on diverse multimodal benchmarks with competitive performance show the effectiveness of our proposed VL-Mamba and demonstrate the great potential of applying state space models for multimodal learning tasks.

NOCS · DOS · Integration · Learning · MoDELS ·

2024 年 3 月 20 日

DL2Fence: Integrating Deep Learning and Frame Fusion for Enhanced Detection and Localization of Refined Denial-of-Service in Large-Scale NoCs

Haoyu Wang,Basel Halak,Jianjie Ren,Ahmad Atamli

This study introduces a refined Flooding Injection Rate-adjustable Denial-of-Service (DoS) model for Network-on-Chips (NoCs) and more importantly presents DL2Fence, a novel framework utilizing Deep Learning (DL) and Frame Fusion (2F) for DoS detection and localization. Two Convolutional Neural Networks models for classification and segmentation were developed to detect and localize DoS respectively. It achieves detection and localization accuracies of 95.8\% and 91.7\%, and precision rates of 98.5\% and 99.3\% in a 16x16 mesh NoC. The framework's hardware overhead notably decreases by 76.3\% when scaling from 8x8 to 16x16 NoCs, and it requires 42.4\% less hardware compared to state-of-the-arts. This advancement demonstrates DL2Fence's effectiveness in balancing outstanding detection performance in large-scale NoCs with extremely low hardware overhead.

學成 · Machine Learning · INTERACT · 圖 · INFORMS ·

2021 年 5 月 27 日

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

David Ahmedt-Aristizabal,Mohammad Ali Armin,Simon Denman,Clinton Fookes,Lars Petersson

With the advances of data-driven machine learning research, a wide variety of prediction problems have been tackled. It has become critical to explore how machine learning and specifically deep learning methods can be exploited to analyse healthcare data. A major limitation of existing methods has been the focus on grid-like data; however, the structure of physiological recordings are often irregular and unordered which makes it difficult to conceptualise them as a matrix. As such, graph neural networks have attracted significant attention by exploiting implicit information that resides in a biological system, with interactive nodes connected by edges whose weights can be either temporal associations or anatomical junctions. In this survey, we thoroughly review the different types of graph architectures and their applications in healthcare. We provide an overview of these methods in a systematic manner, organized by their domain of application including functional connectivity, anatomical structure and electrical-based analysis. We also outline the limitations of existing techniques and discuss potential directions for future research.

學成 · 深度學習 · 可辨認的 · MoDELS · 目標跟蹤 ·

2019 年 7 月 31 日

Deep Learning in Video Multi-Object Tracking: A Survey

Gioele Ciaparrone,Francisco Luque Sánchez,Siham Tabik,Luigi Troiano,Roberto Tagliaferri,Francisco Herrera

from arxiv, New in v2: corrected typos and various minor mistakes. Submitted to Neurocomputing. Main text: 25 pages, 5 figures, 6 tables. Summary table in appendix at the end of the paper

The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.

判別器 · Performer · 降維 · 卷積神經網絡 · 多任務學習 ·

2018 年 1 月 25 日

NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Yuan Gao,Qi She,Jiayi Ma,Mingbo Zhao,Wei Liu,Alan L. Yuille

from arxiv, 11 pages, 5 figures, 7 tables

State-of-the-art Convolutional Neural Network (CNN) benefits a lot from multi-task learning (MTL), which learns multiple related tasks simultaneously to obtain shared or mutually related representations for different tasks. The most widely-used MTL CNN structure is based on an empirical or heuristic split on a specific layer (e.g., the last convolutional layer) to minimize different task-specific losses. However, this heuristic sharing/splitting strategy may be harmful to the final performance of one or multiple tasks. In this paper, we propose a novel CNN structure for MTL, which enables automatic feature fusing at every layer. Specifically, we first concatenate features from different tasks according to their channel dimension, and then formulate the feature fusing problem as discriminative dimensionality reduction. We show that this discriminative dimensionality reduction can be done by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN, which we refer to as Neural Discriminative Dimensionality Reduction (NDDR). We perform ablation analysis in details for different configurations in training the network. The experiments carried out on different network structures and different task sets demonstrate the promising performance and desirable generalizability of our proposed method.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='6WU9e'></tfoot>

<legend id='xeX8x'><style id='O8blc'><dir id='IEvuY'><q id='XKEEK'></q></dir></style></legend>

<i id='OmaWl'><tr id='sbtKT'><dt id='Z9Di2'><q id='fnR4N'><span id='2cYXW'><b id='Do8Tx'><form id='xLB2R'><ins id='GobDo'></ins><ul id='icwny'></ul><sub id='N43Nn'></sub></form><legend id='iaCHs'></legend><bdo id='MxmaR'><pre id='ddKnv'><center id='nGLAv'></center></pre></bdo></b><th id='1mnPF'></th></span></q></dt></tr></i><div id='7XWbD'><tfoot id='8WX3h'></tfoot><dl id='RTe09'><fieldset id='eMcvm'></fieldset></dl></div>

<li id='FMBUK'><abbr id='bLKWx'></abbr></li>