秋霞网一区二区三区_东京热无码一区二区三区无码_国产精品福利日韩福利精品_亚洲乱码中文字幕一区二区三区_91最新久久精品人妻_精品无码少妇一区二区三区免费_在线观看亚洲国产一区二区三区

Physics-based simulation has been actively employed in generating offline visual effects in the film and animation industry. However, the computations required for high-quality scenarios are generally immense, deterring its adoption in real-time applications, e.g., virtual production, avatar live-streaming, and cloud gaming. We summarize the principles that can accelerate the computation pipeline on single-GPU and multi-GPU platforms through extensive investigation and comprehension of modern GPU architecture. We further demonstrate the effectiveness of these principles by applying them to the material point method to build up our framework, which achieves $1.7\times$--$8.6\times$ speedup on a single GPU and $2.5\times$--$14.8\times$ on four GPUs compared to the state-of-the-art. Our pipeline is specifically designed for real-time applications (i.e., scenarios with small to medium particles) and achieves significant multi-GPU efficiency. We demonstrate our pipeline by simulating a snow scenario with 1.33M particles and a fountain scenario with 143K particles in real-time (on average, 68.5 and 55.9 frame-per-second, respectively) on four NVIDIA Tesla V100 GPUs interconnected with NVLinks.

相關內容

Principle

關注 2

迄今為止，產(chan)品設(she)計師最友好的(de)交(jiao)互動畫軟件。

SOFT · 線性的 · 全 · MoDELS · GPU ·

2021 年 12 月 31 日

Towards real-time finite-strain anisotropic thermo-visco-elastodynamic analysis of soft tissues for thermal ablative therapy

Jinao Zhang,Remi Jacob Lay,Stuart K. Roberts,Sunita Chauhan

from arxiv, Submitted to Computer Methods and Programs in Biomedicine

Accurate and efficient prediction of soft tissue temperatures is essential to computer-assisted treatment systems for thermal ablation. It can be used to predict tissue temperatures and ablation volumes for personalised treatment planning and image-guided intervention. Numerically, it requires full nonlinear modelling of the coupled computational bioheat transfer and biomechanics, and efficient solution procedures; however, existing studies considered the bioheat analysis alone or the coupled linear analysis, without the fully coupled nonlinear analysis. We present a coupled thermo-visco-hyperelastic finite element algorithm, based on finite-strain thermoelasticity and total Lagrangian explicit dynamics. It considers the coupled nonlinear analysis of (i) bioheat transfer under soft tissue deformations and (ii) soft tissue deformations due to thermal expansion/shrinkage. The presented method accounts for anisotropic, finite-strain, temperature-dependent, thermal, and viscoelastic behaviours of soft tissues, and it is implemented using GPU acceleration for real-time computation. We also demonstrate the translational benefits of the presented method for clinical applications using a simulation of thermal ablation in the liver. The key advantage of the presented method is that it enables full nonlinear modelling of the anisotropic, finite-strain, temperature-dependent, thermal, and viscoelastic behaviours of soft tissues, instead of linear elastic, linear viscoelastic, and thermal-only modelling in the existing methods. It also provides high computational speeds for computer-assisted treatment systems towards enabling the operator to simulate thermal ablation accurately and visualise tissue temperatures and ablation zones immediately.

Performer · Use Case · CASES · CASE · Storage ·

2021 年 12 月 30 日

SIM-SITU: A Framework for the Faithful Simulation of in-situ Workflows

Valentin Honoré,Tu Mai Anh Do,Lo?c Pottier,Rafael Ferreira da Silva,Ewa Deelman,Frédéric Suter

The amount of data generated by numerical simulations in various scientific domains such as molecular dynamics, climate modeling, biology, or astrophysics, led to a fundamental redesign of application workflows. The throughput and the capacity of storage subsystems have not evolved as fast as the computing power in extreme-scale supercomputers. As a result, the classical post-hoc analysis of simulation outputs became highly inefficient. In-situ workflows have then emerged as a solution in which simulation and data analytics are intertwined through shared computing resources, thus lower latencies. Determining the best allocation, i.e., how many resources to allocate to each component of an in-situ workflow; and mapping, i.e., where and at which frequency to run the data analytics component, is a complex task whose performance assessment is crucial to the efficient execution of in-situ workflows. However, such a performance evaluation of different allocation and mapping strategies usually relies either on directly running them on the targeted execution environments, which can rapidly become extremely time-and resource-consuming, or on resorting to the simulation of simplified models of the components of an in-situ workflow, which can lack of realism. In both cases, the validity of the performance evaluation is limited. To address this issue, we introduce SIM-SITU, a framework for the faithful simulation of in-situ workflows. This framework builds on the SimGrid toolkit and benefits of several important features of this versatile simulation tool. We designed SIM-SITU to reflect the typical structure of in-situ workflows and thanks to its modular design, SIM-SITU has the necessary flexibility to easily and faithfully evaluate the behavior and performance of various allocation and mapping strategies for in-situ workflows. We illustrate the simulation capabilities of SIM-SITU on a Molecular Dynamics use case. We study the impact of different allocation and mapping strategies on performance and show how users can leverage SIM-SITU to determine interesting tradeoffs when designing their in-situ workflow.

FAST · Performer · SOFT · HTTPS ·

2021 年 12 月 29 日

Real-time computation of bio-heat transfer in the fast explicit dynamics finite element algorithm (FED-FEM) framework

Jinao Zhang,Sunita Chauhan

from arxiv, Published in Numerical Heat Transfer, Part B: Fundamentals. arXiv admin note: text overlap with arXiv:1909.03355

Real-time analysis of bio-heat transfer is very beneficial in improving clinical outcomes of hyperthermia and thermal ablative treatments but challenging to achieve due to large computational costs. This paper presents a fast numerical algorithm well suited for real-time solutions of bio-heat transfer, and it achieves real-time computation via the (i) computationally efficient explicit dynamics in the temporal domain, (ii) element-level thermal load computation, (iii) computationally efficient finite elements, (iv) explicit formulation for unknown nodal temperature, and (v) pre-computation of constant simulation matrices and parameters, all of which lead to a significant reduction in computation time for fast run-time computation. The proposed methodology considers temperature-dependent thermal properties for nonlinear characteristics of bio-heat transfer in soft tissue. Utilising a parallel execution, the proposed method achieves computation time reduction of 107.71 and 274.57 times compared to those of with and without parallelisation of the commercial finite element codes if temperature-dependent thermal properties are considered, and 303.07 and 772.58 times if temperature-independent thermal properties are considered, far exceeding the computational performance of the commercial finite element codes, presenting great potential in real-time predictive analysis of tissue temperature for planning, optimisation and evaluation of thermo-therapeutic treatments. The source code is available at //github.com/jinaojakezhang/FEDFEMBioheat.

Networking · Performer · 控制器 · 可約的 · HPCC ·

2021 年 12 月 28 日

PowerTCP: Pushing the Performance Limits of Datacenter Networks

Vamsi Addanki,Oliver Michel,Stefan Schmid

Increasingly stringent throughput and latency requirements in datacenter networks demand fast and accurate congestion control. We observe that the reaction time and accuracy of existing datacenter congestion control schemes are inherently limited. They either rely only on explicit feedback about the network state (e.g., queue lengths in DCTCP) or only on variations of state (e.g., RTT gradient in TIMELY). To overcome these limitations, we propose a novel congestion control algorithm, PowerTCP, which achieves much more fine-grained congestion control by adapting to the bandwidth-window product (henceforth called power). PowerTCP leverages in-band network telemetry to react to changes in the network instantaneously without loss of throughput and while keeping queues short. Due to its fast reaction time, our algorithm is particularly well-suited for dynamic network environments and bursty traffic patterns. We show analytically and empirically that PowerTCP can significantly outperform the state-of-the-art in both traditional datacenter topologies and emerging reconfigurable datacenters where frequent bandwidth changes make congestion control challenging. In traditional datacenter networks, PowerTCP reduces tail flow completion times of short flows by 80% compared to DCQCN and TIMELY, and by 33% compared to HPCC even at 60% network load. In reconfigurable datacenters, PowerTCP achieves 85% circuit utilization without incurring additional latency and cuts tail latency by at least 2x compared to existing approaches.

Performer · 近似 · 模型評估 · MoDELS · 可理解性 ·

2021 年 12 月 27 日

PerfSim: A Performance Simulator for Cloud Native Microservice Chains

Michel Gokan Khan,Javid Taheri,Auday Al-Dulaimy,Andreas Kassler

from arxiv, for the dataset used for evaluation, see //ieee-dataport.org/documents/experiments-data-used-evaluating-perfsim-simulation-accuracy-based-sfc-stress-workloads and //ui.neptune.ai/o/kau/org/PerfSim/experiments. Source code will be available via perfsim.io in end of January 2022

Cloud native computing paradigm allows microservice-based applications to take advantage of cloud infrastructure in a scalable, reusable, and interoperable way. However, in a cloud native system, the vast number of configuration parameters and highly granular resource allocation policies can significantly impact the performance and deployment cost. For understanding and analyzing these implications in an easy, quick, and cost-effective way, we present PerfSim, a discrete-event simulator for approximating and predicting the performance of cloud native service chains in user-defined scenarios. To this end, we proposed a systematic approach for modeling the performance of microservices endpoint functions by collecting and analyzing their performance and network traces. With a combination of the extracted models and user-defined scenarios, PerfSim can then simulate the performance behavior of all services over a given period and provide an approximation for system KPIs, such as requests' average response time. Using the processing power of a single laptop, we evaluated both simulation accuracy and speed of PerfSim in 104 prevalent scenarios and compared the simulation results with the identical deployment in a real Kubernetes cluster. We achieved ~81-99% simulation accuracy in approximating the average response time of incoming requests and ~16-1200 times speed-up factor for the simulation.

歐幾里得距離 · 簇 · 可約的 · Processing（編程語言） · 早停 ·

2021 年 12 月 27 日

GPU-accelerated Faster Mean Shift with euclidean distance metrics

Le You,Han Jiang,Jinyong Hu,Chorng Chang,Lingxi Chen,Xintong Cui,Mengyang Zhao

from arxiv, 7 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2007.14283

Handling clustering problems are important in data statistics, pattern recognition and image processing. The mean-shift algorithm, a common unsupervised algorithms, is widely used to solve clustering problems. However, the mean-shift algorithm is restricted by its huge computational resource cost. In previous research[10], we proposed a novel GPU-accelerated Faster Mean-shift algorithm, which greatly speed up the cosine-embedding clustering problem. In this study, we extend and improve the previous algorithm to handle Euclidean distance metrics. Different from conventional GPU-based mean-shift algorithms, our algorithm adopts novel Seed Selection & Early Stopping approaches, which greatly increase computing speed and reduce GPU memory consumption. In the simulation testing, when processing a 200K points clustering problem, our algorithm achieved around 3 times speedup compared to the state-of-the-art GPU-based mean-shift algorithms with optimized GPU memory consumption. Moreover, in this study, we implemented a plug-and-play model for faster mean-shift algorithm, which can be easily deployed. (Plug-and-play model is available: //github.com/masqm/Faster-Mean-Shift-Euc)

簇 · 劃分 · Performer · 線性的 · Processing（編程語言） ·

2021 年 12 月 23 日

Comparative Analysis of Different Techniques of Real Time Scheduling for Multi-Core Platform

Girish Talmale,Urmila Shrawankar

from arxiv, 6 pages, 5 figures, 1 table

As the demand of real time computing increases day by day, there is a major paradigm shift in processing platform of real time system from single core to multi-core platform which provides advantages like higher throughput, linear power consumption, efficient utilization of processor cores and high performance per unit cost over the many single core processors unit. Currently available most popular real time schedulers for multi-core domain are partitioned and global scheduling and these schedulers not suitable to efficiently use this multi-core platform efficiently. Although, semi-partitioned algorithms increases utilization bound by using spare capacities left by partitioning via global scheduling, it has a inherent disadvantage of off-line task splitting. Although, semi-partitioned algorithms increases utilization bound by using spare capacities left by partitioning via global scheduling, it has a inherent disadvantage of off-line task splitting. To overcome these problems of multi-core real time scheduling algorithm new dynamic cluster based multi-core real time scheduling algorithm proposed which is hybrid scheduling approach. This paper discuss different multi-core scheduling techniques and comparative analysis of these techniques with the proposed dynamic cluster based real time multi-core scheduling

命名實體識別 · MoDELS · entity · Performer · Better ·

2020 年 10 月 14 日

Simplify the Usage of Lexicon in Chinese NER

Ruotian Ma,Minlong Peng,Qi Zhang,Xuanjing Huang

from arxiv, ACL 2020

Recently, many works have tried to augment the performance of Chinese named entity recognition (NER) using word lexicons. As a representative, Lattice-LSTM (Zhang and Yang, 2018) has achieved new benchmark results on several public Chinese NER datasets. However, Lattice-LSTM has a complex model architecture. This limits its application in many industrial areas where real-time NER responses are needed. In this work, we propose a simple but effective method for incorporating the word lexicon into the character representations. This method avoids designing a complicated sequence modeling architecture, and for any neural NER model, it requires only subtle adjustment of the character representation layer to introduce the lexicon information. Experimental studies on four benchmark Chinese NER datasets show that our method achieves an inference speed up to 6.15 times faster than those of state-ofthe-art methods, along with a better performance. The experimental results also show that the proposed method can be easily incorporated with pre-trained models like BERT.

Faster R-CNN · YOLO · R-CNN · Performer · 卷積神經網絡 ·

2018 年 12 月 28 日

Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3

Bilel Benjdira,Taha Khursheed,Anis Koubaa,Adel Ammar,Kais Ouni

from arxiv, This paper is accepted in The 1st Unmanned Vehicle Systems conference in Oman, Feb 2019

Unmanned Aerial Vehicles are increasingly being used in surveillance and traffic monitoring thanks to their high mobility and ability to cover areas at different altitudes and locations. One of the major challenges is to use aerial images to accurately detect cars and count them in real-time for traffic monitoring purposes. Several deep learning techniques were recently proposed based on convolution neural network (CNN) for real-time classification and recognition in computer vision. However, their performance depends on the scenarios where they are used. In this paper, we investigate the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN and YOLOv3, in the context of car detection from aerial images. We trained and tested these two models on a large car dataset taken from UAVs. We demonstrated in this paper that YOLOv3 outperforms Faster R-CNN in sensitivity and processing time, although they are comparable in the precision metric.

INTERACT · 流 · 圖片分類 · 目標跟蹤 · Neural Networks ·

2018 年 3 月 27 日

ClickBAIT-v2: Training an Object Detector in Real-Time

Ervin Teng,Rui Huang,Bob Iannucci

from arxiv, 8 pages, 13 figures. For ClickBAIT-v1, see arXiv:1709.05021

Modern deep convolutional neural networks (CNNs) for image classification and object detection are often trained offline on large static datasets. Some applications, however, will require training in real-time on live video streams with a human-in-the-loop. We refer to this class of problem as time-ordered online training (ToOT). These problems will require a consideration of not only the quantity of incoming training data, but the human effort required to annotate and use it. We demonstrate and evaluate a system tailored to training an object detector on a live video stream with minimal input from a human operator. We show that we can obtain bounding box annotation from weakly-supervised single-point clicks through interactive segmentation. Furthermore, by exploiting the time-ordered nature of the video stream through object tracking, we can increase the average training benefit of human interactions by 3-4 times.