国产成人精品三级在线_亚洲国产中文精品在线观看香蕉_中文字幕成人久久久久精品_亚洲欧美日韩第一区二区_91人成在线观看网站_偷拍女厕一级毛片免费播放_中文字幕在线免费观看视频一区

from arxiv, Author accepted version of paper in ACM Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W 2023)

MLIR has become popular since it was open sourced in 2019. A sub-project of LLVM, the flexibility provided by MLIR to represent Intermediate Representations (IR) as dialects at different abstraction levels, to mix these, and to leverage transformations between dialects provides opportunities for automated program optimisation and parallelisation. In addition to general purpose compilers built upon MLIR, domain specific abstractions have also been developed. In this paper we explore complimenting the Flang MLIR general purpose compiler by combining with the domain specific Open Earth Compiler's MLIR stencil dialect. Developing transformations to discover and extracts stencils from Fortran, this specialisation delivers between a 2 and 10 times performance improvement for our benchmarks on a Cray supercomputer compared to using Flang alone. Furthermore, by leveraging existing MLIR transformations we develop an auto-parallelisation approach targeting multi-threaded and distributed memory parallelism, and optimised execution on GPUs, without any modifications to the serial Fortran source code.

相關內容

Performer

關注 10

回合 · 優化器 · 確切的 · Processing（編程語言） · 3D ·

2023 年 11 月 15 日

Automatic cable harness layout routing in a customizable 3D environment

T. Karlsson,E. ?blad,T. Hermansson,J. S. Carlson,G. Tenf?lt

from arxiv, 14 pages, submitted for publication

Designing cable harnesses can be time-consuming and complex due to many design and manufacturing aspects and rules. Automating the design process can help to fulfil these rules, speed up the process, and optimize the design. To accommodate this, we formulate a harness routing optimization problem to minimize cable lengths, maximize bundling by rewarding shared paths, and optimize the cables' spatial location with respect to case-specific information of the routing environment, e.g., zones to avoid. A deterministic and computationally effective cable harness routing algorithm has been developed to solve the routing problem and is used to generate a set of cable harness topology candidates and approximate the Pareto front. Our approach was tested against a stochastic and an exact solver and our routing algorithm generated objective function values better than the stochastic approach and close to the exact solver. Our algorithm was able to find solutions, some of them being proven to be near-optimal, for three industrial-sized 3D cases within reasonable time (in magnitude of seconds to minutes) and the computation times were comparable to those of the stochastic approach.

近似 · 分離的 · Analysis · Continuity · 原點 ·

2023 年 11 月 15 日

A unified framework for multiscale spectral generalized FEMs and low-rank approximations to multiscale PDEs

Chupeng Ma

This work presents an abstract framework for the design, implementation, and analysis of the multiscale spectral generalized finite element method (MS-GFEM), a particular numerical multiscale method originally proposed in [I. Babuska and R. Lipton, Multiscale Model.\;\,Simul., 9 (2011), pp.~373--406]. MS-GFEM is a partition of unity method employing optimal local approximation spaces constructed from local spectral problems. We establish a general local approximation theory demonstrating exponential convergence with respect to local degrees of freedom under certain assumptions, with explicit dependence on key problem parameters. Our framework applies to a broad class of multiscale PDEs with $L^{\infty}$-coefficients in both continuous and discrete, finite element settings, including highly indefinite problems (convection-dominated diffusion, as well as the high-frequency Helmholtz, Maxwell and elastic wave equations with impedance boundary conditions), and higher-order problems. Notably, we prove a local convergence rate of $O(e^{-cn^{1/d}})$ for MS-GFEM for all these problems, improving upon the $O(e^{-cn^{1/(d+1)}})$ rate shown by Babuska and Lipton. Moreover, based on the abstract local approximation theory for MS-GFEM, we establish a unified framework for showing low-rank approximations to multiscale PDEs. This framework applies to the aforementioned problems, proving that the associated Green's functions admit an $O(|\log\epsilon|^{d})$-term separable approximation on well-separated domains with error $\epsilon>0$. Our analysis improves and generalizes the result in [M. Bebendorf and W. Hackbusch, Numerische Mathematik, 95 (2003), pp.~1-28] where an $O(|\log\epsilon|^{d+1})$-term separable approximation was proved for Poisson-type problems.

可約的 · Oracle · 相互獨立的 · 評論員 · 近似 ·

2023 年 11 月 14 日

A practical key-recovery attack on LWE-based key-encapsulation mechanism schemes using Rowhammer

Puja Mondal,Suparna Kundu,Sarani Bhattacharya,Angshuman Karmakar,Ingrid Verbauwhede

Physical attacks are serious threats to cryptosystems deployed in the real world. In this work, we propose a microarchitectural end-to-end attack methodology on generic lattice-based post-quantum key encapsulation mechanisms to recover the long-term secret key. Our attack targets a critical component of a Fujisaki-Okamoto transform that is used in the construction of almost all lattice-based key encapsulation mechanisms. We demonstrate our attack model on practical schemes such as Kyber and Saber by using Rowhammer. We show that our attack is highly practical and imposes little preconditions on the attacker to succeed. As an additional contribution, we propose an improved version of the plaintext checking oracle, which is used by almost all physical attack strategies on lattice-based key-encapsulation mechanisms. Our improvement reduces the number of queries to the plaintext checking oracle by as much as $39\%$ for Saber and approximately $23\%$ for Kyber768. This can be of independent interest and can also be used to reduce the complexity of other attacks.

Performance · 推斷 · AI · 設計 · Performer ·

2023 年 11 月 14 日

System and Design Technology Co-optimization of SOT-MRAM for High-Performance AI Accelerator Memory System

Kaniz Mishty,Mehdi Sadi

SoCs are now designed with their own AI accelerator segment to accommodate the ever-increasing demand of Deep Learning (DL) applications. With powerful MAC engines for matrix multiplications, these accelerators show high computing performance. However, because of limited memory resources (i.e., bandwidth and capacity), they fail to achieve optimum system performance during large batch training and inference. In this work, we propose a memory system with high on-chip capacity and bandwidth to shift the gear of AI accelerators from memory-bound to achieving system-level peak performance. We develop the memory system with DTCO-enabled customized SOT-MRAM as large on-chip memory through STCO and detailed characterization of the DL workloads. %We evaluate our workload-aware memory system on the CV and NLP benchmarks and observe significant PPA improvement compared to an SRAM-based in both inference and training modes. Our workload-aware memory system achieves 8X energy and 9X latency improvement on Computer Vision (CV) benchmarks in training and 8X energy and 4.5X latency improvement on Natural Language Processing (NLP) benchmarks in training while consuming only around 50% of SRAM area at iso-capacity.

Analysis · MoDELS · 近似 · Integration · 成比例 ·

2023 年 11 月 13 日

Bayesian survival analysis with INLA

Danilo Alvares,Janet van Niekerk,Elias Teixeira Krainski,H?vard Rue,Denis Rustand

This tutorial shows how various Bayesian survival models can be fitted using the integrated nested Laplace approximation in a clear, legible, and comprehensible manner using the INLA and INLAjoint R-packages. Such models include accelerated failure time, proportional hazards, mixture cure, competing risks, multi-state, frailty, and joint models of longitudinal and survival data, originally presented in the article "Bayesian survival analysis with BUGS" (Alvares et al., 2021). In addition, we illustrate the implementation of a new joint model for a longitudinal semicontinuous marker, recurrent events, and a terminal event. Our proposal aims to provide the reader with syntax examples for implementing survival models using a fast and accurate approximate Bayesian inferential approach.

奇異的 · Analysis · 卷積 · 截斷誤差 · 離散化 ·

2023 年 11 月 12 日

Convolution quadrature for Hadamard fractional calculus and correction methods for the subdiffusion with singular source terms

Baoli Yin,Guoyu Zhang,Yang Liu,Hong Li

from arxiv, 23 pages

The convolution quadrature method originally developed for the Riemann-Liouville fractional calculus is extended in this work to the Hadamard fractional calculus by using the exponential type meshes. Local truncation error analysis is presented for singular solutions. By adopting the fractional BDF-$p(1\leq p \leq 6)$ for the Caputo-Hadamard fractional derivative in solving subdiffusion problem with singular source terms, and using the finite element method to discretize the space variable, we carry out the sharp error analysis rigorously and obtain the optimal accuracy by the novel correction technique. Our correction method is a natural generalization of the one developed for subdiffusion problems with smooth source terms. Numerical tests confirm the correctness of our theoretical results.

向量化 · FFT · PageRank · 容差 · Performer ·

2023 年 11 月 11 日

Short reasons for long vectors in HPC CPUs: a study based on RISC-V

Pablo Vizcaino,Georgios Ieronymakis,Nikolaos Dimou,Vassilis Papaefstathiou,Jesus Labarta,Filippo Mantovani

from arxiv, SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis Denver CO USA November 12 - 17, 2023

For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction. The optimal vector width and its impact on CPU throughput due to memory latency and bandwidth remain challenging research areas. This study examines the behavior of four computational kernels on a RISC-V core connected to a customizable vector unit, capable of operating up to 256 double precision elements per instruction. The four codes have been purposefully selected to represent non-dense workloads: SpMV, BFS, PageRank, FFT. The experimental setup allows us to measure their performance while varying the vector length, the memory latency, and bandwidth. Our results not only show that larger vector lengths allow for better tolerance of limitations in the memory subsystem but also offer hope to code developers beyond dense linear algebra.

Networking · binary · 類別 · MoDELS · 聯系函數 ·

2023 年 11 月 11 日

Asymptotic in a class of network models with an increasing sub-Gamma degree sequence

Jing Luo,Haoyu Wei,Xiaoyu Lei,Jiaxin Guo

from arxiv, arXiv admin note: text overlap with arXiv:2002.12733 by other authors

For the differential privacy under the sub-Gamma noise, we derive the asymptotic properties of a class of network models with binary values with a general link function. In this paper, we release the degree sequences of the binary networks under a general noisy mechanism with the discrete Laplace mechanism as a special case. We establish the asymptotic result including both consistency and asymptotically normality of the parameter estimator when the number of parameters goes to infinity in a class of network models. Simulations and a real data example are provided to illustrate asymptotic results.

估計/估計量 · Markov · 馬爾可夫鏈 · 可逆馬爾可夫鏈 · 方差 ·

2023 年 11 月 9 日

Efficient shape-constrained inference for the autocovariance sequence from a reversible Markov chain

Stephen Berg,Hyebin Song

In this paper, we study the problem of estimating the autocovariance sequence resulting from a reversible Markov chain. A motivating application for studying this problem is the estimation of the asymptotic variance in central limit theorems for Markov chains. We propose a novel shape-constrained estimator of the autocovariance sequence, which is based on the key observation that the representability of the autocovariance sequence as a moment sequence imposes certain shape constraints. We examine the theoretical properties of the proposed estimator and provide strong consistency guarantees for our estimator. In particular, for geometrically ergodic reversible Markov chains, we show that our estimator is strongly consistent for the true autocovariance sequence with respect to an $\ell_2$ distance, and that our estimator leads to strongly consistent estimates of the asymptotic variance. Finally, we perform empirical studies to illustrate the theoretical properties of the proposed estimator as well as to demonstrate the effectiveness of our estimator in comparison with other current state-of-the-art methods for Markov chain Monte Carlo variance estimation, including batch means, spectral variance estimators, and the initial convex sequence estimator.

Pegasus · Performer · state-of-the-art · MoDELS · ROUGE ·

2020 年 6 月 2 日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Jingqing Zhang,Yao Zhao,Mohammad Saleh,Peter J. Liu

from arxiv, Added Human Evaluation results; Code link added; Accepted for ICML 2020

Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples. Finally we validated our results using human evaluation and show that our model summaries achieve human performance on multiple datasets.