亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

MLIR has become popular since it was open sourced in 2019. A sub-project of LLVM, the flexibility provided by MLIR to represent Intermediate Representations (IR) as dialects at different abstraction levels, to mix these, and to leverage transformations between dialects provides opportunities for automated program optimisation and parallelisation. In addition to general purpose compilers built upon MLIR, domain specific abstractions have also been developed. In this paper we explore complimenting the Flang MLIR general purpose compiler by combining with the domain specific Open Earth Compiler's MLIR stencil dialect. Developing transformations to discover and extracts stencils from Fortran, this specialisation delivers between a 2 and 10 times performance improvement for our benchmarks on a Cray supercomputer compared to using Flang alone. Furthermore, by leveraging existing MLIR transformations we develop an auto-parallelisation approach targeting multi-threaded and distributed memory parallelism, and optimised execution on GPUs, without any modifications to the serial Fortran source code.

相關內容

Designing cable harnesses can be time-consuming and complex due to many design and manufacturing aspects and rules. Automating the design process can help to fulfil these rules, speed up the process, and optimize the design. To accommodate this, we formulate a harness routing optimization problem to minimize cable lengths, maximize bundling by rewarding shared paths, and optimize the cables' spatial location with respect to case-specific information of the routing environment, e.g., zones to avoid. A deterministic and computationally effective cable harness routing algorithm has been developed to solve the routing problem and is used to generate a set of cable harness topology candidates and approximate the Pareto front. Our approach was tested against a stochastic and an exact solver and our routing algorithm generated objective function values better than the stochastic approach and close to the exact solver. Our algorithm was able to find solutions, some of them being proven to be near-optimal, for three industrial-sized 3D cases within reasonable time (in magnitude of seconds to minutes) and the computation times were comparable to those of the stochastic approach.

This work presents an abstract framework for the design, implementation, and analysis of the multiscale spectral generalized finite element method (MS-GFEM), a particular numerical multiscale method originally proposed in [I. Babuska and R. Lipton, Multiscale Model.\;\,Simul., 9 (2011), pp.~373--406]. MS-GFEM is a partition of unity method employing optimal local approximation spaces constructed from local spectral problems. We establish a general local approximation theory demonstrating exponential convergence with respect to local degrees of freedom under certain assumptions, with explicit dependence on key problem parameters. Our framework applies to a broad class of multiscale PDEs with $L^{\infty}$-coefficients in both continuous and discrete, finite element settings, including highly indefinite problems (convection-dominated diffusion, as well as the high-frequency Helmholtz, Maxwell and elastic wave equations with impedance boundary conditions), and higher-order problems. Notably, we prove a local convergence rate of $O(e^{-cn^{1/d}})$ for MS-GFEM for all these problems, improving upon the $O(e^{-cn^{1/(d+1)}})$ rate shown by Babuska and Lipton. Moreover, based on the abstract local approximation theory for MS-GFEM, we establish a unified framework for showing low-rank approximations to multiscale PDEs. This framework applies to the aforementioned problems, proving that the associated Green's functions admit an $O(|\log\epsilon|^{d})$-term separable approximation on well-separated domains with error $\epsilon>0$. Our analysis improves and generalizes the result in [M. Bebendorf and W. Hackbusch, Numerische Mathematik, 95 (2003), pp.~1-28] where an $O(|\log\epsilon|^{d+1})$-term separable approximation was proved for Poisson-type problems.

Physical attacks are serious threats to cryptosystems deployed in the real world. In this work, we propose a microarchitectural end-to-end attack methodology on generic lattice-based post-quantum key encapsulation mechanisms to recover the long-term secret key. Our attack targets a critical component of a Fujisaki-Okamoto transform that is used in the construction of almost all lattice-based key encapsulation mechanisms. We demonstrate our attack model on practical schemes such as Kyber and Saber by using Rowhammer. We show that our attack is highly practical and imposes little preconditions on the attacker to succeed. As an additional contribution, we propose an improved version of the plaintext checking oracle, which is used by almost all physical attack strategies on lattice-based key-encapsulation mechanisms. Our improvement reduces the number of queries to the plaintext checking oracle by as much as $39\%$ for Saber and approximately $23\%$ for Kyber768. This can be of independent interest and can also be used to reduce the complexity of other attacks.

SoCs are now designed with their own AI accelerator segment to accommodate the ever-increasing demand of Deep Learning (DL) applications. With powerful MAC engines for matrix multiplications, these accelerators show high computing performance. However, because of limited memory resources (i.e., bandwidth and capacity), they fail to achieve optimum system performance during large batch training and inference. In this work, we propose a memory system with high on-chip capacity and bandwidth to shift the gear of AI accelerators from memory-bound to achieving system-level peak performance. We develop the memory system with DTCO-enabled customized SOT-MRAM as large on-chip memory through STCO and detailed characterization of the DL workloads. %We evaluate our workload-aware memory system on the CV and NLP benchmarks and observe significant PPA improvement compared to an SRAM-based in both inference and training modes. Our workload-aware memory system achieves 8X energy and 9X latency improvement on Computer Vision (CV) benchmarks in training and 8X energy and 4.5X latency improvement on Natural Language Processing (NLP) benchmarks in training while consuming only around 50% of SRAM area at iso-capacity.

This tutorial shows how various Bayesian survival models can be fitted using the integrated nested Laplace approximation in a clear, legible, and comprehensible manner using the INLA and INLAjoint R-packages. Such models include accelerated failure time, proportional hazards, mixture cure, competing risks, multi-state, frailty, and joint models of longitudinal and survival data, originally presented in the article "Bayesian survival analysis with BUGS" (Alvares et al., 2021). In addition, we illustrate the implementation of a new joint model for a longitudinal semicontinuous marker, recurrent events, and a terminal event. Our proposal aims to provide the reader with syntax examples for implementing survival models using a fast and accurate approximate Bayesian inferential approach.

The convolution quadrature method originally developed for the Riemann-Liouville fractional calculus is extended in this work to the Hadamard fractional calculus by using the exponential type meshes. Local truncation error analysis is presented for singular solutions. By adopting the fractional BDF-$p(1\leq p \leq 6)$ for the Caputo-Hadamard fractional derivative in solving subdiffusion problem with singular source terms, and using the finite element method to discretize the space variable, we carry out the sharp error analysis rigorously and obtain the optimal accuracy by the novel correction technique. Our correction method is a natural generalization of the one developed for subdiffusion problems with smooth source terms. Numerical tests confirm the correctness of our theoretical results.

For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction. The optimal vector width and its impact on CPU throughput due to memory latency and bandwidth remain challenging research areas. This study examines the behavior of four computational kernels on a RISC-V core connected to a customizable vector unit, capable of operating up to 256 double precision elements per instruction. The four codes have been purposefully selected to represent non-dense workloads: SpMV, BFS, PageRank, FFT. The experimental setup allows us to measure their performance while varying the vector length, the memory latency, and bandwidth. Our results not only show that larger vector lengths allow for better tolerance of limitations in the memory subsystem but also offer hope to code developers beyond dense linear algebra.

For the differential privacy under the sub-Gamma noise, we derive the asymptotic properties of a class of network models with binary values with a general link function. In this paper, we release the degree sequences of the binary networks under a general noisy mechanism with the discrete Laplace mechanism as a special case. We establish the asymptotic result including both consistency and asymptotically normality of the parameter estimator when the number of parameters goes to infinity in a class of network models. Simulations and a real data example are provided to illustrate asymptotic results.

In this paper, we study the problem of estimating the autocovariance sequence resulting from a reversible Markov chain. A motivating application for studying this problem is the estimation of the asymptotic variance in central limit theorems for Markov chains. We propose a novel shape-constrained estimator of the autocovariance sequence, which is based on the key observation that the representability of the autocovariance sequence as a moment sequence imposes certain shape constraints. We examine the theoretical properties of the proposed estimator and provide strong consistency guarantees for our estimator. In particular, for geometrically ergodic reversible Markov chains, we show that our estimator is strongly consistent for the true autocovariance sequence with respect to an $\ell_2$ distance, and that our estimator leads to strongly consistent estimates of the asymptotic variance. Finally, we perform empirical studies to illustrate the theoretical properties of the proposed estimator as well as to demonstrate the effectiveness of our estimator in comparison with other current state-of-the-art methods for Markov chain Monte Carlo variance estimation, including batch means, spectral variance estimators, and the initial convex sequence estimator.

Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples. Finally we validated our results using human evaluation and show that our model summaries achieve human performance on multiple datasets.

北京阿比特科技有限公司