亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Numerically solving ordinary differential equations (ODEs) is a naturally serial process and as a result the vast majority of ODE solver software are serial. In this manuscript we developed a set of parallelized ODE solvers using extrapolation methods which exploit "parallelism within the method" so that arbitrary user ODEs can be parallelized. We describe the specific choices made in the implementation of the explicit and implicit extrapolation methods which allow for generating low overhead static schedules to then exploit with optimized multi-threaded implementations. We demonstrate that while the multi-threading gives a noticeable acceleration on both explicit and implicit problems, the explicit parallel extrapolation methods gave no significant improvement over state-of-the-art even with a multi-threading advantage against current optimized high order Runge-Kutta tableaus. However, we demonstrate that the implicit parallel extrapolation methods are able to achieve state-of-the-art performance (2x-4x) on standard multicore x86 CPUs for systems of $<200$ stiff ODEs solved at low tolerance, a typical setup for a vast majority of users of high level language equation solver suites. The resulting method is distributed as the first widely available open source software for within-method parallel acceleration targeting typical modest compute architectures.

相關內容

In this work, we study the convergence and performance of nonlinear solvers for the Bidomain equations after decoupling the ordinary and partial differential equations of the cardiac system. We first rigorously prove that Quasi-Newton methods such as BFGS and nonlinear Conjugate-Gradient such as Fletcher-Reeves methods are globally convergent, by studying an auxiliary variational problem under physically reasonable hypotheses. Then, we compare several nonlinear solvers in terms of execution time, robustness with respect to the data and parallel scalability. Our results suggest that Quasi-Newton methods are the best choice for this type of problem, being faster than standard Newton-Krylov methods without hindering their robustness or scalability. In addition, first order methods are also competitive, and represent a better alternative for matrix-free implementations, which are suitable for GPU computing.

We develop an optimization-based algorithm for parametric model order reduction (PMOR) of linear time-invariant dynamical systems. Our method aims at minimizing the $\mathcal{H}_\infty \otimes \mathcal{L}_\infty$ approximation error in the frequency and parameter domain by an optimization of the reduced order model (ROM) matrices. State-of-the-art PMOR methods often compute several nonparametric ROMs for different parameter samples, which are then combined to a single parametric ROM. However, these parametric ROMs can have a low accuracy between the utilized sample points. In contrast, our optimization-based PMOR method minimizes the approximation error across the entire parameter domain. Moreover, due to our flexible approach of optimizing the system matrices directly, we can enforce favorable features such as a port-Hamiltonian structure in our ROMs across the entire parameter domain. Our method is an extension of the recently developed SOBMOR-algorithm to parametric systems. We extend both the ROM parameterization and the adaptive sampling procedure to the parametric case. Several numerical examples demonstrate the effectiveness and high accuracy of our method in a comparison with other PMOR methods.

A key challenge for a common waveform for Integrated Sensing and Communications (ISAC) - widely seen as an attractive proposition to achieve high performance for both functionalities, while efficiently utilizing available resources -- lies in leveraging information-bearing channel-coded communications signals (c.c.s) for sensing. In this paper, we investigate the sensing performance of c.c.s in (multi-user) interference-limited operation, and show that it is limited by sidelobes in the range-Doppler map, whose form depends on whether the c.c.s modulates a single-carrier or OFDM waveform. While uncoded communications signals -- comprising a block of $N$ i.i.d zero-mean symbols -- give rise to asymptotically (i.e., as $N \rightarrow \infty$) zero sidelobes due to the law of large numbers, it is not obvious that the same holds for c.c.s, as structured channel coding schemes (e.g., linear block codes) induce dependence across codeword symbols. In this paper, we show that c.c.s also give rise to asymptotically zero sidelobes -- for both single-carrier and OFDM waveforms -- by deriving upper bounds for the tail probabilities of the sidelobe magnitudes that decay as $\exp( - O($code rate $\times$ block length$))$. This implies that for any code rate, c.c.s are effective sensing signals that are robust to multi-user interference at sufficiently large block lengths, with negligible difference in performance based on whether they modulate a single-carrier or OFDM waveform. We verify the latter implication through simulations, where we observe the sensing performance (characterized by the detection and false-alarm probabilities) of a QPSK-modulated c.c.s (code rate = 120/1024, block length = 1024 symbols) to match that of a comparable interference-free FMCW waveform even at high interference levels (signal-to-interference ratio of -11dB), for both single-carrier and OFDM waveforms.

In their seminal paper that initiated the field of algorithmic mechanism design, \citet{NR99} studied the problem of designing strategyproof mechanisms for scheduling jobs on unrelated machines aiming to minimize the makespan. They provided a strategyproof mechanism that achieves an $n$-approximation and they made the bold conjecture that this is the best approximation achievable by any deterministic strategyproof scheduling mechanism. After more than two decades and several efforts, $n$ remains the best known approximation and very recent work by \citet{CKK21} has been able to prove an $\Omega(\sqrt{n})$ approximation lower bound for all deterministic strategyproof mechanisms. This strong negative result, however, heavily depends on the fact that the performance of these mechanisms is evaluated using worst-case analysis. To overcome such overly pessimistic, and often uninformative, worst-case bounds, a surge of recent work has focused on the ``learning-augmented framework'', whose goal is to leverage machine-learned predictions to obtain improved approximations when these predictions are accurate (consistency), while also achieving near-optimal worst-case approximations even when the predictions are arbitrarily wrong (robustness). In this work, we study the classic strategic scheduling problem of~\citet{NR99} using the learning-augmented framework and give a deterministic polynomial-time strategyproof mechanism that is $6$-consistent and $2n$-robust. We thus achieve the ``best of both worlds'': an $O(1)$ consistency and an $O(n)$ robustness that asymptotically matches the best-known approximation. We then extend this result to provide more general worst-case approximation guarantees as a function of the prediction error. Finally, we complement our positive results by showing that any $1$-consistent deterministic strategyproof mechanism has unbounded robustness.

We extend results known for the randomized Gauss-Seidel and the Gauss-Southwell methods for the case of a Hermitian and positive definite matrix to certain classes of non-Hermitian matrices. We obtain convergence results for a whole range of parameters describing the probabilities in the randomized method or the greedy choice strategy in the Gauss-Southwell-type methods. We identify those choices which make our convergence bounds best possible. Our main tool is to use weighted l1-norms to measure the residuals. A major result is that the best convergence bounds that we obtain for the expected values in the randomized algorithm are as good as the best for the deterministic, but more costly algorithms of Gauss-Southwell type. Numerical experiments illustrate the convergence of the method and the bounds obtained. Comparisons with the randomized Kaczmarz method are also presented.

Motivated by the success of the serial dictatorship mechanism in social choice settings, we explore its usefulness in tackling various combinatorial optimization problems. We do so by considering an abstract model, in which a set of agents are asked to act in a particular ordering, called the action sequence. Each agent acts in a way that gives her the maximum possible value, given the actions of the agents who preceded her in the action sequence. Our goal is to compute action sequences that yield approximately optimal total value to the agents (a.k.a., social welfare). We assume query access to the value $v_i(S)$ that the agent i gets when she acts after the agents in the ordered set $S$. We establish tight bounds on the social welfare that can be achieved using polynomially many queries. Even though these bounds show a marginally sublinear approximation of optimal social welfare in general, excellent approximations can be obtained when the valuations stem from an underlying combinatorial domain. Indicatively, when the valuations are defined using bipartite matchings, arborescences in directed graphs, and satisfiability of Boolean expressions, simple query-efficient algorithms yield $2$-approximations. We discuss issues related to truthfulness and show how some of our algorithms can be implemented truthfully using VCG-like payments. Finally, we introduce and study the price of serial dictatorship, a notion that provides an optimistic measure of the quality of combinatorial optimization solutions generated by action sequences.

The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.

The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.

北京阿比特科技有限公司