亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The need to execute Deep Neural Networks (DNNs) at low latency and low power at the edge has spurred the development of new heterogeneous Systems-on-Chips (SoCs) encapsulating a diverse set of hardware accelerators. How to optimally map a DNN onto such multi-accelerator systems is an open problem. We propose ODiMO, a hardware-aware tool that performs a fine-grain mapping across different accelerators on-chip, splitting individual layers and executing them in parallel, to reduce inference energy consumption or latency, while taking into account each accelerator's quantization precision to maintain accuracy. Pareto-optimal networks in the accuracy vs. energy or latency space are pursued for three popular dataset/DNN pairs, and deployed on the DIANA heterogeneous ultra-low power edge AI SoC. We show that ODiMO reduces energy/latency by up to 33%/31% with limited accuracy drop (-0.53%/-0.32%) compared to manual heuristic mappings.

相關內容

機器學習系統(tong)設計系統(tong)評估標準

Bayesian inference and kernel methods are well established in machine learning. The neural network Gaussian process in particular provides a concept to investigate neural networks in the limit of infinitely wide hidden layers by using kernel and inference methods. Here we build upon this limit and provide a field-theoretic formalism which covers the generalization properties of infinitely wide networks. We systematically compute generalization properties of linear, non-linear, and deep non-linear networks for kernel matrices with heterogeneous entries. In contrast to currently employed spectral methods we derive the generalization properties from the statistical properties of the input, elucidating the interplay of input dimensionality, size of the training data set, and variability of the data. We show that data variability leads to a non-Gaussian action reminiscent of a ($\varphi^3+\varphi^4$)-theory. Using our formalism on a synthetic task and on MNIST we obtain a homogeneous kernel matrix approximation for the learning curve as well as corrections due to data variability which allow the estimation of the generalization properties and exact results for the bounds of the learning curves in the case of infinitely many training data points.

This article introduces a new Neural Network stochastic model to generate a 1-dimensional stochastic field with turbulent velocity statistics. Both the model architecture and training procedure ground on the Kolmogorov and Obukhov statistical theories of fully developed turbulence, so guaranteeing descriptions of 1) energy distribution, 2) energy cascade and 3) intermittency across scales in agreement with experimental observations. The model is a Generative Adversarial Network with multiple multiscale optimization criteria. First, we use three physics-based criteria: the variance, skewness and flatness of the increments of the generated field that retrieve respectively the turbulent energy distribution, energy cascade and intermittency across scales. Second, the Generative Adversarial Network criterion, based on reproducing statistical distributions, is used on segments of different length of the generated field. Furthermore, to mimic multiscale decompositions frequently used in turbulence's studies, the model architecture is fully convolutional with kernel sizes varying along the multiple layers of the model. To train our model we use turbulent velocity signals from grid turbulence at Modane wind tunnel.

Learning generic high-dimensional tasks is notably hard, as it requires a number of training data exponential in the dimension. Yet, deep convolutional neural networks (CNNs) have shown remarkable success in overcoming this challenge. A popular hypothesis is that learnable tasks are highly structured and that CNNs leverage this structure to build a low-dimensional representation of the data. However, little is known about how much training data they require, and how this number depends on the data structure. This paper answers this question for a simple classification task that seeks to capture relevant aspects of real data: the Random Hierarchy Model. In this model, each of the $n_c$ classes corresponds to $m$ synonymic compositions of high-level features, which are in turn composed of sub-features through an iterative process repeated $L$ times. We find that the number of training data $P^*$ required by deep CNNs to learn this task (i) grows asymptotically as $n_c m^L$, which is only polynomial in the input dimensionality; (ii) coincides with the training set size such that the representation of a trained network becomes invariant to exchanges of synonyms; (iii) corresponds to the number of data at which the correlations between low-level features and classes become detectable. Overall, our results indicate how deep CNNs can overcome the curse of dimensionality by building invariant representations, and provide an estimate of the number of data required to learn a task based on its hierarchically compositional structure.

The classical Minkowski problem for convex bodies has deeply influenced the development of differential geometry. During the past several decades, abundant mathematical theories have been developed for studying the solutions of the Minkowski problem, however, the numerical solution of this problem has been largely left behind, with only few methods available to achieve that goal. In this article, focusing on the two-dimensional Minkowski problem with Dirichlet boundary conditions, we introduce two solution methods, both based on operator-splitting. One of these two methods deals directly with the Dirichlet condition, while the other method uses an approximation of this Dirichlet condition. This relaxation of the Dirichlet condition makes this second method better suited than the first one to treat those situations where the Minkowski and the Dirichlet condition are not compatible. Both methods are generalizations of the solution method for the canonical Monge-Amp\`{e}re equation discussed by Glowinski et al. (Journal of Scientific Computing, 79(1), 1-47, 2019); as such they take advantage of a divergence formulation of the Minkowski problem, well-suited to a mixed finite element approximation, and to the the time-discretization via an operator-splitting scheme, of an associated initial value problem. Our methodology can be easily implemented on convex domains of rather general shape (with curved boundaries, possibly). The numerical experiments we performed validate both methods and show that if one uses continuous piecewise affine finite element approximations of the smooth solution of the Minkowski problem and of its three second order derivatives, these two methods provide nearly second order accuracy for the $L^2$ and $L^{\infty}$ error. One can extend easily the methods discussed in this article, to address the solution of three-dimensional Minkowski problem.

Bayesian inference has widely acknowledged advantages in many problems, but it can also be unreliable if the model is misspecified. Bayesian modular inference is concerned with inference in complex models which have been specified through a collection of coupled sub-models. The sub-models are called modules in the literature, and they often arise from modeling different data sources, or from combining domain knowledge from different disciplines. When some modules are misspecified, cutting feedback is a widely used Bayesian modular inference method which ensures that information from suspect model components is not used in making inferences about parameters in correctly specified modules. However, in general settings it is difficult to decide when this ``cut posterior'' is preferable to the exact posterior. When misspecification is not severe, cutting feedback may increase the uncertainty in Bayesian posterior inference greatly without reducing estimation bias substantially. This motivates semi-modular inference methods, which avoid the binary cut of cutting feedback approaches. In this work, using a local model misspecification framework, we provide the first precise formulation of the the bias-variance trade-off that has motivated the literature on semi-modular inference. We then implement a mixture-based semi-modular inference approach, demonstrating theoretically that it delivers inferences that are more accurate, in terms of a user-defined loss function, than if either the cut or full posterior were used by themselves. The new method is demonstrated in a number of applications.

Neural operators have emerged as a powerful tool for learning the mapping between infinite-dimensional parameter and solution spaces of partial differential equations (PDEs). In this work, we focus on multiscale PDEs that have important applications such as reservoir modeling and turbulence prediction. We demonstrate that for such PDEs, the spectral bias towards low-frequency components presents a significant challenge for existing neural operators. To address this challenge, we propose a hierarchical attention neural operator (HANO) inspired by the hierarchical matrix approach. HANO features a scale-adaptive interaction range and self-attentions over a hierarchy of levels, enabling nested feature computation with controllable linear cost and encoding/decoding of multiscale solution space. We also incorporate an empirical $H^1$ loss function to enhance the learning of high-frequency components. Our numerical experiments demonstrate that HANO outperforms state-of-the-art (SOTA) methods for representative multiscale problems.

Spiking neural networks (SNNs) are brain-inspired energy-efficient models that encode information in spatiotemporal dynamics. Recently, deep SNNs trained directly have shown great success in achieving high performance on classification tasks with very few time steps. However, how to design a directly-trained SNN for the regression task of object detection still remains a challenging problem. To address this problem, we propose EMS-YOLO, a novel directly-trained SNN framework for object detection, which is the first trial to train a deep SNN with surrogate gradients for object detection rather than ANN-SNN conversion strategies. Specifically, we design a full-spike residual block, EMS-ResNet, which can effectively extend the depth of the directly-trained SNN with low power consumption. Furthermore, we theoretically analyze and prove the EMS-ResNet could avoid gradient vanishing or exploding. The results demonstrate that our approach outperforms the state-of-the-art ANN-SNN conversion methods (at least 500 time steps) in extremely fewer time steps (only 4 time steps). It is shown that our model could achieve comparable performance to the ANN with the same architecture while consuming 5.83 times less energy on the frame-based COCO Dataset and the event-based Gen1 Dataset.

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand in order to enable numerous edge AI applications. This paper provides an overview of efficient deep learning methods, systems and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.

Deep convolutional neural networks (CNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and sharing will be described at the beginning, after that the other techniques will be introduced. For each scheme, we provide insightful analysis regarding the performance, related applications, advantages, and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrix, the main datasets used for evaluating the model performance and recent benchmarking efforts. Finally, we conclude this paper, discuss remaining challenges and possible directions on this topic.

北京阿比特科技有限公司