国产乱人弄视频免费观看_91日韩国产无码_欧美一级免费大片_国产又爽又黄又湿免费99_日韩欧美精品中文字幕富二代_精品日韩在线视频一区二区三区_91影视在线看黄片

Hardware reliability is adversely affected by the downscaling of semiconductor devices and the scale-out of systems necessitated by modern applications. Apart from crashes, this unreliability often manifests as silent data corruptions (SDCs), affecting application output. Therefore, we need low-cost and low-human-effort solutions to reduce the incidence rate and the effects of SDCs on the quality of application outputs. We propose Artificial Neural Networks (ANNs) as an effective mechanism for online error detection. We train ANNs using software fault injection. We find that the average overhead of our approach, followed by a costly error correction by re-execution, is 6.45% in terms of CPU cycles. We also report that ANNs discover 94.85% of faults thereby resulting in minimal output quality degradation. To validate our approach we overclock ARM Cortex A53 CPUs, execute benchmarks on them and record the program outputs. ANNs prove to be an efficient error detection mechanism, better than a state of the art approximate error detection mechanism (Topaz), both in terms of performance (12.81% CPU overhead) and quality of application output (94.11% detection coverage).

相關內容

人工神經網絡

關注 130

人工(gong)神(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)（Artificial Neural Network，即ANN），它從(cong)信(xin)息處理角度對人腦神(shen)(shen)經(jing)(jing)元網(wang)(wang)絡(luo)進行抽象，建立某種(zhong)(zhong)簡單模型(xing)，按不(bu)同(tong)(tong)的(de)(de)連(lian)接(jie)方(fang)式組成不(bu)同(tong)(tong)的(de)(de)網(wang)(wang)絡(luo)。在工(gong)程與(yu)學(xue)術界也常(chang)(chang)直接(jie)簡稱(cheng)為神(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)或(huo)類(lei)神(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)。神(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)是(shi)(shi)一種(zhong)(zhong)運算(suan)模型(xing)，由(you)大量的(de)(de)節點(dian)（或(huo)稱(cheng)神(shen)(shen)經(jing)(jing)元）之間相互(hu)聯接(jie)構成。每(mei)個(ge)節點(dian)代表一種(zhong)(zhong)特定的(de)(de)輸(shu)出(chu)函(han)數，稱(cheng)為激勵函(han)數（activation function）。每(mei)兩個(ge)節點(dian)間的(de)(de)連(lian)接(jie)都(dou)代表一個(ge)對于通過該連(lian)接(jie)信(xin)號的(de)(de)加權值(zhi)，稱(cheng)之為權重(zhong)，這相當(dang)于人工(gong)神(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)的(de)(de)記憶(yi)。網(wang)(wang)絡(luo)的(de)(de)輸(shu)出(chu)則依網(wang)(wang)絡(luo)的(de)(de)連(lian)接(jie)方(fang)式，權重(zhong)值(zhi)和激勵函(han)數的(de)(de)不(bu)同(tong)(tong)而(er)不(bu)同(tong)(tong)。而(er)網(wang)(wang)絡(luo)自(zi)身通常(chang)(chang)都(dou)是(shi)(shi)對自(zi)然界某種(zhong)(zhong)算(suan)法(fa)或(huo)者函(han)數的(de)(de)逼(bi)近，也可能(neng)是(shi)(shi)對一種(zhong)(zhong)邏輯策(ce)略的(de)(de)表達。

特化 · 可約的 · Weight · Performer · 稀疏 ·

2022 年 2 月 1 日

Sense: Model Hardware Co-design for Accelerating Sparse Neural Networks

Wenhao Sun,Deng Liu,Zhiwei Zou,Wendi Sun,Junpeng Wang,Yi Kang,Song Chen

from arxiv, 21 pages, 18 figures, 7 tables, Transactions on Embedded Computing Systems

Sparsity is an intrinsic property of neural network(NN). Many software researchers have attempted to improve sparsity through pruning, for reduction on weight storage and computation workload, while hardware architects are working on how to skip redundant computations for higher energy efciency, but there always exists overhead, causing many architectures suffering from only minor proft. Therefrom, systolic array becomes a promising candidate for the advantages of low fanout and high throughput. However, sparsity is irregular, making it tricky to ft in with the rigid systolic tempo. Thus, this paper proposed a systolic-based architecture, called Sense, for both sparse input feature map(IFM) and weight processing, achieving large performance improvement with relatively small resource and power consumption. Meanwhile, we applied channel rearrangement to gather IFMs with approximate sparsity and co-designed an adaptive weight training method to keep the sparsity ratio(zero element percentage) of each kernel at 1/2, with little accuracy loss. This treatment can effectively reduce the irregularity of sparsity and help better ft with systolic dataflow. Additionally, a dataflow called Partition Reuse is mapped to our architecture, enhancing data reuse, lowering 1.9x-2.6x DRAM access reduction compared with Eyeriss and further reducing system energy consumption. The whole design is implemented on ZynqZCU102 and performs at a peak throughput of 409.6 GOP/s, with power consumption of 11.2W; compared with previous sparse NN accelerators based on FPGA, Sense takes up 1/5 less LUTs and 3/4 less BRAMs, reaches 2.1x peak energy efciency and achieves 1.15x-1.49x speedup.

contrastive · 損失 · Extensibility · 目標檢測 · Better ·

2022 年 2 月 1 日

The KFIoU Loss for Rotated Object Detection

Xue Yang,Yue Zhou,Gefan Zhang,Jirui Yang,Wentao Wang,Junchi Yan,Xiaopeng Zhang,Qi Tian

from arxiv, 19 pages, 5 figures, 11 tables, tensorflow code: //github.com/yangxue0827/RotationDetection, pytorch code: //github.com/open-mmlab/mmrotate

Differing from the well-developed horizontal object detection area whereby the computing-friendly IoU based loss is readily adopted and well fits with the detection metrics. In contrast, rotation detectors often involve a more complicated loss based on SkewIoU which is unfriendly to gradient-based training. In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss instead of the strict value-level identity. Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU by its definition, and show its alignment with the SkewIoU at trend-level. This is in contrast to recent Gaussian modeling based rotation detectors e.g. GWD, KLD that involves a human-specified distribution distance metric which requires additional hyperparameter tuning. The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU, thanks to its full differentiability and ability to handle the non-overlapping cases. We further extend our technique to the 3-D case which also suffers from the same issues as 2-D detection. Extensive results on various public datasets (2-D/3-D, aerial/text/face images) with different base detectors show the effectiveness of our approach.

DARTS · Performer · 優化器 · 模型評估 · 估計/估計量 ·

2022 年 1 月 31 日

ZARTS: On Zero-order Optimization for Neural Architecture Search

Xiaoxing Wang,Wenxuan Guo,Junchi Yan,Jianlin Su,Xiaokang Yang

Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency. It introduces trainable architecture parameters to represent the importance of candidate operations and proposes first/second-order approximation to estimate their gradients, making it possible to solve NAS by gradient descent algorithm. However, our in-depth empirical results show that the approximation will often distort the loss landscape, leading to the biased objective to optimize and in turn inaccurate gradient estimation for architecture parameters. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. Specifically, three representative zero-order optimization methods are introduced: RS, MGS, and GLD, among which MGS performs best by balancing the accuracy and speed. Moreover, we explore the connections between RS/MGS and gradient descent algorithm and show that our ZARTS can be seen as a robust gradient-free counterpart to DARTS. Extensive experiments on multiple datasets and search spaces show the remarkable performance of our method. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue. Also, we search on the search space of DARTS to compare with peer methods, and our discovered architecture achieves 97.54% accuracy on CIFAR-10 and 75.7% top-1 accuracy on ImageNet, which are state-of-the-art performance.

可約的 · Processing（編程語言） · 學成 · Networking · Better ·

2021 年 3 月 1 日

Coarse-Fine Networks for Temporal Activity Detection in Videos

Kumara Kahatapitiya,Michael S. Ryoo

from arxiv, Accepted to be published at CVPR 2021

In this paper, we introduce 'Coarse-Fine Networks', a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion. Traditional Video models process inputs at one (or few) fixed temporal resolution without any dynamic frame selection. However, we argue that, processing multiple temporal resolutions of the input and doing so dynamically by learning to estimate the importance of each frame can largely improve video representations, specially in the domain of temporal activity localization. To this end, we propose (1) `Grid Pool', a learned temporal downsampling layer to extract coarse features, and, (2) `Multi-stage Fusion', a spatio-temporal attention mechanism to fuse a fine-grained context with the coarse features. We show that our method can outperform the state-of-the-arts for action detection in public datasets including Charades with a significantly reduced compute and memory footprint.

Performer · 優化器 · 可約的 · Continuity · Networking ·

2020 年 10 月 8 日

Neural Architecture Generator Optimization

Binxin Ru,Pedro Esperanca,Fabio Carlucci

from arxiv, 20 pages, 9 figures, neural architecture search, Thirty-fourth Conference on Neural Information Processing Systems

Neural Architecture Search (NAS) was first proposed to achieve state-of-the-art performance through the discovery of new architecture patterns, without human intervention. An over-reliance on expert knowledge in the search space design has however led to increased performance (local optima) without significant architectural breakthroughs, thus preventing truly novel solutions from being reached. In this work we 1) are the first to investigate casting NAS as a problem of finding the optimal network generator and 2) we propose a new, hierarchical and graph-based search space capable of representing an extremely large variety of network types, yet only requiring few continuous hyper-parameters. This greatly reduces the dimensionality of the problem, enabling the effective use of Bayesian Optimisation as a search strategy. At the same time, we expand the range of valid architectures, motivating a multi-objective learning approach. We demonstrate the effectiveness of this strategy on six benchmark datasets and show that our search space generates extremely lightweight yet highly competitive models.

近似 · 目標檢測 · 模型評估 · 縮放 · state-of-the-art ·

2018 年 10 月 4 日

Domain Specific Approximation for Object Detection

Ting-Wu Chin,Chia-Lin Yu,Matthew Halpern,Hasan Genc,Shiao-Li Tsao,Vijay Janapa Reddi

from arxiv, 6 pages, 6 figures. Published in IEEE Micro, vol. 38, no. 1, pp. 31-40, January/February 2018

There is growing interest in object detection in advanced driver assistance systems and autonomous robots and vehicles. To enable such innovative systems, we need faster object detection. In this work, we investigate the trade-off between accuracy and speed with domain-specific approximations, i.e. category-aware image size scaling and proposals scaling, for two state-of-the-art deep learning-based object detection meta-architectures. We study the effectiveness of applying approximation both statically and dynamically to understand the potential and the applicability of them. By conducting experiments on the ImageNet VID dataset, we show that domain-specific approximation has great potential to improve the speed of the system without deteriorating the accuracy of object detectors, i.e. up to 7.5x speedup for dynamic domain-specific approximation. To this end, we present our insights toward harvesting domain-specific approximation as well as devise a proof-of-concept runtime, AutoFocus, that exploits dynamic domain-specific approximation.

Continuity · 優化器 · Networking · 語言模型化 · 預測器/決策函數 ·

2018 年 9 月 5 日

Neural Architecture Optimization

Renqian Luo,Fei Tian,Tao Qin,Enhong Chen,Tie-Yan Liu

from arxiv, Ongoing work. Will appear at NIPS 2018

Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space. (2) A predictor takes the continuous representation of a network as input and predicts its accuracy. (3) A decoder maps a continuous representation of a network back to its architecture. The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy. Such a better embedding is then decoded to a network by the decoder. Experiments show that the architecture discovered by our method is very competitive for image classification task on CIFAR-10 and language modeling task on PTB, outperforming or on par with the best results of previous architecture search methods with a significantly reduction of computational resources. Specifically we obtain $2.07\%$ test set error rate for CIFAR-10 image classification task and $55.9$ test set perplexity of PTB language modeling task. The best discovered architectures on both tasks are successfully transferred to other tasks such as CIFAR-100 and WikiText-2.

目標檢測 · Performance · FAST · MINE · 子采樣 ·

2018 年 8 月 27 日

Speeding-up Object Detection Training for Robotics with FALKON

Elisa Maiettini,Giulia Pasquale,Lorenzo Rosasco,Lorenzo Natale

Latest deep learning methods for object detection provide remarkable performance, but have limits when used in robotic applications. One of the most relevant issues is the long training time, which is due to the large size and imbalance of the associated training sets, characterized by few positive and a large number of negative examples (i.e. background). Proposed approaches are based on end-to-end learning by back-propagation [22] or kernel methods trained with Hard Negatives Mining on top of deep features [8]. These solutions are effective, but prohibitively slow for on-line applications. In this paper we propose a novel pipeline for object detection that overcomes this problem and provides comparable performance, with a 60x training speedup. Our pipeline combines (i) the Region Proposal Network and the deep feature extractor from [22] to efficiently select candidate RoIs and encode them into powerful representations, with (ii) the FALKON [23] algorithm, a novel kernel-based method that allows fast training on large scale problems (millions of points). We address the size and imbalance of training data by exploiting the stochastic subsampling intrinsic into the method and a novel, fast, bootstrapping approach. We assess the effectiveness of the approach on a standard Computer Vision dataset (PASCAL VOC 2007 [5]) and demonstrate its applicability to a real robotic scenario with the iCubWorld Transformations [18] dataset.

感受野 · FAST · 塊 · Performer · 目標檢測 ·

2018 年 7 月 26 日

Receptive Field Block Net for Accurate and Fast Object Detection

Songtao Liu,Di Huang,Yunhong Wang

from arxiv, Accepted by ECCV 2018

Current top-performing object detectors depend on deep CNN backbones, such as ResNet-101 and Inception, benefiting from their powerful feature representations but suffering from high computational costs. Conversely, some lightweight model based detectors fulfil real time processing, while their accuracies are often criticized. In this paper, we explore an alternative to build a fast and accurate detector by strengthening lightweight features using a hand-crafted mechanism. Inspired by the structure of Receptive Fields (RFs) in human visual systems, we propose a novel RF Block (RFB) module, which takes the relationship between the size and eccentricity of RFs into account, to enhance the feature discriminability and robustness. We further assemble RFB to the top of SSD, constructing the RFB Net detector. To evaluate its effectiveness, experiments are conducted on two major benchmarks and the results show that RFB Net is able to reach the performance of advanced very deep detectors while keeping the real-time speed. Code is available at //github.com/ruinmessi/RFBNet.

Networking · Neural Networks · Extensibility · 流 · 目標檢測 ·

2018 年 3 月 30 日

Contrast-Oriented Deep Neural Networks for Salient Object Detection

Guanbin Li,Yizhou Yu

from arxiv, Accept to TNNLS

Deep convolutional neural networks have become a key element in the recent breakthrough of salient object detection. However, existing CNN-based methods are based on either patch-wise (region-wise) training and inference or fully convolutional networks. Methods in the former category are generally time-consuming due to severe storage and computational redundancies among overlapping patches. To overcome this deficiency, methods in the second category attempt to directly map a raw input image to a predicted dense saliency map in a single network forward pass. Though being very efficient, it is arduous for these methods to detect salient objects of different scales or salient regions with weak semantic information. In this paper, we develop hybrid contrast-oriented deep neural networks to overcome the aforementioned limitations. Each of our deep networks is composed of two complementary components, including a fully convolutional stream for dense prediction and a segment-level spatial pooling stream for sparse saliency inference. We further propose an attentional module that learns weight maps for fusing the two saliency predictions from these two streams. A tailored alternate scheme is designed to train these deep networks by fine-tuning pre-trained baseline models. Finally, a customized fully connected CRF model incorporating a salient contour feature embedding can be optionally applied as a post-processing step to improve spatial coherence and contour positioning in the fused result from these two streams. Extensive experiments on six benchmark datasets demonstrate that our proposed model can significantly outperform the state of the art in terms of all popular evaluation metrics.