{mayi_des}

亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The first stage of tactile sensing is data acquisition using tactile sensors and the sensed data is transmitted to the central unit for neuromorphic computing. The memristive crossbars were proposed to use as synapses in neuromorphic computing but device intelligence at the sensor level are not investigated in literature. We propose the concept of Transistor Memristor Sensor (TMS)-crossbar by including sensor to memristor crossbar array configuration in the input layer of the neural network architecture. 2 possible cell configurations of TMS crossbar arrays: 1 Transistor 1 Memristor 1 Sensor (1T1M1S) and 2 Transistor 1 Memristor 1 Sensor (2T1M1S) are presented. We verified the proposed TMS-crossbar in the practical design of analog neural networks based Braille character recognition system. The proposed design is verified with SPICE simulations using circuit equivalent of FLX-A501 force sensor, TiO$_2$ memristors and low power 22nm high-k CMOS transistors. The proposed analog neuromorphic computing system presents a scalable solution and is possible to encode 125 symbols with good accuracy in comparison with other Braille character recognition systems in the literature. The benefits of analog implementation of the TMS crossbar arrays is substantiated with results of accuracy, area and power requirements in comparison with the binary counterparts.

相關內容

傳(chuan)感(gan)器(英文名(ming)稱(cheng):transducer/sensor)是一(yi)種檢(jian)測裝置,能(neng)感(gan)受到被測量的(de)(de)信息(xi),并(bing)能(neng)將感(gan)受到的(de)(de)信息(xi),按一(yi)定規律變換成(cheng)為(wei)電信號或其他所(suo)需(xu)形式的(de)(de)信息(xi)輸(shu)(shu)出(chu),以滿足信息(xi)的(de)(de)傳(chuan)輸(shu)(shu)、處理(li)、存儲(chu)、顯示、記錄和控制等(deng)要求(qiu)。

We consider a coded compressed sensing approach for the unsourced random access and replace the outer tree code proposed by Amalladinne et al. with the list recoverable code capable of correcting t errors. A finite-length random coding bound for such codes is derived. The numerical experiments in the single antenna quasi-static Rayleigh fading MAC show that transition to list recoverable codes correcting t errors improves the performance of coded compressed sensing scheme by 7-10 dB compared to the tree code-based scheme. We propose two practical constructions of outer codes. The first is a modification of the tree code. It utilizes the same code structure, and a key difference is a decoder capable of correcting up to t errors. The second is based on the Reed-Solomon codes and Guruswami-Sudan list decoding algorithm. The first scheme provides an energy efficiency very close to the random coding bound when the decoding complexity is unbounded. But for the practical parameters, the second scheme is better and improves the performance of a tree code-based scheme when the number of active users is less than 200.

Many applications from the robotics domain can benefit from FPGA acceleration. A corresponding key question is how to integrate hardware accelerators into software-centric robotics programming environments. Recently, several approaches have demonstrated hardware acceleration for the robot operating system (ROS), the dominant programming environment in robotics. ROS is a middleware layer that features the composition of complex robotics applications as a set of nodes that communicate via mechanisms such as publish/subscribe, and distributes them over several compute platforms. In this paper, we present a novel approach for event-based programming of robotics applications that leverages ReconROS, a framework for flexibly mapping ROS 2 nodes to either software or reconfigurable hardware. The ReconROS executor schedules callbacks of ROS 2 nodes and utilizes a reconfigurable slot model and partial runtime reconfiguration to load hardware-based callbacks on demand. We describe the ReconROS executor approach, give design examples, and experimentally evaluate its functionality with examples.

Dynamic Random Access Memory (DRAM) is the de-facto choice for main memory devices due to its cost-effectiveness. It offers a larger capacity and higher bandwidth compared to SRAM but is slower than the latter. With each passing generation, DRAMs are becoming denser. One of its side-effects is the deviation of nominal parameters: process, voltage, and temperature. DRAMs are often considered as the bottleneck of the system as it trades off performance with capacity. With such inherent limitations, further deviation from nominal specifications is undesired. In this paper, we investigate the impact of variations in conventional DRAM devices on the aspects of performance, reliability, and energy requirements. Based on this study, we model a variation-aware framework, called VAR-DRAM, targeted for modern-day DRAM devices. It provides enhanced power management by taking variations into account. VAR-DRAM ensures faster execution of programs as it internally remaps data from variation affected cells to normal cells and also ensures data preservation. On extensive experimentation, we find that VAR-DRAM achieves peak energy savings of up to 48.8% with an average of 29.54% on DDR4 memories while improving the access latency of the DRAM compared to a variation affected device by 7.4%.

Agile hardware development requires fast and accurate circuit quality evaluation from early design stages. Existing work of high-level synthesis (HLS) performance prediction usually needs extensive feature engineering after the synthesis process. To expedite circuit evaluation from as earlier design stage as possible, we propose a rapid and accurate performance modeling, exploiting the representation power of graph neural networks (GNNs) by representing C/C++ programs as graphs. The contribution of this work is three-fold. First, we build a standard benchmark containing 40k C synthesizable programs, which includes both synthetic programs and three sets of real-world HLS benchmarks. Each program is implemented on FPGA to generate ground-truth performance metrics. Second, we formally formulate the HLS performance prediction problem on graphs, and propose multiple modeling strategies with GNNs that leverage different trade-offs between prediction timeliness (early/late prediction) and accuracy. Third, we further propose a novel hierarchical GNN that does not sacrifice timeliness but largely improves prediction accuracy, significantly outperforming HLS tools. We apply extensive evaluations for both synthetic and unseen real-case programs; our proposed predictor largely outperforms HLS by up to 40X and excels existing predictors by 2X to 5X in terms of resource usage and timing prediction.

The impact of device and circuit-level effects in mixed-signal Resistive Random Access Memory (RRAM) accelerators typically manifest as performance degradation of Deep Learning (DL) algorithms, but the degree of impact varies based on algorithmic features. These include network architecture, capacity, weight distribution, and the type of inter-layer connections. Techniques are continuously emerging to efficiently train sparse neural networks, which may have activation sparsity, quantization, and memristive noise. In this paper, we present an extended Design Space Exploration (DSE) methodology to quantify the benefits and limitations of dense and sparse mapping schemes for a variety of network architectures. While sparsity of connectivity promotes less power consumption and is often optimized for extracting localized features, its performance on tiled RRAM arrays may be more susceptible to noise due to under-parameterization, when compared to dense mapping schemes. Moreover, we present a case study quantifying and formalizing the trade-offs of typical non-idealities introduced into 1-Transistor-1-Resistor (1T1R) tiled memristive architectures and the size of modular crossbar tiles using the CIFAR-10 dataset.

Computer architecture design space is vast and complex. Tools are needed to explore new ideas and gain insights quickly, with low efforts and at a desired accuracy. We propose Calipers, a criticality-based framework to model key abstractions of complex architectures and a program's execution using dynamic event-dependence graphs. By applying graph algorithms, Calipers can track instruction and event dependencies, compute critical paths, and analyze architecture bottlenecks. By manipulating the graph, Calipers enables architects to investigate a wide range of Instruction Set Architecture (ISA) and microarchitecture design choices/"what-if" scenarios during both early- and late-stage design space exploration without recompiling and rerunning the program. Calipers can model in-order and out-of-order microarchitectures, structural hazards, and different types of ISAs, and can evaluate multiple ideas in a single run. Modeling algorithms are described in detail. We apply Calipers to explore and gain insights in complex microarchitectural and ISA ideas for RISC and EDGE processors, at lower effort than cycle-accurate simulators and with comparable accuracy. For example, among a variety of investigations presented in the paper, experiments show that targeting only a fraction of critical loads can help realize most benefits of value prediction.

Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design. In this paper, we introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate data fusion. In our approach, we convert data fusion into a latent space learning problem where the relations among different data sources are automatically learned. This conversion endows our approach with attractive advantages such as increased accuracy, reduced costs, flexibility to jointly fuse any number of data sources, and ability to visualize correlations between data sources. This visualization allows the user to detect model form errors or determine the optimum strategy for high-fidelity emulation by fitting LMGP only to the subset of the data sources that are well-correlated. We also develop a new kernel function that enables LMGPs to not only build a probabilistic multi-fidelity surrogate but also estimate calibration parameters with high accuracy and consistency. The implementation and use of our approach are considerably simpler and less prone to numerical issues compared to existing technologies. We demonstrate the benefits of LMGP-based data fusion by comparing its performance against competing methods on a wide range of examples.

Driven by the visions of Internet of Things and 5G communications, the edge computing systems integrate computing, storage and network resources at the edge of the network to provide computing infrastructure, enabling developers to quickly develop and deploy edge applications. Nowadays the edge computing systems have received widespread attention in both industry and academia. To explore new research opportunities and assist users in selecting suitable edge computing systems for specific applications, this survey paper provides a comprehensive overview of the existing edge computing systems and introduces representative projects. A comparison of open source tools is presented according to their applicability. Finally, we highlight energy efficiency and deep learning optimization of edge computing systems. Open issues for analyzing and designing an edge computing system are also studied in this survey.

We study the use of the Wave-U-Net architecture for speech enhancement, a model introduced by Stoller et al for the separation of music vocals and accompaniment. This end-to-end learning method for audio source separation operates directly in the time domain, permitting the integrated modelling of phase information and being able to take large temporal contexts into account. Our experiments show that the proposed method improves several metrics, namely PESQ, CSIG, CBAK, COVL and SSNR, over the state-of-the-art with respect to the speech enhancement task on the Voice Bank corpus (VCTK) dataset. We find that a reduced number of hidden layers is sufficient for speech enhancement in comparison to the original system designed for singing voice separation in music. We see this initial result as an encouraging signal to further explore speech enhancement in the time-domain, both as an end in itself and as a pre-processing step to speech recognition systems.

We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that mid-level features extracted by pre-trained convolutional neural nets are surprisingly effective descriptors for this specialized domains. However, the choice of similarity measure for matching exemplars to a query image is essential to good performance. For matching multi-channel deep features, we propose the use of multi-channel normalized cross-correlation and analyze its effectiveness. Our proposed metric significantly improves performance in matching crime scene shoeprints to laboratory test impressions. We also show its effectiveness in other cross-domain image retrieval problems: matching facade images to segmentation labels and aerial photos to map images. Finally, we introduce a discriminatively trained variant and fine-tune our system through our proposed metric, obtaining state-of-the-art performance.

北京阿比特科技有限公司
2$ memristors and low power 22nm high-k CMOS transistors. The proposed analog neuromorphic computing system presents a scalable solution and is possible to encode 125 symbols with good accuracy in comparison with other Braille character recognition systems in the literature. The benefits of analog implementation of the TMS crossbar arrays is substantiated with results of accuracy, area and power requirements in comparison with the binary counterparts. ">

亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The first stage of tactile sensing is data acquisition using tactile sensors and the sensed data is transmitted to the central unit for neuromorphic computing. The memristive crossbars were proposed to use as synapses in neuromorphic computing but device intelligence at the sensor level are not investigated in literature. We propose the concept of Transistor Memristor Sensor (TMS)-crossbar by including sensor to memristor crossbar array configuration in the input layer of the neural network architecture. 2 possible cell configurations of TMS crossbar arrays: 1 Transistor 1 Memristor 1 Sensor (1T1M1S) and 2 Transistor 1 Memristor 1 Sensor (2T1M1S) are presented. We verified the proposed TMS-crossbar in the practical design of analog neural networks based Braille character recognition system. The proposed design is verified with SPICE simulations using circuit equivalent of FLX-A501 force sensor, TiO$_2$ memristors and low power 22nm high-k CMOS transistors. The proposed analog neuromorphic computing system presents a scalable solution and is possible to encode 125 symbols with good accuracy in comparison with other Braille character recognition systems in the literature. The benefits of analog implementation of the TMS crossbar arrays is substantiated with results of accuracy, area and power requirements in comparison with the binary counterparts.

相關內容

傳感器(英文名稱(cheng):transducer/sensor)是一(yi)種檢測(ce)裝置,能感受(shou)到(dao)(dao)被測(ce)量的(de)信(xin)(xin)息,并能將感受(shou)到(dao)(dao)的(de)信(xin)(xin)息,按一(yi)定(ding)規律變換(huan)成為電信(xin)(xin)號或其他所需形式(shi)的(de)信(xin)(xin)息輸出,以滿足信(xin)(xin)息的(de)傳輸、處理、存儲、顯示、記錄和(he)控制等要求(qiu)。

We consider a coded compressed sensing approach for the unsourced random access and replace the outer tree code proposed by Amalladinne et al. with the list recoverable code capable of correcting t errors. A finite-length random coding bound for such codes is derived. The numerical experiments in the single antenna quasi-static Rayleigh fading MAC show that transition to list recoverable codes correcting t errors improves the performance of coded compressed sensing scheme by 7-10 dB compared to the tree code-based scheme. We propose two practical constructions of outer codes. The first is a modification of the tree code. It utilizes the same code structure, and a key difference is a decoder capable of correcting up to t errors. The second is based on the Reed-Solomon codes and Guruswami-Sudan list decoding algorithm. The first scheme provides an energy efficiency very close to the random coding bound when the decoding complexity is unbounded. But for the practical parameters, the second scheme is better and improves the performance of a tree code-based scheme when the number of active users is less than 200.

Many applications from the robotics domain can benefit from FPGA acceleration. A corresponding key question is how to integrate hardware accelerators into software-centric robotics programming environments. Recently, several approaches have demonstrated hardware acceleration for the robot operating system (ROS), the dominant programming environment in robotics. ROS is a middleware layer that features the composition of complex robotics applications as a set of nodes that communicate via mechanisms such as publish/subscribe, and distributes them over several compute platforms. In this paper, we present a novel approach for event-based programming of robotics applications that leverages ReconROS, a framework for flexibly mapping ROS 2 nodes to either software or reconfigurable hardware. The ReconROS executor schedules callbacks of ROS 2 nodes and utilizes a reconfigurable slot model and partial runtime reconfiguration to load hardware-based callbacks on demand. We describe the ReconROS executor approach, give design examples, and experimentally evaluate its functionality with examples.

Dynamic Random Access Memory (DRAM) is the de-facto choice for main memory devices due to its cost-effectiveness. It offers a larger capacity and higher bandwidth compared to SRAM but is slower than the latter. With each passing generation, DRAMs are becoming denser. One of its side-effects is the deviation of nominal parameters: process, voltage, and temperature. DRAMs are often considered as the bottleneck of the system as it trades off performance with capacity. With such inherent limitations, further deviation from nominal specifications is undesired. In this paper, we investigate the impact of variations in conventional DRAM devices on the aspects of performance, reliability, and energy requirements. Based on this study, we model a variation-aware framework, called VAR-DRAM, targeted for modern-day DRAM devices. It provides enhanced power management by taking variations into account. VAR-DRAM ensures faster execution of programs as it internally remaps data from variation affected cells to normal cells and also ensures data preservation. On extensive experimentation, we find that VAR-DRAM achieves peak energy savings of up to 48.8% with an average of 29.54% on DDR4 memories while improving the access latency of the DRAM compared to a variation affected device by 7.4%.

Agile hardware development requires fast and accurate circuit quality evaluation from early design stages. Existing work of high-level synthesis (HLS) performance prediction usually needs extensive feature engineering after the synthesis process. To expedite circuit evaluation from as earlier design stage as possible, we propose a rapid and accurate performance modeling, exploiting the representation power of graph neural networks (GNNs) by representing C/C++ programs as graphs. The contribution of this work is three-fold. First, we build a standard benchmark containing 40k C synthesizable programs, which includes both synthetic programs and three sets of real-world HLS benchmarks. Each program is implemented on FPGA to generate ground-truth performance metrics. Second, we formally formulate the HLS performance prediction problem on graphs, and propose multiple modeling strategies with GNNs that leverage different trade-offs between prediction timeliness (early/late prediction) and accuracy. Third, we further propose a novel hierarchical GNN that does not sacrifice timeliness but largely improves prediction accuracy, significantly outperforming HLS tools. We apply extensive evaluations for both synthetic and unseen real-case programs; our proposed predictor largely outperforms HLS by up to 40X and excels existing predictors by 2X to 5X in terms of resource usage and timing prediction.

The impact of device and circuit-level effects in mixed-signal Resistive Random Access Memory (RRAM) accelerators typically manifest as performance degradation of Deep Learning (DL) algorithms, but the degree of impact varies based on algorithmic features. These include network architecture, capacity, weight distribution, and the type of inter-layer connections. Techniques are continuously emerging to efficiently train sparse neural networks, which may have activation sparsity, quantization, and memristive noise. In this paper, we present an extended Design Space Exploration (DSE) methodology to quantify the benefits and limitations of dense and sparse mapping schemes for a variety of network architectures. While sparsity of connectivity promotes less power consumption and is often optimized for extracting localized features, its performance on tiled RRAM arrays may be more susceptible to noise due to under-parameterization, when compared to dense mapping schemes. Moreover, we present a case study quantifying and formalizing the trade-offs of typical non-idealities introduced into 1-Transistor-1-Resistor (1T1R) tiled memristive architectures and the size of modular crossbar tiles using the CIFAR-10 dataset.

Computer architecture design space is vast and complex. Tools are needed to explore new ideas and gain insights quickly, with low efforts and at a desired accuracy. We propose Calipers, a criticality-based framework to model key abstractions of complex architectures and a program's execution using dynamic event-dependence graphs. By applying graph algorithms, Calipers can track instruction and event dependencies, compute critical paths, and analyze architecture bottlenecks. By manipulating the graph, Calipers enables architects to investigate a wide range of Instruction Set Architecture (ISA) and microarchitecture design choices/"what-if" scenarios during both early- and late-stage design space exploration without recompiling and rerunning the program. Calipers can model in-order and out-of-order microarchitectures, structural hazards, and different types of ISAs, and can evaluate multiple ideas in a single run. Modeling algorithms are described in detail. We apply Calipers to explore and gain insights in complex microarchitectural and ISA ideas for RISC and EDGE processors, at lower effort than cycle-accurate simulators and with comparable accuracy. For example, among a variety of investigations presented in the paper, experiments show that targeting only a fraction of critical loads can help realize most benefits of value prediction.

Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design. In this paper, we introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate data fusion. In our approach, we convert data fusion into a latent space learning problem where the relations among different data sources are automatically learned. This conversion endows our approach with attractive advantages such as increased accuracy, reduced costs, flexibility to jointly fuse any number of data sources, and ability to visualize correlations between data sources. This visualization allows the user to detect model form errors or determine the optimum strategy for high-fidelity emulation by fitting LMGP only to the subset of the data sources that are well-correlated. We also develop a new kernel function that enables LMGPs to not only build a probabilistic multi-fidelity surrogate but also estimate calibration parameters with high accuracy and consistency. The implementation and use of our approach are considerably simpler and less prone to numerical issues compared to existing technologies. We demonstrate the benefits of LMGP-based data fusion by comparing its performance against competing methods on a wide range of examples.

Driven by the visions of Internet of Things and 5G communications, the edge computing systems integrate computing, storage and network resources at the edge of the network to provide computing infrastructure, enabling developers to quickly develop and deploy edge applications. Nowadays the edge computing systems have received widespread attention in both industry and academia. To explore new research opportunities and assist users in selecting suitable edge computing systems for specific applications, this survey paper provides a comprehensive overview of the existing edge computing systems and introduces representative projects. A comparison of open source tools is presented according to their applicability. Finally, we highlight energy efficiency and deep learning optimization of edge computing systems. Open issues for analyzing and designing an edge computing system are also studied in this survey.

We study the use of the Wave-U-Net architecture for speech enhancement, a model introduced by Stoller et al for the separation of music vocals and accompaniment. This end-to-end learning method for audio source separation operates directly in the time domain, permitting the integrated modelling of phase information and being able to take large temporal contexts into account. Our experiments show that the proposed method improves several metrics, namely PESQ, CSIG, CBAK, COVL and SSNR, over the state-of-the-art with respect to the speech enhancement task on the Voice Bank corpus (VCTK) dataset. We find that a reduced number of hidden layers is sufficient for speech enhancement in comparison to the original system designed for singing voice separation in music. We see this initial result as an encouraging signal to further explore speech enhancement in the time-domain, both as an end in itself and as a pre-processing step to speech recognition systems.

We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that mid-level features extracted by pre-trained convolutional neural nets are surprisingly effective descriptors for this specialized domains. However, the choice of similarity measure for matching exemplars to a query image is essential to good performance. For matching multi-channel deep features, we propose the use of multi-channel normalized cross-correlation and analyze its effectiveness. Our proposed metric significantly improves performance in matching crime scene shoeprints to laboratory test impressions. We also show its effectiveness in other cross-domain image retrieval problems: matching facade images to segmentation labels and aerial photos to map images. Finally, we introduce a discriminatively trained variant and fine-tune our system through our proposed metric, obtaining state-of-the-art performance.

北京阿比特科技有限公司