亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Spiking neural networks (SNNs) offer both compelling potential advantages, including energy efficiency and low latencies and challenges including the non-differentiable nature of event spikes. Much of the initial research in this area has converted deep neural networks to equivalent SNNs, but this conversion approach potentially negates some of the advantages of SNN-based approaches developed from scratch. One promising area for high-performance SNNs is template matching and image recognition. This research introduces the first high-performance SNN for the Visual Place Recognition (VPR) task: given a query image, the SNN has to find the closest match out of a list of reference images. At the core of this new system is a novel assignment scheme that implements a form of ambiguity-informed salience, by up-weighting single-place-encoding neurons and down-weighting "ambiguous" neurons that respond to multiple different reference places. In a range of experiments on the challenging Nordland, Oxford RobotCar, SPEDTest, Synthia, and St Lucia datasets, we show that our SNN achieves comparable VPR performance to state-of-the-art and classical techniques, and degrades gracefully in performance with an increasing number of reference places. Our results provide a significant milestone towards SNNs that can provide robust, energy-efficient, and low latency robot localization.

相關內容

神(shen)(shen)經(jing)網(wang)(wang)(wang)絡(luo)(Neural Networks)是(shi)世(shi)界(jie)上(shang)三個(ge)最(zui)古老的(de)(de)(de)(de)(de)(de)(de)神(shen)(shen)經(jing)建模(mo)學(xue)(xue)會(hui)(hui)的(de)(de)(de)(de)(de)(de)(de)檔案期刊(kan):國際(ji)神(shen)(shen)經(jing)網(wang)(wang)(wang)絡(luo)學(xue)(xue)會(hui)(hui)(INNS)、歐洲神(shen)(shen)經(jing)網(wang)(wang)(wang)絡(luo)學(xue)(xue)會(hui)(hui)(ENNS)和(he)(he)(he)(he)(he)(he)(he)日本神(shen)(shen)經(jing)網(wang)(wang)(wang)絡(luo)學(xue)(xue)會(hui)(hui)(JNNS)。神(shen)(shen)經(jing)網(wang)(wang)(wang)絡(luo)提供了(le)一(yi)個(ge)論(lun)壇,以發展(zhan)和(he)(he)(he)(he)(he)(he)(he)培育(yu)一(yi)個(ge)國際(ji)社(she)會(hui)(hui)的(de)(de)(de)(de)(de)(de)(de)學(xue)(xue)者和(he)(he)(he)(he)(he)(he)(he)實踐(jian)者感興趣(qu)的(de)(de)(de)(de)(de)(de)(de)所有(you)方面的(de)(de)(de)(de)(de)(de)(de)神(shen)(shen)經(jing)網(wang)(wang)(wang)絡(luo)和(he)(he)(he)(he)(he)(he)(he)相關方法(fa)的(de)(de)(de)(de)(de)(de)(de)計(ji)算(suan)(suan)(suan)智(zhi)能。神(shen)(shen)經(jing)網(wang)(wang)(wang)絡(luo)歡迎高質量論(lun)文(wen)的(de)(de)(de)(de)(de)(de)(de)提交,有(you)助于全面的(de)(de)(de)(de)(de)(de)(de)神(shen)(shen)經(jing)網(wang)(wang)(wang)絡(luo)研究,從行為和(he)(he)(he)(he)(he)(he)(he)大腦建模(mo),學(xue)(xue)習算(suan)(suan)(suan)法(fa),通過數(shu)學(xue)(xue)和(he)(he)(he)(he)(he)(he)(he)計(ji)算(suan)(suan)(suan)分(fen)析(xi),系統(tong)的(de)(de)(de)(de)(de)(de)(de)工程(cheng)和(he)(he)(he)(he)(he)(he)(he)技(ji)(ji)術應用,大量使用神(shen)(shen)經(jing)網(wang)(wang)(wang)絡(luo)的(de)(de)(de)(de)(de)(de)(de)概念(nian)和(he)(he)(he)(he)(he)(he)(he)技(ji)(ji)術。這(zhe)一(yi)獨特而廣泛的(de)(de)(de)(de)(de)(de)(de)范圍促(cu)進(jin)了(le)生物(wu)和(he)(he)(he)(he)(he)(he)(he)技(ji)(ji)術研究之(zhi)間(jian)的(de)(de)(de)(de)(de)(de)(de)思想交流,并有(you)助于促(cu)進(jin)對生物(wu)啟(qi)發的(de)(de)(de)(de)(de)(de)(de)計(ji)算(suan)(suan)(suan)智(zhi)能感興趣(qu)的(de)(de)(de)(de)(de)(de)(de)跨(kua)學(xue)(xue)科(ke)社(she)區的(de)(de)(de)(de)(de)(de)(de)發展(zhan)。因此,神(shen)(shen)經(jing)網(wang)(wang)(wang)絡(luo)編委會(hui)(hui)代表(biao)的(de)(de)(de)(de)(de)(de)(de)專家領域包括心理(li)學(xue)(xue),神(shen)(shen)經(jing)生物(wu)學(xue)(xue),計(ji)算(suan)(suan)(suan)機科(ke)學(xue)(xue),工程(cheng),數(shu)學(xue)(xue),物(wu)理(li)。該雜志發表(biao)文(wen)章、信件(jian)和(he)(he)(he)(he)(he)(he)(he)評(ping)論(lun)以及給編輯(ji)的(de)(de)(de)(de)(de)(de)(de)信件(jian)、社(she)論(lun)、時事、軟件(jian)調查和(he)(he)(he)(he)(he)(he)(he)專利信息。文(wen)章發表(biao)在五個(ge)部分(fen)之(zhi)一(yi):認知科(ke)學(xue)(xue),神(shen)(shen)經(jing)科(ke)學(xue)(xue),學(xue)(xue)習系統(tong),數(shu)學(xue)(xue)和(he)(he)(he)(he)(he)(he)(he)計(ji)算(suan)(suan)(suan)分(fen)析(xi)、工程(cheng)和(he)(he)(he)(he)(he)(he)(he)應用。 官(guan)網(wang)(wang)(wang)地(di)址:

Food recognition is an important task for a variety of applications, including managing health conditions and assisting visually impaired people. Several food recognition studies have focused on generic types of food or specific cuisines, however, food recognition with respect to Middle Eastern cuisines has remained unexplored. Therefore, in this paper we focus on developing a mobile friendly, Middle Eastern cuisine focused food recognition application for assisted living purposes. In order to enable a low-latency, high-accuracy food classification system, we opted to utilize the Mobilenet-v2 deep learning model. As some of the foods are more popular than the others, the number of samples per class in the used Middle Eastern food dataset is relatively imbalanced. To compensate for this problem, data augmentation methods are applied on the underrepresented classes. Experimental results show that using Mobilenet-v2 architecture for this task is beneficial in terms of both accuracy and the memory usage. With the model achieving 94% accuracy on 23 food classes, the developed mobile application has potential to serve the visually impaired in automatic food recognition via images.

Bridging the semantic gap between image and question is an important step to improve the accuracy of the Visual Question Answering (VQA) task. However, most of the existing VQA methods focus on attention mechanisms or visual relations for reasoning the answer, while the features at different semantic levels are not fully utilized. In this paper, we present a new reasoning framework to fill the gap between visual features and semantic clues in the VQA task. Our method first extracts the features and predicates from the image and question. We then propose a new reasoning framework to effectively jointly learn these features and predicates in a coarse-to-fine manner. The intensively experimental results on three large-scale VQA datasets show that our proposed approach achieves superior accuracy comparing with other state-of-the-art methods. Furthermore, our reasoning framework also provides an explainable way to understand the decision of the deep neural network when predicting the answer.

In this paper, we propose a novel sequence verification task that aims to distinguish positive video pairs performing the same action sequence from negative ones with step-level transformations but still conducting the same task. Such a challenging task resides in an open-set setting without prior action detection or segmentation that requires event-level or even frame-level annotations. To that end, we carefully reorganize two publicly available action-related datasets with step-procedure-task structure. To fully investigate the effectiveness of any method, we collect a scripted video dataset enumerating all kinds of step-level transformations in chemical experiments. Besides, a novel evaluation metric Weighted Distance Ratio is introduced to ensure equivalence for different step-level transformations during evaluation. In the end, a simple but effective baseline based on the transformer encoder with a novel sequence alignment loss is introduced to better characterize long-term dependency between steps, which outperforms other action recognition methods. Codes and data will be released.

In recent years, spiking neural networks (SNNs) have received extensive attention in the field of brain-inspired intelligence due to their rich spatially-temporal dynamics, various coding schemes, and event-driven characteristics that naturally fit the neuromorphic hardware. With the development of SNNs, brain-inspired intelligence, an emerging research field inspired by brain science achievements and aiming at artificial general intelligence, is becoming hot. In this paper, we review the recent advances and discuss the new frontiers in SNNs from four major research topics, including essential elements (i.e., spiking neuron models, encoding methods, and topology structures), datasets, optimization algorithms, and software and hardware frameworks. We hope our survey can help researchers understand SNNs better and inspire new works to advance this field.

Visual place recognition (VPR) in condition-varying environments is still an open problem. Popular solutions are CNN-based image descriptors, which have been shown to outperform traditional image descriptors based on hand-crafted visual features. However, there are two drawbacks of current CNN-based descriptors: a) their high dimension and b) lack of generalization, leading to low efficiency and poor performance in applications. In this paper, we propose to use a convolutional autoencoder (CAE) to tackle this problem. We employ a high-level layer of a pre-trained CNN to generate features, and train a CAE to map the features to a low-dimensional space to improve the condition invariance property of the descriptor and reduce its dimension at the same time. We verify our method in three challenging datasets involving significant illumination changes, and our method is shown to be superior to the state-of-the-art. For the benefit of the community, we make public the source code.

Synthesis of ergodic, stationary visual patterns is widely applicable in texturing, shape modeling, and digital content creation. The wide applicability of this technique thus requires the pattern synthesis approaches to be scalable, diverse, and authentic. In this paper, we propose an exemplar-based visual pattern synthesis framework that aims to model the inner statistics of visual patterns and generate new, versatile patterns that meet the aforementioned requirements. To this end, we propose an implicit network based on generative adversarial network (GAN) and periodic encoding, thus calling our network the Implicit Periodic Field Network (IPFN). The design of IPFN ensures scalability: the implicit formulation directly maps the input coordinates to features, which enables synthesis of arbitrary size and is computationally efficient for 3D shape synthesis. Learning with a periodic encoding scheme encourages diversity: the network is constrained to model the inner statistics of the exemplar based on spatial latent codes in a periodic field. Coupled with continuously designed GAN training procedures, IPFN is shown to synthesize tileable patterns with smooth transitions and local variations. Last but not least, thanks to both the adversarial training technique and the encoded Fourier features, IPFN learns high-frequency functions that produce authentic, high-quality results. To validate our approach, we present novel experimental results on various applications in 2D texture synthesis and 3D shape synthesis.

Visual recognition is currently one of the most important and active research areas in computer vision, pattern recognition, and even the general field of artificial intelligence. It has great fundamental importance and strong industrial needs. Deep neural networks (DNNs) have largely boosted their performances on many concrete tasks, with the help of large amounts of training data and new powerful computation resources. Though recognition accuracy is usually the first concern for new progresses, efficiency is actually rather important and sometimes critical for both academic research and industrial applications. Moreover, insightful views on the opportunities and challenges of efficiency are also highly required for the entire community. While general surveys on the efficiency issue of DNNs have been done from various perspectives, as far as we are aware, scarcely any of them focused on visual recognition systematically, and thus it is unclear which progresses are applicable to it and what else should be concerned. In this paper, we present the review of the recent advances with our suggestions on the new possible directions towards improving the efficiency of DNN-related visual recognition approaches. We investigate not only from the model but also the data point of view (which is not the case in existing surveys), and focus on three most studied data types (images, videos and points). This paper attempts to provide a systematic summary via a comprehensive survey which can serve as a valuable reference and inspire both researchers and practitioners who work on visual recognition problems.

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals. Detailed analysis of past and current baseline approaches and an in-depth study of recent advancements in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning applications is proposed, elaborating on different applications in more depth. Architectures and datasets used in these applications are also discussed, along with their evaluation metrics. Last, main issues are highlighted separately for each domain along with their possible future research directions.

Detection and recognition of text in natural images are two main problems in the field of computer vision that have a wide variety of applications in analysis of sports videos, autonomous driving, industrial automation, to name a few. They face common challenging problems that are factors in how text is represented and affected by several environmental conditions. The current state-of-the-art scene text detection and/or recognition methods have exploited the witnessed advancement in deep learning architectures and reported a superior accuracy on benchmark datasets when tackling multi-resolution and multi-oriented text. However, there are still several remaining challenges affecting text in the wild images that cause existing methods to underperform due to there models are not able to generalize to unseen data and the insufficient labeled data. Thus, unlike previous surveys in this field, the objectives of this survey are as follows: first, offering the reader not only a review on the recent advancement in scene text detection and recognition, but also presenting the results of conducting extensive experiments using a unified evaluation framework that assesses pre-trained models of the selected methods on challenging cases, and applies the same evaluation criteria on these techniques. Second, identifying several existing challenges for detecting or recognizing text in the wild images, namely, in-plane-rotation, multi-oriented and multi-resolution text, perspective distortion, illumination reflection, partial occlusion, complex fonts, and special characters. Finally, the paper also presents insight into the potential research directions in this field to address some of the mentioned challenges that are still encountering scene text detection and recognition techniques.

Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.

北京阿比特科技有限公司