{mayi_des}

亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The detection of tiny objects in microscopic videos is a problematic point, especially in large-scale experiments. For tiny objects (such as sperms) in microscopic videos, current detection methods face challenges in fuzzy, irregular, and precise positioning of objects. In contrast, we present a convolutional neural network for tiny object detection (TOD-CNN) with an underlying data set of high-quality sperm microscopic videos (111 videos, $>$ 278,000 annotated objects), and a graphical user interface (GUI) is designed to employ and test the proposed model effectively. TOD-CNN is highly accurate, achieving $85.60\%$ AP$_{50}$ in the task of real-time sperm detection in microscopic videos. To demonstrate the importance of sperm detection technology in sperm quality analysis, we carry out relevant sperm quality evaluation metrics and compare them with the diagnosis results from medical doctors.

相關內容

神(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)(Neural Networks)是世界上三個(ge)(ge)最古老的(de)(de)神(shen)(shen)(shen)經(jing)(jing)建模學(xue)(xue)會的(de)(de)檔案(an)期(qi)刊:國際神(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)學(xue)(xue)會(INNS)、歐(ou)洲(zhou)神(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)學(xue)(xue)會(ENNS)和(he)(he)日本(ben)神(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)學(xue)(xue)會(JNNS)。神(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)提供了一(yi)個(ge)(ge)論(lun)壇,以發(fa)展和(he)(he)培(pei)育(yu)一(yi)個(ge)(ge)國際社(she)(she)會的(de)(de)學(xue)(xue)者和(he)(he)實踐(jian)者感興趣(qu)的(de)(de)所有方面(mian)的(de)(de)神(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)和(he)(he)相關方法的(de)(de)計(ji)(ji)算(suan)(suan)智能。神(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)歡迎高質量論(lun)文(wen)的(de)(de)提交(jiao),有助于全面(mian)的(de)(de)神(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)研究(jiu),從行為和(he)(he)大(da)腦建模,學(xue)(xue)習算(suan)(suan)法,通(tong)過數學(xue)(xue)和(he)(he)計(ji)(ji)算(suan)(suan)分析,系統(tong)的(de)(de)工程(cheng)和(he)(he)技(ji)(ji)(ji)術應用(yong),大(da)量使用(yong)神(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)的(de)(de)概念(nian)和(he)(he)技(ji)(ji)(ji)術。這一(yi)獨特而(er)廣(guang)泛的(de)(de)范圍(wei)促進(jin)了生物(wu)和(he)(he)技(ji)(ji)(ji)術研究(jiu)之間(jian)的(de)(de)思想交(jiao)流,并有助于促進(jin)對(dui)生物(wu)啟發(fa)的(de)(de)計(ji)(ji)算(suan)(suan)智能感興趣(qu)的(de)(de)跨學(xue)(xue)科(ke)社(she)(she)區的(de)(de)發(fa)展。因(yin)此,神(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)編委(wei)會代表的(de)(de)專家(jia)領域包(bao)括心理(li)學(xue)(xue),神(shen)(shen)(shen)經(jing)(jing)生物(wu)學(xue)(xue),計(ji)(ji)算(suan)(suan)機科(ke)學(xue)(xue),工程(cheng),數學(xue)(xue),物(wu)理(li)。該(gai)雜志發(fa)表文(wen)章、信件(jian)和(he)(he)評(ping)論(lun)以及給編輯的(de)(de)信件(jian)、社(she)(she)論(lun)、時事(shi)、軟件(jian)調查和(he)(he)專利信息(xi)。文(wen)章發(fa)表在(zai)五個(ge)(ge)部分之一(yi):認知科(ke)學(xue)(xue),神(shen)(shen)(shen)經(jing)(jing)科(ke)學(xue)(xue),學(xue)(xue)習系統(tong),數學(xue)(xue)和(he)(he)計(ji)(ji)算(suan)(suan)分析、工程(cheng)和(he)(he)應用(yong)。 官網(wang)(wang)地址:

State-of-the-art methods for quantifying wear in cylinder liners of large internal combustion engines for stationary power generation require disassembly and cutting of the examined liner. This is followed by laboratory-based high-resolution microscopic surface depth measurement that quantitatively evaluates wear based on bearing load curves (also known as Abbott-Firestone curves). Such reference methods are destructive, time-consuming and costly. The goal of the research presented here is to develop nondestructive yet reliable methods for quantifying the surface topography. A novel machine learning framework is proposed that allows prediction of the bearing load curves representing the depth profiles from reflection RGB images of the liner surface. These images can be collected with a simple handheld microscope. A joint deep learning approach involving two neural network modules optimizes the prediction quality of surface roughness parameters as well. The network stack is trained using a custom-built database containing 422 perfectly aligned depth profile and reflection image pairs of liner surfaces of large gas engines. The observed success of the method suggests its great potential for on-site wear assessment of engines during service.

Recently, deep learning technology have been extensively used in the field of image recognition. However, its main application is the recognition and detection of ordinary pictures and common scenes. It is challenging to effectively and expediently analyze remote-sensing images obtained by the image acquisition systems on unmanned aerial vehicles (UAVs), which includes the identification of the target and calculation of its position. Aerial remote sensing images have different shooting angles and methods compared with ordinary pictures or images, which makes remote-sensing images play an irreplaceable role in some areas. In this study, a new target detection and recognition method in remote-sensing images is proposed based on deep convolution neural network (CNN) for the provision of multilevel information of images in combination with a region proposal network used to generate multiangle regions-of-interest. The proposed method generated results that were much more accurate and precise than those obtained with traditional ways. This demonstrated that the model proposed herein displays tremendous applicability potential in remote-sensing image recognition.

Convolutional Neural Network (CNN) have been widely used in image classification. Over the years, they have also benefited from various enhancements and they are now considered as state of the art techniques for image like data. However, when they are used for regression to estimate some function value from images, fewer recommendations are available. In this study, a novel CNN regression model is proposed. It combines convolutional neural layers to extract high level features representations from images with a soft labelling technique. More specifically, as the deep regression task is challenging, the idea is to account for some uncertainty in the targets that are seen as distributions around their mean. The estimations are carried out by the model in the form of distributions. Building from earlier work, a specific histogram loss function based on the Kullback-Leibler (KL) divergence is applied during training. The model takes advantage of the CNN feature representation and is able to carry out estimation from multi-channel input images. To assess and illustrate the technique, the model is applied to Global Navigation Satellite System (GNSS) multi-path estimation where multi-path signal parameters have to be estimated from correlator output images from the I and Q channels. The multi-path signal delay, magnitude, Doppler shift frequency and phase parameters are estimated from synthetically generated datasets of satellite signals. Experiments are conducted under various receiving conditions and various input images resolutions to test the estimation performances quality and robustness. The results show that the proposed soft labelling CNN technique using distributional loss outperforms classical CNN regression under all conditions. Furthermore, the extra learning performance achieved by the model allows the reduction of input image resolution from 80x80 down to 40x40 or sometimes 20x20.

Graph Convolutional Networks (GCNs) have received increasing attention in recent machine learning. How to effectively leverage the rich structural information in complex graphs, such as knowledge graphs with heterogeneous types of entities and relations, is a primary open challenge in the field. Most GCN methods are either restricted to graphs with a homogeneous type of edges (e.g., citation links only), or focusing on representation learning for nodes only instead of jointly optimizing the embeddings of both nodes and edges for target-driven objectives. This paper addresses these limitations by proposing a novel framework, namely the GEneralized Multi-relational Graph Convolutional Networks (GEM-GCN), which combines the power of GCNs in graph-based belief propagation and the strengths of advanced knowledge-base embedding methods, and goes beyond. Our theoretical analysis shows that GEM-GCN offers an elegant unification of several well-known GCN methods as specific cases, with a new perspective of graph convolution. Experimental results on benchmark datasets show the advantageous performance of GEM-GCN over strong baseline methods in the tasks of knowledge graph alignment and entity classification.

It is a common paradigm in object detection frameworks to treat all samples equally and target at maximizing the performance on average. In this work, we revisit this paradigm through a careful study on how different samples contribute to the overall performance measured in terms of mAP. Our study suggests that the samples in each mini-batch are neither independent nor equally important, and therefore a better classifier on average does not necessarily mean higher mAP. Motivated by this study, we propose the notion of Prime Samples, those that play a key role in driving the detection performance. We further develop a simple yet effective sampling and learning strategy called PrIme Sample Attention (PISA) that directs the focus of the training process towards such samples. Our experiments demonstrate that it is often more effective to focus on prime samples than hard samples when training a detector. Particularly, On the MSCOCO dataset, PISA outperforms the random sampling baseline and hard mining schemes, e.g. OHEM and Focal Loss, consistently by more than 1% on both single-stage and two-stage detectors, with a strong backbone ResNeXt-101.

Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.

The low resolution of objects of interest in aerial images makes pedestrian detection and action detection extremely challenging tasks. Furthermore, using deep convolutional neural networks to process large images can be demanding in terms of computational requirements. In order to alleviate these challenges, we propose a two-step, yes and no question answering framework to find specific individuals doing one or multiple specific actions in aerial images. First, a deep object detector, Single Shot Multibox Detector (SSD), is used to generate object proposals from small aerial images. Second, another deep network, is used to learn a latent common sub-space which associates the high resolution aerial imagery and the pedestrian action labels that are provided by the human-based sources

We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.

Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect $2806$ aerial images from different sensors and platforms. Each image is of the size about 4000-by-4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using $15$ common object categories. The fully annotated DOTA images contains $188,282$ instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging.

Salient object detection is a fundamental problem and has been received a great deal of attentions in computer vision. Recently deep learning model became a powerful tool for image feature extraction. In this paper, we propose a multi-scale deep neural network (MSDNN) for salient object detection. The proposed model first extracts global high-level features and context information over the whole source image with recurrent convolutional neural network (RCNN). Then several stacked deconvolutional layers are adopted to get the multi-scale feature representation and obtain a series of saliency maps. Finally, we investigate a fusion convolution module (FCM) to build a final pixel level saliency map. The proposed model is extensively evaluated on four salient object detection benchmark datasets. Results show that our deep model significantly outperforms other 12 state-of-the-art approaches.

北京阿比特科技有限公司
{50}$ in the task of real-time sperm detection in microscopic videos. To demonstrate the importance of sperm detection technology in sperm quality analysis, we carry out relevant sperm quality evaluation metrics and compare them with the diagnosis results from medical doctors. ">

亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The detection of tiny objects in microscopic videos is a problematic point, especially in large-scale experiments. For tiny objects (such as sperms) in microscopic videos, current detection methods face challenges in fuzzy, irregular, and precise positioning of objects. In contrast, we present a convolutional neural network for tiny object detection (TOD-CNN) with an underlying data set of high-quality sperm microscopic videos (111 videos, $>$ 278,000 annotated objects), and a graphical user interface (GUI) is designed to employ and test the proposed model effectively. TOD-CNN is highly accurate, achieving $85.60\%$ AP$_{50}$ in the task of real-time sperm detection in microscopic videos. To demonstrate the importance of sperm detection technology in sperm quality analysis, we carry out relevant sperm quality evaluation metrics and compare them with the diagnosis results from medical doctors.

相關內容

神(shen)(shen)經網(wang)(wang)(wang)(wang)(wang)絡(luo)(Neural Networks)是世(shi)界上(shang)三(san)個最古老(lao)的(de)(de)(de)(de)神(shen)(shen)經建模學(xue)(xue)(xue)會(hui)(hui)的(de)(de)(de)(de)檔案(an)期刊:國際神(shen)(shen)經網(wang)(wang)(wang)(wang)(wang)絡(luo)學(xue)(xue)(xue)會(hui)(hui)(INNS)、歐洲(zhou)神(shen)(shen)經網(wang)(wang)(wang)(wang)(wang)絡(luo)學(xue)(xue)(xue)會(hui)(hui)(ENNS)和(he)(he)(he)(he)日本神(shen)(shen)經網(wang)(wang)(wang)(wang)(wang)絡(luo)學(xue)(xue)(xue)會(hui)(hui)(JNNS)。神(shen)(shen)經網(wang)(wang)(wang)(wang)(wang)絡(luo)提供了一(yi)個論(lun)壇,以(yi)發(fa)展和(he)(he)(he)(he)培育一(yi)個國際社(she)會(hui)(hui)的(de)(de)(de)(de)學(xue)(xue)(xue)者和(he)(he)(he)(he)實(shi)踐者感(gan)(gan)興(xing)趣的(de)(de)(de)(de)所有(you)方面的(de)(de)(de)(de)神(shen)(shen)經網(wang)(wang)(wang)(wang)(wang)絡(luo)和(he)(he)(he)(he)相關方法(fa)的(de)(de)(de)(de)計(ji)算(suan)智能。神(shen)(shen)經網(wang)(wang)(wang)(wang)(wang)絡(luo)歡迎高質量論(lun)文(wen)的(de)(de)(de)(de)提交,有(you)助于全(quan)面的(de)(de)(de)(de)神(shen)(shen)經網(wang)(wang)(wang)(wang)(wang)絡(luo)研(yan)究(jiu),從行為和(he)(he)(he)(he)大腦建模,學(xue)(xue)(xue)習算(suan)法(fa),通過數(shu)(shu)學(xue)(xue)(xue)和(he)(he)(he)(he)計(ji)算(suan)分(fen)(fen)析(xi),系(xi)統的(de)(de)(de)(de)工(gong)程和(he)(he)(he)(he)技術(shu)應用,大量使用神(shen)(shen)經網(wang)(wang)(wang)(wang)(wang)絡(luo)的(de)(de)(de)(de)概念和(he)(he)(he)(he)技術(shu)。這一(yi)獨特而(er)廣泛的(de)(de)(de)(de)范圍促進了生物(wu)和(he)(he)(he)(he)技術(shu)研(yan)究(jiu)之間(jian)的(de)(de)(de)(de)思想(xiang)交流,并有(you)助于促進對生物(wu)啟發(fa)的(de)(de)(de)(de)計(ji)算(suan)智能感(gan)(gan)興(xing)趣的(de)(de)(de)(de)跨學(xue)(xue)(xue)科(ke)社(she)區(qu)的(de)(de)(de)(de)發(fa)展。因此,神(shen)(shen)經網(wang)(wang)(wang)(wang)(wang)絡(luo)編委會(hui)(hui)代表(biao)的(de)(de)(de)(de)專家領域包括心理學(xue)(xue)(xue),神(shen)(shen)經生物(wu)學(xue)(xue)(xue),計(ji)算(suan)機科(ke)學(xue)(xue)(xue),工(gong)程,數(shu)(shu)學(xue)(xue)(xue),物(wu)理。該雜志發(fa)表(biao)文(wen)章、信(xin)件(jian)和(he)(he)(he)(he)評論(lun)以(yi)及給編輯(ji)的(de)(de)(de)(de)信(xin)件(jian)、社(she)論(lun)、時(shi)事(shi)、軟件(jian)調查和(he)(he)(he)(he)專利信(xin)息。文(wen)章發(fa)表(biao)在五個部分(fen)(fen)之一(yi):認(ren)知科(ke)學(xue)(xue)(xue),神(shen)(shen)經科(ke)學(xue)(xue)(xue),學(xue)(xue)(xue)習系(xi)統,數(shu)(shu)學(xue)(xue)(xue)和(he)(he)(he)(he)計(ji)算(suan)分(fen)(fen)析(xi)、工(gong)程和(he)(he)(he)(he)應用。 官(guan)網(wang)(wang)(wang)(wang)(wang)地址(zhi):

State-of-the-art methods for quantifying wear in cylinder liners of large internal combustion engines for stationary power generation require disassembly and cutting of the examined liner. This is followed by laboratory-based high-resolution microscopic surface depth measurement that quantitatively evaluates wear based on bearing load curves (also known as Abbott-Firestone curves). Such reference methods are destructive, time-consuming and costly. The goal of the research presented here is to develop nondestructive yet reliable methods for quantifying the surface topography. A novel machine learning framework is proposed that allows prediction of the bearing load curves representing the depth profiles from reflection RGB images of the liner surface. These images can be collected with a simple handheld microscope. A joint deep learning approach involving two neural network modules optimizes the prediction quality of surface roughness parameters as well. The network stack is trained using a custom-built database containing 422 perfectly aligned depth profile and reflection image pairs of liner surfaces of large gas engines. The observed success of the method suggests its great potential for on-site wear assessment of engines during service.

Recently, deep learning technology have been extensively used in the field of image recognition. However, its main application is the recognition and detection of ordinary pictures and common scenes. It is challenging to effectively and expediently analyze remote-sensing images obtained by the image acquisition systems on unmanned aerial vehicles (UAVs), which includes the identification of the target and calculation of its position. Aerial remote sensing images have different shooting angles and methods compared with ordinary pictures or images, which makes remote-sensing images play an irreplaceable role in some areas. In this study, a new target detection and recognition method in remote-sensing images is proposed based on deep convolution neural network (CNN) for the provision of multilevel information of images in combination with a region proposal network used to generate multiangle regions-of-interest. The proposed method generated results that were much more accurate and precise than those obtained with traditional ways. This demonstrated that the model proposed herein displays tremendous applicability potential in remote-sensing image recognition.

Convolutional Neural Network (CNN) have been widely used in image classification. Over the years, they have also benefited from various enhancements and they are now considered as state of the art techniques for image like data. However, when they are used for regression to estimate some function value from images, fewer recommendations are available. In this study, a novel CNN regression model is proposed. It combines convolutional neural layers to extract high level features representations from images with a soft labelling technique. More specifically, as the deep regression task is challenging, the idea is to account for some uncertainty in the targets that are seen as distributions around their mean. The estimations are carried out by the model in the form of distributions. Building from earlier work, a specific histogram loss function based on the Kullback-Leibler (KL) divergence is applied during training. The model takes advantage of the CNN feature representation and is able to carry out estimation from multi-channel input images. To assess and illustrate the technique, the model is applied to Global Navigation Satellite System (GNSS) multi-path estimation where multi-path signal parameters have to be estimated from correlator output images from the I and Q channels. The multi-path signal delay, magnitude, Doppler shift frequency and phase parameters are estimated from synthetically generated datasets of satellite signals. Experiments are conducted under various receiving conditions and various input images resolutions to test the estimation performances quality and robustness. The results show that the proposed soft labelling CNN technique using distributional loss outperforms classical CNN regression under all conditions. Furthermore, the extra learning performance achieved by the model allows the reduction of input image resolution from 80x80 down to 40x40 or sometimes 20x20.

Graph Convolutional Networks (GCNs) have received increasing attention in recent machine learning. How to effectively leverage the rich structural information in complex graphs, such as knowledge graphs with heterogeneous types of entities and relations, is a primary open challenge in the field. Most GCN methods are either restricted to graphs with a homogeneous type of edges (e.g., citation links only), or focusing on representation learning for nodes only instead of jointly optimizing the embeddings of both nodes and edges for target-driven objectives. This paper addresses these limitations by proposing a novel framework, namely the GEneralized Multi-relational Graph Convolutional Networks (GEM-GCN), which combines the power of GCNs in graph-based belief propagation and the strengths of advanced knowledge-base embedding methods, and goes beyond. Our theoretical analysis shows that GEM-GCN offers an elegant unification of several well-known GCN methods as specific cases, with a new perspective of graph convolution. Experimental results on benchmark datasets show the advantageous performance of GEM-GCN over strong baseline methods in the tasks of knowledge graph alignment and entity classification.

It is a common paradigm in object detection frameworks to treat all samples equally and target at maximizing the performance on average. In this work, we revisit this paradigm through a careful study on how different samples contribute to the overall performance measured in terms of mAP. Our study suggests that the samples in each mini-batch are neither independent nor equally important, and therefore a better classifier on average does not necessarily mean higher mAP. Motivated by this study, we propose the notion of Prime Samples, those that play a key role in driving the detection performance. We further develop a simple yet effective sampling and learning strategy called PrIme Sample Attention (PISA) that directs the focus of the training process towards such samples. Our experiments demonstrate that it is often more effective to focus on prime samples than hard samples when training a detector. Particularly, On the MSCOCO dataset, PISA outperforms the random sampling baseline and hard mining schemes, e.g. OHEM and Focal Loss, consistently by more than 1% on both single-stage and two-stage detectors, with a strong backbone ResNeXt-101.

Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.

The low resolution of objects of interest in aerial images makes pedestrian detection and action detection extremely challenging tasks. Furthermore, using deep convolutional neural networks to process large images can be demanding in terms of computational requirements. In order to alleviate these challenges, we propose a two-step, yes and no question answering framework to find specific individuals doing one or multiple specific actions in aerial images. First, a deep object detector, Single Shot Multibox Detector (SSD), is used to generate object proposals from small aerial images. Second, another deep network, is used to learn a latent common sub-space which associates the high resolution aerial imagery and the pedestrian action labels that are provided by the human-based sources

We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.

Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect $2806$ aerial images from different sensors and platforms. Each image is of the size about 4000-by-4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using $15$ common object categories. The fully annotated DOTA images contains $188,282$ instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging.

Salient object detection is a fundamental problem and has been received a great deal of attentions in computer vision. Recently deep learning model became a powerful tool for image feature extraction. In this paper, we propose a multi-scale deep neural network (MSDNN) for salient object detection. The proposed model first extracts global high-level features and context information over the whole source image with recurrent convolutional neural network (RCNN). Then several stacked deconvolutional layers are adopted to get the multi-scale feature representation and obtain a series of saliency maps. Finally, we investigate a fusion convolution module (FCM) to build a final pixel level saliency map. The proposed model is extensively evaluated on four salient object detection benchmark datasets. Results show that our deep model significantly outperforms other 12 state-of-the-art approaches.

北京阿比特科技有限公司