As solar capacity installed worldwide continues to grow, there is an increasing awareness that advanced inspection systems are becoming of utmost importance to schedule smart interventions and minimize downtime likelihood. In this work we propose a novel automatic multi-stage model to detect panel defects on aerial images captured by unmanned aerial vehicle by using the YOLOv3 network and Computer Vision techniques. The model combines detections of panels and defects to refine its accuracy. The main novelties are represented by its versatility to process either thermographic or visible images and detect a large variety of defects and its portability to both rooftop and ground-mounted PV systems and different panel types. The proposed model has been validated on two big PV plants in the south of Italy with an outstanding [email protected] exceeding 98% for panel detection, a remarkable [email protected] ([email protected]) of roughly 88.3% (66.95%) for hotspots by means of infrared thermography and a [email protected] of almost 70% in the visible spectrum for detection of anomalies including panel shading induced by soiling and bird dropping, delamination, presence of puddles and raised rooftop panels. An estimation of the soiling coverage is also predicted. Finally an analysis of the influence of the different YOLOv3's output scales on the detection is discussed.
Autonomous Vehicles (AVs) i.e., self-driving cars, operate in a safety critical domain, since errors in the autonomous driving software can lead to huge losses. Statistically, road intersections which are a part of the AVs operational design domain (ODD), have some of the highest accident rates. Hence, testing AVs to the limits on road intersections and assuring their safety on road intersections is pertinent, and thus the focus of this paper. We present a situation coverage-based (SitCov) AV-testing framework for the verification and validation (V&V) and safety assurance of AVs, developed in an open-source AV simulator named CARLA. The SitCov AV-testing framework focuses on vehicle-to-vehicle interaction on a road intersection under different environmental and intersection configuration situations, using situation coverage criteria for automatic test suite generation for safety assurance of AVs. We have developed an ontology for intersection situations, and used it to generate a situation hyperspace i.e., the space of all possible situations arising from that ontology. For the evaluation of our SitCov AV-testing framework, we have seeded multiple faults in our ego AV, and compared situation coverage based and random situation generation. We have found that both generation methodologies trigger around the same number of seeded faults, but the situation coverage-based generation tells us a lot more about the weaknesses of the autonomous driving algorithm of our ego AV, especially in edge-cases. Our code is publicly available online, anyone can use our SitCov AV-testing framework and use it or build further on top of it. This paper aims to contribute to the domain of V&V and development of AVs, not only from a theoretical point of view, but also from the viewpoint of an open-source software contribution and releasing a flexible/effective tool for V&V and development of AVs.
At present, a major challenge for the application of automatic driving technology is the accurate prediction of vehicle trajectory. With the vigorous development of computer technology and the emergence of convolution depth neural network, the accuracy of prediction results has been improved. But, the depth, width of the network and image resolution are still important reasons that restrict the accuracy of the model and the prediction results. The main innovation of this paper is the combination of RESNET network and efficient net network, which not only greatly increases the network depth, but also comprehensively changes the choice of network width and image resolution, so as to make the model performance better, but also save computing resources as much as possible. The experimental results also show that our proposed model obtains the optimal prediction results. Specifically, the loss value of our method is separately 4 less and 2.1 less than that of resnet and efficientnet method.
Computer aided diagnosis (CAD) increases diagnosis efficiency, helping doctors providing a quick and confident diagnosis, it has played an important role in the treatment of COVID19. In our task, we solve the problem about abnormality detection and classification. The dataset provided by Kaggle platform and we choose YOLOv5 as our model. We introduce some methods on objective detection in the related work section, the objection detection can be divided into two streams: onestage and two stage. The representational model are Faster RCNN and YOLO series. Then we describe the YOLOv5 model in the detail. Compared Experiments and results are shown in section IV. We choose mean average precision (mAP) as our experiments' metrics, and the higher (mean) mAP is, the better result the model will gain. [email protected] of our YOLOv5s is 0.623 which is 0.157 and 0.101 higher than Faster RCNN and EfficientDet respectively.
Being effective and efficient is essential to an object detector for practical use. To meet these two concerns, we comprehensively evaluate a collection of existing refinements to improve the performance of PP-YOLO while almost keep the infer time unchanged. This paper will analyze a collection of refinements and empirically evaluate their impact on the final model performance through incremental ablation study. Things we tried that didn't work will also be discussed. By combining multiple effective refinements, we boost PP-YOLO's performance from 45.9% mAP to 49.5% mAP on COCO2017 test-dev. Since a significant margin of performance has been made, we present PP-YOLOv2. In terms of speed, PP-YOLOv2 runs in 68.9FPS at 640x640 input size. Paddle inference engine with TensorRT, FP16-precision, and batch size = 1 further improves PP-YOLOv2's infer speed, which achieves 106.5 FPS. Such a performance surpasses existing object detectors with roughly the same amount of parameters (i.e., YOLOv4-CSP, YOLOv5l). Besides, PP-YOLOv2 with ResNet101 achieves 50.3% mAP on COCO2017 test-dev. Source code is at //github.com/PaddlePaddle/PaddleDetection.
Detecting objects in aerial images is challenging for at least two reasons: (1) target objects like pedestrians are very small in pixels, making them hardly distinguished from surrounding background; and (2) targets are in general sparsely and non-uniformly distributed, making the detection very inefficient. In this paper, we address both issues inspired by observing that these targets are often clustered. In particular, we propose a Clustered Detection (ClusDet) network that unifies object clustering and detection in an end-to-end framework. The key components in ClusDet include a cluster proposal sub-network (CPNet), a scale estimation sub-network (ScaleNet), and a dedicated detection network (DetecNet). Given an input image, CPNet produces object cluster regions and ScaleNet estimates object scales for these regions. Then, each scale-normalized cluster region is fed into DetecNet for object detection. ClusDet has several advantages over previous solutions: (1) it greatly reduces the number of chips for final object detection and hence achieves high running time efficiency, (2) the cluster-based scale estimation is more accurate than previously used single-object based ones, hence effectively improves the detection for small objects, and (3) the final DetecNet is dedicated for clustered regions and implicitly models the prior context information so as to boost detection accuracy. The proposed method is tested on three popular aerial image datasets including VisDrone, UAVDT and DOTA. In all experiments, ClusDet achieves promising performance in comparison with state-of-the-art detectors. Code will be available in \url{//github.com/fyangneil}.
Detection and classification of objects in aerial imagery have several applications like urban planning, crop surveillance, and traffic surveillance. However, due to the lower resolution of the objects and the effect of noise in aerial images, extracting distinguishing features for the objects is a challenge. We evaluate CenterNet, a state of the art method for real-time 2D object detection, on the VisDrone2019 dataset. We evaluate the performance of the model with different backbone networks in conjunction with varying resolutions during training and testing.
Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.
Generative adversarial networks (GANs) have been promising for many computer vision problems due to their powerful capabilities to enhance the data for training and test. In this paper, we leveraged GANs and proposed a new architecture with a cascaded Single Shot Detector (SSD) for pedestrian detection at distance, which is yet a challenge due to the varied sizes of pedestrians in videos at distance. To overcome the low-resolution issues in pedestrian detection at distance, DCGAN is employed to improve the resolution first to reconstruct more discriminative features for a SSD to detect objects in images or videos. A crucial advantage of our method is that it learns a multi-scale metric to distinguish multiple objects at different distances under one image, while DCGAN serves as an encoder-decoder platform to generate parts of an image that contain better discriminative information. To measure the effectiveness of our proposed method, experiments were carried out on the Canadian Institute for Advanced Research (CIFAR) dataset, and it was demonstrated that the proposed new architecture achieved a much better detection rate, particularly on vehicles and pedestrians at distance, making it highly suitable for smart cities applications that need to discover key objects or pedestrians at distance.
Unmanned Aerial Vehicles are increasingly being used in surveillance and traffic monitoring thanks to their high mobility and ability to cover areas at different altitudes and locations. One of the major challenges is to use aerial images to accurately detect cars and count them in real-time for traffic monitoring purposes. Several deep learning techniques were recently proposed based on convolution neural network (CNN) for real-time classification and recognition in computer vision. However, their performance depends on the scenarios where they are used. In this paper, we investigate the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN and YOLOv3, in the context of car detection from aerial images. We trained and tested these two models on a large car dataset taken from UAVs. We demonstrated in this paper that YOLOv3 outperforms Faster R-CNN in sensitivity and processing time, although they are comparable in the precision metric.
Object detection is a major challenge in computer vision, involving both object classification and object localization within a scene. While deep neural networks have been shown in recent years to yield very powerful techniques for tackling the challenge of object detection, one of the biggest challenges with enabling such object detection networks for widespread deployment on embedded devices is high computational and memory requirements. Recently, there has been an increasing focus in exploring small deep neural network architectures for object detection that are more suitable for embedded devices, such as Tiny YOLO and SqueezeDet. Inspired by the efficiency of the Fire microarchitecture introduced in SqueezeNet and the object detection performance of the single-shot detection macroarchitecture introduced in SSD, this paper introduces Tiny SSD, a single-shot detection deep convolutional neural network for real-time embedded object detection that is composed of a highly optimized, non-uniform Fire sub-network stack and a non-uniform sub-network stack of highly optimized SSD-based auxiliary convolutional feature layers designed specifically to minimize model size while maintaining object detection performance. The resulting Tiny SSD possess a model size of 2.3MB (~26X smaller than Tiny YOLO) while still achieving an mAP of 61.3% on VOC 2007 (~4.2% higher than Tiny YOLO). These experimental results show that very small deep neural network architectures can be designed for real-time object detection that are well-suited for embedded scenarios.