高清国产三级在线播放_露脸公妇仑乱在线观看日本_国产区91在线无码_女的把腿扒开让男人桶免费观看_日韩美欧精品人妻_日韩亚洲欧美在线V_欧美成中文字幕一区二区三区

We propose \textit{average Localisation-Recall-Precision} (aLRP), a unified, bounded, balanced and ranking-based loss function for both classification and localisation tasks in object detection. aLRP extends the Localisation-Recall-Precision (LRP) performance metric (Oksuz et al., 2018) inspired from how Average Precision (AP) Loss extends precision to a ranking-based loss function for classification (Chen et al., 2020). aLRP has the following distinct advantages: (i) aLRP is the first ranking-based loss function for both classification and localisation tasks. (ii) Thanks to using ranking for both tasks, aLRP naturally enforces high-quality localisation for high-precision classification. (iii) aLRP provides provable balance between positives and negatives. (iv) Compared to on average $\sim$6 hyperparameters in the loss functions of state-of-the-art detectors, aLRP Loss has only one hyperparameter, which we did not tune in practice. On the COCO dataset, aLRP Loss improves its ranking-based predecessor, AP Loss, up to around $5$ AP points, achieves $48.9$ AP without test time augmentation and outperforms all one-stage detectors. Code available at: //github.com/kemaloksuz/aLRPLoss .

相關內容

損失(shi)函數（機器學習）

關注 10

損失(shi)函(han)數(shu)(shu)(shu)(shu)(shu)，在(zai)AI中亦稱呼距離(li)函(han)數(shu)(shu)(shu)(shu)(shu)，度量(liang)函(han)數(shu)(shu)(shu)(shu)(shu)。此處的(de)距離(li)代表的(de)是(shi)(shi)抽象性的(de)，代表真實(shi)數(shu)(shu)(shu)(shu)(shu)據與(yu)預(yu)測數(shu)(shu)(shu)(shu)(shu)據之間的(de)誤差。損失(shi)函(han)數(shu)(shu)(shu)(shu)(shu)（loss function）是(shi)(shi)用(yong)來估量(liang)你模型的(de)預(yu)測值f(x)與(yu)真實(shi)值Y的(de)不一(yi)(yi)致(zhi)程度，它是(shi)(shi)一(yi)(yi)個(ge)非負實(shi)值函(han)數(shu)(shu)(shu)(shu)(shu),通常使用(yong)L(Y, f(x))來表示(shi)，損失(shi)函(han)數(shu)(shu)(shu)(shu)(shu)越(yue)小，模型的(de)魯棒性就(jiu)越(yue)好(hao)。損失(shi)函(han)數(shu)(shu)(shu)(shu)(shu)是(shi)(shi)經驗風(feng)險函(han)數(shu)(shu)(shu)(shu)(shu)的(de)核心部(bu)分(fen)，也是(shi)(shi)結構風(feng)險函(han)數(shu)(shu)(shu)(shu)(shu)重要組成部(bu)分(fen)。

SimPLe · SSL · Performer · 未標記 · 目標檢測 ·

2020 年 12 月 3 日

A Simple Semi-Supervised Learning Framework for Object Detection

Kihyuk Sohn,Zizhao Zhang,Chun-Liang Li,Han Zhang,Chen-Yu Lee,Tomas Pfister

Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine learning models using unlabeled data. Although there has been remarkable recent progress, the scope of demonstration in SSL has mainly been on image classification tasks. In this paper, we propose STAC, a simple yet effective SSL framework for visual object detection along with a data augmentation strategy. STAC deploys highly confident pseudo labels of localized objects from an unlabeled image and updates the model by enforcing consistency via strong augmentations. We propose experimental protocols to evaluate the performance of semi-supervised object detection using MS-COCO and show the efficacy of STAC on both MS-COCO and VOC07. On VOC07, STAC improves the AP$^{0.5}$ from $76.30$ to $79.08$; on MS-COCO, STAC demonstrates $2{\times}$ higher data efficiency by achieving 24.38 mAP using only 5\% labeled data than supervised baseline that marks 23.86\% using 10\% labeled data. The code is available at //github.com/google-research/ssl_detection/.

圖片分類 · 小樣本學習 · 學成 · Extensibility · HTTPS ·

2019 年 11 月 14 日

Self-Supervised Learning For Few-Shot Image Classification

Da Chen,Yuefeng Chen,Yuhong Li,Feng Mao,Yuan He,Hui Xue

from arxiv, Submitted to ICASSP 2020, Implementation //github.com/phecy/ssl-few-shot

Few-shot image classification aims to classify unseen classes with limited labeled samples. Recent works benefit from the meta-learning process with episodic tasks and can fast adapt to class from training to testing. Due to the limited number of samples for each task, the initial embedding network for meta learning becomes an essential component and can largely affects the performance in practice. To this end, many pre-trained methods have been proposed, and most of them are trained in supervised way with limited transfer ability for unseen classes. In this paper, we proposed to train a more generalized embedding network with self-supervised learning (SSL) which can provide slow and robust representation for downstream tasks by learning from the data itself. We evaluate our work by extensive comparisons with previous baseline methods on two few-shot classification datasets ({\em i.e.,} MiniImageNet and CUB). Based on the evaluation results, the proposed method achieves significantly better performance, i.e., improve 1-shot and 5-shot tasks by nearly \textbf{3\%} and \textbf{4\%} on MiniImageNet, by nearly \textbf{9\%} and \textbf{3\%} on CUB. Moreover, the proposed method can gain the improvement of (\textbf{15\%}, \textbf{13\%}) on MiniImageNet and (\textbf{15\%}, \textbf{8\%}) on CUB by pretraining using more unlabeled data. Our code will be available at \hyperref[//github.com/phecy/SSL-FEW-SHOT.]{//github.com/phecy/ssl-few-shot.}

R-CNN · 目標檢測 · 可約的 · Processing（編程語言） · 學成 ·

2019 年 4 月 4 日

Libra R-CNN: Towards Balanced Learning for Object Detection

Jiangmiao Pang,Kai Chen,Jianping Shi,Huajun Feng,Wanli Ouyang,Dahua Lin

from arxiv, To appear at CVPR 2019

Compared with model architectures, the training process, which is also crucial to the success of detectors, has received relatively less attention in object detection. In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level. To mitigate the adverse effects caused thereby, we propose Libra R-CNN, a simple but effective framework towards balanced learning for object detection. It integrates three novel components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss, respectively for reducing the imbalance at sample, feature, and objective level. Benefitted from the overall balanced design, Libra R-CNN significantly improves the detection performance. Without bells and whistles, it achieves 2.5 points and 2.0 points higher Average Precision (AP) than FPN Faster R-CNN and RetinaNet respectively on MSCOCO.

R-CNN · 3D · 目標檢測 · 稀疏 · INFORMS ·

2019 年 2 月 26 日

Stereo R-CNN based 3D Object Detection for Autonomous Driving

Peiliang Li,Xiaozhi Chen,Shaojie Shen

from arxiv, Accepted by cvpr2019

We propose a 3D object detection method for autonomous driving by fully exploiting the sparse and dense, semantic and geometry information in stereo imagery. Our method, called Stereo R-CNN, extends Faster R-CNN for stereo inputs to simultaneously detect and associate object in left and right images. We add extra branches after stereo Region Proposal Network (RPN) to predict sparse keypoints, viewpoints, and object dimensions, which are combined with 2D left-right boxes to calculate a coarse 3D object bounding box. We then recover the accurate 3D bounding box by a region-based photometric alignment using left and right RoIs. Our method does not require depth input and 3D position supervision, however, outperforms all existing fully supervised image-based methods. Experiments on the challenging KITTI dataset show that our method outperforms the state-of-the-art stereo-based method by around 30% AP on both 3D detection and 3D localization tasks. Code will be made publicly available.

目標檢測 · Mask-RCNN · MS · 過采樣 · Performer ·

2019 年 2 月 19 日

Augmentation for small object detection

Mate Kisantal,Zbigniew Wojna,Jakub Murawski,Jacek Naruniec,Kyunghyun Cho

In recent years, object detection has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze the current state-of-the-art model, Mask-RCNN, on a challenging dataset, MS COCO. We show that the overlap between small ground-truth objects and the predicted anchors is much lower than the expected IoU threshold. We conjecture this is due to two factors; (1) only a few images are containing small objects, and (2) small objects do not appear enough even within each image containing them. We thus propose to oversample those images with small objects and augment each of those images by copy-pasting small objects many times. It allows us to trade off the quality of the detector on large objects with that on small objects. We evaluate different pasting augmentation strategies, and ultimately, we achieve 9.7\% relative improvement on the instance segmentation and 7.1\% on the object detection of small objects, compared to the current state of the art method on MS COCO.

可約的 · 預測器/決策函數 · SSD · Pyramid · MoDELS ·

2018 年 7 月 9 日

Pooling Pyramid Network for Object Detection

Pengchong Jin,Vivek Rathod,Xiangxin Zhu

We'd like to share a simple tweak of Single Shot Multibox Detector (SSD) family of detectors, which is effective in reducing model size while maintaining the same quality. We share box predictors across all scales, and replace convolution between scales with max pooling. This has two advantages over vanilla SSD: (1) it avoids score miscalibration across scales; (2) the shared predictor sees the training data over all scales. Since we reduce the number of predictors to one, and trim all convolutions between them, model size is significantly smaller. We empirically show that these changes do not hurt model quality compared to vanilla SSD.

DetNet · 圖片分類 · Backbone · 目標檢測 · Networking ·

2018 年 4 月 17 日

DetNet: A Backbone network for Object Detection

Zeming Li,Chao Peng,Gang Yu,Xiangyu Zhang,Yangdong Deng,Jian Sun

Recent CNN based object detectors, no matter one-stage methods like YOLO, SSD, and RetinaNe or two-stage detectors like Faster R-CNN, R-FCN and FPN are usually trying to directly finetune from ImageNet pre-trained models designed for image classification. There has been little work discussing on the backbone feature extractor specifically designed for the object detection. More importantly, there are several differences between the tasks of image classification and object detection. 1. Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales. 2. Object detection not only needs to recognize the category of the object instances but also spatially locate the position. Large downsampling factor brings large valid receptive field, which is good for image classification but compromises the object location ability. Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection. Moreover, DetNet includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers. Without any bells and whistles, state-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on our DetNet~(4.8G FLOPs) backbone. The code will be released for the reproduction.

INFORMS · Performer · 查全率/召回率 · 目標檢測 · 學成 ·

2018 年 3 月 19 日

Zero-Shot Detection

Pengkai Zhu,Hanxiao Wang,Tolga Bolukbasi,Venkatesh Saligrama

from arxiv, 16 pages, 5 figures

As we move towards large-scale object detection, it is unrealistic to expect annotated training data for all object classes at sufficient scale, and so methods capable of unseen object detection are required. We propose a novel zero-shot method based on training an end-to-end model that fuses semantic attribute prediction with visual features to propose object bounding boxes for seen and unseen classes. While we utilize semantic features during training, our method is agnostic to semantic information for unseen classes at test-time. Our method retains the efficiency and effectiveness of YOLO for objects seen during training, while improving its performance for novel and unseen objects. The ability of state-of-art detection methods to learn discriminative object features to reject background proposals also limits their performance for unseen objects. We posit that, to detect unseen objects, we must incorporate semantic information into the visual domain so that the learned visual features reflect this information and leads to improved recall rates for unseen objects. We test our method on PASCAL VOC and MS COCO dataset and observed significant improvements on the average precision of unseen classes.

NMS · 可辨認的 · 損失 · Performer · Better ·

2018 年 3 月 12 日

Improving Object Localization with Fitness NMS and Bounded IoU Loss

Lachlan Tychsen-Smith,Lars Petersson

from arxiv, CVPR2018 Main Conference (Poster)

We demonstrate that many detection methods are designed to identify only a sufficently accurate bounding box, rather than the best available one. To address this issue we propose a simple and fast modification to the existing methods called Fitness NMS. This method is tested with the DeNet model and obtains a significantly improved MAP at greater localization accuracies without a loss in evaluation rate, and can be used in conjunction with Soft NMS for additional improvements. Next we derive a novel bounding box regression loss based on a set of IoU upper bounds that better matches the goal of IoU maximization while still providing good convergence properties. Following these novelties we investigate RoI clustering schemes for improving evaluation rates for the DeNet wide model variants and provide an analysis of localization performance at various input image dimensions. We obtain a MAP of 33.6%@79Hz and 41.8%@5Hz for MSCOCO and a Titan X (Maxwell). Source code available from: //github.com/lachlants/denet

學成 · fast-rcnn · 相似度 · 目標檢測 · state-of-the-art ·

2018 年 2 月 21 日

Self Paced Deep Learning for Weakly Supervised Object Detection

Enver Sangineto,Moin Nabi,Dubravko Culibrk,Nicu Sebe

from arxiv, To appear at IEEE Transactions on PAMI

In a weakly-supervised scenario object detectors need to be trained using image-level annotation alone. Since bounding-box-level ground truth is not available, most of the solutions proposed so far are based on an iterative, Multiple Instance Learning framework in which the current classifier is used to select the highest-confidence boxes in each image, which are treated as pseudo-ground truth in the next training iteration. However, the errors of an immature classifier can make the process drift, usually introducing many of false positives in the training dataset. To alleviate this problem, we propose in this paper a training protocol based on the self-paced learning paradigm. The main idea is to iteratively select a subset of images and boxes that are the most reliable, and use them for training. While in the past few years similar strategies have been adopted for SVMs and other classifiers, we are the first showing that a self-paced approach can be used with deep-network-based classifiers in an end-to-end training pipeline. The method we propose is built on the fully-supervised Fast-RCNN architecture and can be applied to similar architectures which represent the input image as a bag of boxes. We show state-of-the-art results on Pascal VOC 2007, Pascal VOC 2010 and ILSVRC 2013. On ILSVRC 2013 our results based on a low-capacity AlexNet network outperform even those weakly-supervised approaches which are based on much higher-capacity networks.