亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Four-variable-independent-regression localization losses, such as Smooth-$\ell_1$ Loss, are used by default in modern detectors. Nevertheless, this kind of loss is oversimplified so that it is inconsistent with the final evaluation metric, intersection over union (IoU). Directly employing the standard IoU is also not infeasible, since the constant-zero plateau in the case of non-overlapping boxes and the non-zero gradient at the minimum may make it not trainable. Accordingly, we propose a systematic method to address these problems. Firstly, we propose a new metric, the extended IoU (EIoU), which is well-defined when two boxes are not overlapping and reduced to the standard IoU when overlapping. Secondly, we present the convexification technique (CT) to construct a loss on the basis of EIoU, which can guarantee the gradient at the minimum to be zero. Thirdly, we propose a steady optimization technique (SOT) to make the fractional EIoU loss approaching the minimum more steadily and smoothly. Fourthly, to fully exploit the capability of the EIoU based loss, we introduce an interrelated IoU-predicting head to further boost localization accuracy. With the proposed contributions, the new method incorporated into Faster R-CNN with ResNet50+FPN as the backbone yields \textbf{4.2 mAP} gain on VOC2007 and \textbf{2.3 mAP} gain on COCO2017 over the baseline Smooth-$\ell_1$ Loss, at almost \textbf{no training and inferencing computational cost}. Specifically, the stricter the metric is, the more notable the gain is, improving \textbf{8.2 mAP} on VOC2007 and \textbf{5.4 mAP} on COCO2017 at metric $AP_{90}$.

相關內容

Functional quantile regression (FQR) is a useful alternative to mean regression for functional data as it provides a comprehensive understanding of how scalar predictors influence the conditional distribution of functional responses. In this article, we study the FQR model for densely sampled, high-dimensional functional data without relying on parametric error or independent stochastic process assumptions, with the focus being on statistical inference under this challenging regime along with scalable implementation. This is achieved by a simple but powerful distributed strategy, in which we first perform separate quantile regression to compute $M$-estimators at each sampling location, and then carry out estimation and inference for the entire coefficient functions by properly exploiting the uncertainty quantification and dependence structures of $M$-estimators. We derive a uniform Bahadur representation and a strong Gaussian approximation result for the $M$-estimators on the discrete sampling grid, leading to dimension reduction and serving as the basis for inference. An interpolation-based estimator with minimax optimality and a Bayesian alternative to improve upon finite sample performance are discussed. Large sample properties for point and simultaneous interval estimators are established. The obtained minimax optimal rate under the FQR model shows an interesting phase transition phenomenon that has been previously observed in functional mean regression. The proposed methods are illustrated via simulations and an application to a mass spectrometry proteomics dataset.

The CUR decomposition is a technique for low-rank approximation that selects small subsets of the columns and rows of a given matrix to use as bases for its column and rowspaces. It has recently attracted much interest, as it has several advantages over traditional low rank decompositions based on orthonormal bases. These include the preservation of properties such as sparsity or non-negativity, the ability to interpret data, and reduced storage requirements. The problem of finding the skeleton sets that minimize the norm of the residual error is known to be NP-hard, but classical pivoting schemes such as column pivoted QR work tend to work well in practice. When combined with randomized dimension reduction techniques, classical pivoting based methods become particularly effective, and have proven capable of very rapidly computing approximate CUR decompositions of large, potentially sparse, matrices. Another class of popular algorithms for computing CUR de-compositions are based on drawing the columns and rows randomly from the full index sets, using specialized probability distributions based on leverage scores. Such sampling based techniques are particularly appealing for very large scale problems, and are well supported by theoretical performance guarantees. This manuscript provides a comparative study of the various randomized algorithms for computing CUR decompositions that have recently been proposed. Additionally, it proposes some modifications and simplifications to the existing algorithms that leads to faster execution times.

Image captioning has attracted considerable attention in recent years. However, little work has been done for game image captioning which has some unique characteristics and requirements. In this work we propose a novel game image captioning model which integrates bottom-up attention with a new multi-level residual top-down attention mechanism. Firstly, a lower-level residual top-down attention network is added to the Faster R-CNN based bottom-up attention network to address the problem that the latter may lose important spatial information when extracting regional features. Secondly, an upper-level residual top-down attention network is implemented in the caption generation network to better fuse the extracted regional features for subsequent caption prediction. We create two game datasets to evaluate the proposed model. Extensive experiments show that our proposed model outperforms existing baseline models.

Retrieving object instances among cluttered scenes efficiently requires compact yet comprehensive regional image representations. Intuitively, object semantics can help build the index that focuses on the most relevant regions. However, due to the lack of bounding-box datasets for objects of interest among retrieval benchmarks, most recent work on regional representations has focused on either uniform or class-agnostic region selection. In this paper, we first fill the void by providing a new dataset of landmark bounding boxes, based on the Google Landmarks dataset, that includes $86k$ images with manually curated boxes from $15k$ unique landmarks. Then, we demonstrate how a trained landmark detector, using our new dataset, can be leveraged to index image regions and improve retrieval accuracy while being much more efficient than existing regional methods. In addition, we introduce a novel regional aggregated selective match kernel (R-ASMK) to effectively combine information from detected regions into an improved holistic image representation. R-ASMK boosts image retrieval accuracy substantially with no dimensionality increase, while even outperforming systems that index image regions independently. Our complete image retrieval system improves upon the previous state-of-the-art by significant margins on the Revisited Oxford and Paris datasets. Code and data available at the project webpage: //github.com/tensorflow/models/tree/master/research/delf.

Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. However, there is a gap between optimizing the commonly used distance losses for regressing the parameters of a bounding box and maximizing this metric value. The optimal objective for a metric is the metric itself. In the case of axis-aligned 2D bounding boxes, it can be shown that $IoU$ can be directly used as a regression loss. However, $IoU$ has a plateau making it infeasible to optimize in the case of non-overlapping bounding boxes. In this paper, we address the weaknesses of $IoU$ by introducing a generalized version as both a new loss and a new metric. By incorporating this generalized $IoU$ ($GIoU$) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, $IoU$ based, and new, $GIoU$ based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.

The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. The modeling power is enhanced through a more comprehensive integration of deformable convolution within the network, and by introducing a modulation mechanism that expands the scope of deformation modeling. To effectively harness this enriched modeling capability, we guide network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features. With the proposed contributions, this new version of Deformable ConvNets yields significant performance gains over the original model and produces leading results on the COCO benchmark for object detection and instance segmentation.

Non-maximum suppression (NMS) is essential for state-of-the-art object detectors to localize object from a set of candidate locations. However, accurate candidate location sometimes is not associated with a high classification score, which leads to object localization failure during NMS. In this paper, we introduce a novel bounding box regression loss for learning bounding box transformation and localization variance together. The resulting localization variance exhibits a strong connection to localization accuracy, which is then utilized in our new non-maximum suppression method to improve localization accuracy for object detection. On MS-COCO, we boost the AP of VGG-16 faster R-CNN from 23.6% to 29.1% with a single model and nearly no additional computational overhead. More importantly, our method is able to improve the AP of ResNet-50 FPN fast R-CNN from 36.8% to 37.8%, which achieves state-of-the-art bounding box refinement result.

Average precision (AP), the area under the recall-precision (RP) curve, is the standard performance measure for object detection. Despite its wide acceptance, it has a number of shortcomings, the most important of which are (i) the inability to distinguish very different RP curves, and (ii) the lack of directly measuring bounding box localization accuracy. In this paper, we propose 'Localization Recall Precision (LRP) Error', a new metric which we specifically designed for object detection. LRP Error is composed of three components related to localization, false negative (FN) rate and false positive (FP) rate. Based on LRP, we introduce the 'Optimal LRP', the minimum achievable LRP error representing the best achievable configuration of the detector in terms of recall-precision and the tightness of the boxes. In contrast to AP, which considers precisions over the entire recall domain, Optimal LRP determines the 'best' confidence score threshold for a class, which balances the trade-off between localization and recall-precision. In our experiments, we show that, for state-of-the-art object (SOTA) detectors, Optimal LRP provides richer and more discriminative information than AP. We also demonstrate that the best confidence score thresholds vary significantly among classes and detectors. Moreover, we present LRP results of a simple online video object detector which uses a SOTA still image object detector and show that the class-specific optimized thresholds increase the accuracy against the common approach of using a general threshold for all classes. We provide the source code that can compute LRP for the PASCAL VOC and MSCOCO datasets in //github.com/cancam/LRP. Our source code can easily be adapted to other datasets as well.

In this paper, a novel image moments based model for shape estimation and tracking of an object moving with a complex trajectory is presented. The camera is assumed to be stationary looking at a moving object. Point features inside the object are sampled as measurements. An ellipsoidal approximation of the shape is assumed as a primitive shape. The shape of an ellipse is estimated using a combination of image moments. Dynamic model of image moments when the object moves under the constant velocity or coordinated turn motion model is derived as a function for the shape estimation of the object. An Unscented Kalman Filter-Interacting Multiple Model (UKF-IMM) filter algorithm is applied to estimate the shape of the object (approximated as an ellipse) and track its position and velocity. A likelihood function based on average log-likelihood is derived for the IMM filter. Simulation results of the proposed UKF-IMM algorithm with the image moments based models are presented that show the estimations of the shape of the object moving in complex trajectories. Comparison results, using intersection over union (IOU), and position and velocity root mean square errors (RMSE) as metrics, with a benchmark algorithm from literature are presented. Results on real image data captured from the quadcopter are also presented.

We demonstrate that many detection methods are designed to identify only a sufficently accurate bounding box, rather than the best available one. To address this issue we propose a simple and fast modification to the existing methods called Fitness NMS. This method is tested with the DeNet model and obtains a significantly improved MAP at greater localization accuracies without a loss in evaluation rate. Next we derive a novel bounding box regression loss based on a set of IoU upper bounds that better matches the goal of IoU maximization while still providing good convergence properties. Following these novelties we investigate RoI clustering schemes for improving evaluation rates for the DeNet \textit{wide} model variants and provide an analysis of localization performance at various input image dimensions. We obtain a MAP[0.5:0.95] of 33.6\%@79Hz and 41.8\%@5Hz for MSCOCO and a Titan X (Maxwell).

北京阿比特科技有限公司