亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

As the rapid development of depth learning, object detection in aviatic remote sensing images has become increasingly popular in recent years. Most of the current Anchor Free detectors based on key point detection sampling directly regression and classification features, with the design of object loss function based on the horizontal bounding box. It is more challenging for complex and diverse aviatic remote sensing object. In this paper, we propose an Anchor Free aviatic remote sensing object detector (BWP-Det) to detect rotating and multi-scale object. Specifically, we design a interactive double-branch(IDB) up-sampling network, in which one branch gradually up-sampling is used for the prediction of Heatmap, and the other branch is used for the regression of boundary box parameters. We improve a weighted multi-scale convolution (WmConv) in order to highlight the difference between foreground and background. We extracted Pixel level attention features from the middle layer to guide the two branches to pay attention to effective object information in the sampling process. Finally, referring to the calculation idea of horizontal IoU, we design a rotating IoU based on the split polar coordinate plane, namely JIoU, which is expressed as the intersection ratio following discretization of the inner ellipse of the rotating bounding box, to solve the correlation between angle and side length in the regression process of the rotating bounding box. Ultimately, BWP-Det, our experiments on DOTA, UCAS-AOD and NWPU VHR-10 datasets show, achieves advanced performance with simpler models and fewer regression parameters.

相關內容

Popular social media platforms employ neural network based image moderation engines to classify images uploaded on them as having potentially objectionable content. Such moderation engines must answer a large number of queries with heavy computational cost, even though the actual number of images with objectionable content is usually a tiny fraction. Inspired by recent work on Neural Group Testing, we propose an approach which exploits this fact to reduce the overall computational cost of such engines using the technique of Compressed Sensing (CS). We present the quantitative matrix-pooled neural network (QMPNN), which takes as input $n$ images, and a $m \times n$ binary pooling matrix with $m < n$, whose rows indicate $m$ pools of images i.e. selections of $r$ images out of $n$. The QMPNN efficiently outputs the product of this matrix with the unknown sparse binary vector indicating whether each image is objectionable or not, i.e. it outputs the number of objectionable images in each pool. For suitable matrices, this is decoded using CS decoding algorithms to predict which images were objectionable. The computational cost of running the QMPNN and the CS algorithms is significantly lower than the cost of using a neural network with the same number of parameters separately on each image to classify the images, which we demonstrate via extensive experiments. Our technique is inherently resilient to moderate levels of errors in the prediction from the QMPNN. Furthermore, we present pooled deep outlier detection, which brings CS and group testing techniques to deep outlier detection, to provide for the case when the objectionable images do not belong to a set of pre-defined classes. This technique enables efficient automated moderation of off-topic images shared on topical forums dedicated to sharing images of a certain single class, many of which are currently human-moderated.

This paper proposes an online, provably robust, and scalable Bayesian approach for changepoint detection. The resulting algorithm has key advantages over previous work: it provides provable robustness by leveraging the generalised Bayesian perspective, and also addresses the scalability issues of previous attempts. Specifically, the proposed generalised Bayesian formalism leads to conjugate posteriors whose parameters are available in closed form by leveraging diffusion score matching. The resulting algorithm is exact, can be updated through simple algebra, and is more than 10 times faster than its closest competitor.

Object detection (OD) is crucial to autonomous driving. Unknown objects are one of the reasons that hinder autonomous vehicles from driving beyond the operational domain. We propose a saliency-based OD algorithm (SalienDet) to detect objects that do not appear in the training sample set. SalienDet utilizes a saliency-based algorithm to enhance image features for object proposal generation. Then, we design a dataset relabeling approach to differentiate the unknown objects from all objects to achieve open-world detection. We evaluate SalienDet on KITTI, NuScenes, and BDD datasets, and the result indicates that it outperforms existing algorithms for unknown object detection. Additionally, SalienDet can be easily adapted for incremental learning in open-world detection tasks.

Recent Transformer-based 3D object detectors learn point cloud features either from point- or voxel-based representations. However, the former requires time-consuming sampling while the latter introduces quantization errors. In this paper, we present a novel Point-Voxel Transformer for single-stage 3D detection (PVT-SSD) that takes advantage of these two representations. Specifically, we first use voxel-based sparse convolutions for efficient feature encoding. Then, we propose a Point-Voxel Transformer (PVT) module that obtains long-range contexts in a cheap manner from voxels while attaining accurate positions from points. The key to associating the two different representations is our introduced input-dependent Query Initialization module, which could efficiently generate reference points and content queries. Then, PVT adaptively fuses long-range contextual and local geometric information around reference points into content queries. Further, to quickly find the neighboring points of reference points, we design the Virtual Range Image module, which generalizes the native range image to multi-sensor and multi-frame. The experiments on several autonomous driving benchmarks verify the effectiveness and efficiency of the proposed method. Code will be available at //github.com/Nightmare-n/PVT-SSD.

We propose a novel method called SHS-Net for oriented normal estimation of point clouds by learning signed hyper surfaces, which can accurately predict normals with global consistent orientation from various point clouds. Almost all existing methods estimate oriented normals through a two-stage pipeline, i.e., unoriented normal estimation and normal orientation, and each step is implemented by a separate algorithm. However, previous methods are sensitive to parameter settings, resulting in poor results from point clouds with noise, density variations and complex geometries. In this work, we introduce signed hyper surfaces (SHS), which are parameterized by multi-layer perceptron (MLP) layers, to learn to estimate oriented normals from point clouds in an end-to-end manner. The signed hyper surfaces are implicitly learned in a high-dimensional feature space where the local and global information is aggregated. Specifically, we introduce a patch encoding module and a shape encoding module to encode a 3D point cloud into a local latent code and a global latent code, respectively. Then, an attention-weighted normal prediction module is proposed as a decoder, which takes the local and global latent codes as input to predict oriented normals. Experimental results show that our SHS-Net outperforms the state-of-the-art methods in both unoriented and oriented normal estimation on the widely used benchmarks. The code, data and pretrained models are publicly available.

Change detection is an essential and widely utilized task in remote sensing that aims to detect and analyze changes occurring in the same geographical area over time, which has broad applications in urban development, agricultural surveys, and land cover monitoring. Detecting changes in remote sensing images is a complex challenge due to various factors, including variations in image quality, noise, registration errors, illumination changes, complex landscapes, and spatial heterogeneity. In recent years, deep learning has emerged as a powerful tool for feature extraction and addressing these challenges. Its versatility has resulted in its widespread adoption for numerous image-processing tasks. This paper presents a comprehensive survey of significant advancements in change detection for remote sensing images over the past decade. We first introduce some preliminary knowledge for the change detection task, such as problem definition, datasets, evaluation metrics, and transformer basics, as well as provide a detailed taxonomy of existing algorithms from three different perspectives: algorithm granularity, supervision modes, and learning frameworks in the methodology section. This survey enables readers to gain systematic knowledge of change detection tasks from various angles. We then summarize the state-of-the-art performance on several dominant change detection datasets, providing insights into the strengths and limitations of existing algorithms. Based on our survey, some future research directions for change detection in remote sensing are well identified. This survey paper will shed some light on the community and inspire further research efforts in the change detection task.

Autonomous driving is regarded as one of the most promising remedies to shield human beings from severe crashes. To this end, 3D object detection serves as the core basis of such perception system especially for the sake of path planning, motion prediction, collision avoidance, etc. Generally, stereo or monocular images with corresponding 3D point clouds are already standard layout for 3D object detection, out of which point clouds are increasingly prevalent with accurate depth information being provided. Despite existing efforts, 3D object detection on point clouds is still in its infancy due to high sparseness and irregularity of point clouds by nature, misalignment view between camera view and LiDAR bird's eye of view for modality synergies, occlusions and scale variations at long distances, etc. Recently, profound progress has been made in 3D object detection, with a large body of literature being investigated to address this vision task. As such, we present a comprehensive review of the latest progress in this field covering all the main topics including sensors, fundamentals, and the recent state-of-the-art detection methods with their pros and cons. Furthermore, we introduce metrics and provide quantitative comparisons on popular public datasets. The avenues for future work are going to be judiciously identified after an in-deep analysis of the surveyed works. Finally, we conclude this paper.

Object detection with transformers (DETR) reaches competitive performance with Faster R-CNN via a transformer encoder-decoder architecture. Inspired by the great success of pre-training transformers in natural language processing, we propose a pretext task named random query patch detection to unsupervisedly pre-train DETR (UP-DETR) for object detection. Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. The model is pre-trained to detect these query patches from the original image. During the pre-training, we address two critical issues: multi-task learning and multi-query localization. (1) To trade-off multi-task learning of classification and localization in the pretext task, we freeze the CNN backbone and propose a patch feature reconstruction branch which is jointly optimized with patch detection. (2) To perform multi-query localization, we introduce UP-DETR from single-query patch and extend it to multi-query patches with object query shuffle and attention mask. In our experiments, UP-DETR significantly boosts the performance of DETR with faster convergence and higher precision on PASCAL VOC and COCO datasets. The code will be available soon.

Conventional methods for object detection typically require a substantial amount of training data and preparing such high-quality training data is very labor-intensive. In this paper, we propose a novel few-shot object detection network that aims at detecting objects of unseen categories with only a few annotated examples. Central to our method are our Attention-RPN, Multi-Relation Detector and Contrastive Training strategy, which exploit the similarity between the few shot support set and query set to detect novel objects while suppressing false detection in the background. To train our network, we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations. To the best of our knowledge, this is one of the first datasets specifically designed for few-shot object detection. Once our few-shot network is trained, it can detect objects of unseen categories without further training or fine-tuning. Our method is general and has a wide range of potential applications. We produce a new state-of-the-art performance on different datasets in the few-shot setting. The dataset link is //github.com/fanq15/Few-Shot-Object-Detection-Dataset.

Inspired by recent development of artificial satellite, remote sensing images have attracted extensive attention. Recently, noticeable progress has been made in scene classification and target detection.However, it is still not clear how to describe the remote sensing image content with accurate and concise sentences. In this paper, we investigate to describe the remote sensing images with accurate and flexible sentences. First, some annotated instructions are presented to better describe the remote sensing images considering the special characteristics of remote sensing images. Second, in order to exhaustively exploit the contents of remote sensing images, a large-scale aerial image data set is constructed for remote sensing image caption. Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption. Extensive experiments on the proposed data set demonstrate that the content of the remote sensing image can be completely described by generating language descriptions. The data set is available at //github.com/2051/RSICD_optimal

北京阿比特科技有限公司