国产乱人弄视频免费观看_麻豆尤物国产AV一区二区_91精品国产综合一区二区_五月丁香婷婷在线免费观看_在线观看WWW免费视频软件_日本乱码一区视频在线观看_亚洲精品在线视频观看

Munkhjargal Gochoo,Munkh-Erdene Otgonbold,Erkhembayar Ganbold,Jun-Wei Hsieh,Ming-Ching Chang,Ping-Yang Chen,Byambaa Dorj,Hamad Al Jassmi,Ganzorig Batnasan,Fady Alnajjar,Mohammed Abduljabbar,Fang-Pang Lin

from arxiv, CVPR Workshops 2023

With the advance of AI, road object detection has been a prominent topic in computer vision, mostly using perspective cameras. Fisheye lens provides omnidirectional wide coverage for using fewer cameras to monitor road intersections, however with view distortions. To our knowledge, there is no existing open dataset prepared for traffic surveillance on fisheye cameras. This paper introduces an open FishEye8K benchmark dataset for road object detection tasks, which comprises 157K bounding boxes across five classes (Pedestrian, Bike, Car, Bus, and Truck). In addition, we present benchmark results of State-of-The-Art (SoTA) models, including variations of YOLOv5, YOLOR, YOLO7, and YOLOv8. The dataset comprises 8,000 images recorded in 22 videos using 18 fisheye cameras for traffic monitoring in Hsinchu, Taiwan, at resolutions of 1080$\times$1080 and 1280$\times$1280. The data annotation and validation process were arduous and time-consuming, due to the ultra-wide panoramic and hemispherical fisheye camera images with large distortion and numerous road participants, particularly people riding scooters. To avoid bias, frames from a particular camera were assigned to either the training or test sets, maintaining a ratio of about 70:30 for both the number of images and bounding boxes in each class. Experimental results show that YOLOv8 and YOLOR outperform on input sizes 640$\times$640 and 1280$\times$1280, respectively. The dataset will be available on GitHub with PASCAL VOC, MS COCO, and YOLO annotation formats. The FishEye8K benchmark will provide significant contributions to the fisheye video analytics and smart city applications.

相關內容

數據集

關注 88

數據集，又稱為資料集、數據集合或資料集合，是一種由數據所組成的集合。
Data set（或dataset）是一個數據的集合，通常以表格形式出現。每一列代表一個特定變量。每一行都對應于某一成員的數據集的問題。它列出的價值觀為每一個變量，如身高和體重的一個物體或價值的隨機數。每個數值被稱為數據資料。對應于行數，該數據集的數據可能包括一個或多個成員。

Performer · Seven · 值域 · MoDELS · 假陰性 ·

2023 年 7 月 27 日

The Impact of Partial Occlusion on Pedestrian Detectability

Shane Gilroy,Darragh Mullins,Edward Jones,Ashkan Parsi,Martin Glavin

from arxiv, This research has been published under the title "Replacing the human driver: An objective benchmark for occluded pedestrian detection" in Biomimetic Intelligence and Robotics //doi.org/10.1016/j.birob.2023.100115

Robust detection of vulnerable road users is a safety critical requirement for the deployment of autonomous vehicles in heterogeneous traffic. One of the most complex outstanding challenges is that of partial occlusion where a target object is only partially available to the sensor due to obstruction by another foreground object. A number of leading pedestrian detection benchmarks provide annotation for partial occlusion, however each benchmark varies greatly in their definition of the occurrence and severity of occlusion. Recent research demonstrates that a high degree of subjectivity is used to classify occlusion level in these cases and occlusion is typically categorized into 2 to 3 broad categories such as partially and heavily occluded. This can lead to inaccurate or inconsistent reporting of pedestrian detection model performance depending on which benchmark is used. This research introduces a novel, objective benchmark for partially occluded pedestrian detection to facilitate the objective characterization of pedestrian detection models. Characterization is carried out on seven popular pedestrian detection models for a range of occlusion levels from 0-99%, in order to demonstrate the efficacy and increased analysis capabilities of the proposed characterization method. Results demonstrate that pedestrian detection performance degrades, and the number of false negative detections increase as pedestrian occlusion level increases. Of the seven popular pedestrian detection routines characterized, CenterNet has the greatest overall performance, followed by SSDlite. RetinaNet has the lowest overall detection performance across the range of occlusion levels.

3D · 目標檢測 · 可約的 · 傳感器 · MoDELS ·

2023 年 7 月 27 日

RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection

Jisong Kim,Minjae Seong,Geonho Bang,Dongsuk Kum,Jun Won Choi

from arxiv, 10 pages, 5 figures

While LiDAR sensors have been succesfully applied to 3D object detection, the affordability of radar and camera sensors has led to a growing interest in fusiong radars and cameras for 3D object detection. However, previous radar-camera fusion models have not been able to fully utilize radar information in that initial 3D proposals were generated based on the camera features only and the instance-level fusion is subsequently conducted. In this paper, we propose radar-camera multi-level fusion (RCM-Fusion), which fuses radar and camera modalities at both the feature-level and instance-level to fully utilize radar information. At the feature-level, we propose a Radar Guided BEV Encoder which utilizes radar Bird's-Eye-View (BEV) features to transform image features into precise BEV representations and then adaptively combines the radar and camera BEV features. At the instance-level, we propose a Radar Grid Point Refinement module that reduces localization error by considering the characteristics of the radar point clouds. The experiments conducted on the public nuScenes dataset demonstrate that our proposed RCM-Fusion offers 11.8% performance gain in nuScenes detection score (NDS) over the camera-only baseline model and achieves state-of-the-art performaces among radar-camera fusion methods in the nuScenes 3D object detection benchmark. Code will be made publicly available.

模型評估 · MoDELS · state-of-the-art · 估計/估計量 · 中央處理器 (CPU) ·

2023 年 7 月 26 日

YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems

Ivan Lazarevich,Matteo Grimaldi,Ravish Kumar,Saptarshi Mitra,Shahrukh Khan,Sudhakar Sah

We present YOLOBench, a benchmark comprised of 550+ YOLO-based object detection models on 4 different datasets and 4 different embedded hardware platforms (x86 CPU, ARM CPU, Nvidia GPU, NPU). We collect accuracy and latency numbers for a variety of YOLO-based one-stage detectors at different model scales by performing a fair, controlled comparison of these detectors with a fixed training environment (code and training hyperparameters). Pareto-optimality analysis of the collected data reveals that, if modern detection heads and training techniques are incorporated into the learning process, multiple architectures of the YOLO series achieve a good accuracy-latency trade-off, including older models like YOLOv3 and YOLOv4. We also evaluate training-free accuracy estimators used in neural architecture search on YOLOBench and demonstrate that, while most state-of-the-art zero-cost accuracy estimators are outperformed by a simple baseline like MAC count, some of them can be effectively used to predict Pareto-optimal detection models. We showcase that by using a zero-cost proxy to identify a YOLO architecture competitive against a state-of-the-art YOLOv8 model on a Raspberry Pi 4 CPU. The code and data are available at //github.com/Deeplite/deeplite-torch-zoo

Learning · Performer · 數據集 · 未標記 · 標注 ·

2023 年 7 月 25 日

Productive Crop Field Detection: A New Dataset and Deep Learning Benchmark Results

Eduardo Nascimento,John Just,Jurandy Almeida,Tiago Almeida

from arxiv, Preprint of the paper //doi.org/10.1109/lgrs.2023.3296064 published in IEEE Geoscience and Remote Sensing Letters

In precision agriculture, detecting productive crop fields is an essential practice that allows the farmer to evaluate operating performance separately and compare different seed varieties, pesticides, and fertilizers. However, manually identifying productive fields is often a time-consuming and error-prone task. Previous studies explore different methods to detect crop fields using advanced machine learning algorithms, but they often lack good quality labeled data. In this context, we propose a high-quality dataset generated by machine operation combined with Sentinel-2 images tracked over time. As far as we know, it is the first one to overcome the lack of labeled samples by using this technique. In sequence, we apply a semi-supervised classification of unlabeled data and state-of-the-art supervised and self-supervised deep learning methods to detect productive crop fields automatically. Finally, the results demonstrate high accuracy in Positive Unlabeled learning, which perfectly fits the problem where we have high confidence in the positive samples. Best performances have been found in Triplet Loss Siamese given the existence of an accurate dataset and Contrastive Learning considering situations where we do not have a comprehensive labeled dataset available.

Perplexity · MoDELS · INFORMS · Performer · SimPLe ·

2023 年 7 月 25 日

Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy

Yu Fu,Deyi Xiong,Yue Dong

from arxiv, 8 pages, 6 figures

To mitigate potential risks associated with language models, recent AI detection research proposes incorporating watermarks into machine-generated text through random vocabulary restrictions and utilizing this information for detection. While these watermarks only induce a slight deterioration in perplexity, our empirical investigation reveals a significant detriment to the performance of conditional text generation. To address this issue, we introduce a simple yet effective semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context. Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models, including BART and Flan-T5, in tasks such as summarization and data-to-text generation while maintaining detection ability.

SOD · 目標檢測 · SODA · 示例 · 數據集 ·

2022 年 7 月 28 日

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Gong Cheng,Xiang Yuan,Xiwen Yao,Kebing Yan,Qinghua Zeng,Junwei Han

from arxiv, 12 pages, 12 figures

With the rise of deep convolutional neural networks, object detection has achieved prominent advances in past years. However, such prosperity could not camouflage the unsatisfactory situation of Small Object Detection (SOD), one of the notoriously challenging tasks in computer vision, owing to the poor visual appearance and noisy representation caused by the intrinsic structure of small targets. In addition, large-scale dataset for benchmarking small object detection methods remains a bottleneck. In this paper, we first conduct a thorough review of small object detection. Then, to catalyze the development of SOD, we construct two large-scale Small Object Detection dAtasets (SODA), SODA-D and SODA-A, which focus on the Driving and Aerial scenarios respectively. SODA-D includes 24704 high-quality traffic images and 277596 instances of 9 categories. For SODA-A, we harvest 2510 high-resolution aerial images and annotate 800203 instances over 9 classes. The proposed datasets, as we know, are the first-ever attempt to large-scale benchmarks with a vast collection of exhaustively annotated instances tailored for multi-category SOD. Finally, we evaluate the performance of mainstream methods on SODA. We expect the released benchmarks could facilitate the development of SOD and spawn more breakthroughs in this field. Datasets and codes will be available soon at: \url{//shaunyuan22.github.io/SODA}.

目標跟蹤 · Extensibility · 模態 · 數據集 · Performer ·

2021 年 11 月 11 日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Chenglong Li,Tianhao Zhu,Lei Liu,Xiaonan Si,Zilin Fan,Sulan Zhai

from arxiv, In Submission

In many visual systems, visual tracking often bases on RGB image sequences, in which some targets are invalid in low-light conditions, and tracking performance is thus affected significantly. Introducing other modalities such as depth and infrared data is an effective way to handle imaging limitations of individual sources, but multi-modal imaging platforms usually require elaborate designs and cannot be applied in many real-world applications at present. Near-infrared (NIR) imaging becomes an essential part of many surveillance cameras, whose imaging is switchable between RGB and NIR based on the light intensity. These two modalities are heterogeneous with very different visual properties and thus bring big challenges for visual tracking. However, existing works have not studied this challenging problem. In this work, we address the cross-modal object tracking problem and contribute a new video dataset, including 654 cross-modal image sequences with over 481K frames in total, and the average video length is more than 735 frames. To promote the research and development of cross-modal object tracking, we propose a new algorithm, which learns the modality-aware target representation to mitigate the appearance gap between RGB and NIR modalities in the tracking process. It is plug-and-play and could thus be flexibly embedded into different tracking frameworks. Extensive experiments on the dataset are conducted, and we demonstrate the effectiveness of the proposed algorithm in two representative tracking frameworks against 17 state-of-the-art tracking methods. We will release the dataset for free academic usage, dataset download link and code will be released soon.

目標檢測 · 數據集 · 學成 · 數據驅動的方法 · 多樣性 ·

2019 年 9 月 22 日

Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark

Ke Li,Gang Wan,Gong Cheng,Liqiu Meng,Junwei Han

Substantial efforts have been devoted more recently to presenting various methods for object detection in optical remote sensing images. However, the current survey of datasets and deep learning based methods for object detection in optical remote sensing images is not adequate. Moreover, most of the existing datasets have some shortcomings, for example, the numbers of images and object categories are small scale, and the image diversity and variations are insufficient. These limitations greatly affect the development of deep learning based object detection methods. In the paper, we provide a comprehensive review of the recent deep learning based object detection progress in both the computer vision and earth observation communities. Then, we propose a large-scale, publicly available benchmark for object DetectIon in Optical Remote sensing images, which we name as DIOR. The dataset contains 23463 images and 192472 instances, covering 20 object classes. The proposed DIOR dataset 1) is large-scale on the object categories, on the object instance number, and on the total image number; 2) has a large range of object size variations, not only in terms of spatial resolutions, but also in the aspect of inter- and intra-class size variability across objects; 3) holds big variations as the images are obtained with different imaging conditions, weathers, seasons, and image quality; and 4) has high inter-class similarity and intra-class diversity. The proposed benchmark can help the researchers to develop and validate their data-driven methods. Finally, we evaluate several state-of-the-art approaches on our DIOR dataset to establish a baseline for future research.

目標檢測 · Mask-RCNN · MS · 過采樣 · Performer ·

2019 年 2 月 19 日

Augmentation for small object detection

Mate Kisantal,Zbigniew Wojna,Jakub Murawski,Jacek Naruniec,Kyunghyun Cho

In recent years, object detection has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze the current state-of-the-art model, Mask-RCNN, on a challenging dataset, MS COCO. We show that the overlap between small ground-truth objects and the predicted anchors is much lower than the expected IoU threshold. We conjecture this is due to two factors; (1) only a few images are containing small objects, and (2) small objects do not appear enough even within each image containing them. We thus propose to oversample those images with small objects and augment each of those images by copy-pasting small objects many times. It allows us to trade off the quality of the detector on large objects with that on small objects. We evaluate different pasting augmentation strategies, and ultimately, we achieve 9.7\% relative improvement on the instance segmentation and 7.1\% on the object detection of small objects, compared to the current state of the art method on MS COCO.

目標檢測 · Fashion MNIST (數據集) · SimPLe · Vision · 訓練數據 ·

2018 年 5 月 17 日

Zero-Shot Object Detection by Hybrid Region Embedding

Berkan Demirel,Ramazan Gokberk Cinbis,Nazli Ikizler-Cinbis

Object detection is considered as one of the most challenging problems in computer vision, since it requires correct prediction of both classes and locations of objects in images. In this study, we define a more difficult scenario, namely zero-shot object detection (ZSD) where no visual training data is available for some of the target object classes. We present a novel approach to tackle this ZSD problem, where a convex combination of embeddings are used in conjunction with a detection framework. For evaluation of ZSD methods, we propose a simple dataset constructed from Fashion-MNIST images and also a custom zero-shot split for the Pascal VOC detection challenge. The experimental results suggest that our method yields promising results for ZSD.