亚洲精品无码黄色网站在线观看_亚洲AV无码一区二区三区久久_亚洲偷自拍视频在线观看_久久久久久精品人妻_成人亚洲国产播放器网站下载_国产日韩欧美在线不卡高清_亚洲日韩国产一区夜夜嗨

Weakly supervised learning based on scribble annotations in target extraction of remote sensing images has drawn much interest due to scribbles' flexibility in denoting winding objects and low cost of manually labeling. However, scribbles are too sparse to identify object structure and detailed information, bringing great challenges in target localization and boundary description. To alleviate these problems, in this paper, we construct two inner structure-constraints, a deformation consistency loss and a trainable active contour loss, together with a scribble-constraint to supervise the optimization of the encoder-decoder network without introducing any auxiliary module or extra operation based on prior cues. Comprehensive experiments demonstrate our method's superiority over five state-of-the-art algorithms in this field. Source code is available at //github.com/yitongli123/ISC-TE.

相關內容

可(ke)辨(bian)認的

關注 4

偽標記 · Performer · 3D · Boosting（一種模型訓練加速方式） · 點云 ·

2023 年 7 月 5 日

Teachers in concordance for pseudo-labeling of 3D sequential data

Awet Haileslassie Gebrehiwot,Patrik Vacek,David Hurych,Karel Zimmermann,Patrick Perez,Tomá? Svoboda

from arxiv, This work has been submitted to the IEEE for publication

Automatic pseudo-labeling is a powerful tool to tap into large amounts of sequential unlabeled data. It is specially appealing in safety-critical applications of autonomous driving, where performance requirements are extreme, datasets are large, and manual labeling is very challenging. We propose to leverage sequences of point clouds to boost the pseudolabeling technique in a teacher-student setup via training multiple teachers, each with access to different temporal information. This set of teachers, dubbed Concordance, provides higher quality pseudo-labels for student training than standard methods. The output of multiple teachers is combined via a novel pseudo label confidence-guided criterion. Our experimental evaluation focuses on the 3D point cloud domain and urban driving scenarios. We show the performance of our method applied to 3D semantic segmentation and 3D object detection on three benchmark datasets. Our approach, which uses only 20% manual labels, outperforms some fully supervised methods. A notable performance boost is achieved for classes rarely appearing in training data.

數據集 · Yolo · 可辨認的 · Extensibility · 標注 ·

2023 年 7 月 4 日

Synthetic Data-based Detection of Zebras in Drone Imagery

Elia Bonetto,Aamir Ahmad

from arxiv, 8 pages, 7 figures, 3 tables. Published in IEEE ECMR 2023

Nowadays, there is a wide availability of datasets that enable the training of common object detectors or human detectors. These come in the form of labelled real-world images and require either a significant amount of human effort, with a high probability of errors such as missing labels, or very constrained scenarios, e.g. VICON systems. On the other hand, uncommon scenarios, like aerial views, animals, like wild zebras, or difficult-to-obtain information, such as human shapes, are hardly available. To overcome this, synthetic data generation with realistic rendering technologies has recently gained traction and advanced research areas such as target tracking and human pose estimation. However, subjects such as wild animals are still usually not well represented in such datasets. In this work, we first show that a pre-trained YOLO detector can not identify zebras in real images recorded from aerial viewpoints. To solve this, we present an approach for training an animal detector using only synthetic data. We start by generating a novel synthetic zebra dataset using GRADE, a state-of-the-art framework for data generation. The dataset includes RGB, depth, skeletal joint locations, pose, shape and instance segmentations for each subject. We use this to train a YOLO detector from scratch. Through extensive evaluations of our model with real-world data from i) limited datasets available on the internet and ii) a new one collected and manually labelled by us, we show that we can detect zebras by using only synthetic data during training. The code, results, trained models, and both the generated and training data are provided as open-source at //eliabntt.github.io/grade-rr.

塑造 · Projection · 3D · 穩健性 · 泛函 ·

2023 年 7 月 3 日

ROAR: Robust Adaptive Reconstruction of Shapes Using Planar Projections

Amir Barda,Yotam Erel,Yoni Kasten,Amit H. Bermano

from arxiv, The first two authors contributed equally to this work. Project page: //yoterel.github.io/ROAR-project-page/

The majority of existing large 3D shape datasets contain meshes that lend themselves extremely well to visual applications such as rendering, yet tend to be topologically invalid (i.e, contain non-manifold edges and vertices, disconnected components, self-intersections). Therefore, it is of no surprise that state of the art studies in shape understanding do not explicitly use this 3D information. In conjunction with this, triangular meshes remain the dominant shape representation for many downstream tasks, and their connectivity remain a relatively untapped source of potential for more profound shape reasoning. In this paper, we introduce ROAR, an iterative geometry/topology evolution approach to reconstruct 2-manifold triangular meshes from arbitrary 3D shape representations, that is highly suitable for large existing in-the-wild datasets. ROAR leverages the visual prior large datasets exhibit by evolving the geometry of the mesh via a 2D render loss, and a novel 3D projection loss, the Planar Projection. After each geometry iteration, our system performs topological corrections. Self-intersections are reduced following a geometrically motivated attenuation term, and resolution is added to required regions using a novel face scoring function. These steps alternate until convergence is achieved, yielding a high-quality manifold mesh. We evaluate ROAR on the notoriously messy yet popular dataset ShapeNet, and present ShapeROAR - a topologically valid yet still geometrically accurate version of ShapeNet. We compare our results to state-of-the-art reconstruction methods and demonstrate superior shape faithfulness, topological correctness, and triangulation quality. In addition, we demonstrate reconstructing a mesh from neural Signed Distance Functions (SDF), and achieve comparable Chamfer distance with much fewer SDF sampling operations than the commonly used Marching Cubes approach.

Performer · MoDELS · Prompt · 查準率/準確率 · 語言模型化 ·

2023 年 7 月 1 日

InstructEval: Systematic Evaluation of Instruction Selection Methods

Anirudh Ajith,Chris Pan,Mengzhou Xia,Ameet Deshpande,Karthik Narasimhan

from arxiv, 10 content pages, 3 figures, 8 tables

In-context learning (ICL) performs tasks by prompting a large language model (LLM) using an instruction and a small set of annotated examples called demonstrations. Recent work has shown that the precise details of the inputs used in the prompt significantly impacts ICL, which has incentivized instruction selection algorithms. The effect of instruction-choice however is severely underexplored, with existing analyses being restricted to shallow subsets of models and tasks, which limits the generalizability of their insights. We develop an ICL evaluation suite to conduct a thorough assessment of these techniques. The suite includes 13 open-sourced LLMs of varying scales from 4 distinct model families and covers 9 different tasks, representing a range of task types across 3 categories. In this work, we evaluate the relative performance of 7 popular instruction selection methods using our benchmark over five desiderata relevant to ICL. We discover that using curated manually-written instructions and simple instructions without any task-specific descriptions often elicits superior ICL performance than that of automatic instruction-induction methods, pointing to a lack of generalizability among the latter. We release our evaluation suite for benchmarking instruction selection approaches, and call for more rigorous and generalizable methods in this space.

注意力機制 · Cognition · Performer · 深度學習 · Boosting（一種模型訓練加速方式） ·

2022 年 4 月 16 日

Visual Attention Methods in Deep Learning: An In-Depth Survey

Mohammed Hassanin,Saeed Anwar,Ibrahim Radwan,Fahad S Khan,Ajmal Mian

Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated in one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey specific to attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and open questions related to attention mechanism in general. Finally, we recommend possible future research directions for deep attention.

目標領域 · 源領域 · Vision · Processing（編程語言） · 圖像處理 ·

2021 年 1 月 21 日

Image-to-Image Translation: Methods and Applications

Yingxue Pang,Jianxin Lin,Tao Qin,Zhibo Chen

from arxiv, 19 pages, 17 figures

Image-to-image translation (I2I) aims to transfer images from a source domain to a target domain while preserving the content representations. I2I has drawn increasing attention and made tremendous progress in recent years because of its wide range of applications in many computer vision and image processing problems, such as image synthesis, segmentation, style transfer, restoration, and pose estimation. In this paper, we provide an overview of the I2I works developed in recent years. We will analyze the key techniques of the existing I2I works and clarify the main progress the community has made. Additionally, we will elaborate on the effect of I2I on the research and industry community and point out remaining challenges in related fields.

對象識別 · MoDELS · Backbone · Extensibility · 學成 ·

2020 年 3 月 31 日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Mohan Zhou,Yalong Bai,Wei Zhang,Tiejun Zhao,Tao Mei

from arxiv, 10 pages, 7 figures, accepted by CVPR 2020

Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: //github.com/JDAI-CV/LIO.

目標檢測 · 數據集 · 學成 · 數據驅動的方法 · 多樣性 ·

2019 年 9 月 22 日

Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark

Ke Li,Gang Wan,Gong Cheng,Liqiu Meng,Junwei Han

Substantial efforts have been devoted more recently to presenting various methods for object detection in optical remote sensing images. However, the current survey of datasets and deep learning based methods for object detection in optical remote sensing images is not adequate. Moreover, most of the existing datasets have some shortcomings, for example, the numbers of images and object categories are small scale, and the image diversity and variations are insufficient. These limitations greatly affect the development of deep learning based object detection methods. In the paper, we provide a comprehensive review of the recent deep learning based object detection progress in both the computer vision and earth observation communities. Then, we propose a large-scale, publicly available benchmark for object DetectIon in Optical Remote sensing images, which we name as DIOR. The dataset contains 23463 images and 192472 instances, covering 20 object classes. The proposed DIOR dataset 1) is large-scale on the object categories, on the object instance number, and on the total image number; 2) has a large range of object size variations, not only in terms of spatial resolutions, but also in the aspect of inter- and intra-class size variability across objects; 3) holds big variations as the images are obtained with different imaging conditions, weathers, seasons, and image quality; and 4) has high inter-class similarity and intra-class diversity. The proposed benchmark can help the researchers to develop and validate their data-driven methods. Finally, we evaluate several state-of-the-art approaches on our DIOR dataset to establish a baseline for future research.

屬性空間 · 多樣性 · Pair · MoDELS · 訓練數據 ·

2018 年 8 月 2 日

Diverse Image-to-Image Translation via Disentangled Representations

Hsin-Ying Lee,Hung-Yu Tseng,Jia-Bin Huang,Maneesh Kumar Singh,Ming-Hsuan Yang

from arxiv, ECCV 2018 (Oral). Project page: //vllab.ucmerced.edu/hylee/DRIT/ Code: //github.com/HsinYingLee/DRIT/

Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: 1) the lack of aligned training pairs and 2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Our model takes the encoded content features extracted from a given input and the attribute vectors sampled from the attribute space to produce diverse outputs at test time. To handle unpaired training data, we introduce a novel cross-cycle consistency loss based on disentangled representations. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. For quantitative comparisons, we measure realism with user study and diversity with a perceptual distance metric. We apply the proposed model to domain adaptation and show competitive performance when compared to the state-of-the-art on the MNIST-M and the LineMod datasets.

事件抽取 · 學成 · 逆強化學習 · GAN · 估計/估計量 ·

2018 年 4 月 21 日

Event Extraction with Generative Adversarial Imitation Learning

Tongtao Zhang,Heng Ji

We propose a new method for event extraction (EE) task based on an imitation learning framework, specifically, inverse reinforcement learning (IRL) via generative adversarial network (GAN). The GAN estimates proper rewards according to the difference between the actions committed by the expert (or ground truth) and the agent among complicated states in the environment. EE task benefits from these dynamic rewards because instances and labels yield to various extents of difficulty and the gains are expected to be diverse -- e.g., an ambiguous but correctly detected trigger or argument should receive high gains -- while the traditional RL models usually neglect such differences and pay equal attention on all instances. Moreover, our experiments also demonstrate that the proposed framework outperforms state-of-the-art methods, without explicit feature engineering.