亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tr id='qyfy7'><strong id='qyfy7'></strong><small id='qyfy7'></small><button id='qyfy7'></button><li id='qyfy7'><noscript id='qyfy7'><big id='qyfy7'></big><dt id='qyfy7'></dt></noscript></li></tr><ol id='qyfy7'><option id='qyfy7'><table id='qyfy7'><blockquote id='qyfy7'><tbody id='qyfy7'></tbody></blockquote></table></option></ol><u id='qyfy7'></u><kbd id='qyfy7'><kbd id='qyfy7'></kbd></kbd>

<code id='qyfy7'><strong id='qyfy7'></strong></code>

<fieldset id='qyfy7'></fieldset>

<span id='qyfy7'></span>

<ins id='qyfy7'></ins>

<acronym id='qyfy7'><em id='qyfy7'></em><td id='qyfy7'><div id='qyfy7'></div></td></acronym><address id='qyfy7'><big id='qyfy7'><big id='qyfy7'></big><legend id='qyfy7'></legend></big></address>

<i id='qyfy7'><div id='qyfy7'><ins id='qyfy7'></ins></div></i>

<i id='qyfy7'></i>

·

估計/估計量 · 相似度 · Learning · 3D · 成對型 ·

2022 年 7 月 22 日

Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects

Chen Zhao,Yinlin Hu,Mathieu Salzmann

from arxiv, Accepted by ECCV 2022

In this paper, we tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images. This task contrasts with the one considered by most existing deep learning methods which typically assume that the testing objects have been observed during training. To handle the unseen objects, we follow a retrieval-based strategy and prevent the network from learning object-specific features by computing multi-scale local similarities between the query image and synthetically-generated reference images. We then introduce an adaptive fusion module that robustly aggregates the local similarities into a global similarity score of pairwise images. Furthermore, we speed up the retrieval process by developing a fast retrieval strategy. Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works. Our code and pre-trained models are available at //sailor-z.github.io/projects/Unseen_Object_Pose.html.

相關內容

估計/估計量

估計/估計量

向量化 · 離散化 · MaskGIT · 相似度 · MoDELS ·

2022 年 9 月 19 日

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Chuanxia Zheng,Long Tung Vuong,Jianfei Cai,Dinh Phung

Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.

Learning · Performer · 估計/估計量 · Processing（編程語言） · 平均絕對誤差 ·

2022 年 9 月 19 日

Estimating Brain Age with Global and Local Dependencies

Yanwu Yang,Xutao Guo,Zhikai Chang,Chenfei Ye,Yang Xiang,Haiyan Lv,Ting Ma

The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such as local convolution and recurrent operations that process one local neighborhood at a time. Instead, Vision Transformers learn global attentive interaction of patch tokens, introducing less inductive bias and modeling long-range dependencies. In terms of this, we proposed a novel network for learning brain age interpreting with global and local dependencies, where the corresponding representations are captured by Successive Permuted Transformer (SPT) and convolution blocks. The SPT brings computation efficiency and locates the 3D spatial information indirectly via continuously encoding 2D slices from different views. Finally, we collect a large cohort of 22645 subjects with ages ranging from 14 to 97 and our network performed the best among a series of deep learning methods, yielding a mean absolute error (MAE) of 2.855 in validation set, and 2.911 in an independent test set.

對象識別 · 3D · INFORMS · state-of-the-art · 規范化的 ·

2022 年 9 月 19 日

TANDEM3D: Active Tactile Exploration for 3D Object Recognition

Jingxi Xu,Han Lin,Shuran Song,Matei Ciocarlie

Tactile recognition of 3D objects remains a challenging task. Compared to 2D shapes, the complex geometry of 3D surfaces requires richer tactile signals, more dexterous actions, and more advanced encoding techniques. In this work, we propose TANDEM3D, a method that applies a co-training framework for exploration and decision making to 3D object recognition with tactile signals. Starting with our previous work, which introduced a co-training paradigm for 2D recognition problems, we introduce a number of advances that enable us to scale up to 3D. TANDEM3D is based on a novel encoder that builds 3D object representation from contact positions and normals using PointNet++. Furthermore, by enabling 6DOF movement, TANDEM3D explores and collects discriminative touch information with high efficiency. Our method is trained entirely in simulation and validated with real-world experiments. Compared to state-of-the-art baselines, TANDEM3D achieves higher accuracy and a lower number of actions in recognizing 3D objects and is also shown to be more robust to different types and amounts of sensor noise. Video is available at //jxu.ai/tandem3d.

估計/估計量 · Learning · 數據集 · Performer · TOOLS ·

2022 年 9 月 16 日

Imitrob: Imitation Learning Dataset for Training and Evaluating 6D Object Pose Estimators

Jiri Sedlar,Karla Stepanova,Radoslav Skoviera,Jan Kristof Behrens,Gabriela Sejnova,Josef Sivic,Robert Babuska

This paper introduces a dataset for training and evaluating methods for 6D pose estimation of hand-held tools in task demonstrations captured by a standard RGB camera. Despite the significant progress of 6D pose estimation methods, their performance is usually limited for heavily occluded objects, which is a common case in imitation learning where the object is typically partially occluded by the manipulating hand. Currently, there is a lack of datasets that would enable the development of robust 6D pose estimation methods for these conditions. To overcome this problem, we collect a new dataset (Imitrob) aimed at 6D pose estimation in imitation learning and other applications where a human holds a tool and performs a task. The dataset contains image sequences of three different tools and six manipulation tasks with two camera viewpoints, four human subjects, and left/right hand. Each image is accompanied by an accurate ground truth measurement of the 6D object pose, obtained by the HTC Vive motion tracking device. The use of the dataset is demonstrated by training and evaluating a recent 6D object pose estimation method (DOPE) in various setups. The dataset and code are publicly available at //imitrob.ciirc.cvut.cz/imitrobdataset.php.

相似度 · 圖 · Neural Networks · 相似度度量 · 圖形處理器 ·

2020 年 12 月 29 日

Image-to-Image Retrieval by Learning Similarity between Scene Graphs

Sangwoong Yoon,Woo Young Kang,Sungwook Jeon,SeongEun Lee,Changjin Han,Jonghun Park,Eun-Sol Kim

from arxiv, Accepted to AAAI 2021

As a scene graph compactly summarizes the high-level content of an image in a structured and symbolic manner, the similarity between scene graphs of two images reflects the relevance of their contents. Based on this idea, we propose a novel approach for image-to-image retrieval using scene graph similarity measured by graph neural networks. In our approach, graph neural networks are trained to predict the proxy image relevance measure, computed from human-annotated captions using a pre-trained sentence similarity model. We collect and publish the dataset for image relevance measured by human annotators to evaluate retrieval algorithms. The collected dataset shows that our method agrees well with the human perception of image similarity than other competitive baselines.

Performer · 變換 · 目標檢測 · 無監督 · Faster R-CNN ·

2020 年 11 月 18 日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Zhigang Dai,Bolun Cai,Yugeng Lin,Junying Chen

Object detection with transformers (DETR) reaches competitive performance with Faster R-CNN via a transformer encoder-decoder architecture. Inspired by the great success of pre-training transformers in natural language processing, we propose a pretext task named random query patch detection to unsupervisedly pre-train DETR (UP-DETR) for object detection. Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. The model is pre-trained to detect these query patches from the original image. During the pre-training, we address two critical issues: multi-task learning and multi-query localization. (1) To trade-off multi-task learning of classification and localization in the pretext task, we freeze the CNN backbone and propose a patch feature reconstruction branch which is jointly optimized with patch detection. (2) To perform multi-query localization, we introduce UP-DETR from single-query patch and extend it to multi-query patches with object query shuffle and attention mask. In our experiments, UP-DETR significantly boosts the performance of DETR with faster convergence and higher precision on PASCAL VOC and COCO datasets. The code will be available soon.

Extensibility · 學成 · SSL · 目標檢測 · 3D ·

2020 年 3 月 20 日

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

Dong Wang,Yuan Zhang,Kexin Zhang,Liwei Wang

from arxiv, Accepted by CVPR 2020

Applying artificial intelligence techniques in medical imaging is one of the most promising areas in medicine. However, most of the recent success in this area highly relies on large amounts of carefully annotated data, whereas annotating medical images is a costly process. In this paper, we propose a novel method, called FocalMix, which, to the best of our knowledge, is the first to leverage recent advances in semi-supervised learning (SSL) for 3D medical image detection. We conducted extensive experiments on two widely used datasets for lung nodule detection, LUNA16 and NLST. Results show that our proposed SSL methods can achieve a substantial improvement of up to 17.3% over state-of-the-art supervised learning approaches with 400 unlabeled CT scans.

可理解性 · 邊界框 · RGB-D · 相互獨立的 · 3D ·

2020 年 2 月 27 日

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Yinyu Nie,Xiaoguang Han,Shihui Guo,Yujian Zheng,Jian Chang,Jian Jun Zhang

from arxiv, Accepted by CVPR 2020

Semantic reconstruction of indoor scenes refers to both scene understanding and object reconstruction. Existing works either address one part of this problem or focus on independent objects. In this paper, we bridge the gap between understanding and reconstruction, and propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image. Instead of separately resolving scene understanding and object reconstruction, our method builds upon a holistic scene context and proposes a coarse-to-fine hierarchy with three components: 1. room layout with camera pose; 2. 3D object bounding boxes; 3. object meshes. We argue that understanding the context of each component can assist the task of parsing the others, which enables joint understanding and reconstruction. The experiments on the SUN RGB-D and Pix3D datasets demonstrate that our method consistently outperforms existing methods in indoor layout estimation, 3D object detection and mesh reconstruction.

目標檢測 · Fashion MNIST (數據集) · SimPLe · Vision · 訓練數據 ·

2018 年 5 月 17 日

Zero-Shot Object Detection by Hybrid Region Embedding

Berkan Demirel,Ramazan Gokberk Cinbis,Nazli Ikizler-Cinbis

Object detection is considered as one of the most challenging problems in computer vision, since it requires correct prediction of both classes and locations of objects in images. In this study, we define a more difficult scenario, namely zero-shot object detection (ZSD) where no visual training data is available for some of the target object classes. We present a novel approach to tackle this ZSD problem, where a convex combination of embeddings are used in conjunction with a detection framework. For evaluation of ZSD methods, we propose a simple dataset constructed from Fashion-MNIST images and also a custom zero-shot split for the Pascal VOC detection challenge. The experimental results suggest that our method yields promising results for ZSD.

目標檢測 · Vision · 地球 · 數據集 · state-of-the-art ·

2018 年 1 月 27 日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Gui-Song Xia,Xiang Bai,Jian Ding,Zhen Zhu,Serge Belongie,Jiebo Luo,Mihai Datcu,Marcello Pelillo,Liangpei Zhang

Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect $2806$ aerial images from different sensors and platforms. Each image is of the size about 4000-by-4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using $15$ common object categories. The fully annotated DOTA images contains $188,282$ instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

估計/估計量

成(cheng)對型

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<form id='2pKoB'></form>

<bdo id='KxTfT'><sup id='1agT0'><div id='Vye4X'><bdo id='lza3s'></bdo></div></sup></bdo>