亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Saliency detection methods are central to several real-world applications such as robot navigation and satellite imagery. However, the performance of existing methods deteriorate under low-light conditions because training datasets mostly comprise of well-lit images. One possible solution is to collect a new dataset for low-light conditions. This involves pixel-level annotations, which is not only tedious and time-consuming but also infeasible if a huge training corpus is required. We propose a technique that performs classical band-pass filtering in the Fourier space to transform well-lit images to low-light images and use them as a proxy for real low-light images. Unlike popular deep learning approaches which require learning thousands of parameters and enormous amounts of training data, the proposed transformation is fast and simple and easy to extend to other tasks such as low-light depth estimation. Our experiments show that the state-of-the-art saliency detection and depth estimation networks trained on our proxy low-light images perform significantly better on real low-light images than networks trained using existing strategies.

相關內容

Face recognition in complex scenes suffers severe challenges coming from perturbations such as pose deformation, ill illumination, partial occlusion. Some methods utilize depth estimation to obtain depth corresponding to RGB to improve the accuracy of face recognition. However, the depth generated by them suffer from image blur, which introduces noise in subsequent RGB-D face recognition tasks. In addition, existing RGB-D face recognition methods are unable to fully extract complementary features. In this paper, we propose a fine-grained facial depth generation network and an improved multimodal complementary feature learning network. Extensive experiments on the Lock3DFace dataset and the IIIT-D dataset show that the proposed FFDGNet and I MCFLNet can improve the accuracy of RGB-D face recognition while achieving the state-of-the-art performance.

Deepfake techniques have been widely used for malicious purposes, prompting extensive research interest in developing Deepfake detection methods. Deepfake manipulations typically involve tampering with facial parts, which can result in inconsistencies across different parts of the face. For instance, Deepfake techniques may change smiling lips to an upset lip, while the eyes remain smiling. Existing detection methods depend on specific indicators of forgery, which tend to disappear as the forgery patterns are improved. To address the limitation, we propose Mover, a new Deepfake detection model that exploits unspecific facial part inconsistencies, which are inevitable weaknesses of Deepfake videos. Mover randomly masks regions of interest (ROIs) and recovers faces to learn unspecific features, which makes it difficult for fake faces to be recovered, while real faces can be easily recovered. Specifically, given a real face image, we first pretrain a masked autoencoder to learn facial part consistency by dividing faces into three parts and randomly masking ROIs, which are then recovered based on the unmasked facial parts. Furthermore, to maximize the discrepancy between real and fake videos, we propose a novel model with dual networks that utilize the pretrained encoder and masked autoencoder, respectively. 1) The pretrained encoder is finetuned for capturing the encoding of inconsistent information in the given video. 2) The pretrained masked autoencoder is utilized for mapping faces and distinguishing real and fake videos. Our extensive experiments on standard benchmarks demonstrate that Mover is highly effective.

Transformer是當下關注的熱點方法之一,如何把Transformer用在3D視覺上是個重要的研究方向。來自阿聯酋MBZUAI大學的學者發布了《3D視覺Transformers處理》綜述論文,提出了一個超過100種Transformer方法的系統和全面的綜述,不(bu)同的三維(wei)視覺任務,包(bao)括分類,分割,檢測,完(wan)成,姿態估(gu)計,和其他。

Transformer結構在自然語言處理中的成功引起了計算機視覺領域的關注。由于Transformer 具有學習遠程依賴關系的能力,它已被用作廣泛使用的卷積算子的替代。這種替代被證明在許多任務中都是成功的,其中一些最先進的方法依賴于Transformer來更好地學習。在計算機視覺領域,3D領域也越來越多地采用Transformer用于三維卷積神經網絡和多層感知器網絡。雖然很多綜述都關注視覺中的Transformer,但與2D視覺相比,3D視覺在數據表示和處理方面存在差異,因此需要特別關注。在這項工作中,我們提出了一個超過100種Transformer方法的系統和全面的綜述,不同的三維視覺任務,包括分類,分割,檢測,完成,姿態估計,和其他。我們討論了三維視覺中的Transformer設計,它允許用各種三維表示來處理數據。對于每個應用,我們強調了基于Transformer的方法的關鍵屬性和貢獻。為了評估這些方法的競爭力,我們在12個3D基準上將它們的性能與普通的非Transformer方法進行了比較。最后,我們討論了Transformer在3D視覺中的不同開放方向和挑戰。除了已發表的論文,我們的目標是經常更新最新的相關論文及其相應的實現://github.com/lahoud/3d-vision-transformers。

計算機視覺的一個基本問題是理解三維空間中的場景和物體。它支持關系的(de)緊湊(cou)表(biao)示,并提供了在現實世界中導航和操作的(de)能力。3D視覺在各(ge)個領域(yu)發揮著重要的(de)作用,包括自動駕(jia)駛(shi)、機器(qi)人、遙感(gan)(gan)、醫(yi)療、增強現實、設計行業等(deng)許多領域(yu)的(de)應用。人們(men)對(dui)3D領域(yu)的(de)興趣(qu)越來越大(da)(da),原(yuan)因有很多: (1)各(ge)種3D捕獲(huo)傳感(gan)(gan)器(qi)的(de)發展,如(ru)激(ji)光雷達和RGB-D傳感(gan)(gan)器(qi); (2) 大(da)(da)量(liang)大(da)(da)規(gui)模(mo)的(de)3D幾何數據集的(de)引入(ru),這(zhe)些數據以3D方式(shi)收集和標記;(3)3D深度(du)學(xue)習方法的(de)進步。

常見的3D深度學習方法采用深度卷積神經網絡(CNNs)和多層感知器(MLPs)。盡管如此,使用注意力機制的基于transformer的架構已經在各種領域顯示了這種方法的有效性,如自然語言處理(li)(NLP)和2D圖像處理(li)。雖然卷積操(cao)作(zuo)具有有限的(de)(de)接受(shou)域和翻譯(yi)等方(fang)(fang)差屬性,但注意(yi)機制是全局(ju)操(cao)作(zuo)的(de)(de),因此可以編碼長程(cheng)依賴,允(yun)許(xu)基(ji)于注意(yi)力的(de)(de)方(fang)(fang)法(fa)學(xue)習(xi)更(geng)豐富的(de)(de)特征(zheng)表示。見證了基(ji)于transformer的(de)(de)架構在(zai)(zai)圖像領域的(de)(de)成功,許(xu)多3D視(shi)覺方(fang)(fang)法(fa)最近都在(zai)(zai)模型設計中采用了transformer。這些架構已(yi)(yi)經(jing)被提出作(zuo)為最常(chang)見的(de)(de)3D視(shi)覺應(ying)用的(de)(de)解決方(fang)(fang)案。在(zai)(zai)3D技術中,transformer已(yi)(yi)經(jing)取代或補充(chong)了之前(qian)的(de)(de)學(xue)習(xi)方(fang)(fang)法(fa),得益于其捕捉遠程(cheng)信息(xi)和學(xue)習(xi)任務(wu)特定的(de)(de)歸(gui)納偏(pian)差的(de)(de)能力。

鑒于transformers 對3D視覺的興趣日益濃厚(圖1,左),對現有方法的概述對于全面了解這一新興領域至關重要。在本次綜(zong)述中,我們回(hui)顧了(le)使用transformers完(wan)成(cheng)3D視覺(jue)任(ren)務的(de)方法(fa),包括分(fen)類、分(fen)割、檢測、完(wan)成(cheng)、姿(zi)態估(gu)計(ji)等(圖(tu)1,右)。我們強(qiang)調了(le)transformers 在3D視覺(jue)中的(de)設計(ji)選擇,這允許它處理(li)各種3D表示的(de)數據。對(dui)于每個應(ying)用,我們討論了(le)提(ti)出的(de)基于transformers 的(de)方法(fa)的(de)關鍵屬性(xing)和貢獻。最(zui)后,我們將它們的(de)性(xing)能(neng)與廣泛使用的(de)3D數據集(ji)(ji)/基準上的(de)其他方法(fa)進行比較(jiao),以評估(gu)transformers 集(ji)(ji)成(cheng)在該領域的(de)競爭(zheng)力。

我們注意到,許多綜述研究了3D視覺中的深度學習方法。在這些綜述中,許多已經發表的研究對處理3D數據[1],[2],[3],[4]的方法進行了全面的綜述。其他(ta)研究(jiu)集中(zhong)在特定的(de)3D視覺應(ying)用,如分(fen)割[5],[6],[7],分(fen)類[8],或(huo)檢(jian)測[9],[10]。此外,一(yi)(yi)些調查(cha)從表示的(de)角度(du)[11],[12]來研究(jiu)3D深度(du)學(xue)習方法,還有一(yi)(yi)些研究(jiu)將(jiang)研究(jiu)限制在特定的(de)數(shu)據輸入傳感器[10],[13]。考慮到大多數(shu)綜(zong)述(shu)是在transformer架構最(zui)近取得成功之前發布(bu)的(de),對基于transformer的(de)架構的(de)關注仍然缺失。

隨著最近大量依賴于注意力機制和transformer 架構的視覺方法的出現,涌現了許多研究這些方法的工作。這些(xie)作品中(zhong)有一(yi)些(xie)考慮(lv)了視(shi)覺(jue)上(shang)的(de)(de)(de)(de)transformer [14]、[15]、[16]、[17]、[18],而另一(yi)些(xie)則專(zhuan)注(zhu)于特(te)(te)定方面,如(ru)效率[19],或特(te)(te)定應用(yong)(yong)(yong)(yong),如(ru)視(shi)頻[20]或醫(yi)學(xue)成(cheng)像(xiang)[21]。考慮(lv)到二維(wei)和(he)(he)(he)三(san)維(wei)數據(ju)表示和(he)(he)(he)處(chu)理的(de)(de)(de)(de)差異,特(te)(te)別關(guan)注(zhu)應用(yong)(yong)(yong)(yong)于三(san)維(wei)視(shi)覺(jue)應用(yong)(yong)(yong)(yong)的(de)(de)(de)(de)transformer 是必要的(de)(de)(de)(de)。因此(ci),我們重點研(yan)究transformer 結構在(zai)三(san)維(wei)視(shi)覺(jue)領域的(de)(de)(de)(de)應用(yong)(yong)(yong)(yong)。該(gai)綜述包括使(shi)用(yong)(yong)(yong)(yong)具有3D輸(shu)入(ru)和(he)(he)(he)/或輸(shu)出(chu)的(de)(de)(de)(de)transformer架(jia)(jia)構的(de)(de)(de)(de)方法。3D數據(ju)可(ke)以(yi)通過(guo)(guo)許多(duo)傳(chuan)(chuan)感(gan)器獲得,如(ru)室(shi)內的(de)(de)(de)(de)RGB-D傳(chuan)(chuan)感(gan)器,室(shi)外(wai)的(de)(de)(de)(de)激(ji)光雷達(da),以(yi)及(ji)專(zhuan)門(men)的(de)(de)(de)(de)醫(yi)療傳(chuan)(chuan)感(gan)器。我們包括使(shi)用(yong)(yong)(yong)(yong)點云或密(mi)集(ji)(ji)的(de)(de)(de)(de)3D網(wang)格作為輸(shu)入(ru)的(de)(de)(de)(de)方法。在(zai)醫(yi)學(xue)成(cheng)像(xiang)中(zhong),通過(guo)(guo)在(zai)不同的(de)(de)(de)(de)切片上(shang)拍(pai)攝圖(tu)(tu)像(xiang)也可(ke)以(yi)得到密(mi)集(ji)(ji)的(de)(de)(de)(de)三(san)維(wei)網(wang)格。此(ci)外(wai),還介(jie)紹了將transformer 架(jia)(jia)構應用(yong)(yong)(yong)(yong)于其他輸(shu)入(ru)數據(ju)(如(ru)多(duo)視(shi)圖(tu)(tu)圖(tu)(tu)像(xiang)或鳥瞰圖(tu)(tu))并(bing)生成(cheng)3D輸(shu)出(chu)的(de)(de)(de)(de)代表性方法。

圖1 三維計算機視(shi)覺中transformer設計的分類。我們將(jiang)這些方法(fa)分成與轉換器的輸入、上(shang)下文級別(bie)、其與其他學習方法(fa)的組合(純/混合)以及可擴展性元(yuan)素(su)相關(guan)的底層方法(fa)差異。

注意力(li)塊(kuai)捕獲遠程依賴,這有助于(yu)(yu)學習上(shang)下文在(zai)基于(yu)(yu)卷積(ji)的(de)網(wang)絡(luo)中(zhong)(zhong)(zhong)(zhong)(zhong)沒(mei)有充(chong)分(fen)利用(yong)。這些遠程依賴關(guan)系在(zai)場(chang)景理解中(zhong)(zhong)(zhong)(zhong)(zhong)起著重要(yao)的(de)作(zuo)用(yong),特(te)別是(shi)當局部信息模糊時。此外,transformers 可(ke)以應用(yong)于(yu)(yu)集(ji)合(he),這是(shi)點云(yun)的(de)自然(ran)表示。與(yu)圖像(xiang)表征不同(tong),點云(yun)可(ke)以以不同(tong)的(de)長度出現,與(yu)句子中(zhong)(zhong)(zhong)(zhong)(zhong)的(de)單詞有相(xiang)似之處(chu)。考(kao)慮到(dao)(dao)在(zai)NLP中(zhong)(zhong)(zhong)(zhong)(zhong)transformers 的(de)成(cheng)(cheng)(cheng)功,人們(men)希望將(jiang)transformers 集(ji)成(cheng)(cheng)(cheng)到(dao)(dao)3D領(ling)域也會遵循類似的(de)趨勢。此外,應用(yong)于(yu)(yu)2D的(de)transformers 需(xu)要(yao)在(zai)特(te)征信息中(zhong)(zhong)(zhong)(zhong)(zhong)添加(jia)位置(zhi)(zhi)信息。在(zai)3D中(zhong)(zhong)(zhong)(zhong)(zhong),位置(zhi)(zhi)可(ke)以作(zuo)為(wei)點云(yun)中(zhong)(zhong)(zhong)(zhong)(zhong)點的(de)坐標。上(shang)述transformers 的(de)特(te)性(xing)為(wei)在(zai)3D領(ling)域中(zhong)(zhong)(zhong)(zhong)(zhong)使用(yong)transformers 架構奠(dian)定了基礎。然(ran)而,有許多方法可(ke)以將(jiang)transformers 集(ji)成(cheng)(cheng)(cheng)到(dao)(dao)3D應用(yong)管道(dao)中(zhong)(zhong)(zhong)(zhong)(zhong)。因此,我(wo)們(men)將(jiang)在(zai)本節中(zhong)(zhong)(zhong)(zhong)(zhong)討論這種集(ji)成(cheng)(cheng)(cheng)的(de)關(guan)鍵(jian)特(te)征。我(wo)們(men)的(de)討論基于(yu)(yu)圖5所示的(de)分(fen)類。

將transformer集成到(dao)3D應用Pipeline中已(yi)被證明(ming)在許(xu)多領域(yu)都是(shi)有(you)效的(de)(de)(de)。考慮到(dao)在多個數(shu)據(ju)集上(shang)具有(you)競爭力的(de)(de)(de)性能,由于其(qi)學(xue)習遠程依賴(lai)的(de)(de)(de)能力,該transformer被證明(ming)是(shi)卷積和多層感知器操作的(de)(de)(de)充分(fen)替代(dai)。盡(jin)管如此,用于3D處(chu)理的(de)(de)(de)通用transformer主干(gan)仍然缺失。與許(xu)多其(qi)他方法(fa)所依賴(lai)的(de)(de)(de)transformer圖像處(chu)理方法(fa)[69]、[112]不同,大多數(shu)基(ji)于transformer的(de)(de)(de)3D方法(fa)使用不同的(de)(de)(de)transformer設計和集成。開發一(yi)種(zhong)通用的(de)(de)(de)轉換方法(fa),在局(ju)(ju)部和全(quan)局(ju)(ju)尺度上(shang)處(chu)理點云并學(xue)習豐富的(de)(de)(de)特(te)征,是(shi)一(yi)項非常有(you)意(yi)義的(de)(de)(de)工作。transformer需要學(xue)習精(jing)細的(de)(de)(de)形狀信息,同時在場景全(quan)局(ju)(ju)范圍內(nei)運行,以利用場景上(shang)下文(wen)。

付費5元查看完整內容

用于解決模仿學(xue)習中(zhong)因果混淆問題的(de)察覺對象的(de)正(zheng)則化(hua)方法

Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning

論文(wen)摘要:行(xing)為(wei)克(ke)隆(long)(long)是一種(zhong)有(you)效的(de)(de)從(cong)專(zhuan)家(jia)(jia)(jia)示范中學習策略(lve)(lve)(lve)的(de)(de)方(fang)(fang)(fang)法。然而,行(xing)為(wei)克(ke)隆(long)(long)常會產(chan)生因(yin)果混淆問題,即(ji)學到的(de)(de)策略(lve)(lve)(lve)關注的(de)(de)是專(zhuan)家(jia)(jia)(jia)動(dong)作的(de)(de)一個(ge)明(ming)顯(xian)的(de)(de)結果而非專(zhuan)家(jia)(jia)(jia)動(dong)作的(de)(de)因(yin)(即(ji)專(zhuan)家(jia)(jia)(jia)策略(lve)(lve)(lve)所(suo)關注的(de)(de)對(dui)(dui)(dui)象)。針(zhen)對(dui)(dui)(dui)此問題,本文(wen)提(ti)出(chu)了一個(ge)察(cha)覺(jue)對(dui)(dui)(dui)象的(de)(de)正(zheng)則化方(fang)(fang)(fang)法,主要思想是鼓勵待學策略(lve)(lve)(lve)去均勻地(di)關注所(suo)有(you)對(dui)(dui)(dui)象,以防它把注意力全部放在(zai)與專(zhuan)家(jia)(jia)(jia)動(dong)作強相關的(de)(de)干擾(rao)變量上(shang)。具體方(fang)(fang)(fang)法分(fen)為(wei)兩個(ge)階段:(a)我們利用(yong)量子化向量變分(fen)自(zi)編(bian)(bian)碼(ma)(ma)(ma)器的(de)(de)離散編(bian)(bian)碼(ma)(ma)(ma)從(cong)圖片(pian)中提(ti)取有(you)語義的(de)(de)對(dui)(dui)(dui)象,然后(b)隨機地(di)將具有(you)相同離散編(bian)(bian)碼(ma)(ma)(ma)值的(de)(de)編(bian)(bian)碼(ma)(ma)(ma)分(fen)量一起丟(diu)棄,即(ji)掩蓋(gai)掉(diao)該語義對(dui)(dui)(dui)象。實(shi)驗表明(ming)所(suo)提(ti)方(fang)(fang)(fang)法顯(xian)著提(ti)升了行(xing)為(wei)克(ke)隆(long)(long)的(de)(de)性能(neng),并在(zai)各(ge)種(zhong) Atari 環(huan)境(jing)及 CARLA 自(zi)動(dong)駕(jia)駛環(huan)境(jing)中超(chao)過了各(ge)種(zhong)其(qi)他正(zheng)則化方(fang)(fang)(fang)法和基于(yu)因(yin)果的(de)(de)方(fang)(fang)(fang)法,甚至(zhi)優(you)于(yu)可與環(huan)境(jing)交互的(de)(de)逆強化學習方(fang)(fang)(fang)法。

//www.zhuanzhi.ai/paper/53fb95a858607df85bb6d17b317fae15

付費5元查看完整內容

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods.

Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.

北京阿比特科技有限公司