嘘嘘中国免费观看网站,欧美色欧美专区第一页

Neural radiance field (NeRF) has achieved impressive results in high-quality 3D scene reconstruction. However, NeRF heavily relies on precise camera poses. While recent works like BARF have introduced camera pose optimization within NeRF, their applicability is limited to simple trajectory scenes. Existing methods struggle while tackling complex trajectories involving large rotations. To address this limitation, we propose CT-NeRF, an incremental reconstruction optimization pipeline using only RGB images without pose and depth input. In this pipeline, we first propose a local-global bundle adjustment under a pose graph connecting neighboring frames to enforce the consistency between poses to escape the local minima caused by only pose consistency with the scene structure. Further, we instantiate the consistency between poses as a reprojected geometric image distance constraint resulting from pixel-level correspondences between input image pairs. Through the incremental reconstruction, CT-NeRF enables the recovery of both camera poses and scene structure and is capable of handling scenes with complex trajectories. We evaluate the performance of CT-NeRF on two real-world datasets, NeRFBuster and Free-Dataset, which feature complex trajectories. Results show CT-NeRF outperforms existing methods in novel view synthesis and pose estimation accuracy.

相關內容

優化器

關注 4

MoDELS · 機器人 · 數據集 · Learning · 分離的 ·

2024 年 6 月 1 日

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration,Abby O'Neill,Abdul Rehman,Abhinav Gupta,Abhiram Maddukuri,Abhishek Gupta,Abhishek Padalkar,Abraham Lee,Acorn Pooley,Agrim Gupta,Ajay Mandlekar,Ajinkya Jain,Albert Tung,Alex Bewley,Alex Herzog,Alex Irpan,Alexander Khazatsky,Anant Rai,Anchit Gupta,Andrew Wang,Andrey Kolobov,Anikait Singh,Animesh Garg,Aniruddha Kembhavi,Annie Xie,Anthony Brohan,Antonin Raffin,Archit Sharma,Arefeh Yavary,Arhan Jain,Ashwin Balakrishna,Ayzaan Wahid,Ben Burgess-Limerick,Beomjoon Kim,Bernhard Sch?lkopf,Blake Wulfe,Brian Ichter,Cewu Lu,Charles Xu,Charlotte Le,Chelsea Finn,Chen Wang,Chenfeng Xu,Cheng Chi,Chenguang Huang,Christine Chan,Christopher Agia,Chuer Pan,Chuyuan Fu,Coline Devin,Danfei Xu,Daniel Morton,Danny Driess,Daphne Chen,Deepak Pathak,Dhruv Shah,Dieter Büchler,Dinesh Jayaraman,Dmitry Kalashnikov,Dorsa Sadigh,Edward Johns,Ethan Foster,Fangchen Liu,Federico Ceola,Fei Xia,Feiyu Zhao,Felipe Vieira Frujeri,Freek Stulp,Gaoyue Zhou,Gaurav S. Sukhatme,Gautam Salhotra,Ge Yan,Gilbert Feng,Giulio Schiavi,Glen Berseth,Gregory Kahn,Guangwen Yang,Guanzhi Wang,Hao Su,Hao-Shu Fang,Haochen Shi,Henghui Bao,Heni Ben Amor,Henrik I Christensen,Hiroki Furuta,Homanga Bharadhwaj,Homer Walke,Hongjie Fang,Huy Ha,Igor Mordatch,Ilija Radosavovic,Isabel Leal,Jacky Liang,Jad Abou-Chakra,Jaehyung Kim,Jaimyn Drake,Jan Peters,Jan Schneider,Jasmine Hsu,Jay Vakil,Jeannette Bohg,Jeffrey Bingham,Jeffrey Wu,Jensen Gao,Jiaheng Hu,Jiajun Wu,Jialin Wu,Jiankai Sun,Jianlan Luo,Jiayuan Gu,Jie Tan,Jihoon Oh,Jimmy Wu,Jingpei Lu,Jingyun Yang,Jitendra Malik,Jo?o Silvério,Joey Hejna,Jonathan Booher,Jonathan Tompson,Jonathan Yang,Jordi Salvador,Joseph J. Lim,Junhyek Han,Kaiyuan Wang,Kanishka Rao,Karl Pertsch,Karol Hausman,Keegan Go,Keerthana Gopalakrishnan,Ken Goldberg,Kendra Byrne,Kenneth Oslund,Kento Kawaharazuka,Kevin Black,Kevin Lin,Kevin Zhang,Kiana Ehsani,Kiran Lekkala,Kirsty Ellis,Krishan Rana,Krishnan Srinivasan,Kuan Fang,Kunal Pratap Singh,Kuo-Hao Zeng,Kyle Hatch,Kyle Hsu,Laurent Itti,Lawrence Yunliang Chen,Lerrel Pinto,Li Fei-Fei,Liam Tan,Linxi "Jim" Fan,Lionel Ott,Lisa Lee,Luca Weihs,Magnum Chen,Marion Lepert,Marius Memmel,Masayoshi Tomizuka,Masha Itkina,Mateo Guaman Castro,Max Spero,Maximilian Du,Michael Ahn,Michael C. Yip,Mingtong Zhang,Mingyu Ding,Minho Heo,Mohan Kumar Srirama,Mohit Sharma,Moo Jin Kim,Naoaki Kanazawa,Nicklas Hansen,Nicolas Heess,Nikhil J Joshi,Niko Suenderhauf,Ning Liu,Norman Di Palo,Nur Muhammad Mahi Shafiullah,Oier Mees,Oliver Kroemer,Osbert Bastani,Pannag R Sanketi,Patrick "Tree" Miller,Patrick Yin,Paul Wohlhart,Peng Xu,Peter David Fagan,Peter Mitrano,Pierre Sermanet,Pieter Abbeel,Priya Sundaresan,Qiuyu Chen,Quan Vuong,Rafael Rafailov,Ran Tian,Ria Doshi,Roberto Mart'in-Mart'in,Rohan Baijal,Rosario Scalise,Rose Hendrix,Roy Lin,Runjia Qian,Ruohan Zhang,Russell Mendonca,Rutav Shah,Ryan Hoque,Ryan Julian,Samuel Bustamante,Sean Kirmani,Sergey Levine,Shan Lin,Sherry Moore,Shikhar Bahl,Shivin Dass,Shubham Sonawani,Shubham Tulsiani,Shuran Song,Sichun Xu,Siddhant Haldar,Siddharth Karamcheti,Simeon Adebola,Simon Guist,Soroush Nasiriany,Stefan Schaal,Stefan Welker,Stephen Tian,Subramanian Ramamoorthy,Sudeep Dasari,Suneel Belkhale,Sungjae Park,Suraj Nair,Suvir Mirchandani,Takayuki Osa,Tanmay Gupta,Tatsuya Harada,Tatsuya Matsushima,Ted Xiao,Thomas Kollar,Tianhe Yu,Tianli Ding,Todor Davchev,Tony Z. Zhao,Travis Armstrong,Trevor Darrell,Trinity Chung,Vidhi Jain,Vikash Kumar,Vincent Vanhoucke,Wei Zhan,Wenxuan Zhou,Wolfram Burgard,Xi Chen,Xiangyu Chen,Xiaolong Wang,Xinghao Zhu,Xinyang Geng,Xiyuan Liu,Xu Liangwei,Xuanlin Li,Yansong Pang,Yao Lu,Yecheng Jason Ma,Yejin Kim,Yevgen Chebotar,Yifan Zhou,Yifeng Zhu,Yilin Wu,Ying Xu,Yixuan Wang,Yonatan Bisk,Yongqiang Dou,Yoonyoung Cho,Youngwoon Lee,Yuchen Cui,Yue Cao,Yueh-Hua Wu,Yujin Tang,Yuke Zhu,Yunchu Zhang,Yunfan Jiang,Yunshuang Li,Yunzhu Li,Yusuke Iwasawa,Yutaka Matsuo,Zehan Ma,Zhuo Xu,Zichen Jeff Cui,Zichen Zhang,Zipeng Fu,Zipeng Lin

from arxiv, Project website: //robotics-transformer-x.github.io

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website //robotics-transformer-x.github.io.

控制器 · Performance · FAST · 推斷 · Performer ·

2024 年 5 月 31 日

Fast yet Safe: Early-Exiting with Risk Control

Metod Jazbec,Alexander Timans,Tin Had?i Veljkovi?,Kaspar Sakmann,Dan Zhang,Christian A. Naesseth,Eric Nalisnick

from arxiv, 25 pages, 11 figures, 4 tables (incl. appendix)

Scaling machine learning models significantly improves their performance. However, such gains come at the cost of inference being slow and resource-intensive. Early-exit neural networks (EENNs) offer a promising solution: they accelerate inference by allowing intermediate layers to exit and produce a prediction early. Yet a fundamental issue with EENNs is how to determine when to exit without severely degrading performance. In other words, when is it 'safe' for an EENN to go 'fast'? To address this issue, we investigate how to adapt frameworks of risk control to EENNs. Risk control offers a distribution-free, post-hoc solution that tunes the EENN's exiting mechanism so that exits only occur when the output is of sufficient quality. We empirically validate our insights on a range of vision and language tasks, demonstrating that risk control can produce substantial computational savings, all the while preserving user-specified performance goals.

類別 · 可行 · 可辨認的 · 二次規劃 · 分離的 ·

2024 年 5 月 31 日

CSDO: Enhancing Efficiency and Success in Large-Scale Multi-Vehicle Trajectory Planning

Yibin Yang,Shaobing Xu,Xintao Yan,Junkai Jiang,Jianqiang Wang,Heye Huang

from arxiv, 8 pages, 7 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

This paper presents an efficient algorithm, naming Centralized Searching and Decentralized Optimization (CSDO), to find feasible solution for large-scale Multi-Vehicle Trajectory Planning (MVTP) problem. Due to the intractable growth of non-convex constraints with the number of agents, exploring various homotopy classes that imply different convex domains, is crucial for finding a feasible solution. However, existing methods struggle to explore various homotopy classes efficiently due to combining it with time-consuming precise trajectory solution finding. CSDO, addresses this limitation by separating them into different levels and integrating an efficient Multi-Agent Path Finding (MAPF) algorithm to search homotopy classes. It first searches for a coarse initial guess using a large search step, identifying a specific homotopy class. Subsequent decentralized Quadratic Programming (QP) refinement processes this guess, resolving minor collisions efficiently. Experimental results demonstrate that CSDO outperforms existing MVTP algorithms in large-scale, high-density scenarios, achieving up to 95% success rate in 50m $\times$ 50m random scenarios around one second. Source codes are released in //github.com/YangSVM/CSDOTrajectoryPlanning.

語言模型化 · MoDELS · 級聯 · Better · 分解 ·

2024 年 5 月 30 日

SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought

Hongyu Gong,Bandhav Veluri

Expressive speech-to-speech translation (S2ST) is a key research topic in seamless communication, which focuses on the preservation of semantics and speaker vocal style in translated speech. Early works synthesized speaker style aligned speech in order to directly learn the mapping from speech to target speech spectrogram. Without reliance on style aligned data, recent studies leverage the advances of language modeling (LM) and build cascaded LMs on semantic and acoustic tokens. This work proposes SeamlessExpressiveLM, a single speech language model for expressive S2ST. We decompose the complex source-to-target speech mapping into intermediate generation steps with chain-of-thought prompting. The model is first guided to translate target semantic content and then transfer the speaker style to multi-stream acoustic units. Evaluated on Spanish-to-English and Hungarian-to-English translations, SeamlessExpressiveLM outperforms cascaded LMs in both semantic quality and style transfer, meanwhile achieving better parameter efficiency.

3D · MoDELS · 逼真度 · 規范化的 · 蒸餾 ·

2024 年 5 月 30 日

Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

Kailu Wu,Fangfu Liu,Zhihan Cai,Runjie Yan,Hanyang Wang,Yating Hu,Yueqi Duan,Kaisheng Ma

from arxiv, Project page: //wukailu.github.io/Unique3D

In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from long per-case optimization time with inconsistent issues. Recent works address the problem and generate better 3D results either by finetuning a multi-view diffusion model or training a fast feed-forward model. However, they still lack intricate textures and complex geometries due to inconsistency and limited generated resolution. To simultaneously achieve high fidelity, consistency, and efficiency in single image-to-3D, we propose a novel framework Unique3D that includes a multi-view diffusion model with a corresponding normal diffusion model to generate multi-view images with their normal maps, a multi-level upscale process to progressively improve the resolution of generated orthographic multi-views, as well as an instant and consistent mesh reconstruction algorithm called ISOMER, which fully integrates the color and geometric priors into mesh results. Extensive experiments demonstrate that our Unique3D significantly outperforms other image-to-3D baselines in terms of geometric and textural details.

解碼 · GPUs · SimPLe · MoDELS · GPU ·

2024 年 5 月 30 日

S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

Wei Zhong,Manasa Bharadwaj

Speculative decoding (SD) has attracted a significant amount of research attention due to the substantial speedup it can achieve for LLM inference. However, despite the high speedups they offer, speculative decoding methods often achieve optimal performance on high-end devices or with a substantial GPU memory overhead. Given limited memory and the necessity of quantization, a high-performing model on a high-end GPU can slow down by up to 7 times. To this end, we propose Skippy Simultaneous Speculative Decoding (or S3D), a cost-effective self-speculative SD method based on simultaneous multi-token decoding and mid-layer skipping. When compared against recent effective open-source SD systems, our method has achieved one of the top performance-memory ratios while requiring minimal architecture changes and training data. Leveraging our memory efficiency, we created a smaller yet more effective SD model based on Phi-3. It is 1.4 to 2 times faster than the quantized EAGLE model and operates in half-precision while using less VRAM.

可交換的 · Continuity · 有偏 · 可辨認的 · 合一 ·

2024 年 5 月 30 日

A Unification of Exchangeability and Continuous Exposure and Confounder Measurement Errors: Probabilistic Exchangeability

Honghyok Kim

from arxiv, Submitted for peer-reviewed publication Update on May 29, 2024: Clerical and notation errors have been identified and corrected

Exchangeability concerning a continuous exposure, X, implies no confounding bias when identifying average exposure effects of X, AEE(X). When X is measured with error (Xep), two challenges arise in identifying AEE(X). Firstly, exchangeability regarding Xep does not equal exchangeability regarding X. Secondly, the non-differential error assumption (NDEA) could be overly stringent in practice. To address them, this article proposes unifying exchangeability and exposure and confounder measurement errors with three novel concepts. The first, Probabilistic Exchangeability (PE), states that the outcomes of those with Xep=e are probabilistically exchangeable with the outcomes of those truly exposed to X=eT. The relationship between AEE(Xep) and AEE(X) in risk difference and ratio scales is mathematically expressed as a probabilistic certainty, termed exchangeability probability (Pe). Squared Pe (Pe2) quantifies the extent to which AEE(Xep) differs from AEE(X) due to exposure measurement error through mechanisms not akin to confounding mechanisms. The coefficient of determination (R2) in the regression of Xep against X may sometimes be sufficient to measure Pe2. The second concept, Emergent Pseudo Confounding (EPC), describes the bias introduced by exposure measurement error through mechanisms akin to confounding mechanisms. PE requires controlling for EPC, which is weaker than NDEA. The third, Emergent Confounding, describes when bias due to confounder measurement error arises. Adjustment for E(P)C can be performed like confounding adjustment. This paper provides maximum insight into when AEE(Xep) is an appropriate surrogate of AEE(X) and how to measure the difference between these two. Differential errors could be addressed and may not compromise causal inference.

Integration · 約束 · Analysis · 優化器 · 情景 ·

2024 年 5 月 30 日

Inconsistency Handling in Prioritized Databases with Universal Constraints: Complexity Analysis and Links with Active Integrity Constraints

Meghyn Bienvenu,Camille Bourgaux

from arxiv, This is an extended version of a paper appearing at the 20th International Conference on Principles of Knowledge Representation and Reasoning (KR 2023). This version fixes an error in Table 1 (case of subset repairs w.r.t. denial constraints). 28 pages

This paper revisits the problem of repairing and querying inconsistent databases equipped with universal constraints. We adopt symmetric difference repairs, in which both deletions and additions of facts can be used to restore consistency, and suppose that preferred repair actions are specified via a binary priority relation over (negated) facts. Our first contribution is to show how existing notions of optimal repairs, defined for simpler denial constraints and repairs solely based on fact deletion, can be suitably extended to our richer setting. We next study the computational properties of the resulting repair notions, in particular, the data complexity of repair checking and inconsistency-tolerant query answering. Finally, we clarify the relationship between optimal repairs of prioritized databases and repair notions introduced in the framework of active integrity constraints. In particular, we show that Pareto-optimal repairs in our setting correspond to founded, grounded and justified repairs w.r.t. the active integrity constraints obtained by translating the prioritized database. Our study also yields useful insights into the behavior of active integrity constraints.

多峰值 · 學成 · Extensibility · 深度學習 · Processing（編程語言） ·

2021 年 5 月 24 日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Jabeen Summaira,Xi Li,Amin Muhammad Shoib,Songyuan Li,Jabbar Abdul

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals. Detailed analysis of past and current baseline approaches and an in-depth study of recent advancements in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning applications is proposed, elaborating on different applications in more depth. Architectures and datasets used in these applications are also discussed, along with their evaluation metrics. Last, main issues are highlighted separately for each domain along with their possible future research directions.

稀疏 · Performer · Siamese · 孿生網絡 · Branch ·

2020 年 12 月 3 日

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

Tiancai Wang,Tong Yang,Jiale Cao,Xiangyu Zhang

from arxiv, Accepted to AAAI 2021

Object detectors usually achieve promising results with the supervision of complete instance annotations. However, their performance is far from satisfactory with sparse instance annotations. Most existing methods for sparsely annotated object detection either re-weight the loss of hard negative samples or convert the unlabeled instances into ignored regions to reduce the interference of false negatives. We argue that these strategies are insufficient since they can at most alleviate the negative effect caused by missing annotations. In this paper, we propose a simple but effective mechanism, called Co-mining, for sparsely annotated object detection. In our Co-mining, two branches of a Siamese network predict the pseudo-label sets for each other. To enhance multi-view learning and better mine unlabeled instances, the original image and corresponding augmented image are used as the inputs of two branches of the Siamese network, respectively. Co-mining can serve as a general training mechanism applied to most of modern object detectors. Experiments are performed on MS COCO dataset with three different sparsely annotated settings using two typical frameworks: anchor-based detector RetinaNet and anchor-free detector FCOS. Experimental results show that our Co-mining with RetinaNet achieves 1.4%~2.1% improvements compared with different baselines and surpasses existing methods under the same sparsely annotated setting.