精品自在线观看影片天天看,日韩纯肉无遮挡一区二区视频,国产1000免费福利在线观看,久久国产偷任你爽538,成人午夜视频一区二区国语

Open X-Embodiment Collaboration,Abhishek Padalkar,Acorn Pooley,Ajinkya Jain,Alex Bewley,Alex Herzog,Alex Irpan,Alexander Khazatsky,Anant Rai,Anikait Singh,Anthony Brohan,Antonin Raffin,Ayzaan Wahid,Ben Burgess-Limerick,Beomjoon Kim,Bernhard Sch?lkopf,Brian Ichter,Cewu Lu,Charles Xu,Chelsea Finn,Chenfeng Xu,Cheng Chi,Chenguang Huang,Christine Chan,Chuer Pan,Chuyuan Fu,Coline Devin,Danny Driess,Deepak Pathak,Dhruv Shah,Dieter Büchler,Dmitry Kalashnikov,Dorsa Sadigh,Edward Johns,Federico Ceola,Fei Xia,Freek Stulp,Gaoyue Zhou,Gaurav S. Sukhatme,Gautam Salhotra,Ge Yan,Giulio Schiavi,Gregory Kahn,Hao Su,Hao-Shu Fang,Haochen Shi,Heni Ben Amor,Henrik I Christensen,Hiroki Furuta,Homer Walke,Hongjie Fang,Igor Mordatch,Ilija Radosavovic,Isabel Leal,Jacky Liang,Jad Abou-Chakra,Jaehyung Kim,Jan Peters,Jan Schneider,Jasmine Hsu,Jeannette Bohg,Jeffrey Bingham,Jiajun Wu,Jialin Wu,Jianlan Luo,Jiayuan Gu,Jie Tan,Jihoon Oh,Jitendra Malik,Jonathan Tompson,Jonathan Yang,Joseph J. Lim,Jo?o Silvério,Junhyek Han,Kanishka Rao,Karl Pertsch,Karol Hausman,Keegan Go,Keerthana Gopalakrishnan,Ken Goldberg,Kendra Byrne,Kenneth Oslund,Kento Kawaharazuka,Kevin Zhang,Krishan Rana,Krishnan Srinivasan,Lawrence Yunliang Chen,Lerrel Pinto,Liam Tan,Lionel Ott,Lisa Lee,Masayoshi Tomizuka,Maximilian Du,Michael Ahn,Mingtong Zhang,Mingyu Ding,Mohan Kumar Srirama,Mohit Sharma,Moo Jin Kim,Naoaki Kanazawa,Nicklas Hansen,Nicolas Heess,Nikhil J Joshi,Niko Suenderhauf,Norman Di Palo,Nur Muhammad Mahi Shafiullah,Oier Mees,Oliver Kroemer,Pannag R Sanketi,Paul Wohlhart,Peng Xu,Pierre Sermanet,Priya Sundaresan,Quan Vuong,Rafael Rafailov,Ran Tian,Ria Doshi,Roberto Martín-Martín,Russell Mendonca,Rutav Shah,Ryan Hoque,Ryan Julian,Samuel Bustamante,Sean Kirmani,Sergey Levine,Sherry Moore,Shikhar Bahl,Shivin Dass,Shubham Sonawani,Shuran Song,Sichun Xu,Siddhant Haldar,Simeon Adebola,Simon Guist,Soroush Nasiriany,Stefan Schaal,Stefan Welker,Stephen Tian,Sudeep Dasari,Suneel Belkhale,Takayuki Osa,Tatsuya Harada,Tatsuya Matsushima,Ted Xiao,Tianhe Yu,Tianli Ding,Todor Davchev,Tony Z. Zhao,Travis Armstrong,Trevor Darrell,Vidhi Jain,Vincent Vanhoucke,Wei Zhan,Wenxuan Zhou,Wolfram Burgard,Xi Chen,Xiaolong Wang,Xinghao Zhu,Xuanlin Li,Yao Lu,Yevgen Chebotar,Yifan Zhou,Yifeng Zhu,Ying Xu,Yixuan Wang,Yonatan Bisk,Yoonyoung Cho,Youngwoon Lee,Yuchen Cui,Yueh-Hua Wu,Yujin Tang,Yuke Zhu,Yunzhu Li,Yusuke Iwasawa,Yutaka Matsuo,Zhuo Xu,Zichen Jeff Cui

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website $\href{//robotics-transformer-x.github.io}{\text{robotics-transformer-x.github.io}}$.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · MoDELS · Performer · 掩碼 · Learning ·

2023 年 12 月 1 日

LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models

Ying Nie,Wei He,Kai Han,Yehui Tang,Tianyu Guo,Fanyi Du,Yunhe Wang

Vision-language pre-training like CLIP has shown promising performance on various downstream tasks such as zero-shot image classification and image-text retrieval. Most of the existing CLIP-alike works usually adopt relatively large image encoders like ResNet50 and ViT, while the lightweight counterparts are rarely discussed. In this paper, we propose a multi-level interaction paradigm for training lightweight CLIP models. Firstly, to mitigate the problem that some image-text pairs are not strictly one-to-one correspondence, we improve the conventional global instance-level alignment objective by softening the label of negative samples progressively. Secondly, a relaxed bipartite matching based token-level alignment objective is introduced for finer-grained alignment between image patches and textual words. Moreover, based on the observation that the accuracy of CLIP model does not increase correspondingly as the parameters of text encoder increase, an extra objective of masked language modeling (MLM) is leveraged for maximizing the potential of the shortened text encoder. In practice, an auxiliary fusion module injecting unmasked image embedding into masked text embedding at different network stages is proposed for enhancing the MLM. Extensive experiments show that without introducing additional computational cost during inference, the proposed method achieves a higher performance on multiple downstream tasks.

3D · MoDELS · 優化器 · 塑造 · Extensibility ·

2023 年 11 月 30 日

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Yukang Cao,Yan-Pei Cao,Kai Han,Ying Shan,Kwan-Yee K. Wong

from arxiv, Project page: //yukangcao.github.io/DreamAvatar/

We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses. While encouraging results have been reported by recent methods on text-guided 3D common object generation, generating high-quality human avatars remains an open challenge due to the complexity of the human body's shape, pose, and appearance. We propose DreamAvatar to tackle this challenge, which utilizes a trainable NeRF for predicting density and color for 3D points and pretrained text-to-image diffusion models for providing 2D self-supervision. Specifically, we leverage the SMPL model to provide shape and pose guidance for the generation. We introduce a dual-observation-space design that involves the joint optimization of a canonical space and a posed space that are related by a learnable deformation field. This facilitates the generation of more complete textures and geometry faithful to the target pose. We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem and improve facial details in the generated avatars. Extensive evaluations demonstrate that DreamAvatar significantly outperforms existing methods, establishing a new state-of-the-art for text-and-shape guided 3D human avatar generation.

SIMM · Processing（編程語言） · on the fly · Performer · 講稿 ·

2023 年 11 月 30 日

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

Biao Gong,Siteng Huang,Yutong Feng,Shiwei Zhang,Yuyuan Li,Yu Liu

Diffusion models have recently achieved remarkable progress in generating realistic images. However, challenges remain in accurately understanding and synthesizing the layout requirements in the textual prompts. To align the generated image with layout instructions, we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time. Specifically, following a "check-locate-rectify" pipeline, the system first analyses the prompt to generate the target layout and compares it with the intermediate outputs to automatically detect errors. Then, by moving the located activations and making intra- and inter-map adjustments, the rectification process can be performed with negligible computational overhead. To evaluate SimM over a range of layout requirements, we present a benchmark SimMBench that compensates for the lack of superlative spatial relations in existing datasets. And both quantitative and qualitative results demonstrate the effectiveness of the proposed SimM in calibrating the layout inconsistencies. Our project page is at //simm-t2i.github.io/SimM.

黑盒 · MoDELS · Prompt · 優化器 · 語言模型化 ·

2023 年 11 月 30 日

Language Models as Black-Box Optimizers for Vision-Language Models

Shihong Liu,Zhiqiu Lin,Samuel Yu,Ryan Lee,Tiffany Ling,Deepak Pathak,Deva Ramanan

from arxiv, Project site: llm-can-optimize-vlm.github.io

Vision-language models (VLMs) pre-trained on web-scale datasets have demonstrated remarkable capabilities on downstream tasks when fine-tuned with minimal data. However, many VLMs rely on proprietary data and are not open-source, which restricts the use of white-box approaches for fine-tuning. As such, we aim to develop a black-box approach to optimize VLMs through natural language prompts, thereby avoiding the need to access model parameters, feature embeddings, or even output logits. We propose employing chat-based LLMs to search for the best text prompt for VLMs. Specifically, we adopt an automatic hill-climbing procedure that converges to an effective prompt by evaluating the performance of current prompts and asking LLMs to refine them based on textual feedback, all within a conversational process without human-in-the-loop. In a challenging 1-shot image classification setup, our simple approach surpasses the white-box continuous prompting method (CoOp) by an average of 1.5% across 11 datasets including ImageNet. Our approach also outperforms both human-engineered and LLM-generated prompts. We highlight the advantage of conversational feedback that incorporates both positive and negative prompts, suggesting that LLMs can utilize the implicit gradient direction in textual feedback for a more efficient search. In addition, we find that the text prompts generated through our strategy are not only more interpretable but also transfer well across different VLM architectures in a black-box manner. Lastly, we demonstrate our framework on a state-of-the-art black-box VLM (DALL-E 3) for text-to-image optimization.

DNN · Networking · Neural Networks · 模型評估 · Extensibility ·

2023 年 11 月 30 日

ADA-GP: Accelerating DNN Training By Adaptive Gradient Prediction

Vahid Janfaza,Shantanu Mandal,Farabi Mahmud,Abdullah Muzahid

from arxiv, 13 pages, 21 figures, 5 tables

Neural network training is inherently sequential where the layers finish the forward propagation in succession, followed by the calculation and back-propagation of gradients (based on a loss function) starting from the last layer. The sequential computations significantly slow down neural network training, especially the deeper ones. Prediction has been successfully used in many areas of computer architecture to speed up sequential processing. Therefore, we propose ADA-GP, which uses gradient prediction adaptively to speed up deep neural network (DNN) training while maintaining accuracy. ADA-GP works by incorporating a small neural network to predict gradients for different layers of a DNN model. ADA-GP uses a novel tensor reorganization method to make it feasible to predict a large number of gradients. ADA-GP alternates between DNN training using backpropagated gradients and DNN training using predicted gradients. ADA-GP adaptively adjusts when and for how long gradient prediction is used to strike a balance between accuracy and performance. Last but not least, we provide a detailed hardware extension in a typical DNN accelerator to realize the speed up potential from gradient prediction. Our extensive experiments with fifteen DNN models show that ADA-GP can achieve an average speed up of 1.47X with similar or even higher accuracy than the baseline models. Moreover, it consumes, on average, 34% less energy due to reduced off-chip memory accesses compared to the baseline accelerator.

控制器 · 二次規劃 · 線性的 · 整流線性 · 修正線性單元/整流線性單元 ·

2023 年 11 月 29 日

ReLU-QP: A GPU-Accelerated Quadratic Programming Solver for Model-Predictive Control

Arun L. Bishop,John Z. Zhang,Swaminathan Gurumurthy,Kevin Tracy,Zachary Manchester

from arxiv, submitted to ICRA 2024

We present ReLU-QP, a GPU-accelerated solver for quadratic programs (QPs) that is capable of solving high-dimensional control problems at real-time rates. ReLU-QP is derived by exactly reformulating the Alternating Direction Method of Multipliers (ADMM) algorithm for solving QPs as a deep, weight-tied neural network with rectified linear unit (ReLU) activations. This reformulation enables the deployment of ReLU-QP on GPUs using standard machine-learning toolboxes. We evaluate the performance of ReLU-QP across three model-predictive control (MPC) benchmarks: stabilizing random linear dynamical systems with control limits, balancing an Atlas humanoid robot on a single foot, and tracking whole-body reference trajectories on a quadruped equipped with a six-degree-of-freedom arm. These benchmarks indicate that ReLU-QP is competitive with state-of-the-art CPU-based solvers for small-to-medium-scale problems and offers order-of-magnitude speed improvements for larger-scale problems.

MoDELS · 語言模型化 · 數據集 · Extensibility · Performer ·

2023 年 11 月 29 日

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

Weijia Wu,Zhuang Li,Yefei He,Mike Zheng Shou,Chunhua Shen,Lele Cheng,Yan Li,Tingting Gao,Di Zhang,Zhongyuan Wang

from arxiv, The project website is at: //weijiawu.github.io/ParaDiffusionPage/. Code: //github.com/weijiawu/ParaDiffusion

Text-to-image (T2I) models have recently experienced rapid development, achieving astonishing performance in terms of fidelity and textual alignment capabilities. However, given a long paragraph (up to 512 words), these generation models still struggle to achieve strong alignment and are unable to generate images depicting complex scenes. In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation. At its core is using a large language model (e.g., Llama V2) to encode long-form text, followed by fine-tuning with LORA to alignthe text-image feature spaces in the generation task. To facilitate the training of long-text semantic alignment, we also curated a high-quality paragraph-image pair dataset, namely ParaImage. This dataset contains a small amount of high-quality, meticulously annotated data, and a large-scale synthetic dataset with long text descriptions being generated using a vision-language model. Experiments demonstrate that ParaDiffusion outperforms state-of-the-art models (SD XL, DeepFloyd IF) on ViLG-300 and ParaPrompts, achieving up to 15% and 45% human voting rate improvements for visual appeal and text faithfulness, respectively. The code and dataset will be released to foster community research on long-text alignment.

可理解性 · Transformer · Vision · 塊 · INFORMS ·

2021 年 12 月 30 日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Zizhao Zhang,Han Zhang,Long Zhao,Ting Chen,Sercan O. Arik,Tomas Pfister

from arxiv, AAAI2022

Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well. In this paper, we explore the idea of nesting basic local transformers on non-overlapping image blocks and aggregating them in a hierarchical way. We find that the block aggregation function plays a critical role in enabling cross-block non-local information communication. This observation leads us to design a simplified architecture that requires minor code changes upon the original vision transformer. The benefits of the proposed judiciously-selected design are threefold: (1) NesT converges faster and requires much less training data to achieve good generalization on both ImageNet and small datasets like CIFAR; (2) when extending our key ideas to image generation, NesT leads to a strong decoder that is 8$\times$ faster than previous transformer-based generators; and (3) we show that decoupling the feature learning and abstraction processes via this nested hierarchy in our design enables constructing a novel method (named GradCAT) for visually interpreting the learned model. Source code is available //github.com/google-research/nested-transformer.

目標跟蹤 · Extensibility · 模態 · 數據集 · Performer ·

2021 年 11 月 11 日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Chenglong Li,Tianhao Zhu,Lei Liu,Xiaonan Si,Zilin Fan,Sulan Zhai

from arxiv, In Submission

In many visual systems, visual tracking often bases on RGB image sequences, in which some targets are invalid in low-light conditions, and tracking performance is thus affected significantly. Introducing other modalities such as depth and infrared data is an effective way to handle imaging limitations of individual sources, but multi-modal imaging platforms usually require elaborate designs and cannot be applied in many real-world applications at present. Near-infrared (NIR) imaging becomes an essential part of many surveillance cameras, whose imaging is switchable between RGB and NIR based on the light intensity. These two modalities are heterogeneous with very different visual properties and thus bring big challenges for visual tracking. However, existing works have not studied this challenging problem. In this work, we address the cross-modal object tracking problem and contribute a new video dataset, including 654 cross-modal image sequences with over 481K frames in total, and the average video length is more than 735 frames. To promote the research and development of cross-modal object tracking, we propose a new algorithm, which learns the modality-aware target representation to mitigate the appearance gap between RGB and NIR modalities in the tracking process. It is plug-and-play and could thus be flexibly embedded into different tracking frameworks. Extensive experiments on the dataset are conducted, and we demonstrate the effectiveness of the proposed algorithm in two representative tracking frameworks against 17 state-of-the-art tracking methods. We will release the dataset for free academic usage, dataset download link and code will be released soon.

MoDELS · Extensibility · surge · 學成 · Backbone ·

2021 年 6 月 15 日

Pre-Trained Models: Past, Present and Future

Xu Han,Zhengyan Zhang,Ning Ding,Yuxian Gu,Xiao Liu,Yuqi Huo,Jiezhong Qiu,Liang Zhang,Wentao Han,Minlie Huang,Qin Jin,Yanyan Lan,Yang Liu,Zhiyuan Liu,Zhiwu Lu,Xipeng Qiu,Ruihua Song,Jie Tang,Ji-Rong Wen,Jinhui Yuan,Wayne Xin Zhao,Jun Zhu

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.