国产高清一区二区在线影院-一区二区三区免费观看在线视频播放

Articulated 3D reconstruction has valuable applications in various domains, yet it remains costly and demands intensive work from domain experts. Recent advancements in template-free learning methods show promising results with monocular videos. Nevertheless, these approaches necessitate a comprehensive coverage of all viewpoints of the subject in the input video, thus limiting their applicability to casually captured videos from online sources. In this work, we study articulated 3D shape reconstruction from a single and casually captured internet video, where the subject's view coverage is incomplete. We propose DreaMo that jointly performs shape reconstruction while solving the challenging low-coverage regions with view-conditioned diffusion prior and several tailored regularizations. In addition, we introduce a skeleton generation strategy to create human-interpretable skeletons from the learned neural bones and skinning weights. We conduct our study on a self-collected internet video collection characterized by incomplete view coverage. DreaMo shows promising quality in novel-view rendering, detailed articulated shape reconstruction, and skeleton generation. Extensive qualitative and quantitative studies validate the efficacy of each proposed component, and show existing methods are unable to solve correct geometry due to the incomplete view coverage.

相關內容

三維重建

關注 1173

在計算機視覺中, 三維重建是指根據單視圖或者多視圖的圖像重建三維信息的過程. 由于單視頻的信息不完全,因此三維重建需要利用經驗知識. 而多視圖的三維重建(類似人的雙目定位)相對比較容易, 其方法是先對攝像機進行標定, 即計算出攝像機的圖象坐標系與世界坐標系的關系.然后利用多個二維圖象中的信息重建出三維信息。物體三維重建是計算機輔助幾何設計(CAGD)、計算機圖形學(CG)、計算機動畫、計算機視覺、醫學圖像處理、科學計算和虛擬現實、數字媒體創作等領域的共性科學問題和核心技術。在計算機內生成物體三維表示主要有兩類方法。一類是使用幾何建模軟件通過人機交互生成人為控制下的物體三維幾何模型,另一類是通過一定的手段獲取真實物體的幾何形狀。前者實現技術已經十分成熟,現有若干軟件支持,比如:3DMAX、Maya、AutoCAD、UG等等,它們一般使用具有數學表達式的曲線曲面表示幾何形狀。后者一般稱為三維重建過程,三維重建是指利用二維投影恢復物體三維信息(形狀等)的數學過程和計算機技術,包括數據獲取、預處理、點云拼接和特征分析等步驟。

向量化 · 詞元分析器 · 表示 · INFORMS · 離散化 ·

2024 年 1 月 30 日

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

Zecheng Tang,Chenfei Wu,Zekai Zhang,Mingheng Ni,Shengming Yin,Yu Liu,Zhengyuan Yang,Lijuan Wang,Zicheng Liu,Juntao Li,Nan Duan

To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes. This paper posits that an alternative representation of images, vector graphics, can effectively surmount this limitation by enabling a more natural and semantically coherent segmentation of the image information. Thus, we introduce StrokeNUWA, a pioneering work exploring a better visual representation ''stroke tokens'' on vector graphics, which is inherently visual semantics rich, naturally compatible with LLMs, and highly compressed. Equipped with stroke tokens, StrokeNUWA can significantly surpass traditional LLM-based and optimization-based methods across various metrics in the vector graphic generation task. Besides, StrokeNUWA achieves up to a 94x speedup in inference over the speed of prior methods with an exceptional SVG code compression ratio of 6.9%.

Extensibility · Processing（編程語言） · CCS · 可約的 · 后向 ·

2024 年 1 月 30 日

revTPL: The Reversible Temporal Process Language

Laura Bocchi,Ivan Lanese,Claudio Antares Mezzina,Shoji Yuen

Reversible debuggers help programmers to find the causes of misbehaviours in concurrent programs more quickly, by executing a program backwards from the point where a misbehaviour was observed, and looking for the bug(s) that caused it. Reversible debuggers can be founded on the well-studied theory of causal-consistent reversibility, which only allows one to undo an action provided that its consequences, if any, are undone beforehand. Causal-consistent reversibility yields more efficient debugging by reducing the number of states to be explored when looking backwards. Till now, causal-consistent reversibility has never considered time, which is a key aspect in real-world applications. Here, we study the interplay between reversibility and time in concurrent systems via a process algebra. The Temporal Process Language (TPL) by Hennessy and Regan is a well-understood extension of CCS with discrete-time and a timeout operator. We define revTPL, a reversible extension of TPL, and we show that it satisfies the properties expected from a causal-consistent reversible calculus. We show that, alternatively, revTPL can be interpreted as an extension of reversible CCS with time.

Eagle · 樣本 · 解碼 · 大語言模型 · HuggingFace ·

2024 年 1 月 26 日

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Yuhui Li,Fangyun Wei,Chao Zhang,Hongyang Zhang

Auto-regressive decoding makes the inference of Large Language Models (LLMs) time-consuming. We propose a simple framework, EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency), for lossless acceleration. Unlike traditional speculative sampling methods, EAGLE operates the drafting process auto-regressively at the more regular (second-top-layer) feature level and addresses the sampling uncertainty issues in the next-feature prediction problems by integrating tokens from one time step ahead. The acceleration provided by EAGLE is lossless: it involves no fine-tuning of the target LLM, and the generated text maintains the same distribution as that of vanilla auto-regressive decoding. As of the submission of this paper, EAGLE is the fastest known framework within the speculative sampling family. On MT-bench, EAGLE is 3x faster than vanilla decoding, 2x faster than Lookahead, and 1.6x faster than Medusa. Using gpt-fast, EAGLE attains on average 160 tokens/s with LLaMA2-Chat 13B on a single RTX 3090 GPU, compared to 24 tokens/s of Huggingface's implementations.

聯邦學習 · 學成 · Processing（編程語言） · MoDELS · 語言模型化 ·

2021 年 7 月 27 日

Federated Learning Meets Natural Language Processing: A Survey

Ming Liu,Stella Ho,Mengqi Wang,Longxiang Gao,Yuan Jin,He Zhang

from arxiv, 19 pages

Federated Learning aims to learn machine learning models from multiple decentralized edge devices (e.g. mobiles) or servers without sacrificing local data privacy. Recent Natural Language Processing techniques rely on deep learning and large pre-trained language models. However, both big deep neural and language models are trained with huge amounts of data which often lies on the server side. Since text data is widely originated from end users, in this work, we look into recent NLP models and techniques which use federated learning as the learning framework. Our survey discusses major challenges in federated natural language processing, including the algorithm challenges, system challenges as well as the privacy issues. We also provide a critical review of the existing Federated NLP evaluation methods and tools. Finally, we highlight the current research gaps and future directions.

Networking · 圖 · Performer · 網絡嵌入 · Extensibility ·

2021 年 6 月 5 日

ImGAGN:Imbalanced Network Embedding via Generative Adversarial Graph Networks

Liang Qu,Huaisheng Zhu,Ruiqi Zheng,Yuhui Shi,Hongzhi Yin

from arxiv, to be published in KDD'2021

Imbalanced classification on graphs is ubiquitous yet challenging in many real-world applications, such as fraudulent node detection. Recently, graph neural networks (GNNs) have shown promising performance on many network analysis tasks. However, most existing GNNs have almost exclusively focused on the balanced networks, and would get unappealing performance on the imbalanced networks. To bridge this gap, in this paper, we present a generative adversarial graph network model, called ImGAGN to address the imbalanced classification problem on graphs. It introduces a novel generator for graph structure data, named GraphGenerator, which can simulate both the minority class nodes' attribute distribution and network topological structure distribution by generating a set of synthetic minority nodes such that the number of nodes in different classes can be balanced. Then a graph convolutional network (GCN) discriminator is trained to discriminate between real nodes and fake (i.e., generated) nodes, and also between minority nodes and majority nodes on the synthetic balanced network. To validate the effectiveness of the proposed method, extensive experiments are conducted on four real-world imbalanced network datasets. Experimental results demonstrate that the proposed method ImGAGN outperforms state-of-the-art algorithms for semi-supervised imbalanced node classification task.

蒸餾 · Extensibility · 無監督 · 源領域 · 可辨認的 ·

2020 年 12 月 8 日

KD3A: Unsupervised Multi-Source Decentralized Domain Adaptation via Knowledge Distillation

Hao-Zhe Feng,Zhaoyang You,Minghao Chen,Tianye Zhang,Minfeng Zhu,Fei Wu,Chao Wu,Wei Chen

from arxiv, 12 pages, 5 figures

Conventional unsupervised multi-source domain adaptation (UMDA) methods assume all source domains can be accessed directly. This neglects the privacy-preserving policy, that is, all the data and computations must be kept decentralized. There exists three problems in this scenario: (1) Minimizing the domain distance requires the pairwise calculation of the data from source and target domains, which is not accessible. (2) The communication cost and privacy security limit the application of UMDA methods (e.g., the domain adversarial training). (3) Since users have no authority to check the data quality, the irrelevant or malicious source domains are more likely to appear, which causes negative transfer. In this study, we propose a privacy-preserving UMDA paradigm named Knowledge Distillation based Decentralized Domain Adaptation (KD3A), which performs domain adaptation through the knowledge distillation on models from different source domains. KD3A solves the above problems with three components: (1) A multi-source knowledge distillation method named Knowledge Vote to learn high-quality domain consensus knowledge. (2) A dynamic weighting strategy named Consensus Focus to identify both the malicious and irrelevant domains. (3) A decentralized optimization strategy for domain distance named BatchNorm MMD. The extensive experiments on DomainNet demonstrate that KD3A is robust to the negative transfer and brings a 100x reduction of communication cost compared with other decentralized UMDA methods. Moreover, our KD3A significantly outperforms state-of-the-art UMDA approaches.

編譯器 · 優化器 · 學成 · Performer · TVM ·

2020 年 2 月 6 日

The Deep Learning Compiler: A Comprehensive Survey

Mingzhen Li,Yi Liu,Xiaoyan Liu,Qingxiao Sun,Xin You,Hailong Yang,Zhongzhi Luan,Depei Qian

The difficulty of deploying various deep learning (DL) models on diverse DL hardwares has boosted the research and development of DL compilers in the community. Several DL compilers have been proposed from both industry and academia such as Tensorflow XLA and TVM. Similarly, the DL compilers take the DL models described in different DL frameworks as input, and then generate optimized codes for diverse DL hardwares as output. However, none of the existing survey has analyzed the unique design of the DL compilers comprehensively. In this paper, we perform a comprehensive survey of existing DL compilers by dissecting the commonly adopted design in details, with emphasis on the DL oriented multi-level IRs, and frontend/backend optimizations. Specifically, we provide a comprehensive comparison among existing DL compilers from various aspects. In addition, we present detailed analysis of the multi-level IR design and compiler optimization techniques. Finally, several insights are highlighted as the potential research directions of DL compiler. This is the first survey paper focusing on the unique design of DL compiler, which we hope can pave the road for future research towards the DL compiler.

圖注意力網絡 · INTERACT · 圖 · 注意力機制 · INFORMS ·

2019 年 5 月 20 日

KGAT: Knowledge Graph Attention Network for Recommendation

Xiang Wang,Xiangnan He,Yixin Cao,Meng Liu,Tat-Seng Chua

from arxiv, KDD 2019 research track

To provide more accurate, diverse, and explainable recommendation, it is compulsory to go beyond modeling user-item interactions and take side information into account. Traditional methods like factorization machine (FM) cast it as a supervised learning problem, which assumes each interaction as an independent instance with side information encoded. Due to the overlook of the relations among instances or items (e.g., the director of a movie is also an actor of another movie), these methods are insufficient to distill the collaborative signal from the collective behaviors of users. In this work, we investigate the utility of knowledge graph (KG), which breaks down the independent interaction assumption by linking items with their attributes. We argue that in such a hybrid structure of KG and user-item graph, high-order relations --- which connect two items with one or multiple linked attributes --- are an essential factor for successful recommendation. We propose a new method named Knowledge Graph Attention Network (KGAT) which explicitly models the high-order connectivities in KG in an end-to-end fashion. It recursively propagates the embeddings from a node's neighbors (which can be users, items, or attributes) to refine the node's embedding, and employs an attention mechanism to discriminate the importance of the neighbors. Our KGAT is conceptually advantageous to existing KG-based recommendation methods, which either exploit high-order relations by extracting paths or implicitly modeling them with regularization. Empirical results on three public benchmarks show that KGAT significantly outperforms state-of-the-art methods like Neural FM and RippleNet. Further studies verify the efficacy of embedding propagation for high-order relation modeling and the interpretability benefits brought by the attention mechanism.

可理解性 · GAN · GANs · Better · 生成式對抗網絡 ·

2018 年 12 月 8 日

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

David Bau,Jun-Yan Zhu,Hendrik Strobelt,Bolei Zhou,Joshua B. Tenenbaum,William T. Freeman,Antonio Torralba

from arxiv, 18 pages, 19 figures

Generative Adversarial Networks (GANs) have recently achieved impressive results for many real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, they have not been well visualized or understood. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level. We first identify a group of interpretable units that are closely related to object concepts using a segmentation-based network dissection method. Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. We examine the contextual relationship between these units and their surroundings by inserting the discovered object concepts into new images. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a scene. We provide open source interpretation tools to help researchers and practitioners better understand their GAN models.

Capsule · CapsNet · Networking · 判別器 · MoDELS ·

2018 年 2 月 17 日

CapsuleGAN: Generative Adversarial Capsule Network

Ayush Jaiswal,Wael AbdAlmageed,Premkumar Natarajan

We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) setting, while modeling image data. We provide guidelines for designing CapsNet discriminators and the updated GAN objective function, which incorporates the CapsNet margin loss, for training CapsuleGAN models. We show that CapsuleGAN outperforms convolutional-GAN at modeling image data distribution on the MNIST dataset of handwritten digits, evaluated on the generative adversarial metric and at semi-supervised image classification.