亚洲色偷偷色噜噜狠狠99网VR_久久久免费精品视频_一级毛片免费视频_欧美日韩免费一区二区三区不卡_老牛影视亚洲精品无码资源站_亚洲色一区二区三区不卡_66亚洲丁香婷婷久久综合

This paper presents ER-NeRF, a novel conditional Neural Radiance Fields (NeRF) based architecture for talking portrait synthesis that can concurrently achieve fast convergence, real-time rendering, and state-of-the-art performance with small model size. Our idea is to explicitly exploit the unequal contribution of spatial regions to guide talking portrait modeling. Specifically, to improve the accuracy of dynamic head reconstruction, a compact and expressive NeRF-based Tri-Plane Hash Representation is introduced by pruning empty spatial regions with three planar hash encoders. For speech audio, we propose a Region Attention Module to generate region-aware condition feature via an attention mechanism. Different from existing methods that utilize an MLP-based encoder to learn the cross-modal relation implicitly, the attention mechanism builds an explicit connection between audio features and spatial regions to capture the priors of local motions. Moreover, a direct and fast Adaptive Pose Encoding is introduced to optimize the head-torso separation problem by mapping the complex transformation of the head pose into spatial coordinates. Extensive experiments demonstrate that our method renders better high-fidelity and audio-lips synchronized talking portrait videos, with realistic details and high efficiency compared to previous methods.

相關內容

Attention

關注 1

Extensibility · Performer · SimPLe · 線性的 · 回合 ·

2023 年 9 月 8 日

Towards Practical Capture of High-Fidelity Relightable Avatars

Haotian Yang,Mingwu Zheng,Wanquan Feng,Haibin Huang,Yu-Kun Lai,Pengfei Wan,Zhongyuan Wang,Chongyang Ma

from arxiv, Accepted to SIGGRAPH Asia 2023 (Conference); Project page: //travatar-paper.github.io/

In this paper, we propose a novel framework, Tracking-free Relightable Avatar (TRAvatar), for capturing and reconstructing high-fidelity 3D avatars. Compared to previous methods, TRAvatar works in a more practical and efficient setting. Specifically, TRAvatar is trained with dynamic image sequences captured in a Light Stage under varying lighting conditions, enabling realistic relighting and real-time animation for avatars in diverse scenes. Additionally, TRAvatar allows for tracking-free avatar capture and obviates the need for accurate surface tracking under varying illumination conditions. Our contributions are two-fold: First, we propose a novel network architecture that explicitly builds on and ensures the satisfaction of the linear nature of lighting. Trained on simple group light captures, TRAvatar can predict the appearance in real-time with a single forward pass, achieving high-quality relighting effects under illuminations of arbitrary environment maps. Second, we jointly optimize the facial geometry and relightable appearance from scratch based on image sequences, where the tracking is implicitly learned. This tracking-free approach brings robustness for establishing temporal correspondences between frames under different lighting conditions. Extensive qualitative and quantitative experiments demonstrate that our framework achieves superior performance for photorealistic avatar animation and relighting.

MIMO · 相關系數 · 通道 · 優化器 · INFORMS ·

2023 年 9 月 8 日

Double RIS-Assisted MIMO Systems Over Spatially Correlated Rician Fading Channels and Finite Scatterers

Ha An Le,Trinh Van Chien,Van Duc Nguyen,Wan Choi

from arxiv, 15 pages, 9 figures, accepted by IEEE Transactions on Communications

This paper investigates double RIS-assisted MIMO communication systems over Rician fading channels with finite scatterers, spatial correlation, and the existence of a double-scattering link between the transceiver. First, the statistical information is driven in closed form for the aggregated channels, unveiling various influences of the system and environment on the average channel power gains. Next, we study two active and passive beamforming designs corresponding to two objectives. The first problem maximizes channel capacity by jointly optimizing the active precoding and combining matrices at the transceivers and passive beamforming at the double RISs subject to the transmitting power constraint. In order to tackle the inherently non-convex issue, we propose an efficient alternating optimization algorithm (AO) based on the alternating direction method of multipliers (ADMM). The second problem enhances communication reliability by jointly training the encoder and decoder at the transceivers and the phase shifters at the RISs. Each neural network representing a system entity in an end-to-end learning framework is proposed to minimize the symbol error rate of the detected symbols by controlling the transceiver and the RISs phase shifts. Numerical results verify our analysis and demonstrate the superior improvements of phase shift designs to boost system performance.

查準率/準確率 · CNN · 機器人 · 可約的 · 損失 ·

2023 年 9 月 7 日

Efficient Single Object Detection on Image Patches with Early Exit Enhanced High-Precision CNNs

Arne Moos

This paper proposes a novel approach for detecting objects using mobile robots in the context of the RoboCup Standard Platform League, with a primary focus on detecting the ball. The challenge lies in detecting a dynamic object in varying lighting conditions and blurred images caused by fast movements. To address this challenge, the paper presents a convolutional neural network architecture designed specifically for computationally constrained robotic platforms. The proposed CNN is trained to achieve high precision classification of single objects in image patches and to determine their precise spatial positions. The paper further integrates Early Exits into the existing high-precision CNN architecture to reduce the computational cost of easily rejectable cases in the background class. The training process involves a composite loss function based on confidence and positional losses with dynamic weighting and data augmentation. The proposed approach achieves a precision of 100% on the validation dataset and a recall of almost 87%, while maintaining an execution time of around 170 $\mu$s per hypotheses. By combining the proposed approach with an Early Exit, a runtime optimization of more than 28%, on average, can be achieved compared to the original CNN. Overall, this paper provides an efficient solution for an enhanced detection of objects, especially the ball, in computationally constrained robotic platforms.

massive MIMO · MIMO · SAS · 有向 · 優化器 ·

2023 年 9 月 6 日

Sub-Array Selection in Full-Duplex Massive MIMO for Enhanced Self-Interference Suppression

Mobeen Mahmood,Asil Koc,Duc Tuong Nguyen,Robert Morawski,Tho Le-Ngoc

from arxiv, This paper has been accepted for publication in IEEE Globecom 2023

This study considers a novel full-duplex (FD) massive multiple-input multiple-output (mMIMO) system using hybrid beamforming (HBF) architecture, which allows for simultaneous uplink (UL) and downlink (DL) transmission over the same frequency band. Particularly, our objective is to mitigate the strong self-interference (SI) solely on the design of UL and DL RF beamforming stages jointly with sub-array selection (SAS) for transmit (Tx) and receive (Rx) sub-arrays at base station (BS). Based on the measured SI channel in an anechoic chamber, we propose a min-SI beamforming scheme with SAS, which applies perturbations to the beam directivity to enhance SI suppression in UL and DL beam directions. To solve this challenging nonconvex optimization problem, we propose a swarm intelligence-based algorithmic solution to find the optimal perturbations as well as the Tx and Rx sub-arrays to minimize SI subject to the directivity degradation constraints for the UL and DL beams. The results show that the proposed min-SI BF scheme can achieve SI suppression as high as 78 dB in FD mMIMO systems.

分段 · Microsoft Surface · 樣例 · 平滑 · 塑造 ·

2023 年 9 月 6 日

A High-Order Ultra-Weak Variational Formulation for Electromagnetic Waves Utilizing Curved Elements

Timo L?hivaara,William Hall,Matti Malinen,Dale Ota,Vijaya Shankar,Peter Monk

The Ultra Weak Variational Formulation (UWVF) is a special Trefftz discontinuous Galerkin method, here applied to the time-harmonic Maxwell's equations. The method uses superpositions of plane waves to represent solutions element by element on a finite element mesh. We discuss the use of our parallel UWVF implementation called ParMax, and concentrate on methods for obtaining high order solutions in the presence of scatterers with piecewise smooth boundaries. In particular, we show how curved surface triangles can be incorporated in the UWVF. This requires quadrature to assemble the system matrices. We also show how to implement a total field and scattered field approach, together with the transmission conditions across an interface to handle resistive sheets. We note also that a wide variety of element shapes can be used, that the elements can be large compared to the wavelength of the radiation, and that a matrix free version is easy to implement (although computationally costly). Our contributions are illustrated by several numerical examples showing that curved elements can improve the efficiency of the UWVF, and that the method accurately handles resistive screens as well as PEC and penetrable scatterers. Using large curved elements and the matrix free approach, we are able to simulate scattering from an aircraft at X-band frequencies. The innovations here demonstrate the applicability of the UWVF for industrial examples.

FRN · INFORMS · Networking · MoDELS · 學成 ·

2021 年 4 月 12 日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Delian Ruan, YanYan,Shenqi Lai,Zhenhua Chai,Chunhua Shen,Hanzi Wang

from arxiv, IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 (CVPR 2021)

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

Performer · Extensibility · 聯邦學習 · 相似度 · 成對型 ·

2021 年 1 月 7 日

Personalized Cross-Silo Federated Learning on Non-IID Data

Yutao Huang,Lingyang Chu,Zirui Zhou,Lanjun Wang,Jiangchuan Liu,Jian Pei,Yong Zhang

from arxiv, Accepted by AAAI 2021. The API of this work is available at Huawei Cloud (//t.ly/nGN9), free registration is required before use

Non-IID data present a tough challenge for federated learning. In this paper, we explore a novel idea of facilitating pairwise collaborations between clients with similar data. We propose FedAMP, a new method employing federated attentive message passing to facilitate similar clients to collaborate more. We establish the convergence of FedAMP for both convex and non-convex models, and propose a heuristic method to further improve the performance of FedAMP when clients adopt deep neural networks as personalized models. Our extensive experiments on benchmark data sets demonstrate the superior performance of the proposed methods.

entity · 小樣本學習 · 注意力機制 · 圖 · Networking ·

2020 年 10 月 19 日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Jiawei Sheng,Shu Guo,Zhenyu Chen,Juwei Yue,Lihong Wang,Tingwen Liu,Hongbo Xu

from arxiv, 11 pages, 3 figures

Few-shot Knowledge Graph (KG) completion is a focus of current research, where each task aims at querying unseen facts of a relation given its few-shot reference entity pairs. Recent attempts solve this problem by learning static representations of entities and references, ignoring their dynamic properties, i.e., entities may exhibit diverse roles within task relations, and references may make different contributions to queries. This work proposes an adaptive attentional network for few-shot KG completion by learning adaptive entity and reference representations. Specifically, entities are modeled by an adaptive neighbor encoder to discern their task-oriented roles, while references are modeled by an adaptive query-aware aggregator to differentiate their contributions. Through the attention mechanism, both entities and references can capture their fine-grained semantic meanings, and thus render more expressive representations. This will be more predictive for knowledge acquisition in the few-shot scenario. Evaluation in link prediction on two public datasets shows that our approach achieves new state-of-the-art results with different few-shot sizes.

INFORMS · 小樣本學習 · 圖 · 相關系數 · Networking ·

2019 年 11 月 21 日

Knowledge Graph Transfer Network for Few-Shot Recognition

Riquan Chen,Tianshui Chen,Xiaolu Hui,Hefeng Wu,Guanbin Li,Liang Lin

from arxiv, accepted by AAAI 2020 as oral paper

Few-shot learning aims to learn novel categories from very few samples given some base categories with sufficient training samples. The main challenge of this task is the novel categories are prone to dominated by color, texture, shape of the object or background context (namely specificity), which are distinct for the given few training samples but not common for the corresponding categories (see Figure 1). Fortunately, we find that transferring information of the correlated based categories can help learn the novel concepts and thus avoid the novel concept being dominated by the specificity. Besides, incorporating semantic correlations among different categories can effectively regularize this information transfer. In this work, we represent the semantic correlations in the form of structured knowledge graph and integrate this graph into deep neural networks to promote few-shot learning by a novel Knowledge Graph Transfer Network (KGTN). Specifically, by initializing each node with the classifier weight of the corresponding category, a propagation mechanism is learned to adaptively propagate node message through the graph to explore node interaction and transfer classifier information of the base categories to those of the novel ones. Extensive experiments on the ImageNet dataset show significant performance improvement compared with current leading competitors. Furthermore, we construct an ImageNet-6K dataset that covers larger scale categories, i.e, 6,000 categories, and experiments on this dataset further demonstrate the effectiveness of our proposed model.

元學習 · 語音識別 · MAML · 學成 · 端到端 ·

2019 年 10 月 26 日

Meta Learning for End-to-End Low-Resource Speech Recognition

Jui-Yang Hsu,Yuan-Jui Chen,Hung-yi Lee

from arxiv, 5 pages, submitted to ICASSP 2020

In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.