Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

Loading...
Thumbnail Image
File version

Accepted Manuscript (AM)

Author(s)
Li, Jiahe
Zhang, Jiawei
Bai, Xiao
Zhou, Jun
Gu, Lin
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2023
Size
File type(s)
Location

Paris, France

License
Abstract

This paper presents ER-NeRF, a novel conditional Neural Radiance Fields (NeRF) based architecture for talking portrait synthesis that can concurrently achieve fast convergence, real-time rendering, and state-of-the-art performance with small model size. Our idea is to explicitly exploit the unequal contribution of spatial regions to guide talking portrait modeling. Specifically, to improve the accuracy of dynamic head reconstruction, a compact and expressive NeRF-based Tri-Plane Hash Representation is introduced by pruning empty spatial regions with three planar hash encoders. For speech audio, we propose a Region Attention Module to generate region-aware condition feature via an attention mechanism. Different from existing methods that utilize an MLP-based encoder to learn the cross-modal relation implicitly, the attention mechanism builds an explicit connection between audio features and spatial regions to capture the priors of local motions. Moreover, a direct and fast Adaptive Pose Encoding is introduced to optimize the head-torso separation problem by mapping the complex transformation of the head pose into spatial coordinates. Extensive experiments demonstrate that our method renders better high-fidelity and audio-lips synchronized talking portrait videos, with realistic details and high efficiency compared to previous methods. Code is available at https://github.com/Fictionarry/ER-NeRF.

Journal Title
Conference Title

2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

This work is covered by copyright. You must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a specified licence, refer to the licence for details of permitted re-use. If you believe that this work infringes copyright please make a copyright takedown request using the form at https://www.griffith.edu.au/copyright-matters.

Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation

Li, J; Zhang, J; Bai, X; Zhou, J; Gu, L, Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis, 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 7534-7544