EVA-Gaussian: 3D Gaussian-Based Real-time Human Novel View Synthesis Under Diverse Camera Settings


Yingdong Hu1, Zhening Liu1, Jiawei Shao1,2, Zehong Lin1 *, Jun Zhang1,

1The Hong Kong University of Science and Technology    2Institute of Artificial Intelligence (TeleAI), China Telecom

*Corresponding author

Abstract


We propose a three-stage pipeline named EVA-Gaussian for 3D human novel view synthesis across diverse camera settings. Specifically, we first introduce an Efficient cross-View Attention (EVA) module to accurately estimate the position of each 3D Gaussian from the source images. Then, we integrate the source images with the estimated Gaussian position map to predict the attributes and feature embeddings of the 3D Gaussians. Finally, we employ a recurrent feature refiner to correct artifacts caused by geometric errors in position estimation and enhance visual fidelity. To further improve synthesis quality, we incorporate a powerful anchor loss function for both 3D Gaussian attributes and human face landmarks. Experimental results on the THuman2.0 and THumansit datasets showcase the superiority of our EVA-Gaussian approach in rendering quality across diverse camera settings.


Video



Free View Rendering



Method Overview


 

Overview of EVA-Gaussian. EVA-Gaussian takes sparse-view images captured around a human subject as input and performs three key stages: (1) estimating the positions of 3D Gaussians, (2) inferring the remaining attributes (i.e., opacities, scales, quaternions, and features) of these Gaussians, and (3) refining the output image in a recurrent manner

 

EVA Module


 

Efficient cross-View Attention (EVA) module for 3D Gaussian position estimation. EVA takes multi-view image features as input, embeds them into window patches using a shifted algorithm, and performs cross-view attention between the features from different views.

 

Regularization Loss


 

Attribute regularization. We regularize the opacities and scales of Gaussians, as well as the position mismatches among the Gaussians in the landmark collection. The optimization of position mismatch when it falls below a specific tolerance.

 

Visualization


 


Citation



  @misc{hu2024evagaussian3dgaussianbasedrealtime,
    title={EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings}, 
    author={Yingdong Hu and Zhening Liu and Jiawei Shao and Zehong Lin and Jun Zhang},
    year={2024},
    eprint={2410.01425},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2410.01425}, 
}