Neural Free-Viewpoint Performance Rendering under
Complex Human-object Interactions

Guoxing Sun1   Xin Chen1,2   Yizhang Chen1   Anqi Pang1,2   Yuheng Jiang1   Pei Lin1   Lan Xu1   Jingya Wang1,*   Jingyi Yu1,*  


Our approach achieves photo-realistic reconstruction results of human activities in novel views under challenging human-object interactions, using only six RGB cameras. (a) Capture setting. (b) Our results.


4D reconstruction of human-object interaction is critical for immersive VR/AR experience and human activity understanding. Recent advances still fail to recover fine geometry and texture results from sparse RGB inputs, especially under challenging human-object interactions scenarios. In this paper, we propose a neural human performance capture and rendering system to generate both high-quality geometry and photo-realistic texture of both human and objects under challenging interaction scenarios in arbitrary novel views, from only sparse RGB streams. To deal with complex occlusions raised by human-object interactions, we adopt a layer-wise scene decoupling strategy and perform volumetric reconstruction and neural rendering of the human and object. Specifically, for geometry reconstruction, we propose an interaction-aware human-object capture scheme that jointly considers the human reconstruction and object reconstruction with their correlations. Occlusion-aware human reconstruction and robust human-aware object tracking are proposed for consistent 4D human-object dynamic reconstruction. For neural texture rendering, we propose a layer-wise human-object rendering scheme, which combines direction-aware neural blending weight learning and spatial-temporal texture completion to provide high-resolution and photo-realistic texture results in the occluded scenarios. Extensive experiments demonstrate the effectiveness of our approach to achieve high-quality geometry and texture reconstruction in free viewpoints for challenging human-object interactions.

Comparsion on novel view synthesis

The comparsion methods are [1]NHFVV,   [2]NHR,   [3]NeRF.


Technical Paper


This work was supported by NSFC programs (61976138, 61977047), the National Key Research and Development Program (2018YFB2100500), STCSM (2015F0203-000-06), SHMEC (2019-01-07-00-01-E00003 ) and Shanghai YangFan Program (21YF1429500). We thank Yannan He and Han Liang for discussions and helps about human tracking.


      title={Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions}, 
      author={Sun, Guoxing and Chen, Xin and Chen, Yizhang and Pang, Anqi and Lin, Pei and Jiang, Yuheng and Xu, Lan and Wang, Jingya and Yu, Jingyi},
      booktitle={Proceedings of the 29th ACM International Conference on Multimedia}, 

This project page's template is adapted from TightCap and Neural Actor.