ConRF: Zero-shot Stylization of 3D Scenes with Conditioned Radiation Fields

Arxiv 2024

1Durham university, UK   2IHPC, A*STAR, Singapore   3Tencent Jarvis Research Center, China

Zero-shot 3D style transfer of single condition. Given a set of multi-view content images of a 3D scene, ConRF can transfer an arbitrary text reference style or an arbitrary image reference style to the 3D scene in a zero-shot manner.

Abstract

Most of the existing works on arbitrary 3D NeRF style transfer required retraining on each single style condition. This work aims to achieve zero-shot controlled stylization in 3D scenes utilizing text or visual input as conditioning factors. We introduce ConRF, a novel method of zero-shot stylization. Specifically, due to the ambiguity of CLIP features, we employ a conversion process that maps the CLIP feature space to the style space of a pre-trained VGG network and then refine the CLIP multi-modal knowledge into a style transfer neural radiation field. Additionally, we use a 3D volumetric representation to perform local style transfer. By combining these operations, ConRF offers the capability to utilize either text or images as references, resulting in the generation of sequences with novel views enhanced by global or local stylization. Our experiment demonstrates that ConRF outperforms other existing methods for 3D scene and single-text stylization in terms of visual quality.

Global image reference transfer

Global text reference transfer

the colors are bright and bold and the lines are dynamic

Chinese ink painting style

Local transfer

BibTeX

@misc{miao2024conrf,
        title={ConRF: Zero-shot Stylization of 3D Scenes with Conditioned Radiation Fields}, 
        author={Xingyu Miao and Yang Bai and Haoran Duan and Fan Wan and Yawen Huang and Yang Long and Yefeng Zheng},
        year={2024},
        eprint={2402.01950},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }