Tour in Picture
using a spidery mesh interface to make animation from a single image

TUM CV Challenge SS24


Wenbo Ji, Xiang Ji, Hongru Li, Yuming Li, Shilin Zhang

Group 31
Technical University of Munich

Original input image for Tour into the Picture Foreground cutout result for Tour into the Picture

Motivation


Based on "Tour into the Picture" (TIP)[2] approach, we aim to develop autonomous algorithms that infer two key structures from a single 2D image: the regular, program-like textures or patterns on 2D planes and the 3D positioning of these planes within the scene.

For example, from a single Metro Station image below, we can infer the camera pose, partition the image into distinct planes (walls, floor, ceiling, and far plane), and recognize repeated patterns.

This method enables flexible image editing, such as inpainting, moving the camera, and extending the Image Space, which requires a deep understanding of the scene's 3D structure and real-time rendering of spatially consistent views.

Scene decomposition into foreground, background planes, and vanishing point geometry

Introduction


The project aims to develop a graphical user interface (GUI) that allows users to extract a simple scene model from a single 2D image, facilitating easy animation and scene manipulation.

With our GUI, users can intuitively distinguish between foreground and background objects. The background geometry is approximated using simple polygons, forming a polyhedral model with the vanishing point at its base.

Specifying the vanishing point is also user-driven, ensuring that the virtual vanishing point aligns with the user's perception.

Finally, users can determine the proximity of objects in the scene, effectively setting camera parameters to position foreground objects as desired.

Tour into the Picture GUI workflow for extracting a scene model from one image

Method


Overview of Our Method: Step 1: Data Selection, Step 2: Image Decomposition, Step 3: Fitting Perspective Projection, and Step 4: Camera Positioning.


Pipeline: data selection, image decomposition, perspective projection fitting, and camera positioning


Experimental Results


Costume Data - Chilli room

Tour into the Picture result on a chilli room image

Complex sipmle - Simple room

Tour into the Picture result on a simple room image

Complex middle - Museum

Tour into the Picture result on a museum image

Complex middle - Shopping mall

Tour into the Picture result on a shopping mall image

Reference


  
    [1] Zhiqing Cao, Xin Sun, and Jiaoying Shi.
    Tour into the picture using relative depth calculation.
    In Proceedings of the 2004 ACM SIGGRAPH international conference on Virtual Reality continuum and its applications in industry, pages
    38–44, 2004.
    [2] Youichi Horry, Ken‐Ichi Anjyo, and Kiyoshi Arai.
    Tour into the picture: using a spidery mesh interface to make animation from a single image.
    In Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’97, page 225–232, USA,
    1997. ACM Press/Addison‐Wesley Publishing Co.
    [3] Jian Liu, Kuangrong Hao, Huan Liu, and Yongsheng Ding.
    An improved algorithm based on tip using a vanishing line.
    In 2013 IEEE Third International Conference on Information Science and Technology (ICIST), pages 546–549. IEEE, 2013.
    [4] Guihang Wang, Xuejin Chen, and Si Chen.
    Cut‐and‐fold: Automatic 3d modeling from a single image.
    In 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pages 1–6, 2014.