By Aljosa Smolic, Disney Research Zurich
Stereoscopic 3D (S3D) has reached wide levels of adoption by consumer and professional markets. The current success of S3D technology is due to the fact that technology and understanding about content creation has reached a high level of maturity. However, production of high-quality S3D content is still a difficult and expensive art.
S3D production has to consider fundamentals of human 3D perception as well as capabilities and limitations of 3D displays, and combine them with artistic intent. To help with this, Disney Research Zurich has developed advanced S3D production tools, algorithms, and systems, a key component of which is awareness of disparity or depth composition of the input S3D content. In some cases, sparse, but highly robust and accurate disparity information is estimated automatically. Other algorithms estimate dense disparity or depth maps. User interaction is part of some of the concepts, while others are fully automatic.
Fundamentals and limitations of stereo 3D perception
S3D content creation has to provide a pleasing and expressive mapping of the broad real 3D world into the limited stereoscopic comfort zone, to create the depth illusion. The fact that this is a difficult art was the motivation for development of the tools, algorithms, and system, which are outlined in the following discussion.
Computational stereo camera
Our computational stereo camera system features a closed control loop from analysis to automatic adjustments of the physical camera and rig properties. Our freely-programmable architecture comprises a high-performance computational unit that analyses the scene in real-time (e.g., by computing dense disparity and by tracking scene elements) and implements knowledge from stereography. For efficient camera operation, we devise a set of interaction metaphors that abstract the actual camera rig operations into intuitive gestures. The operator controls the camera using a multitouch stereoscopic user interface. In addition, the interface enables the S3D content to be monitored as well as the related stereo parameters, instantly.
On-set analysis and monitoring of stereoscopic video play an important role in S3D productions. Our stereo analyser assists crews to detect camera and lens misalignments, and is able to remove vertical disparities as well as keystoning automatically and in real-time. Our system furthermore analyses and visualises the horizontal disparity distribution, and warns the user in case of uncomfortable settings.
Depth script visualisation, disparity histograms
As 3D movie making becomes more popular, the artistic desire to use depth as an important storytelling element increases. Filmmakers carefully plan and design depth throughout the movie. We therefore developed a production tool that allows for visualisation of depth over individual takes or through an entire movie. The image below shows a typical output of our tool.
Nonlinear disparity mapping by image-domain warping
In many cases captured stereo content still requires modification in post-production. For example, for:
* Display adaptation – showing S3D on a different screen size requires modification of disparities.
* Artistic modification – manipulation of depth distribution in post-production may be required due to artistic decisions.
* Problematic disparities – correction of errors during shooting can become necessary.
We developed a novel approach for remapping the disparity range of a stereoscopic image pair after capture that is based on image-domain warping (IDW). The following are examples of nonlinear disparity mapping, with originals on the left and modified versions on the right:
Stereo to multiview conversion
Although S3D is widely adopted today, the necessity to wear glasses and the limitation to two views, which prevents the perception of all natural 3D cues, are often regarded as the main limitations of today’s mainstream 3D systems. These two shortcomings of S3D are addressed by multiview autostereoscopic displays (MAD). However, content creation for MADs is still a difficult task. We apply the same algorithms as described in the previous section (IDW) for optimum view synthesis from stereo (two-view) input.
Interactive 2D-to-3D conversion using discontinuous warps
For user-assisted 2D to 3D conversion, we have introduced a new workflow called StereoBrush, in which the user ‘paints’ depth onto a 2D image via sparse scribbles. In contrast to existing methods in which the conversion pipeline is separated into discrete steps, including rotoscoping, proxy geometry generation, and rendering (with inpainting), our method accomplishes all steps simultaneously, providing instantaneously intuitive 3D feedback to the user. Our method operates directly on the image domain, creating stereoscopic pairs from sparse, possibly erroneous user input while preserving important depth effects. In addition, inpainting is avoided by means of a stereo-aware stretching of background content to fill in holes.
Automatic 2D-to-3D conversion for sports
In addition, we have developed a system to automatically create high-quality stereoscopic video from monoscopic footage of field-based sports by exploiting context-specific priors, such as the ground plane, player size and known background. Our main contribution is a novel technique that constructs per-shot panoramas to ensure temporally-consistent stereoscopic depth in the output stereo video. Players are rendered as billboards at correct depths on the ground plane.
Producing high-quality S3D requires highly-skilled and experienced individuals, and can be an expensive and difficult process. By developing tools for estimating disparity, either by involving some user interaction or being fully automatic, we are confident that we can help in the drive to make the production process easier and keep costs down, while ensuring the best possible experience for the audience.
Dr Aljosa ‘Josh’ Smolic is Senior Research Scientist and Group Leader of Advanced Video Technology for Disney Research Zurich, the research centre of The Walt Disney Company, related to ETH Zurich.
The author would like to thank the following contributors: S. Poulakos, S. Heinzle, P. Greisen, M. Lang, A. Hornung, M. Farre, N. Stefanoski, O. Wang, L. Schnyder, R. Monroy, and M. Gross.
A PDF of the full version of this paper can be downloaded from www.cvmp-conference.org.