Technology: Disparity-aware Stereo 3D Production Tools
By Aljosa Smolic, Disney Research Zurich
Stereoscopic 3D (S3D) has reached wide levels of adoption by consumer and professional markets. The current success of S3D technology is due to the fact that technology and understanding about content creation has reached a high level of maturity. However, production of high-quality S3D content is still a difficult and expensive art.
S3D production has to consider fundamentals of human 3D perception as well as capabilities and limitations of 3D displays, and combine them with artistic intent. To help with this, Disney Research Zurich has developed advanced S3D production tools, algorithms, and systems, a key component of which is awareness of disparity or depth composition of the input S3D content. In some cases, sparse, but highly robust and accurate disparity information is estimated automatically. Other algorithms estimate dense disparity or depth maps. User interaction is part of some of the concepts, while others are fully automatic.
Fundamentals and limitations of stereo 3D perception
S3D content creation has to provide a pleasing and expressive mapping of the broad real 3D world into the limited stereoscopic comfort zone, to create the depth illusion. The fact that this is a difficult art was the motivation for development of the tools, algorithms, and system, which are outlined in the following discussion.

Stereoscopic comfort zone.
Computational stereo camera
Our computational stereo camera system features a closed control loop from analysis to automatic adjustments of the physical camera and rig properties. Our freely-programmable architecture comprises a high-performance computational unit that analyses the scene in real-time (e.g., by computing dense disparity and by tracking scene elements) and implements knowledge from stereography. For efficient camera operation, we devise a set of interaction metaphors that abstract the actual camera rig operations into intuitive gestures. The operator controls the camera using a multitouch stereoscopic user interface. In addition, the interface enables the S3D content to be monitored as well as the related stereo parameters, instantly.

Computational stereo camera system featuring intuitive interaction metaphors.
Stereoscopic analyser
On-set analysis and monitoring of stereoscopic video play an important role in S3D productions. Our stereo analyser assists crews to detect camera and lens misalignments, and is able to remove vertical disparities as well as keystoning automatically and in real-time. Our system furthermore analyses and visualises the horizontal disparity distribution, and warns the user in case of uncomfortable settings.

The Disney Research system analyses and displays disparities, histogram, and stereo parameters in real time.
Depth script visualisation, disparity histograms
As 3D movie making becomes more popular, the artistic desire to use depth as an important storytelling element increases. Filmmakers carefully plan and design depth throughout the movie. We therefore developed a production tool that allows for visualisation of depth over individual takes or through an entire movie. The image below shows a typical output of our tool.

Depth script visualisation for an edited movie.
Nonlinear disparity mapping by image-domain warping
In many cases captured stereo content still requires modification in post-production. For example, for:
* Display adaptation – showing S3D on a different screen size requires modification of disparities.
* Artistic modification – manipulation of depth distribution in post-production may be required due to artistic decisions.
* Problematic disparities – correction of errors during shooting can become necessary.
We developed a novel approach for remapping the disparity range of a stereoscopic image pair after capture that is based on image-domain warping (IDW). The following are examples of nonlinear disparity mapping, with originals on the left and modified versions on the right:

The cow in this example is shifted back in depth, while not changing the background depth.

The car causes an edge violation, which is corrected by pushing it back in depth (image copyright KUK Filmproduction GmbH).
Stereo to multiview conversion
Although S3D is widely adopted today, the necessity to wear glasses and the limitation to two views, which prevents the perception of all natural 3D cues, are often regarded as the main limitations of today’s mainstream 3D systems. These two shortcomings of S3D are addressed by multiview autostereoscopic displays (MAD). However, content creation for MADs is still a difficult task. We apply the same algorithms as described in the previous section (IDW) for optimum view synthesis from stereo (two-view) input.

Stereo to multiview conversion.
Interactive 2D-to-3D conversion using discontinuous warps
For user-assisted 2D to 3D conversion, we have introduced a new workflow called StereoBrush, in which the user ‘paints’ depth onto a 2D image via sparse scribbles. In contrast to existing methods in which the conversion pipeline is separated into discrete steps, including rotoscoping, proxy geometry generation, and rendering (with inpainting), our method accomplishes all steps simultaneously, providing instantaneously intuitive 3D feedback to the user. Our method operates directly on the image domain, creating stereoscopic pairs from sparse, possibly erroneous user input while preserving important depth effects. In addition, inpainting is avoided by means of a stereo-aware stretching of background content to fill in holes.

The StereoBrush application allows the user to 'paint' depth onto a 2D image using sparse scribbles.
Automatic 2D-to-3D conversion for sports
In addition, we have developed a system to automatically create high-quality stereoscopic video from monoscopic footage of field-based sports by exploiting context-specific priors, such as the ground plane, player size and known background. Our main contribution is a novel technique that constructs per-shot panoramas to ensure temporally-consistent stereoscopic depth in the output stereo video. Players are rendered as billboards at correct depths on the ground plane.

Overview of depth map generation.
Conclusion
Producing high-quality S3D requires highly-skilled and experienced individuals, and can be an expensive and difficult process. By developing tools for estimating disparity, either by involving some user interaction or being fully automatic, we are confident that we can help in the drive to make the production process easier and keep costs down, while ensuring the best possible experience for the audience.
Dr Aljosa ‘Josh’ Smolic is Senior Research Scientist and Group Leader of Advanced Video Technology for Disney Research Zurich, the research centre of The Walt Disney Company, related to ETH Zurich.
The author would like to thank the following contributors: S. Poulakos, S. Heinzle, P. Greisen, M. Lang, A. Hornung, M. Farre, N. Stefanoski, O. Wang, L. Schnyder, R. Monroy, and M. Gross.
A PDF of the full version of this paper can be downloaded from www.cvmp-conference.org.
Nice article. The tools mentioned are very similar to those already commercially available in the Cel-Scope3D stereoscopic analyser.
This is good research, and some uses are great in the live production environment.
However, I’m not convinced about the Nonlinear disparity mapping by image-domain warping module.
To a trained eye, you can see the warping of the grass on the right in the cow image.
The cows “snout” also is disfigured.
“Volume sculpting” as I call it, after-the-fact, is prone to artifacts, and works only depending on the scene.
Certainly high-end software such as Nuke, Mistika and even the Re:flex plugin can achieve success, but it depends on the scene in question.
It’s not “there” yet as a complete solution.
Regards,
Clyde