By Dmitriy Vatolin, YUVsoft
The Sixth International Workshop on Video Processing and Quality Metrics for Consumer Electronics (VPQM-2012), sponsored by Intel, was held in Scottsdale, Arizona, 19 – 20 January 2012. Held over two days, the workshop was attended by specialists from U.S. and European research institutions and universities working on video quality. The first day was dedicated mostly to quality of regular flat video, and the second was given fully to 3D video topics.
Our main impression from the first ‘2D’ day is that the primary research focus is on online video. Internet services such as YouTube, Netflix and Hulu are booming in popularity; video is a considerable part of overall traffic, and this trend is rising steeply. Furthermore, traffic and infrastructure of wireless mobile networks are also growing rapidly.
A speaker at the workshop stated that more than 50% of mobile network traffic is video. All networks are becoming increasingly heterogeneous, and the number of display device types is also rising – from cell phones to high-definition large screens. This growth adds even more complexity to transmission and playback problems.
In his keynote speech title “Optimising Media Delivery in the Future Mobile Cloud”, Jeffrey Foerster of Intel reminded us that it is necessary to evaluate the overall Quality of Experience (QoE) that the end user has while watching video, not just some objective distortions in transmitted content. In the case of mobile devices, there is quite a complex chain between the video content provider and end user device (e.g. smart phone), so we need to take into account the time between user commands and the actual start of video playback or performance of any other requested actions, and use efficient buffering and optimisation of transmission parameters in order to minimise delays and lags.
The rising popularity of stereo 3D
Stereo 3D video is growing in popularity among consumers. The ‘Customer’s Willingness to Pay’ report estimates that 33% of users are now ready to pay more to watch movies in 3D instead of just 2D.
Even more noteworthy is that the majority of 3D content is currently stereo. There is also some room for multiview video containing more than two scene views taken from different viewpoints, as well as for 2D+Depth video, which is compact in terms of 3D information presentation but requires generation of views in the receiving devices, and poses other complications.
Currently, the transmission of 3D video is a challenge. More views create additional traffic, raising the question of how to compress video data effectively while retaining compatibility with various devices. A further complication is that assessment of 3D video quality is an insufficiently researched area compared with 2D video. Addressing 3D video quality is much more intricate in principle as well. For example, the estimation of multiview video quality must take into account not only the level of distortions relative to the original video, but also the uniformity of artefacts between views – specifically, how well each view conforms to its neighbours after encoding, transmission and decoding.
In the case of 2D+Depth video, any direct pixel-based comparisons of original views with the decoded images are grossly incorrect, since the results depend heavily on the view generation algorithm used by the receiving device; furthermore, the key factors are quality of edge processing and handling of occlusion areas.
Stereo 3D as the baseline
Because of the ubiquity of stereo in 3D content, a multiview format was virtually ignored in the presented papers. Stereo was assumed both in matters of coding and transmission, and in matters of video display, almost all displays used for subjective quality were dual-view, i.e. stereo devices.
Overall, quality-metric development for 3D video is still in its early stages, and we have a long way to go to reach adequate and widely-adopted 3D video quality indicators and estimation methods.
Asymmetric stereo compression
Coding and transmission artefacts mainly relate to compression. Regardless of whether the stereo views are compressed independently or with respect to inter-view similarity, artefacts that differ between views can appear, and human perception in such situations is important. Two papers were presented on this topic. The first considered subjective perception of video content encoded for different quality levels in the left and right views (1).
Stereoscopic suppression occurs when a viewer, presented with two views of different qualities, perceives 3D at a quality close to that of the higher-quality view. The paper confirmed this effect, stating that the brain naturally discriminates between eyes, choosing a better-quality image. Information from the second image is used as auxiliary input, mainly for disparity estimation and depth perception.
Of the observers, 70% favoured the experience where bits were unequally allocated to the left and right views, compared with equal allocation. No research data, however, reports how much of a difference (in terms of the overall bitrate) is possible between equal and unequal bitrate stereo video while still maintaining the same subjective quality. Nevertheless, this effect should be considered when generating content for low-capacity networks.
A cautionary note, however, is that excessive exploitation of this effect can be disastrous when applied to high-quality content coding, including storage applications. The problem is that deteriorating correspondence between views affects subsequent processing of stereo-for example, disparity estimation, stereo-to-2D+Depth and stereo-to-multiview conversion. An open question is comfort level when the viewer looks at unequally-encoded stereo for an extended duration. Overall, this method is not recommended for Blu-ray 3D.
Naturally then, another paper on this topic considered brain adaptation to quality asymmetry (2). The problem is that when the same view is always used as a reference with high quality, frequent viewing of such video may cause unwanted brain adaptation. The brain will apply these adaptations to real-life images, which can cause, for example, problems for viewers with different acuity in each eye and in viewers for whom the leading eye is not on the same side as the view coded with maximum quality. A possible technical solution is enabling the encoding format to allow changes in the leading high-quality view over time, thereby preventing the viewer from becoming acclimatised to either view.
3D strength versus quality of experience
Yet another paper studying the question of stereo perception quality examined the relationship between 3D effect strength (the difference between minimum and maximum disparity in a scene) and global QoE (3). The authors conducted subjective testing for various scene types in a number of videos, both captured and rendered, taking into account not only regular 2D image quality but also naturalness, overall visual comfort and depth rendering quality.
Increasing the 3D effect does increase the perceived depth, as people can easily judge different perceived binocular depth levels. At the same time, however, this increase reduces visual comfort when it exceeds some threshold. The maximum perceived depth range was represented as the depth of focus (DoF). On the basis of their findings, the authors stated that DoF = 0.1 should be the target for natural scenes, and for synthetic scenes, the DoF may remain equal to 0.2.
For 3D video formats, subjective testing of several compressed stereo presentations was reported (4). Currently, the most widespread format is side-by-side (SBS) stereo with two-time decreased resolution for one of the dimensions – this approach enables compatibility with older devices. Also, two other formats that are a part of ITU 3D TV Categorisation were considered:
* SBS using Scalable Video Coding (SVC) for passing additional information to restore video to its original resolution;
* encoding of one view using Advanced Video Coding (AVC) and encoding a second view using Multiview Video Coding (MVC), but still supporting playback of 2D video (the first view) on legacy devices.
The reference H.264 codec was used for encoding. The results allowed the authors to state that the MVC-based format currently provides better quality. Most likely, it will represent the high road for various content providers.
Other measurement techniques
Characteristically, an objective metric for measuring 3D video quality was suggested only in one paper, from Tampere University of Technology (5). This is a full-reference metric requiring original video; besides comparison with the original, correspondence between views is also taken into account. The authors demonstrated that the metric surpasses classical methods of video quality estimation. This research is a work in progress, and the paper offers several directions for further improvement.
Another report considered automatic detection of distorted scenes in stereo video using an unsupervised approach, but no ready, objective metrics for finding various stereo artefacts have yet been presented (6).
In another paper, the current development state was shown for the Depth Index – the aggregate measure of depth-related characteristics of a video (7). Currently, this index is computed on the basis of perspective information extracted from the image. The objective results are compared with a subjective rating determined by a small group of observers. Development of such a metric will help to increase the accuracy of QoE estimation for 3D content.
The proceedings of the workshop are available online at www.vpqm.org. The organisers announced that the next workshop would be held as early as July of this year, also in Arizona.
(1) R. Palaniappan and N. Jayant, “Subjective quality in 3DTV: effects of unequal bit allocation to left
and right views” http://enpub.fulton.asu.edu/resp/vpqm/vpqm12/Papers/vpqm12_p20.pdf
(2) A.K. Jain, C. Bal, A. Robinson, D. MacLeod, and T.Q. Nguyen, “Temporal aspects of binocular suppression in 3D video” http://enpub.fulton.asu.edu/resp/vpqm/vpqm12/Papers/vpqm12_p21.pdf
(3) W. Chen, J. Fournier, M. Barkowsky, and P. Le Callet, “Exploration of quality of experience of stereoscopic images: binocular depth” http://enpub.fulton.asu.edu/resp/vpqm/vpqm12/Papers/vpqm12_p14.pdf
(4) T. Zhu, L. Karam, and T. Lam, “Subjective assessment of compressed 3D video” http://enpub.fulton.asu.edu/resp/vpqm/vpqm12/Papers/vpqm12_p10.pdf
(5) L. Jin, A. Boev, A. Gotchev, and K. Egiazarian, “3D-DCT based multi-scale full-reference quality metric for stereoscopic video” http://enpub.fulton.asu.edu/resp/vpqm/vpqm12/Papers/vpqm12_p24.pdf
(6) A. Voronov, A. Borisov, D. Vatolin, “System for automatic detection of distorted scenes in stereo video” http://enpub.fulton.asu.edu/resp/vpqm/vpqm12/Papers/vpqm12_p5.pdf
(7) L. Goldmann, T. Ebrahimi, P. Lebreton, and A. Raake, “Towards a descriptive depth index for 3D content: measuring perspective depth cues” http://enpub.fulton.asu.edu/resp/vpqm/vpqm12/Papers/vpqm12_p15.pdf
Dmitriy Vatolin is the CEO of YUVsoft, an R&D company offering professional software for 2D-to-stereo 3D semi-automatic conversion and stereo processing.