The document is a white paper from Nomad3D that discusses the requirements for interactive online 3D video and introduces their 3D+F codec as a solution. The 3D+F codec runs as an efficient extension on existing 2D video codecs like H.264. It represents stereoscopic video as a single fused 2D view plus metadata, avoiding doubling the complexity and bandwidth needed for separate left/right views. The paper argues 3D+F meets the requirements for low latency, bitrate and power needed for applications like gaming, telemedicine and military communications better than alternatives like encoding left/right views separately with H.264.
1 of 7
Download to read offline
More Related Content
Interactive 3D Online Video Requirements
1. Nomad3D White Paper
White Paper: Mobile Interactive Online 3D
Video Requirements
Dr. Alain Fogel, CEO Nomad3D
www.nomad3D.com
Contact: Dr. Alain Fogel, CEO Nomad3D
Email: alain.fogel@nomad3D
Date: 29.07.2012 Page 1
2. Nomad3D White Paper
1 Nomad3D Executive and Technical Summary
Nomad3D has developed a revolutionary 3D video codec that runs as an efficient
extension on existing 2D video codecs. This codec, referred to as Nomad3D 3D+F, or
3D+F for short, is able to provide interactive 3D video capabilities on top of existing 2D
video infrastructure, where traditional 3D extensions of 2D video codecs fail. Nomad3D
3D+F provides for an efficient and low latency coding and decoding of stereoscopic 3D
video as a thin software layer on top of existing 2D video codec infrastructure. It is well
positioned for use in interactive 3D video gaming, telemedicine, 3D teleconferencing,
3D military communication and mobile 3D environments.
3D+F uses a representation of stereoscopic left/right views as a fused single 2D view
plus additional fusion meta-data. The fused 2D view is coded using a traditional 2D
video codec (H.264, VP8), whereas the fusion metadata (FusionData) is efficiently
encoded using Nomad3Ds fusion codec. Nomad3D IP protects the fusion and de-fusion
technology, as well as technical aspects of the fusion codec.
Traditional stereoscopic 3D video codecs essentially operate by separately encoding left
and right views. This approach leads to an increase by a factor of two compared to a 2D
video codec in terms of complexity, required resources and bandwidth. The 3D+F codec
resolves this complexity issue by transforming a stereoscopic pair into (1) a single fused
(Cyclops) 2D view, and (2) FusionData. The FusionData encode technical elements of
the fusion process and assist in recreating left-right views at decoding time. The
complexity of the fusion and defusion, as well as the fusion codec, is very small.
Moreover, the bandwidth required for FusionData is small. The overall complexity of
the 3D+F codec is therefore mainly determined by the complexity of the 2D video
codec in the lower branch of Figure 1. The 2D video codec in 3D+F is a free parameter,
and may be chosen from a number of well-performing video codecs, such as
H.264/AVC (Wikipedia, 2012) and VP8 (WebM, 2012).
For a given video quality target, 3D+F may gain close to a factor of two compared to a
standard 3D video codec in coding complexity, required resources and bandwidth.
3D+F is therefore very well suited to be deployed in demanding and/or resource-
constrained ecosystems.
Date: 29.07.2012 Page 2
3. Nomad3D White Paper
Fusion Codec Fusion Codec
(Software) (Software)
Fusion
Data
FusionData FusionData
Left-Right Encoding Decoding Left-Right
Views Views
Fusion De-Fusion
2D Video 2D Video
Encoding Decoding
Cyclops (2D)
View
Standard Standard
2D Codec 2D Codec
Figure 1 - 3D+F Overview
2 Interactive Video Overview
Interactive video involves two or more parties interacting in real-time via graphical,
video, textual or audio interfaces. Interactive video is important and/or relevant in video
gaming, telemedicine, and military communications, among others. The respective
markets are growing at tremendous speeds due to the increasing availability of powerful
mobile devices and increasingly powerful connectivity. As stereoscopic 3D (S3D)
becomes more available due to maturing technologies such as auto-stereoscopic
smartphones and tablets (no glasses), interactive video will migrate to S3D devices. The
use of interactive 3D video is believed to lead to a richer user experience and/or a better
understanding of context and situation.
2.1 Interactive Video Gaming
Interactive video gaming over the Internet is a large and growing market, including the
developing Online Cloud Gaming (OCG) market. According to PWC (PWC, 2012), the
online market will take over the traditional gaming market as shown on Figure 2 and
clearly this trend is supported by the deployment of mobile devices. Social Gaming
companies such as Zynga and Playfish and Game on Demand companies such as
OnLive, Microsoft, and Sony Computer Entertainment are forming the bulk of the
ecosystem of OCG.
2.2 Telemedicine
Telemedicine is a developing area, benefitting from the ubiquity of smartphones and
tablets. The technology leaders in this market (e.g. Philips, Cisco, Lucent-Alcatel, and
Date: 29.07.2012 Page 3
4. Nomad3D White Paper
Honeywell) are cooperating with leading carriers (e.g. Orange, and Vodafone), leading
device manufacturers (e.g. Apple, Samsung, and LG) and providers of video
conferencing systems (e.g. Cisco, Polycom, and Vidyo).
2.3 Military communications
Military Communications have specific requirements for interactive video, in particular
in the area of airborne surveillance (e.g. drones), target identification, tracking and
engaging. The use of interactive 3D video is critical to a better understanding of context
and situation.
Figure 2 Online/Wireless Gaming Market vs. Console/PC Market
3 Interactive Online Mobile Video The Pain and Requirements
In the following, we present the pain and requirements of interactive online mobile
video. We point out the differences between 2D and 3D and the resulting differences in
the requirements and the compared consequences due to video codecs.
3.1 3D Cursor
3D interactive applications very often require a 3D cursor, i.e. a cursor that is being
perceived at the depth of the object it is pointing to. The correspondence between the
3D cursor and the actual pixel pointers in the left and right view is hard to implement
Date: 29.07.2012 Page 4
5. Nomad3D White Paper
without the help of a depth or disparity map. This depth/disparity must therefore be
provided in addition to the H.264 baseline flows, implying additional significant
computational, power and bandwidth resources.
3.2 Latency
Latency is the time between capturing and displaying a video frame. Latency is a
primary issue in interactive video and is preferably less than 100ms end-to-end. The
standard codec H.264 is not able to achieve such latency in its highest profiles where the
latency is often above 1 second. Hence, the standard codec will be typically used in
baseline profile. Latency gets worse for 3D video content with separate or multi-view
encoding of left and right views.
3.3 Bit Rate
The required bit rate (or bandwidth) for video transmission depends on the resolution,
frame rate, quality requirement and video codec capability to compress and decompress
within the limitations of the communication channel bandwidth (network capability).
For 3D content, the required bit rate for a given quality may increase by as much as a
factor of two.
3.3.1 Cloud gaming
For OCG the required bit rate for good video quality with H.264 exceeds 3 Mbps. In the
US, available bit rates on the public Internet are usually between 1 and 7 Mbps, making
3D gaming with H.264 problematic. In addition 3D OCG very often needs the presence
of a depth or disparity map for implementation of a 3D cursor requiring significant
additional transmission bandwidth.
3.3.2 Telemedicine and the military communication
The required bit rate for interactive telemedicine and military applications is in the order
of 10 Mbps. This rate is difficult to sustain on public US networks. In Europe and Asia
the situation is slightly better, but still on the cutting edge of possibilities. For 3D
interactive video with the requirement of a 3D cursor using a standard H.264 codec,
network performance is insufficient to sustain acceptable 3D quality.
3.4 Computing Power
3.4.1 Cloud gaming and military
Strong computing capabilities are required to decode a flow of H.264-coded 2D video.
3D video decoding using a standard H.264 solution requires doubling of this compute
Date: 29.07.2012 Page 5
6. Nomad3D White Paper
power. On a 4.3 smartphone, this implies up to 40% of reduction of battery life and on
a tablet, up to 20%.
3.4.2 Telemedicine and military
In addition to decoding computing capabilities, telemedicine and military
communication also require HD encoding capabilities. The H.264 encoder is much
more power hungry than the decoder (at least 2 times) and therefore the impact on
computing power and battery life is severe.
3.5 Conclusion
The requirements for interactive 3D video are at or over the limit of current capabilities
of public networks and terminals using standard H.264 coding techniques.
4 The Solution: NOMAD3D 3D+F Codec
4.1 Features
Nomad3D has developed an innovative 3D CODEC with the following features:
3D+F is compatible with 2D codecs (e.g. H.264, VP8)
3D+F is a layer on top a 2D underlying codec that can be reused for 3D as is.
3D+F is high performance and is power efficient.
3D+F requires no change in the hardware of 2D application processors.
3D+F is enabled by pre-installed or downloadable software.
4.2 Compliance of 3D+F with Interactive 3D Online Video
The Nomad3D 3D+F 3D codec is the solution for complying with the requirements of
3D interactive online video. Specifically, compared to 2D:
3D+F adds at most one video frame of latency (30ms for 30 fps).
3D+F requires at most a 30% of bit rate increase (H.264: 130%).
3D+F does not need an additional depth map to implement a 3D cursor.
3D+F increases decoder power consumption by less than 15%.
Date: 29.07.2012 Page 6
7. Nomad3D White Paper
The advantages of 3D+F vs. H.264 for interactive online 3D video are summarized on
Table 1 below:
3D H.264 NOMAD3D
Baseline Profile 3D+F
Latency Low Low
Decoder Computing
High Very Low
Power
Encoder Computing
Very High Low
Power
Software
No Yes
Implementation
Hardware Impact High None
Required Bit Rate High Low
3D cursor complexity High Low
Additional Depth or
Yes No
Disparity Map
Table 1 Comparison of H.264 and 3D+F for Interactive Online 3D Video
REFERENCES
[ 1] - WebM. (2012, July 25). Welcome to the WebM Project. Retrieved July 25, 2012, from
WebM: http://www.webmproject.org/
[ 2] - Wikipedia. (2012, July 25). H.264/MPEG-4 AVC. Retrieved July 25, 2012, from
Wikipedia: http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC
[ 3] PwC Internet source - http://www.pwc.com/gx/en/global-entertainment-media-
outlook/segment-insights/video-games.jhtml
Date: 29.07.2012 Page 7