Standards: Part 13 - Exploring MPEG4-Part 10 - H.264/AVC

The H.264/AVC codec has been very successful. Here we dig deeper into how profiles and levels work to facilitate deployment of delivery systems and receiving client-player designs.

This article is part of our growing series on Broadcast Standards.
The first 26 articles are now available in Broadcast Standards – The Book.

In 2004, it was uncertain whether H.264/AVC or VC1 would become dominant. VC1 was based on a popular Microsoft Windows Media format offered to SMPTE for ratification. Eventually, H.264 did become the codec of choice for a wide variety of applications. The successors (SVC, HEVC & LCEVC) must offer significant advantages to gain similar traction.

About The Standard

The MPEG-4 Part 10 standard is very large and complex with approximately 900 pages of densely concentrated detail.

The support for profiles and levels is fundamental to successfully deploying your content using the H.264 format. There have also been some important extensions (SVC, MVC and MFC) to the original codec design. These are included as Annexes to the main body of the standard and require a high degree of focus to interpret correctly.

Section 3 briefly describes the terminology and abbreviations used throughout the standard. Understanding these makes the rest of the standard much easier to comprehend.

The notational conventions in Section 5 are relevant if you want to understand the mathematical and logical concepts described later on. These will be most useful to codec developers.

The structure of the Network Abstraction Layer packets (NAL units) is described in Section 7. The coverage of the NAL unit payload includes descriptions of how the profile and level parameters are formatted. Read this in combination with the Annexes to glean the specific values and locations for the profile_idc, profile_iop and level_idc bytes in the NAL unit.

Profiles are described in Annex A for the core AVC compression standard. More profiles are introduced in Annex F which describes Scalable Video Coding (SVC). Annexes G, H and I address various multi-view coding techniques for stereoscopic and 3D viewing. The additional profiles needed to constrain them and signal the client-player are also described there.

Levels are addressed comprehensively in Sub-section A.3 of Annex A.

Annex B describes the Byte Stream Syntax as opposed to the bitstream syntax in Section 7. It also explains how a decoder can resynchronize itself to the incoming stream. The decoder frames the bitstream into 8-bit bytes to unpack the payloads in the NAL units.

Understand the decoding process with the Hypothetical Reference Decoder described in Annex C.

Supplemental Enhancement Information (SEI) is described in Annex D. This additional metadata describes the content in the stream. Decoders have some discretion in how they respond to this.

Annex E describes Video Usability Information (VUI) which parameterizes aspect-ratio, picture size, over-scanning, color gamut ranges and their associated transfer functions. The client-player uses this to present the video canvas correctly.

The rest of this article will focus on Profiles and Levels. This is an area of some complexity and low-level explanations of how it works are scarce and hard to find.

Profiles & Levels

Choose the profile and level that best suits your needs. Encoders transmit the details to the client which interprets the bitstream accordingly.

Profiles manage the encoding process and select appropriate sub-sets of the individual coding tools. This is a huge benefit and reduces the complexity of encoder configurations. The decoder has counterparts for each of these tools.

Levels are important in the receiving client-player and are concerned with the display size and color depth of the decoded images.

Do not confuse the container profiles defined by the MPEG4 systems layer with Part 10 video compression profiles. They are not the same thing.

Signaling The Profile & Level

The profile and level signaling mechanism has become very complex because the standard has been revised multiple times while retaining the necessary backwards compatibility with many millions of previously deployed devices.

The profile and level values are located near the start of a NAL unit (packet) payload. Unpack it carefully to reveal three bytes representing these properties:

• profile_idc
• profile_iop
• level_idc

The profile_iop value uses individual bits as flagging indicators. Conventional Boolean notation applies with the value 1 representing TRUE and the value 0 representing FALSE.

Byte 1 contains the profile_idc which identifies the foundation profile. The same profile_idc value may be used to identify several different profiles because they are uniquely distinguished by appending the profile_iop value. For example, the same profile_idc is used for Baseline and Constrained Baseline profiles but IOP constraint bit-flag 1 determines which is selected.

Byte 2 is the Interoperability Profile (IOP) described as the profile_iop. It carries 5 constraining individual bit-flags which alter the behavior of the profile specified in the profile_idc. It also affects the behavior of the level_idc value. To unambiguously select a profile, Bytes 1 and 2 must be combined. The meaning of these individual constraint flags depends on the context. Refer to Section 7.4.2.1.1 for details and cross-references to the applicable annex descriptions.

Byte 3 The level_idc describes the level at which the chosen profile is operating so the client can reconstruct the images correctly.

Profile Categories

Many of the profiles are derived from the same common Baseline and High ancestors. This has implications when the behavior of level_idc values are examined. This diagram illustrates the inheritance:

H.264 profiles can also be grouped according to which part of the ISO standard they are described in:

Category	Description
Core	The foundation set of profiles in H.264 define non-scalable 2D flat presentations. The player application may transform the video canvas that the images are being drawn onto.
Pro	Professional users, camera ingest and editing require additional profiles.
SVC	The Scalable Video Coding standard introduces more profiles.
MVC	Multi-view coding requires support for stereoscopic images in the player. These reduce the resolution of the two images so they can be accommodated within a single flat video raster.
MFC	Multi-resolution Frame-Compatible coding adds specialized profiles for full resolution stereoscopic imaging.
3D	The 3D-AVC standard adds two more profiles for enhanced 3D support.

Current List Of Profiles

These are the currently defined profiles for H.264. Gleaning the profile_idc and profile_iop values by carefully reading the standard is somewhat arduous as there is no corresponding summary table included.

The profile_idc value is shown in the IDC column. The optional constraint settings in the profile_iop are listed in the IOP column. All combinations of IDC and IOP are unique.

Category	Profile name	IDC	IOP	Description
Core	Constrained Baseline	66	1	Useful for video conferencing and mobile applications.
Core	Baseline	66	-	Improves the robustness of the Constrained Baseline profile. The differences are subtle.
Core	Extended	88	-	Designed for streaming with additional capabilities to support stream switching.
Core	Main	77	-	Standard Definition TV over DVB transports.
Core	High	100	-	High Definition TV broadcast and storage. Adopted by Blu-ray discs and HDTV transmissions.
Core	Progressive High	100	4	Based on the High profile without interlace support.
Core	Constrained High	100	4 & 5	Based on the Progressive High profile. Removes support for Bi-Predictive slices.
Core	High 10	110	-	Based on the high profile with increased 10-bit color detail.
Core	High 4:2:2	122	-	Based on High 10 with added support for 4:2:2 chroma sampling.
Core	High 4:4:4 Predictive	244	-	Based on High 4:2:2 with full 4:4:4 chroma sampling extending up 14 bits. Adds lossless region coding and three separate color planes.
Pro	High 10 Intra	110	3	Based on High 10 constrained to all intra-frame coding.
Pro	High 4:2:2 Intra	122	3	Based on High 4:2:2 constrained to all intra-frame coding.
Pro	High 4:4:4 Intra	244	3	Based on High 4:4:4 constrained to all intra-frame coding.
Pro	CAVLC 4:4:4 Intra	44	-	Based on High 4:4:4 Intra with variable length coding.
SVC	Scalable Baseline	83	-	Adds scalability to the Baseline profile. Useful for video conferencing, mobile and surveillance applications.
SVC	Scalable Constrained Baseline	83	5	Adds scalability to the Constrained Baseline profile. Suitable for Real-Time applications.
SVC	Scalable High	86	-	Adds scalability to the High profile. Suitable for broadcast and streaming applications.
SVC	Scalable Constrained High	86	5	Based on the Constrained High profile with added support for scalability. Used for real-time communications.
SVC	Scalable High Intra	86	3	Used for production applications that need high quality content with Intra support.
MVC	Stereo High	128	-	Based on the High profile with MVC extensions to encode two views.
MVC	Multi-view High	118	-	Based on the high profile. Used when more than two views are required. Lacks support for interlace.
MFC	MFC High	134	-	Enhanced resolution stereoscopic imaging based on the High profile. This packs two images into a single frame.
MFC	MFC Depth High	135	-	Adds depth maps for enhanced 3D rendering.
3D	Multi-view Depth High	138	-	Adds depth map and video texture mapping for better 3D rendition.
3D	Enhanced Multi-view Depth High	139	-	Multiple views with depth mapping support.

The standard defines profile_idc as an unsigned 8-bit integer value (0-255). Any profile_idc values not currently defined in the standard are reserved entirely for future use. They will be defined jointly by ITU-T and ISO/IEC.

The annexes at the end of ISO 14496 Part 10 are the authoritative source. Table 5 in IETF RFC 6184 is also helpful.

Levels

The levels describe picture resolutions and frame-rates for the client-player to use when presenting the decoded output. Within any given bitrate, there is a trade-off between frame-rate and picture size. If you have a higher frame-rate, the pictures must be smaller. Decoding speed is also affected and so is the number of frames that can be buffered. The level limits defined in Table A.1 describe how the client must be able to support this.

The level_idc is an unsigned 8-bit integer value (0-255). However, the standard describes the intermediate levels in Table A.1 as non-integer values. The intermediate levels describe alternative picture sizes and frame-rates within the available bandwidth and buffering capacity of each level.

Here is a summary list showing just the main levels and resolutions. The standard mentions that some implementations may only use these integer numbered levels and omit support for the intermediate ones:

Level grouping	Description
1	Small pictures for older mobile devices.
2	Quarter SD frame size or low frame-rate SD.
3	SD and some 1280 HD formats.
4	2K.
5	4K.
6	8K.

There are some arcane rules for how the level_idc is combined with the profile_idc and the profile_iop constraint flags to determine the actual levels. These are described in Sub-section A.3 in Annex A.

The level limits are applied differently for the Low vs. High profiles. Level limits are described in Table A.1. To determine the indicated level from the level_idc value, you need to treat each group of profiles differently.

• Baseline, main and extended (low) profiles. The Baseline, Constrained Baseline, Main, and Extended profiles all share similar level limits based on constraint flagging in profile_iop and the profile_idc value (see Section A.3.1). Level 1b is non-numeric and uses constraint flag 3 to distinguish it from level 1.1. Both of them have the same level_idc value equal to 11.

• High profiles. The child profiles derived from the High profile similarly share some common behaviors which are described separately (see Section A.3.2). Level 1b is treated as a special case and has a level_idc value equal to 9.

After dealing with the special case for level 1b, the standard uses a fixed-point decimal representation where the integer value in level_idc is divided by 10 to yield the intermediate level number. For example level 6.1 is represented by the level_idc having an integer value 61.

Level	level_idc value	Type
1	10	Main
1b	11 - with constraint bit 3 set to 1 for child profiles based on the baseline, main and extended profiles.	Intermediate
1b	9 for all child profiles based on the High profile.	Intermediate
1.1	11	Intermediate
1.2	12	Intermediate
1.3	13	Intermediate
2	20	Main
2.1	21	Intermediate
2.2	22	Intermediate
3	30	Main
3.1	31	Intermediate
3.2	32	Intermediate
4	40	Main
4.1	41	Intermediate
4.2	42	Intermediate
5	50	Main
5.1	51	Intermediate
5.2	52	Intermediate
6	60	Main
6.1	61	Intermediate
6.2	62	Intermediate

A decoder must support the maximum level limit values defined for a level and all lower levels beneath it.

Conclusion

Standards compliance does not guarantee interoperability. Make sure the profile and level you are encoding with is consistent with your target client-player.

For example, if company A makes a video codec that processes the picture size at high definition and company B makes a video player that expects to play content that is strictly standard definition these are incompatible even though they may both claim to be (and are) 100% standards compliant.

Bear in mind also that H.264 is not a lossless codec. It does have some features that make regions within a frame lossless but it cannot make the entire frame or sequence of frames entirely lossless.

These Appendix articles contain additional information you may find useful:

Part of a series supported by

You might also like...

Monitoring & Compliance In Broadcast: Monitoring QoS & QoE To Power Monetization

Measuring Quality of Experience (QoE) as perceived by viewers has become critical for monetization both from targeted advertising and direct content consumption.

IP Monitoring & Diagnostics With Command Line Tools: Part 5 - Using Shell Scripts

Shell scripts enable you to edit your diagnostic and monitoring commands into a script file so they can be repeated without needing to type them manually every time. Shell scripts also offer some unique and powerful features that help to…

Building Software Defined Infrastructure: Observability In Microservice Architecture

Building dynamic microservices based infrastructure introduces the potential for variable latency which brings new monitoring challenges that require an understanding of observability.

Broadcast Standards: Kubernetes & The Architecture Of Cloud Compute Based Systems

Here we describe Kubernetes and the taxonomy of containerized architecture based cloud compute system designs it manages.

Live Sports Production: Backhaul In Live Sports Production

Getting content reliably and securely from venue to studio remains key to live sports production so here we discuss the technology and services required.