Color space basics for HDMI video

All information about this project is available in a special website section.
We thank you for your attention!
No items found.
Click image to zoom

There are two general measures of video perception: resolution and color. Those notions are intentionally vague for the purpose of explaining their relation to HDMI transmission.

I've covered the resolution part in previous posts, as this metric is simpler to quantify. Every screen has a limited number of pixel dots; the more, the better. The same logic works for refresh rate; the faster we change the image on the screen, the smoother the motion picture. Screens are rectangular matrixes of pixels with a small number of aspect ratios (roughly, from long 21:9 to square-ish 3:2) and resolutions of 1, 2, 4, 5, or 8K. Higher HDMI standards support more Ks. Thanks to Steve Jobs' legacy, we have the upper limit for the density and, consequently, the number of usable pixels, known as 'Retina'.

The second part, color, is more a matter of taste and perception. Much like audio, there's always a place for debate on compressed vs. lossless and the number and range of color or audio frequency levels. So, let's review color-related terms one by one.

Color space

Color space is basically a color wheel from any color picker window in the software of one's choice. Just arranged in a more scientific, yet two-axis planar pattern. This half-parabolic rainbow-colored thing, called the 'cromaticity diagram', was introduced in 1931 and is still referenced today as a space for all the colors, not counting their luminocity.

Then there's 'color models', a system with a multi-letter name for representing a set of colors (color gamut). The ultimate goal for color models is to cover the color space, but in reality, all the RGBs and CMYKs try to cover the most of the color gamut (or technically possible with the technology they are made for).

RGB

One of the most known color models is RGB. I intentionally don't make any distinction between all the variations of RGB (sRGB, AdobeRGB, and xRGB), as this makes no difference for today's topic. RGB, as the name suggests, is copying the human eye's cone cells that make any color a mixture of red, green, and blue.

RGB is the go-to format for PCs, monitors, and most image processing activities. Specifically, sRGB is the standard for web images.

The channels, R, G, and B, are represented as three 8-bit numbers, 0 to 255. Some models can use more raw data to get more chrominance levels—16 bits per channel or 48 bits per pixel. So should we aim to use as many bits per pixel as we can? Well, no, it's not practical. A 24-bit Full HD RGB image takes 1920 × 1080 × 24 = 5.93 MiB of space (or 11.87 MiB for a 48-bit one). It seems manageable, but if we consider a minute of FullHD 30 fps 24-bit video, it will occupy 10.43 GiB of storage, which is obviously too much even for modern networks (that's 938 GiB for a 90-minute movie).

There are two ways to mitigate this: either to use compression algorithms like Youtube's H.264/H.265 or to use a better color space to use less data for better results. In terms of RGB, one could use a limited set known as R'G'B' with values of 16 to 235.

CMYK

Worth a quick mention. It has no use in HDMI video transmission as it is made for printed physical products.

YUV, YPbPr, and YCbCr

Those are used interchangeably, although they differ slightly. But all we care about here is the general idea. YUV works best for photo and video content. Human eyes perceive images as a combination of luminance and color. The luminance is the most important part; black-and-white images are enough for us to have a pretty good idea of what's shown in the image. In the dark, we see just black and white, as there's no light to reflect from colored objects, and it's fine.

So the idea is to devote one color channel to luminance alone and the other two to color: red (or red-difference) and blue (or blue-difference). YCbCr can be converted to and from RGB with a matrix. But the levels of the components Y, Cb, and Cr are gamma-corrected, meaning they are non-linear. When RGB has the same steps for color channels, luminance and, subsequently, red and blue-difference channels don't work like that. Levels of luminance are perceived in a more complex pattern; the white part of the spectrum should have fewer steps, and the dark part needs more.

This attention to luminance levels brings us to HDR.

HDR

HDR stands for high dynamic range, as opposed to SDR, which stands for standard dynamic range. Wider range means more bits for the channels (usually 10 or 12), higher maximum luminance levels, and more metadata for the display.

That means that the number of luminance steps (the same goes for both chromatic channels) increases; the maximum luminance level is an order of magnitude greater than in SDR. In addition to raw data, there's metadata for monitor settings. All the screens are built differently in terms of color calibration and maximum peak brightness. The source communicates luminosity and contrast values to the display for it to adjust. HDR was introduced in HDMI 2.0 as 'static HDR', meaning it sends one set of metadata for the whole video stream. HDMI 2.1 can use 'dynamic HDR', sending different metadata for every scene. HDR is a well-known, widely marketed feature that really elevates the user experience. Dynamic HDR is usually titled 'HDR10+' on the box (compared to static 'HDR10').

Here lies the answer to the question some people ask, "Is 10-bit color ('deep color') HDR?" No, it is not. 10-bit color in RGB means more steps for chromatic channels, not an extra-wide range of luminance. Should YCbCr HDR always take upwards of 10 bits per channel? Yes, 10-bit color could be HDR or SDR, but an HDR signal is 10-bit or more.

Chroma subsampling

The last part is subsampling. It is a third middle option between using compression or implementing a limited color model. Chroma subsampling is used for YCbCr to keep luminance data, as it is the most precious for us, and compress individually colored pixels into bigger blocks.

A standard sample is a two-row block of pixels, usually 4×2. Subsampling is represented as numbers J:a:r.

The first digit 'J' of 4:4:4 or 4:2:0 represents the number of pixels in a row. Basically, it's always 4.

'a' represents the number of different colors (chromatic samples) in the first row. Can be treated as a horizontal resolution; do we reduce the 4 pixel colors of the first row to '1', '2', or do we keep all '4'?

'r' is the number of color changes between the first and second rows. This is a vertical resolution and can be either '0' (no difference; the second row has the same chrominance as the first one) or 'a' (there is a difference in color between two rows).

So, 4:4:4 means no subsampling; 4:2:2 or 4:4:0 reduces 8 different colors to just 4; 4:1:1 or 4:2:0 reduces the number of colors to 2 (vertically or horizontally). Practically useful numbers are 4:4:4 and 4:2:0, as they are the most common.

In conclusion, different color spaces are made for different applications. A PC monitor with SDR works best with RGB for displaying pixel-perfect text on flat surfaces. YCbCr is meant for non-productivity applications like photos, movies, or videogames. YCbCr with no subsampling is great for everything; 4:2:0 is visually indistinguishable from perfect when watching movies and sports.

HDR should always be 10-bit; there's no 8-bit HDR and never was.

All this makes sense for units with HDMI version 2.0 and higher, and there's no point in talking about color spaces in previous versions.

======
Eugenio S

Products in this post

No items found.

Sign up for my newsletter

Kevin Gibbs

Hi! I'm Kevin! I am a very curious engineer :))
I'm the website founder and author of many posts.

I invite you to follow exciting experiments, research, and challenges.
Let's go on to new knowledge and adventures!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.