How JPEG Works in 4 “Simple” Steps, JPEG
It is a known fact that JPEG is one of the most widely used image format today. Many of us possibly used it multiple times without really knowing how it works, when to use it and how it manages to reduce image size significantly. During this blog post we observe that the core principles are not too complex, at least on higher level.
To begin with, JPEG Compression consists of 4 steps which are the following,
1. Colorspace Transform
2. Discrete Cosine Transform
1) Colorspace Transform
According to the JFIF standard the colorspace should be transformed into YcBcR from RGB. Although colorspaces are way complicated to explain in this post, RGB stands for Red, Green and Blue which is the colorspace that many modern monitors use. On the other hand, YcBCr stands for Y as brightness and Cb/Cr for chrominance. Old analog color televisions used this colorspace. However, this fact raises a simple question: Why do we convert “new” into something “old”? Well, this is because the way we see colors. As humans we were unable to detect the colorspace difference between RGB and YCbCr which is caused by our low sensitivity against chrominance.
In addition to that, using YCbCr enables us to compress more efficiently without sacrificing too much from the perceived image quality and it is done by “Downsampling”. It is basically a sub-step which determines the color level reduction. Generally, the ratio of 4:2:0 is used. Which means that both in X and Y axis resolution is reduced by half. This ratio could be changed according to compression and quality needs.
2) Discrete Cosine Transform
This is the main part which makes JPEG compression effective. Below you can see the 8x8 frequency representation of the 2-dimensional DCT which is commonly used in JPEG compression.
The top left corner represents the lowest frequency and bottom right represents the highest frequency. And that frequency is the frequency of the “cosine” wave. Also, what this representation enables us to do is that we can represent each image in that frequency form. For instance, think of an image of the 8x8 pixel chess board. We could say that, the high frequency part is represents our image more than the lower frequency parts. So, we can say that it has higher coefficient. Main purpose of discrete cosine transform is to identify the most prominent frequencies used in the image. As for the images larger than 8x8 pixels, JPEG compression uses a concept called block splitting. Which is basically splitting image into 8x8 pixel parts and checking the discrete cosine transform for each block.
This is the step where we use the coefficients which is found in the discrete cosine transform stage. In this part, we use “Quantization table” which has values that directly affects the amount of compression and corresponds to frequency removal threshold. This is the table that changes when we change JPEG quality in most of the image editing software. The table used in a following manner,
a) For each value in 8x8 block that is obtained from discrete cosine transform is divided to the corresponding value in the quantization table.
b) The result is rounded to the nearest integer.
By the end of the operation, the final table generally should have multiple “0” values on higher frequency ranges. That “0” comes from the threshold that we defined previously. For instance, if the values are higher than that threshold, they end up being a zero.
Obviously we place those zeros for a reason and this is the part where we use them. At that point we have a table of integers. To use then in encoding, we need to convert them into an integer stream. In order to do that we use a principle called “Zigzag Scan”. Which can be seen below.
In JPEG compression the “Huffman Encoding” is the general encoding method that is frequently used. It is again complicated topic which should have its own dedicated blog post, but at its core it utilizes tree structures. As a result, integer stream with multiple 0’s at the end is very beneficial to use in this algorithm.
To Conclude, JPEG is one of the formats that most of us use almost every single day and as we can see the combination of principles behind that is not extremely complicated but rather interesting. However, it is also necessary to state that every principle has very sophisticated and detailed versions which modern JPEG algorithms use today. This post is just a very simplistic and introductory approach to the JPEG compression.
As we move further to the use cases of the JPEG, we could say that it can be used in images where transparency and text sharpness is not the main concern. Since, JPEG is a lossy compression algorithm it does affect the image quality. This property of JPEG is especially more noticeable in images that has text in them. It creates artifacts around the sharp edges of the text and make it visually unpleasant. JPEG format also does not support transparency, which is also an important fact to realize. At that point using lossless formats like PNG or WebP could be a better solution. However, if neither transparency nor text cases are present in your case, it may be a wise choice to use JPEG since it reduces image size significantly.
Reduced image size is especially beneficial for the websites that contain high number of images, where poorly optimized images reduce the website performance and vastly increase the load time. This obviously has multiple side effects such as decreased user retention, decreased revenue, wasted resources and so on. This is the exact problem that we have focused on solving at image4io.
It is highly recommended to check out explanation videos of JPEG by Mike Pound on Computerphile.
JPEG Part 0 Colourspaces: https://www.youtube.com/watch?v=LFXN9PiOGtY
JPEG Part 1: https://www.youtube.com/watch?v=n_uNPbdenRs
JPEG Part 2 DCT: https://www.youtube.com/watch?v=Q2aEzeMDHMA
JPEG wiki: https://en.wikipedia.org/wiki/JPEG
Chroma Subsampling: https://en.wikipedia.org/wiki/Chroma_subsampling
Discrete Cosine Transform: https://en.wikipedia.org/wiki/Discrete_cosine_transform