Introduction
Introduction
The term DCT stands for Discrete Cosine Transform. In the tradition of scratchapixel, we we will start with examples to give you a sense of what it is and how it works, and we will finish the lesson we a more formal explanation (with equations and the proper technical terms). Going from an example to the theory will make it easier for you to make sense of the technical terms and equations which can be abstract at times. It is recommended that you read the lesson on Image Transform before you read this one.
To summarise what we have introduced in the lesson on Image Transform, DCT belongs to a family of mathematical tools that can be used to convert a signal into a different representation which we are going to explain soon. A signal can be many things. It can be a sound or an image. In the case of a sound the signal will be a one-dimensional curve and in the case of an image we will be dealing with a two-dimensional set of data (discrete ones hence the term discrete being used everywhere as opposed to continuous). The process for transforming these signals is the same, independently of their origin and what they represent (a sound, an image, a curve representing the variation of temperature or the profile of an ocean surface). The process we described (transforming the input signal into a different representation) can be see as a form of translation. The signal is expressed in one "language" and we translate it into another "language". Of course the reverse process is possible which makes it possible to go back and forth before the two languages. If the translation process is said to be lossless, after many translation cycles, the signal will be the same as the original one. If we lose some elements in the translation process but keep the overall "meaning" of the original signal, the process is said to be lossy. These terms will are fully explained in the lesson on Image Compression.
DCTs can be seen as tool that makes it possible to translate an input signal (a series of discrete samples) into cosine functions. What does it mean ? It means that we will be using the cosine trigonometric function to represent or encode if you prefer the input signal. This technique comes from the discovery in the early 19th century that a function (our signal) can be represented as a series of sinusoids. This technique belongs to a class of transforms known as the Fourier series (named after the French mathematician Joseph Fourier who developed the idea). If you take a curve, you can represent this curves as a series of sinusoid functions (cosine and sine). The DCT is a subclass of the Fourier series in the sense that instead of using both the cosine and sine functions to transform a signal, it makes only use of the cosine function. The subclass using both sine and cosine to encode a signal is know as the Fourier Transform (DFT) and if you use the sine function only it would be a Sine Transform (DST). DFTs are very useful in CG and image processing and we will talk about them in detail int the next lesson. Don't worry if that idea of representing a signal as a series of cosine function seems an abstract concept to you now. We have the entire lesson to explain it.
The reason why DCTs are explained before DFTs is because they are the simplest type of transform from a mathematical standpoint (they only make use of the cosine function). Despite the relative simplicity of the technique, it is widely used especially in the field of image and audio processing particularly for compression (you will understand why soon). DCTs are like a magician trick. Playing with them is quite mesmerising and should convince many of you that mathematics can be fun (if maths were taught in a playful way, as we shall do in this lesson, maybe students would be much more enthusiastic about studying them). We will first introduce DCTs in an intuitive way. We will be coding a 1D and 2D DCT and will finish the lesson but studying the properties of DCTs, present a more formal mathematical definition of the technique and look at how it can be used in image processing and CG.
Cosine
The first thing we will do before getting any deeper in the lesson is to review what we know about the cosine function. Lets look at the function cos(t). We know the function is a periodic function. The wave repeats over a length of 2π (PI) along the x-axis. One wave is called a period and this period's length (2π) is called the primitive period of the cosine function. If we now graph the function cos(2t) we can see that the period of this function is shorter than before (the length of the wave is π instead of 2π). We can also see that the cosine wave appears twice as frequently on the graph. The frequency of the function is defined by the number of waves that appears on the graph over a certain range of values along the x axis. The more waves the higher the frequency of the function. If there's only a few waves or even only a fraction of a wave, we say that the function has a low frequency. As the frequency of the function increases, its period shortens. When the frequency decreases the length of the wave (its period) increases. The frequency of the function can be changed by multiplying t with some constant (k, y=cos(kt)). When this constant is one, the period of the function is 2π (primitive period). When k is lower than 1 the period is greater than 2π and when k is greater than one, the period is lower than 2π.

Figure 1: cos(kt) with k=1 (green curve), k=2 (blue curve) and k=1/2 (red curve). The vertical lines indicates the length of the function's period. Green curve, period=2π, blue curve, period=π, red curve, period=4π.
For the sake of the demonstration we will shift the cosine curve so that the it starts at the origin of the graph. To do so, we can offset the function's input by π/2 (a quarter of the function's primitive period). To shift the function we just need to add this value to the result of the cosine function's input parameter (xt+offset). We will be using the following equation:
$$y=cos(kt+\frac{\pi}{2})$$

Figure 2: we shift the original cosine curve (green) by π/2. The wave of the resulting curve (in red) starts at the origin of the graph.
Imagine we have two points on the graph A and B (where A is the located at the origin of the graph). We want one single cosine wave to pass through these two points (that is, cos(A+PI/2)=cos(B+PI/2)=0). The length between A and B defines the period of the function (period=B-A). We know that the primitive period of the cosine function is 2π, therefore to remap this basic period to our new period defined by the distance between A and B, we will use the following equation:
$$y=cos(\frac{2\pi t}{||AB||}+\frac{\pi}{2})=cos(\pi (\frac{2t}{||AB||}+\frac{1}{2}))$$
Where ||AB|| means the distance from A to B. To convince yourself that this is working, replace AB with 2π (the primitive period) and the equation becomes cos(t+PI/2). As an example, here is what the graph of this function looks like for A=0 and B=4:

Figure 3: we remapped the primitive period of the cosine function so that a wave fits between points A and B.
That will conclude our revision on the cosine function. What you need to remember is that the cosine function has a period (which is defined by the length of a wave). That the primitive period of the function is 2π but that you can use the above equation to create a cosine function with a period of arbitrary length. We added an offset to this equation so that the wave always starts at the origin of the graph (cos(0)=0) independently from its period. In the next chapter, we will be using this equation to build a one dimensional DCT (a DCT that we can use for a 1D signal such as an audio signal or a curve representing the evolution of a certain value over time). Finally it's important to remember that phase, frequency and amplitude are the three parameters you can use to change/control the shape of a cosine (sinusoids in general) function.
Chapter 1 of 4 Next Chapter »