**Contents**

In the previous chapter, we mentioned that the rendering process could be looked at a two steps process:

- projecting 3D shapes on the surface of a canvas and determining which part of these surfaces are visible from a given point of view,
- simulating the way light propagates through space, which combined with a description of the way light interacts with the materials objects are made of, will give these objects their final appearance (their color, their brightness, their texture, etc.).

In this chapter we will only review the first step in more detail, and more precisely explain how each one of these problems (projecting the objects' shape on the surface of the canvais and the visibility problem) are typically solved. While many solutions may be used, we will only look at the most common ones. This is just an overall presentation. Each method will be studied in a separate lesson and an implementation of these algorithms provided (in a self contained C++ program).

## Going from 3D to 2D: the Projection Matrix

An image is just a representation of a 3D scene on a flat surface: the surface of a canvas or the screen. As explained in the previous chapter, to create an image that looks like reality to our brain, we need to simulate the way an image of the world is formed in our eyes. The principle is quite simple. We just need to extend lines from the objects corners towards the eye and find the intersection of these lines with a flat surface perpendicular to the line of sight. By connecting these points to each other to reform the objects edges, we get a **wireframe** representation of the scene.

One of the main important visual properties of this sort of projection is that an object gets smaller as it moves further away from the eye (the rear edges of a box are smaller than the front edges). This effect is called **foreshortening**.

There's two two important things to note about this type of projection. First the eye is in the centre of the canvas. In other words, the line of sight always passes through the middle of the image (figure 2). Note also that the size of the canvas itself is something we can change. We can more easily understand what the impact of changing the size of the canvas has if we draw the viewing frustum (figure 3). The **frustum** is the pyramid defined by tracing lines from each corner of the canvas towards the eye, and extending these lines further down into the scene (as a far as the eye can see). It is also referred to as the viewing frustum or viewing volume. You can can easily see that the only objects visible to the camera are those which are contained within the volume of that pyramid. By changing the size of the canvas we can either extend that volume or make it smaller. The larger the volume the more of the scene we see. If you are familiar with the concept of focal length in photography, then you will have recognised that this has the same effect as changing the focal length of photographic lenses. An other way of saying this, is that by changing the size of the canvas, we change the field of view.

Something interesting happens when when the canvas becomes infinitesimally small: the lines forming the frustum, end up parallel to each other (they are orthogonal to the canvas). This is of course impossible in reality, but not impossible in the virtual world of a computer. In this particular case, you get what we call an **orthographic projection**. It's important to note that orthographic projection is a form of perspective projection, only one in which the size of the canvas is virtually zero. This has for effect to cancel out the **foreshortening effect**: the size of the edges of objects are preserved when projected to the screen.

Geometrically, computing the intersection point of these lines with the screen is incredibly simple. If you look at the adjacent figure (where P is the point projected onto the canvas, and P' this project point), you can see that the angle \(\angle ABC\) and \(\angle AB'C'\) is the same. A is defined as the eye, AB the distance of the point P along the z-axis (P's z-coordinate), and BC the distance of the point P along the y-axis (P's y coordinate). B'C' is the y coordinate of P', and AB' the z-coordinate of P' (and also the distance of the eye to the canvas). When two triangles have the same angle, we say that they are **similar**. Similar triangles have an interesting property: the ratio of the lengths of their corresponding sides is constant. Based on this property, we can write that:
$${ BC \over AB } = { B'C' \over AB' }$$

If we assume that the canvas is located 1 unit away from the eye (in other words that AB' equals 1 (this is purely a convention to simplify this demonstration), and if we substitute AB, BC, AB' and B'C' with their respective points' coordinates, we get: $${ BC \over AB } = { B'C' \over 1 } \rightarrow P'.y = { P.y \over P.z }.$$

In other words, to find the y coordinate of the projected point, you simply need to divide the point y-coordinate by its z-coordinate. The same principle can be used to compute the x coordinate of P':

$$ P'.x = { P.x \over P.z }.$$This is a very simple and yet this is an extremely important relationship in computer graphics, known as the **perspective divide** or z-divide (if you were on desert island and needed to remember something about computer graphics, that would probably be this equation).

In computer graphics, we generally perform this operation using what we call a **perspective projection matrix**. As its name indicates, it's a matrix which when applied to points, project them to the screen. In the next lesson, we will explain step by step how and why this matrix works, learn how to build it and use it.

But wait! The problem is that whether you need the perspective projection totally depends on the technique you use to sort out the visibility problem. Anticipating what we will learn in the second part of this chapter, algorithms for solving the visibility problems comes into two main categories:

- Rasterisation,
- Ray-tracing.

Algorithms of the first category, relies on projecting P onto the screen to compute P'. For these algorithms, the perspective projection matrix is therefore needed. In ray tracing, rather than projecting the geometry onto the screen, we trace a ray passing through P' and look for P. Obviously we don't need to project P anymore with this approach since we already know P', which means that in ray tracing, the perspective projection is actually technically not needed (and therefore never used).

The advantage of the rasterisation approach over ray tracing is mainly speed. Computing the intersection of rays with geometry is a computationally expensive operation. This intersection time also grows linearly with the amount of geometry contained in the scene, as we will see in one of the next lessons. On the other hand, the projection process is incredibly simple, relies on basic math operations (multiplications, divisions, etc.), and can be aggressively optimised (especially if special hardware is designed for this purpose which is the case with GPUs). Graphics cards are almost all using an algorithm based on the rasterisation approach (which is one of the reasons they can render 3D scene so quickly, at interactive frame rates). When realtime rendering APIs such as OpenGL or DirectX are used, the projection matrix needs to be dealt with. Even if you are only interested in ray tracing, you should know about it for at least historical reason: it is one of the most important techniques in rendering and the most commonly used technique for producing real-time 3D computer graphics. Plus, it is likely at some point that you will have to deal with the GPU anyway, and real time rendering APIs do not compute this matrix for you. You will have to do it yourself.

The next three lessons are devoted to studying the construction of the orthographic and perspective matrix, and how to use them in OpenGL to display images and 3D geometry.