*Keywords: 3D, foreshortening, stereoscopic vision, origin, coordinates, coordinate system, 3D scene, topology, model, mesh, polygon, vertices, edges, perspective projection, viewing frustum, perspective divide, similar triangles, screen space, normalize.*Want to report bugs, errors or send feedback? Write at feedback@scratchapixel.com. We do this for free, please be kind. Thank you.

## Understand how it works!

If you are here, it's probably because you want to learn computer graphics. Each reader may have a different reason for being here, but we are all driven by the same desire: understand how it works! Scratchapixel was created to answer this particular question. Here you will learn how it works and learn techniques used to created CGI, from the simplest and most important methods, to the more complicated and less common ones. Maybe you like video games, and you would like to know how it works, how they are made. Maybe you have seen a film a Pixar film and wonder what's the magic behind it. Whether you are at school, university, already working in the industry (or retired), it is never a bad time to be interested in these topics, to learn or improve your knowledge and we always need a resource like Scratchapixel to find answers to these questions. That's why we are here.

Scratchapixel is accessible to all. They are lessons for all levels. Of course it requires a minimum of knowledge in programming. While we plan to write a quick introductory lesson on programming in the near future, Scratchapixel's mission isn't about teaching you programming. However, while you will be learning about implementing different techniques used for producing CGI, you will also likely improve your programming skills in the process and learn a few programming tricks. Whether you consider yourself a beginner or an expert in programming, you will find here all sort of lessons adapted to your level. Start simple, with basic programs and progress from there. While we are the topic of complexity, if you look at the list of lessons for each category, you will see a math label followed by a series of pluses (the sign "+"). The more pluses, the more difficult is the lessons in terms of math (three pluses being the maximum).

## A gentle introduction to computer graphics programming

You want to learn computer graphics. First, do you know what it is? In the second lesson of this section, you can find a definition of computer graphics, and also learn about how it generally works. Maybe you have heard about terms such as modeling, geometry, animation, 3D, 2D, digital images, 3D viewport, real-time rendering, and compositing but you are unsure about what they mean and more importantly, how they relate to each other. The second lesson of this section will answer these questions. From there, you should know little about CG programming, but have a general understanding of CG and the different tools and processes involved in the making of CGI.

What's next? Our world is fundamentally **three-dimensional**. At least as far as we can experience it with our senses. To which some people like to add the dimension of **time**. Time plays an important role in CGI, but we will come back to this later on. Objects from the real world then are three-dimensional. That's a fact we think we can all agree on without having to get into proving it. However, what's interesting, is that **vision**, one of the senses by which this three-dimensional world can be experienced, is principally a two-dimensional process. We could maybe say that the image created in our mind is dimensionless (we don't understand yet very well how images 'appear' in our brain), but when we speak of an image, it generally means to us a flat surface, on which the dimensionality of objects has been reduced from three to two dimensions (the surface of the canvas or the surface of the screen). The only reason why this image on the canvas looks accurate to our brain is because objects get smaller as they get further away from where you stand, an effect called **foreshortening**. If you are not convinced yet, think of an image as nothing more than a mirror reflection. The surface of the mirror is perfectly flat, and yet, we can't make the difference between looking at the image of a scene reflected from a mirror and looking directly at the scene: you don't perceive the reflection, just the object. It's only because we have two eyes that we can get a sense of seeing things in 3D, something we call **stereoscopic vision**. Each eye looks at the same scene from a slightly different angle, and the brain can use these two images of the same scene to approximate the distance and the position of objects in 3D space with respect to each other. However stereoscopic vision is quite limited in a way as we can't measure the distance to objects or their size very accurately (which computers can do). Human vision is quite sophisticated and an impressive result of evolution, but it's nonetheless a trick, and it can be fooled easily (many magicians' tricks are based on this). To some extent, computer graphics is a mean by which we can create images of artificial worlds and present them to the brain (through the mean of vision), as an experience of reality (something we call photo-realism), exactly like a mirror reflection. This theme is quite common in science fiction, but technology is not far from making this possible.

What have we learned so far? That the world is three-dimensional, that the way we look at it is two-dimensional, and that if you can replicate the shape and the appearance of objects, the brain can not make the difference between looking at these objects directly, and looking at an image of these objects. Computer graphics is not limited to creating photoreal images but while it's easier to create non-photo-realistic images than to create perfectly photo-realistic ones, the goal of computer graphics is realism (as much in the way things move than they appear).

All we need to do now, is learn what the rules for making such a photo-real image are, and that's what you will also learn here on Scratchapixel.

## Describing objects populating the virtual world

The difference between the painter who is painting a real scene (unless the subject of the painting comes from his/her imagination), and us, trying to create an image with a computer, is that we have to first somehow describe the shape (and the appearance) of objects making up the scene we want to render an image of to the computer.

One of the simplest and most important concepts we learn at school is the idea of space in which points can be defined. The position of a point is generally defined in relation to an **origin**. On a ruler, this is generally the tick marked with the number zero. If we use two rulers, one perpendicular to the other, we can define the position of points in two dimensions. Add a third ruler, perpendicular to the first two, and you can define the position of points in three dimensions. The actual numbers representing the position of the point with respect to one of the tree rulers are called the points **coordinates**. We are all familiar with the concept of coordinates to mark where we are with respect to some reference point or line (for example the Greenwich meridian). We can now define points in three dimensions. Let's imagine that you just bought a computer. This computer probably came in a box, and this box has eight corners (sorry for stating the obvious). One way of describing this box is to measure the distance of these 8 corners with respect to one of the corners. This corner acts as the origin of our **coordinate system** and the distance of this reference corner with respect to itself will be 0 in all dimensions. However, the distance from the reference corner to the other seven corners will be different than 0. Let's imagine that our box has the following dimensions:

The first number represents the width, the second number the height, and the third number the corner's depth. Corner 1 as you can see, is the origin from which all the over corners have been measured. All you need to do from here is somehow write a program in which you will define the concept of a three-dimensional point, and use it to store the coordinates of the eight points you just measured. In C/C++, such a program could look like this:

Like in any language, there are always different ways of doing the same thing. This program shows one possible way in C/C++ for defining the concept of point (line 1) and storing the box corners in memory (in this example as an array of eight points)

You have somehow created your first 3D program. It doesn't produce an image yet, but you can already store the description of a 3D object in memory. In CG, the collection of these objects is called a **scene** (a scene also includes the concept of camera and lights but we will talk about this another time). As suggested before, we are lacking two very important things to make the process complete and interesting. First to represent the box in the memory of the computer, ideally, we also need a system that defines how these eight points are connected to make up the faces of the box. In CG, this is called the **topology** of the object (an object is also called a **model**). We will talk about this in the Geometry section and the 3D Basic Render section (in the lesson on rendering triangles and polygonal meshes). Topology refers to how points which we generally call **vertices** are connected to form faces (or flat surfaces). These faces are also called **polygons**. The box would be made of six faces or six polygons and the set of polygons forms what we call a **polygonal mesh** or simply a **mesh**. The second thing we are missing is a system to create an image of that box. This requires to project the corners of the box onto an imaginary canvas, a process we call **perspective projection**.

## Creating an image of this virtual world

The process of projecting a 3D point of the surface of the canvas, actually involves a special matrix called the perspective matrix (don't worry if you don't know what a matrix is). Using this matrix to project point is not necessary but makes things much easier. However, you don't need mathematics and matrices to figure out how it works. You can see an image, or a canvas as some sort of flat surface placed some distance away from the eye. Trace four lines all starting from the eye to each one of the four corners of the canvas and extend these lines further away into the world (as far as you can see). You get a pyramid which we call a **viewing frustum** (and not frustrum). The viewing frustum defines some sort of volume in 3D space and the canvas itself is just a plane cutting of this volume perpendicular to the eye's line of sight. Place your box in front of the canvas. Next, trace a line from each corner of the box to the eye and mark a dot where the line intersects the canvas. Find out on the canvas, the dots corresponding to each of the twelve edges of the box, and trace a line between these dots. What do you see? An image of the box.

The three rulers used to measure the coordinates of the box corner form what we call a coordinate system. It's a system in which points can be measured to. All points' coordinates relate to this coordinate system. Note that a coordinate can either be positive or negative (or zero) depending on whether it's located on the right or the left of the ruler's origin (the value 0). In CG, this coordinate system is often called the **world** coordinate system, and the point (0,0,0), the **origin**.

Let's move the apex of the viewing frustum at the origin and orient the line of sight (the view direction) along the negative z-axis (figure 3). Many graphics applications use this configuration as their default "viewing system". Keep in mind that the top of the pyramid is the point from which we will be looking at the scene. Let's also move the canvas one unit away from the origin. Finally, let's move the box some distance away from the origin so that it is fully contained within the volume of the frustum. Because the box is in a new position (we moved it), the coordinates of its eight corners changed and we need to measure them again. Note that because the box is on the left side of the ruler's origin from which we measure the object's depth, all depth coordinates also called **z-coordinates** will be negative. Four of the corners are also below the reference point used to measure the object's height and will have a negative height or y-coordinate. Finally, four of the corners will be to the left of the ruler's origin measuring the object's width: their width or x-coordinates will also be negative. The new coordinates of the box's corners are:

Let's look at our setup from the side and trace a line from one of the corners to the origin (the viewpoint). We can define two triangles: ABC and AB'C'. As you can see these two triangles have the same origin (A). They are also somehow copies of each other, in the sense that the angle defined by the edges AB and AC is the same as the angle defined by the edge AB', AC'. Such triangles are said to be **similar**. Similar triangles have an interesting property: the ratio between their adjacent and opposite sides is the same. In other words:

Because the canvas is 1 unit away from the origin, we know that AB' equals 1. We also know the position of B and C which are the z (depth) and y coordinates (height) respectively of the corner. If we substitute these numbers in the above equation, we get:

$${ P.y \over P.z } = { P'.y \over 1}.$$Where y' is the y coordinate of the point where the line going from the corner to the viewpoint intersects the canvas, which is, as we said earlier, the dot from which we can draw an image of the box on the canvas. Thus:

$$P'.y = { P.y \over P.z }.$$As you can see, the projection of the corner's y-coordinate on the canvas, is nothing more than the corner's y-coordinate divided by its depth (the z-coordinate). This is probably one of the simplest and most fundamental relations in computer graphics, known as the z or **perspective divide**. The same principle applies to the x coordinate. The projected point x coordinate (x') is the corner's x coordinate divided by its z coordinate.

Note though that because the z-coordinate of P is negative in our example (we will explain why this is always the case in the lesson from the Foundations of 3D Rendering section dedicated to the perspective projection matrix), when the x-coordinate is positive, the projected point's x-coordinate will become negative (similarly, if P.x is negative, P'.x will become positive. The same problem happens with the y-coordinate). As a result, the image of the 3D object is mirrored both vertically and horizontally, which is not an effect we want. Thus, to avoid this problem, we will divide the P.x and P.y coordinates with -P.z instead; this will preserve the sign of the x and y coordinates. We finally get:

$$ \begin{array}{l} P'.x = { P.x \over -P.z }\\ P'.y = { P.y \over -P.z }. \end{array} $$We now have a method to compute the actual positions of the corners as they appear on the surface of the canvas. These are the two-dimensional coordinates of the points projected on the canvas. Let's update our basic program to compute these coordinates:

The size of the canvas itself is also arbitrary. It can also be a square or a rectangle. In our example, we made it two units wide in both dimensions, which means that the x and y coordinates of any points lying on the canvas are contained in the range -1 to 1 (figure 9).

Question: what happens if any of the projected point coordinates is not in this range if for instance x' equals -1.1? The point is simply not visible, it lies outside the boundary of the canvas.

At this point we say that the projected point coordinates are in **screen space** (the space of the screen, where screen and canvas in this context our synonymous). But they are not easy to manipulate, because they can either be negative or positive, and we don't know what they refer to with respect to for example the dimension of your computer screen (if we want to display these dots on the screen). For this reason, we will first normalize them, which simply means that we convert them from whatever range they are initially in, to the range [0,1]. In our case, because we need to map the coordinates from -1,1 to 0,1 we can simply write:

Coordinates of the projected points are not in the range 0,1. Such coordinates are said to be defined in NDC space, which stands for Normalized Device Coordinates. This is convenient because regardless of the original size of the canvas (or screen), which can be different depending on the settings you used, we now have all points coordinates defined in a common space. The term **normalize** is very common. It means that you somehow remap values from whatever range they were originally in, to the range [0,1]. Finally, we generally prefer to define point coordinates with regard to the dimensions of the final image, which as you may know or not, is defined in terms of pixels. A digital image is nothing else than a two-dimensional array of pixels (as is your computer screen).

A 512x512 image is a digital image having 512 rows of 512 pixels, if you prefer to see it the other way around, 512 columns of 512 vertically aligned pixels. Since our coordinates are already normalized, all we need to do to express them in terms of pixels is to multiply these NDC coordinates by the image dimension (512). Here, our canvas being square, we will also use a square image:

The resulting coordinates are said to be in raster space (XX what does raster mean, please explain). Our program is still limited because it doesn't create an image of the box, but if you compile it and run it with the following commands (copy/paste the code in a file and save it as box.cpp):

You can use a paint program to create an image (set its size to 512x512) and add dots at the pixel coordinates that you computed with the program. Then connect the dots to form the edges of the box, and you will get an actual image of the box (as shown in the video below). Pixel coordinates are integers, so you will need to round off the numbers given by the program.

## What have we learned?

- We first need to describe three-dimensional objects using things such as vertices and topology (information about how these vertices are connected to each other to form polygons or faces) before we can produce an image of the 3D scene (a scene is a collection of objects).
- That rendering is the process by which an image of a 3D scene is created. No matter which technique you use to create 3D models (there are quite a few), rendering is a necessary step to 'see' any 3D virtual world.
- From this simple exercise it should be quite apparent that mathematics (more than programming) is essential in the process of making an image with a computer. The computer is merely a tool used to speed up the computation, but the rules used to create this image are pure mathematics. Geometry plays a particularly important role in this process, particularly to handle objects transformations (scale, rotation, translation) but also provide solutions to problems such as computing angles between lines, or finding out the intersection between a line and other simple shapes (a plane, a sphere, etc.).
- In conclusion, computer graphics is mostly mathematics applied to a computer program whose purpose is to generate an image (photo-real or not) at the quickest possible speed (and the accuracy that computers are capable of).
- Modeling includes all techniques used to create 3D models. Modeling techniques will be discussed in the Geometry/Modeling section.
- While static models are fine, it is also possible to animate them over time. This means that an image of the model at each time step needs to be rendered (you can for instance translate, rotate or scale the box a little between each consecutive image whether by animating the corners' coordinates or applying a transformation matrix to the model). More advanced animation techniques can be used to simulate the deformation of the skin by bones and muscles. But all these techniques have in common that geometry (the faces making up the models) are deformed over time. Hence time, as suggested in the introduction is important in CGI as well. Check the Animation section to learn about this topic.
- One particular field overlaps both animation and modeling. It includes all techniques used to
**simulate**the motion of objects in a realistic manner. A very large field of computer graphics is devoted to simulating the motion of fluids (water, fire, smoke), fabric, hair, etc. The laws of physics are applied to 3D models to make them move, deform or break like they would in the real world. Physics simulations are generally very computationally expensive, but they can also run in real-time (it all depends on the complexity of the scene you simulate). - Rendering is also a computationally expensive task. How expensive depends on how much geometry your scene is made up of and how photo-real you want the final image to be. In rendering, we differentiate two modes, an
**off-line**and a**realtime**rendering mode. Real-time is used (it's a requirement) for video games, in which the content of the 3D scenes needs to be rendered at least 30 frames per second (generally 60 frames a second is considered a standard). The GPU is a processor specially designed to render 3D scenes at the quickest possible speed. Offline rendering is commonly used in the production of CGI for films where real-time is not a requirement (images are precomputed and stored before being displayed at 24 or 30 or 60 fps). It may take from a few seconds to many hours before one single image is complete, but it generally handles far more geometry and produce higher quality images than real-time rendering. However, real-time or offline rendering does tend to overlap more and more these days, with video games pushing the amount of geometry they can handle as well as quality, and offline rendering engines trying to take advantage of the latest advancements in the field of CPU technology to greatly improve their performances.

## Where should I start?

We hope the simple box example got you hooked but the main goal of this introduction is to underline the role that geometry plays in computer graphics. Of course, it's not only about geometry, but a lot of the problems can be solved with geometry. Most computer graphics books start with a chapter on geometry, which is always a bit discouraging because it seems like you need to study a lot before you can get to making fun stuff. However, we really recommend you to read the lesson on Geometry first before anything else. We will talk and learn about points but also about the concept of a vector and normal. We will learn about coordinate systems and more importantly about the concept of matrix. Matrices are used extensively to handle transformations such as rotation, scaling, or translation. These concepts are used everywhere throughout all computer graphics literature which is why you need to study them first.

Many CG books do not provide a very good introduction to geometry may be because the authors assume that readers already know about it or that it's better to read books devoted to this particular topic. Our lesson on geometry is very different. It's very thorough and explains everything in very simple words (including things that only people who worked in production will tell you about). Do start your studies by reading this lesson first.

## What should I read next?

It's generally easier and more fun to start learning computer graphics programming with rendering.