Scratchapixel 2.0
Sign in
Rendering an Image of a 3D Scene: an Overview

An image of a 3D scene can be generated in multiply ways, but of course any way you choose should produce the same image for any given scene. In most cases, the goal of rendering is to create a photo-realistic image (non-photorealistic rendering or NPR is also possible). But what does it mean, and how can this be achieved? Photorealistic means essentially that we need to create an image so "real" that it looks like a photograph or (if photography didn't exist) that it would actually look like reality to our eyes (like the reflection of the world off the surface of a mirror). How do we that? By understanding the laws of physics that make objects appear the way they do, and simulating these laws on the computer. In other words, rendering is nothing else than simulating the laws of physics responsible for making up the world we live in, as it appears to us. There are many laws contributing to making up this world, but fewer do contribute to the way it looks. For example gravity, which plays a role in making objects fall (gravity is used in solid body simulation), has little to do with the way an orange looks like. Thus, in rendering, we will be interested in what makes objects look like the way they do, which is essentially the result of the way light propagates through space and interacts with objects (or matter more precisely). This is exactly what we will be simulating.

Perspective Projection and the Visibility Problem

But first, we need to understand and reproduce the way objects look like to our eyes. No so much in terms of their appearance but more in terms of their shape and their size with respect to their distance to the eye. The human eyes is an optical system which converges light rays (light reflected from object) to a focus point.

Figure 1: the human eyes is an optical system which converges light rays (light reflected from object) to a focus point. As a result, by geometric construction, objects which are further away from our eyes, do appear smaller than those which are at close distance.

As a result, by geometric construction, objects which are further away from our eyes, do appear smaller than those which are at close distance (assuming all objects have the same size). Or to say it differently, an object appears smaller as we move away from it. Again this is the pure result of the way our eyes are designed. But because we are accustomed to see the world that way, it makes sense to produce images which have the same effect: something called the foreshortening effect. Cameras and photographic lenses were designed to produce images of that sort. More than simulating the laws of physics, photorealistic rendering, is also about simulating the way our vision system works. We need to produce images of the world on a flat surface, similar to the way images are created in our eyes (which is mostly the result of the way our eyes are designed - we are not too sure about how it works in the brain but this is not really important for us).

How do we do that? A basic method consists of tracing lines from the corner of objects to the eye and finding the intersection of these lines with the surface of an imaginary canvas (a flat surface on which the image will be drawn, such as a sheet of paper or the surface of the screen) perpendicular to the line of sight (figure 2).

Figure 2: to create an image of the box, we trace lines from the corners of the object to the the eye. We then connect the points where these lines intersect an imaginary plane (the canvas) to recreate the edges of the cube. This is an example of perspective projection.

These intersection points can then be connected to each other, to recreate the edges of the objects. The process by which a 3D point is projected onto the surface of the canvas (by the process we just described) is called perspective projection. Figure 3 shows what a box looks like when this technique is used to "trace" an image of that object on a flat surface (the canvas).

Figure 3: image of a cube created using perspective projection.

This sort of rendering in computer graphics is called a wireframe, because only the edges of the objects are actually drawn. This image though is not photo real. If the box was opaque, the front faces of the box (at most three of these faces) should occlude or hide the rear ones, which is clearly not the case in this image (and if more objects were in the scene, they would potentially occlude each other). Thus, one of the problems we need to figure out in rendering, is not only how we should be projecting the geometry onto the scene, but also how we should determine which part of the geometry is visible and which part is hidden, something known as the visibility problem (determining which surfaces and parts of surfaces are not visible from a certain viewpoint). This process in computer is known under many names: hidden surface elimination, hidden surface determination (also known as hidden surface removal, occlusion culling and visible surface determination. Why so many names? Because this is one of the first major problems in rendering, and for this particular reason, a lot of research was made in this area in the early ages of computer graphics (and a lot of different names were given to the different algorithms that resulted from this research). Because it requires to find out whether a given surface is hidden or visible, you can look at the problem in two different ways: do I design an algorithm that looks for hidden surfaces (and remove them) or do I design one in which I focus on finding the visible ones. Of course, this should produce the same image at the end, but can lead to designing different algorithms (in which one might be better than the others).

The visibility problem can solved in many different ways, but they generally fall within two main categories. In historical-chronological order:

Rasterization is not a common name, but for those of you who are already familiar with hidden surface elimination algorithms, it includes the z-buffer and painter's algorithms among others. Almost all graphics cards (GPUs) use an algorithm from this category (likely z-buffering). Both methods will be detailed in the next chapter.


Even though we haven't really explained how the visibility problem can be solved, lets assume for now that we know how to flatten a 3D scene onto a flat surface (using perspective projection) and determine which part of the geometry is visible from a certain viewpoint. This is a big step towards generating a photorealistic image but what else do we need? Objects are not only defined by their shape but also by their appearance (this time not in terms of how big they appear on the scene, but in terms of their look, their color, their texture, how bright they are). Furthermore objects are actually only visible to the human eye because light is bouncing of their surface. How can we define what the appearance of an object is? The appearance of an object can be defined as the way the material this object is made of, interacts with light itself. Light is emitted by light sources (such as the sun, a light bulb, the flame of a candle, etc.) and travels in straight line. When it comes in contact with an object, two things might happen to it. It can either be absorbed by the object or it can be reflected back into the environment. When light is reflected off the surface of an object, it keeps traveling (potentially in a different direction than the direction it came from initially) until it either comes in contact with another object (in which case the process repeats, light is either absorbed or reflected) or or reach our eyes (when it reaches out eyes, the photoreceptors the surface of the eye is made of convert light into an electrical signal which is sent to the brain).

Figure 4: an object appears yellow under white light because it absorbs most of the blue light and reflects green and red light which combined together form a yellow color.

In CG, we generally won't try to simulate the way light interacts with atoms, but the way it behaves at the object level. However things are not that simple. Because if the maths involved in computing the new direction of a tennis ball bouncing of the surface of an object are simple, the problem is that surfaces at the microscopic level (not the atomic level) are generally not flat at all, which actually causes light to bounce in all sort of (almost random in some cases) directions. From the distance we generally look at common objects (a car, a pillow, a fruit), we don't see the microscopic structure of objects, although it has a considerable impact on the way it reflects light and thus the way they look. However, we are obviously not going to represent objects at the microscopic level, for obvious reasons (the amount of geometry needed would simply not fit within the memory of any conventional or non conventional for that matter, computer). What do we do then? The solution to this problem is to come with another mathematical model, for simulating the way light interacts with any given material at the microscopic level. This, in short, is the role played by what we call a shader in computer graphics. A shader, is an implementation of a mathematical model designed to simulate the way light interacts with matter at the microscopic level.

Light Transport

Rendering is mostly about simulating the way light travels in space. Light is emitted from light sources, is reflected of the surface of objects, and some of that light eventually reaches our eyes. This is how and why we see objects around us. As mentioned in the introduction to ray tracing, it is not very efficient to follow the path of light form a light source to the eye. When a photon hits an object, we do not know the direction this photon will have after it has been reflected off the surface of the object. It might travel towards the eyes, but since the eye is itself very small, it is actually more likely to miss it. While it's not impossible to write an program in which we simulate the transport of light as it occurs in nature (this method is called forward tracing), it is in fact, as mentioned before, never done in practice, because of its inefficiency.

Figure 5: in the real world, light travel travels from light sources (the sun, light bulbs, the flame of a candle, etc.) to the eye. This is called forward tracing (left). However, in computer graphics and rendering, it's more efficient to simulate the path of light the other way around, from the eye, to the object, to the light source. This is called backward tracing.

A much more efficient solution, is to follow the path of light, the other way around, from the eye to the light source. Because we follow the natural path of light backward, we call this approach backward tracing.

Both terms are sometimes swapped in the CG literature. Almost all renderers follow light from the eye to the emission source. Because in CG it is the 'default' implementation, some people call prefer to call this method, forward tracing. However in Scratchapixel, we will use forward for the when light goes from the source to the eye, and backward when we follow its path the other way around.

The main point here, is that rendering is for the most part about simulating the way light propagates through space. This is not a simple problem, not because we don't understand it well, but because if we were to simulate what truly happens in nature, there would be so many photons (or light particles) to follow the path of, that it would take a very long time to get an image. Thus in practice, we follow the path of very few photons instead, just to keep the render time down, but obviously the final image is not as accurate as it would, if the paths of all photons were simulated. Finding a good tradeoff between photo-realism and render time is really the crux in rendering. In rendering, a light transport algorithm is an algorithm designed to simulate the way light travels in space in order to produce an image of a 3D scene that matches "reality" as closely as possible.

When light bounces off of a diffuse surface and illuminates other objects around them, we call this effect indirect diffuse. Light can also be reflected off the surface of shiny objects, creating caustics (the disco ball effect). Unfortunately it is very hard to come up with a algorithm capable of simulating all these effects at once (using a single light transport algorithm to simulate them all). It is in practice, often necessary to simulate these effects independently.

Light transport is central to rendering and is a very large field of research.


In this chapter, we learned that rendering can essentially be seen as essential a two steps process:

Have you ever heard the term graphics or rendering pipeline? The term is more often used in the context of real-time rendering APIs (such as OpenGL, DirectX or Metal). The rendering process as explained in this chapter, can be decomposed into at least two steps, visibility and shading. Both steps though can be decomposed into more smaller steps or stages (which is the term more commonly used). Steps are stages are generally executed in a sequential order (the input of a any given stage generally depends on the output of the preceding stage). This sequence of stages forms what we call the rendering pipeline.

It is really important that you always keep this distinction in mind. When you study a particular technique always try to think wether it relates to one of the other. Most lessons from this section (and the advanced rendering section) fall within one of these categories:

Projection/Visibility Problem Light Transport/Shading
  • Perspetive Projection Matrix
  • Rays and Cameras
  • Rendering a Triangle with Ray Tracing
  • Rendering Simple Shapes with Ray Tracing
  • Rendering a Mesh Using Ray Tracing
  • Transform Objects using Matrices
  • Rendering the Utah Teapot
  • The REYES algorithm: an Example of Rasterisation
  • The Rendering Equation
  • Example of a Light Transport Algorithm: Path Tracing
  • Area Lights
  • Shaders and BRDFs
  • Texturing
  • (Motion Blur)
  • (Depth of Field)

We will briefly detail both steps in the next chapters.