Home
About
Donate
Rendering an Image of a 3D Scene: an Overview

The Visibility Problem

We already explained what the visibility problem is in the previous chapters. To create a photorealistic image, we need to determine which part of an object is visible from a given viewpoint. The problem is that when we project the corners of the box for example and connect the projected points to draw the edges of the box, all faces of the box are visible. However, in reality only the front faces of the box would be visible, while the rear ones would be hidden.

In computer graphics you can solve this problem using principally two methods: ray tracing and rasterisation. We will quickly explain how they work. While it's hard to know whether one method is older than the other, rasterisation was far more popular in the early days of computer graphics. Ray tracing is notoriously more computationally expensive (and uses more memory) than rasterisation, and thus is far slower in comparison. Computers back then where so slow (and had so little memory), that rendering images using ray tracing was not considered a viable option, at least in a production environment (to produce films for example). For this reason, almost every renderer used rasterisation (ray tracing was generally limited to research projects). However, for reasons we will explain in the next chapter, ray tracing is way better than rasterisation when it comes to simulating effects such as reflections, soft shadows, etc. In summary, it's easier to create photo-realistic images with ray tracing, only it takes longer compared to rendering geometry using rasterisation which in turn, is less adapted than ray tracing to simulate realistic shading and light effects. We will explain why in the next chapter. Realtime rendering APIs and GPUs are generally using rasterisation, because speed in real time is obviously what determines the choice of the algorithm. What was true for ray tracing in the 80s and 90s is however not true today. Computers are now so powerful, that ray tracing is used by probably every off-line renderer today (at least, they propose a hybrid approach in which both algorithms are implemented). Why? Because again it's the easiest way of simulating important effects such as sharp and glossy reflections, soft shadows, etc. As long as speed is not an issue, it is in fact superior in many ways to rasterisation (making ray tracing work efficiently though still requires a lot of work). Pixar's PhotoRealistic RenderMan, the renderer Pixar developed to produce many of its first feature films (Toys Story, Nemo, Bug's Life) was based on a rasterisation algorithm (the algorithm is called REYES; it stands for Renders Everything You Ever Saw. It is by far considered as one of the best visible surface determination algorithm ever conceived - The GPU rendering pipeline has many similarities with REYES). But their current renderer called RIS is now a pure ray tracer. Introducing ray tracing allowed the studio to greatly push the realism and the complexity of the images it produced over the years.

Rasterisation to Solve the Visibility Problem: How Does it Work?

We hopefully clearly explained already the difference between rasterisation and ray tracing (read the previous chapter). However let's repeat again, that we can look at the rasterisation approach, as if we were moving a point along a line connecting P, a point on the surface of the geometry, to the eye, until it "lies" on the surface of the canvas. Of course this line is only implicit, we actually never really need to construct it, but this how, intuitively we can interpret the projection process.

Figure 1: the projection process can be seen as if the point we want to project was moved down along a line connecting the point or the vertex itself to the eye. We can stop moving the point along that line when it lies on the plane of the canvas. Obviously we don't "slide" the point along this line explicitly, but this how the projection process can be interpreted.

Figure 2: several points in the scene may project to the same point on the scene. The point visible to camera is the one closest to the eye along the ray on which all points are aligned.

Remember that what we need to solve here is the visibility problem. In other words, there might be situations in which several points in the scene, P, P1, P2, etc. projects onto the same point P' onto the canvas (remember that the canvas is also the surface of the screen). However, the only point that is actually visible through the camera is the point along the line connecting the eye to all these points, which is the closest to the eye, as showed in the figure 2.

To solve the visibility problem, we first need to express P' in terms of it position in the image: what are the coordinates of the pixel in the image, P' falls onto? Remember that the projection of a point to the surface of the canvas, gives another point P' which coordinates are real. However, P' also necessarily falls within a given pixel of our final image. So how do we go from expressing P's in terms of its position on the surface of the canvas, to defining it in terms of its position in the final image (the coordinates of the pixel in the image, P' falls onto)? This involves a simple change of coordinate systems.

Figure 3: computing the coordinate of a point on the canvas in terms of pixel values, requires to transform the points coordinates from screen to NDC space, and NDC space to raster space.

Converting points from screen space to raster space it actually really simple. Because the coordinates P' expressed in raster space can only be positive, we first need to normalise P's original coordinates. In other words, convert them from whatever range they are in originally, to the range 0, 1 (when points are defined that way, we say they are defined in NDC space. NDC stands for Normalized Device Coordinates). Once converted to NDC space, converting the point's coordinates to raster space is trivial: just multiply the normalised coordinates by the image dimensions, and round the number off to the nearest integer value (pixel coordinates are always round numbers, or integers if you prefer). The range P' coordinates are originally in, depends on the size of the canvas in screen space. For the sake of simplicity, we will just assume that the canvas is two units long in each of the two dimensions (width and height), which means that P' coordinates in screen space, are in the range [-1, 1]. Here is the pseudo code to convert P's coordinates from screen space to raster space:

int width = 64, height = 64; // dimension of the image in pixels Vec3f P = Vec3f(-1, 2, 10); Vec2f P_proj; P_proj.x = P.x / P.z; // -0.1 P_proj.y = P.y / P.z; // 0.2 // convert from screen space coordinates to normalized coordinates Vec2f P_proj_nor; P_proj_nor.x = (P_proj.x + 1) / 2; // (-0.1 + 1) / 2 = 0.45 P_proj_nor.y = (1 - P_proj.y ) / 2; // (1 - 0.2) / 2 = 0.4 // finally, convert to raster space Vec2i P_proj_raster; P_proj_raster.x = (int)(P_proj_nor.x * width); P_proj_raster.y = (int)(P_proj_nor.y * height); if (P_proj_raster.x == width) P_proj_raster.x = width - 1; if (P_proj_raster.y == height) P_proj_raster.y = height - 1;
This conversion process is explained in details in the lesson on cameras [link].

There are a few things to notice in this code. First that the original point P, the projected point in screen space and NDC space all use the Vec3f or Vec2f types in which the coordinates are defined as real (floats). However the final point in raster space uses the Vec2i type in which coordinates are defined as integers (the coordinate of a pixel in the image). Arrays in programming, are 0-indexed, thereby, the coordinates of a point in raster point should never be greater than the width of the image minus one or the image height minus one. However this may happen when P's coordinates in screen space are exactly 1 in either dimension. The code checks this case (lines 14-15), and clamp the coordinates to the right range if it happens. Also, the origin of the NDC space coordinate is located in the lower left corner of the image, but the origin of the raster space system is located in the upper left corner (see figure 3). Therefore, the y coordinate needs to be inverted when converted from NDC to raster space (check the difference between line 8 and 9 in the code).

But why do we need this conversion? To solve the visibility problem we will use the following method:

An algorithm based on this approach, is called a depth sorting algorithm (a self-explanatory name). The concept of ordering objects in depth (or points on the surfaces of objects) is the base of all rasterisation algorithms. Quite a few exist among the most famous of which are:

Keep in mind that while this may sound like old fashion to you, all graphics card are using one implementation of the z-buffer algorithm, to produce images. These algorithms (at least z-buffering) are still commonly used today.

When Opacity Gets in the Way

Why do we need to keep a list of points? Storing the point with the shortest distance to the eye doesn't require to store all the points in a list. Indeed, you could very well do the following thing:

  • For each pixel in the image, set the variable z to infinity.
  • For each point in the scene.
    • Project the point and compute its raster coordinates
    • If the distance from current point to the eye z' is smaller than the distance z stored in the pixel the point projects to, then update z with z'. If z' is greater than z, then the point is located further away from the point currently stored for that pixel.

You can see that, you can get the same result without having to store a list of visible points and sorting them out at the end. So why did we use one? We used one because in our example, we just assume that all points in the scene were opaque. But what happens if they are not fully opaque? Clearly, if several semi-transparent points project to the same pixel, they may be visible throughout each other. In this particular case, it is necessary to keep track of all the points visible through that particular pixel, sort them out by distance, and use a special compositing technique (we will learn about this in the lesson on the REYES algorithm) to blend them correctly.

Ray Tracing to Solve the Visibility Problem: How Does it Work?

Figure 4: in raytracing, we explicitly trace rays from the eye down into the scene. If the ray intersects some geometry, the pixel the ray passes through takes the color of the intersected object.

With rasterisation, points are projected onto the screen to find their respective position on the image plane. But we can look at the problem the other way around. Rather than going from the point to the pixel, we can actually start from the pixel and convert it into a point on the image plane (we take the centre of the pixel and convert its coordinates defined in raster space to screen space). This gives us P'. We can then trace a ray starting from the eye, passing through P' and extend it down into the scene (by default we will assume that P' is centre of the pixel). If we find that the ray intersects an object, then we know that the point of intersection P, is actually the point visible through that pixel. In short, ray tracing is a method to solve the point's visibility problem, by the mean of explicitly tracing rays from the eye down into the scene.

Note that in a way, ray tracing and rasterisation are really a reflection of each other. They are really based on the same principle, but ray tracing is going from the eye to the object, while rasterisation goes from the object to the eye. While they make it possible to find which point is actually visible for any given pixel in the image (they give the exact same result with that respect), implementing them requires to solve very different problems. Ray tracing is more complicated in a way, because it requires to solve the ray-geometry intersection problem. Do we even have a way of finding the intersection of a ray with geometry? While it might be possible to find a way of computing whether or not a ray intersects a sphere, can we find a similar method to compute the intersection of a ray with a cone for instance? And what about another shape, and what about NURBS, subdivision surfaces, and implicit surfaces? As you can see, ray tracing can be used as long as a technique exists to compute the intersection of a ray with any type of geometry a scene might contain (or your renderer might support).

Over the years, a lot of research was put into efficient ways of computing the intersection of rays with a course the simplest of all possible shape - the triangle - but also directly ray tracing other types of geometry: NURBS, implicit surfaces, etc. However, one possible alternative to supporting all geometry types, is to convert all geometry to a single geometry representation before the rendering process starts, and have the renderer only test the intersection of rays with that one geometry representation. Because triangles, is an ideal rendering primitive, most of the time, all geometry is converted to triangles meshes, which means that rather than implementing a ray-object intersection test per geometry type, you only need to test for the intersection of rays with triangles. This has many advantages:

Comparing Rasterisation and Ray Tracing

Figure 5: the principal of acceleration structure consists of dividing space into sub-regions. As the ray travels from one sub-region to the next, we only need to check for a possible intersection with the geometry contained in the current sub-region. Instead of testing all the objects in the scene, we can only test for those contained in the sub-regions the ray passes through. This leads into potentially saving a lot of ray-geometry intersection tests which are costly.

We already talked a few times about the difference between ray tracing and rasterisation. Why would you choose one or the other? As mentioned before, to sort the visibility problem, rasterisation is definitely faster than ray tracing. Why is that? Converting geometry to make it work with the rasterisation algorithm takes eventually some time, but projecting the geometry itself is very fast (it just takes a few multiplications, additions and divisions). In comparison, computing the intersection of a ray with geometry requires far more instructions and is therefore more expensive. The main difficulty with ray tracing is that render time increases linearly with the amount of geometry the scene contains. Because we have to check whether any given ray intersects any of the triangles in the scene, the final cost is then the number of triangles multiplied by the cost of a single ray-triangle intersection test. Hopefully this problem can be alleviated by the use of an acceleration structure. The idea behind acceleration structures, is that space can be divided into subspaces (for instance you can divide a box containing all the geometry to form a grid - each cell of that grid represents a sub-space of the original box), and that objects can be sorted depending on the sub-space they fall into. This idea is illustrated in the figure 5.

If these sub-spaces are significantly larger than the objects' average size, then it is likely that a subspace will contain more than one object (of course it all depends how they are organised in space). Instead of testing all objects in the scene, we can first test if a ray intersects a given subspace (in other words, if it passes through that sub-space). If it does, we can then test if the ray intersects any of the objects it contains, but if it doesn't, we can then skip the ray-intersection test for all these objects. This leads to only testing a subset of the scene's geometry, which is obviously saving time.

If acceleration structures can be used to accelerate ray tracing then isn't ray tracing superior to rasterisation? Yes and no. First it is still generally slower, but using an acceleration structure raises a all lot of new problems.

Many different acceleration structures have been proposed and they all have as you can guess strengths and weaknesses, but of course some of them are more popular than others. You will find many lessons devoted to this particular topic in the section devoted to Ray Tracing Techniques.

From reading all this you may think that all problems are with ray tracing. Well, in fact, ray tracing is popular for a reason. First in its principle, it is incredibly simple to implement. We showed in the first lesson of this section that a very basic ray tracer can be written in no more than a few hundreds lines of code. In reality, we could argue that it wouldn't take much more code to write a renderer based on the rasterisation algorithm, but still, the concept of ray tracing seems to be easier to code, as maybe it is a more natural way of thinking of the process of making an image of a 3D scene. But far more importantly, it happens that if you use ray tracing, computing effects such as reflection or soft shadow which play a critical role in the photo-realism of an image, are just straightforward to simulate in ray tracing, and very hard to simulate if you use rasterisation. To understand why, we first need to look at shading and light transport in more details, which is the topic of our next chapter.

Rasterization is fast, but needs cleverness to support complex visual effects. Ray tracing supports complex visual effects, but needs cleverness to be fast" (David Luebke - NVIDIA)
With rasterization it is easy to do it very fast, but hard to make it look good. With ray tracing it is easy to make it look good, but very hard to make it fast.

Summary

In this chapter we only look at using ray tracing and rasterisation as two possible ways of solving the visibility problem. Rasterisation is still the method by which graphics card render 3D scenes. Rasterisation is still faster compared to ray tracing when it comes to using one algorithm or the other to solve the visibility problem. You can accelerate ray tracing through with an acceleration structure, however acceleration structures come with their own set of issues: it's hard to find a good acceleration structure, one that performs well regardless of the scene configuration (number of primitives to render, their sizes and their distribution in space). They also require extra memory and building them takes time.

It is important to appreciate that at this stage, ray tracing does not really have any definite advantages over rasterisation. However, ray tracing is better than rasterisation to simulate light or shading effects such as soft shadows or reflections. When we say better we actually mean that is more straightforward to simulate them with ray tracing than it is with rasterisation, which doesn't mean at all these effects can't be simulated with rasterisation. It just generally only requires more work. We insist on this point, because there is a common misbelief regarding the fact that effects such as reflections for example can't be done with rasterisation, which is why ray tracing is used. It is simply not true. However, one might think about using a hybrid approach in which rasterisation is used for the visibility surface elimination step, and ray tracing is used for shading, the second step of the rendering process, but having to implement both systems in the same applications obviously requires more work than just using one unified framework. And since, ray tracing makes it easier to simulate things such as reflections, then most people prefer to use ray tracing to solve the visibility problem as well.