The Visibility Problem
We already explained what the visibility problem is in the previous chapters. To create a photorealistic image, we need to determine which part of an object is visible from a given viewpoint. The problem is that when we project the corners of the box for example and connect the projected points to draw the edges of the box, all faces of the box are visible. However, in reality only the front faces of the box would be visible, while the rear ones would be hidden.
In computer graphics you can solve this problem using principally two methods: ray tracing and rasterisation. We will quickly explain how they work. While it's hard to know whether one method is older than the other, rasterisation was far more popular in the early days of computer graphics. Ray tracing is notoriously more computationally expensive (and uses more memory) than rasterisation, and thus is far slower in comparison. Computers back then where so slow (and had so little memory), that rendering images using ray tracing was not considered a viable option, at least in a production environment (to produce films for example). For this reason, almost every renderer used rasterisation (ray tracing was generally limited to research projects). However, for reasons we will explain in the next chapter, ray tracing is way better than rasterisation when it comes to simulating effects such as reflections, soft shadows, etc. In summary, it's easier to create photo-realistic images with ray tracing, only it takes longer compared to rendering geometry using rasterisation which in turn, is less adapted than ray tracing to simulate realistic shading and light effects. We will explain why in the next chapter. Realtime rendering APIs and GPUs are generally using rasterisation, because speed in real time is obviously what determines the choice of the algorithm. What was true for ray tracing in the 80s and 90s is however not true today. Computers are now so powerful, that ray tracing is used by probably every off-line renderer today (at least, they propose a hybrid approach in which both algorithms are implemented). Why? Because again it's the easiest way of simulating important effects such as sharp and glossy reflections, soft shadows, etc. As long as speed is not an issue, it is in fact superior in many ways to rasterisation (making ray tracing work efficiently though still requires a lot of work). Pixar's PhotoRealistic RenderMan, the renderer Pixar developed to produce many of its first feature films (Toys Story, Nemo, Bug's Life) was based on a rasterisation algorithm (the algorithm is called REYES; it stands for Renders Everything You Ever Saw. It is by far considered as one of the best visible surface determination algorithm ever conceived - The GPU rendering pipeline has many similarities with REYES). But their current renderer called RIS is now a pure ray tracer. Introducing ray tracing allowed the studio to greatly push the realism and the complexity of the images it produced over the years.
Rasterisation to Solve the Visibility Problem: How Does it Work?
We hopefully clearly explained already the difference between rasterisation and ray tracing (read the previous chapter). However let's repeat again, that we can look at the rasterisation approach, as if we were moving a point along a line connecting P, a point on the surface of the geometry, to the eye, until it "lies" on the surface of the canvas. Of course this line is only implicit, we actually never really need to construct it, but this how, intuitively we can interpret the projection process.
Remember that what we need to solve here is the visibility problem. In other words, there might be situations in which several points in the scene, P, P1, P2, etc. projects onto the same point P' onto the canvas (remember that the canvas is also the surface of the screen). However, the only point that is actually visible through the camera is the point along the line connecting the eye to all these points, which is the closest to the eye, as showed in the figure 2.To solve the visibility problem, we first need to express P' in terms of it position in the image: what are the coordinates of the pixel in the image, P' falls onto? Remember that the projection of a point to the surface of the canvas, gives another point P' which coordinates are real. However, P' also necessarily falls within a given pixel of our final image. So how do we go from expressing P's in terms of its position on the surface of the canvas, to defining it in terms of its position in the final image (the coordinates of the pixel in the image, P' falls onto)? This involves a simple change of coordinate systems.
- The coordinate system in which the point is originally defined is called screen space (or image space). It is defined by an origin which is located in the centre of the canvas. All axes of this two dimensional coordinate system have unit length (their length is 1). Note that the x or y coordinate of any point defined in this coordinate system can be negative if it lies to the left of the x-axis (for the x-coordinate) or below the y-axis (for the y-coordinate).
- The coordinate system in which points are defined with respect to the grid formed by the pixels of the image, is called raster space. Its origin is generally located in the upper left corner of the image. Its axes also have unit length and a pixel is considered to be one unit length in this coordinate system. Thus, the actual size of the canvas in this coordinate system is given by the image vertical (height) and horizontal (width) dimension (which is expressed in terms of pixels).
Converting points from screen space to raster space it actually really simple. Because the coordinates P' expressed in raster space can only be positive, we first need to normalise P's original coordinates. In other words, convert them from whatever range they are in originally, to the range 0, 1 (when points are defined that way, we say they are defined in NDC space. NDC stands for Normalized Device Coordinates). Once converted to NDC space, converting the point's coordinates to raster space is trivial: just multiply the normalised coordinates by the image dimensions, and round the number off to the nearest integer value (pixel coordinates are always round numbers, or integers if you prefer). The range P' coordinates are originally in, depends on the size of the canvas in screen space. For the sake of simplicity, we will just assume that the canvas is two units long in each of the two dimensions (width and height), which means that P' coordinates in screen space, are in the range [-1, 1]. Here is the pseudo code to convert P's coordinates from screen space to raster space:
There are a few things to notice in this code. First that the original point P, the projected point in screen space and NDC space all use the Vec3f or Vec2f types in which the coordinates are defined as real (floats). However the final point in raster space uses the Vec2i type in which coordinates are defined as integers (the coordinate of a pixel in the image). Arrays in programming, are 0-indexed, thereby, the coordinates of a point in raster point should never be greater than the width of the image minus one or the image height minus one. However this may happen when P's coordinates in screen space are exactly 1 in either dimension. The code checks this case (lines 14-15), and clamp the coordinates to the right range if it happens. Also, the origin of the NDC space coordinate is located in the lower left corner of the image, but the origin of the raster space system is located in the upper left corner (see figure 3). Therefore, the y coordinate needs to be inverted when converted from NDC to raster space (check the difference between line 8 and 9 in the code).
But why do we need this conversion? To solve the visibility problem we will use the following method:
- Project all points onto the the screen.
- For each projected point, convert P' coordinates from screen space to raster space.
Question from a reader: "you say, project all points onto the screen. How do we find these points in the first place?". Very good question. Technically, we would break down the triangles or the polygons objects are made of, into smaller geometry elements no bigger than a pixel when projected onto the screen. In real-time APIs and OpenGL this what we generally refer to as fragments. Check the lesson on the REYES algorithm in this section to learn how this work in more details.
- Find the pixel the point maps to (using the projected point raster coordinates), and store the distance of that point to the eye, in a special list of points (called the depth list), maintained by that pixel.
- At the end of the process, sort the points in the list of each pixel, by order of increasing distance. The result of this process is that obviously, the point visible for any given pixel in the image, is the first point from that pixel's list.
An algorithm based on this approach, is called a depth sorting algorithm (a self-explanatory name). The concept of ordering objects in depth (or points on the surfaces of objects) is the base of all rasterisation algorithms. Quite a few exist among the most famous of which are:
- the z-buffering algorithm. This is probably the most commonly used one from this category. The REYES algorithm which we present in this section implements the z-buffer algorithm. It is very similar to the technique we described in which points on the surfaces of objects (objects are subdivided into very small surfaces, or fragments which are then projected onto the screen), are projected onto the screen and stored into depth lists.
- the Painter algorithm
- the Newell's algorithm
- ... (list to be extended)
Keep in mind that while this may sound like old fashion to you, all graphics card are using one implementation of the z-buffer algorithm, to produce images. These algorithms (at least z-buffering) are still commonly used today.
Why do we need to keep a list of points? Storing the point with the shortest distance to the eye doesn't require to store all the points in a list. Indeed, you could very well do the following thing:
- For each pixel in the image, set the variable z to infinity.
- For each point in the scene.
- Project the point and compute its raster coordinates
- If the distance from current point to the eye z' is smaller than the distance z stored in the pixel the point projects to, then update z with z'. If z' is greater than z, then the point is located further away from the point currently stored for that pixel.
You can see that, you can get the same result without having to store a list of visible points and sorting them out at the end. So why did we use one? We used one because in our example, we just assume that all points in the scene were opaque. But what happens if they are not fully opaque? Clearly, if several semi-transparent points project to the same pixel, they may be visible throughout each other. In this particular case, it is necessary to keep track of all the points visible through that particular pixel, sort them out by distance, and use a special compositing technique (we will learn about this in the lesson on the REYES algorithm) to blend them correctly.
Ray Tracing to Solve the Visibility Problem: How Does it Work?
With rasterisation, points are projected onto the screen to find their respective position on the image plane. But we can look at the problem the other way around. Rather than going from the point to the pixel, we can actually start from the pixel and convert it into a point on the image plane (we take the centre of the pixel and convert its coordinates defined in raster space to screen space). This gives us P'. We can then trace a ray starting from the eye, passing through P' and extend it down into the scene (by default we will assume that P' is centre of the pixel). If we find that the ray intersects an object, then we know that the point of intersection P, is actually the point visible through that pixel. In short, ray tracing is a method to solve the point's visibility problem, by the mean of explicitly tracing rays from the eye down into the scene.
Note that in a way, ray tracing and rasterisation are really a reflection of each other. They are really based on the same principle, but ray tracing is going from the eye to the object, while rasterisation goes from the object to the eye. While they make it possible to find which point is actually visible for any given pixel in the image (they give the exact same result with that respect), implementing them requires to solve very different problems. Ray tracing is more complicated in a way, because it requires to solve the ray-geometry intersection problem. Do we even have a way of finding the intersection of a ray with geometry? While it might be possible to find a way of computing whether or not a ray intersects a sphere, can we find a similar method to compute the intersection of a ray with a cone for instance? And what about another shape, and what about NURBS, subdivision surfaces, and implicit surfaces? As you can see, ray tracing can be used as long as a technique exists to compute the intersection of a ray with any type of geometry a scene might contain (or your renderer might support).
Over the years, a lot of research was put into efficient ways of computing the intersection of rays with a course the simplest of all possible shape - the triangle - but also directly ray tracing other types of geometry: NURBS, implicit surfaces, etc. However, one possible alternative to supporting all geometry types, is to convert all geometry to a single geometry representation before the rendering process starts, and have the renderer only test the intersection of rays with that one geometry representation. Because triangles, is an ideal rendering primitive, most of the time, all geometry is converted to triangles meshes, which means that rather than implementing a ray-object intersection test per geometry type, you only need to test for the intersection of rays with triangles. This has many advantages:
- First as suggested before, the triangle has many properties that makes it very attractive as a geometry primitive. It's co-planar, a triangle is indivisible (as creating more faces by connecting the existing vertices to each other, as you would for faces having at least four or more vertices), but it can easily be subdivided into more triangles. Finally, the math for computing the barycentric coordinates of a triangle (which is used in texturing) are simple and robust.
- Because triangles are a good geometry primitive, a lot of research was done to find the best possible ray-triangle intersection test. What is a good ray triangle intersection algorithm? It's one that is fast (get to the result using as few operations as possible), uses the least memory (some algorithms are more memory-hungry than others because they require to store precomputed variables on the triangle geometry) and is also robust (floating point arithmetic issues are hard to avoid).
- From a coding point of view, supporting one single-routine is obviously far more advantageous than having to code many routines to handle all geometry types. Supporting triangles only simplifies the code in many places, but also allows to design code that works best with triangles in general. This is particular true when it comes to acceleration structures. Computing the intersection of rays with geometry is by far the most expensive operation in a ray tracer. The time it takes to test the intersection with all geometry in the scene grows linearly with the amount of geometry the scene contains. As soon as the scene contains even just hundreds of such primitives it becomes necessary to implement strategies to quickly discard section of the scene, which we know have no chances to be intersected by the ray and test for only subsections of the scene which the ray will potentially intersect. These strategies save a considerable amount of time and are generally based on acceleration structures. We will study acceleration structures in the section devoted to ray tracing techniques. Also it's worth noticing that specially designed hardware has been already built in the past, to handle to the ray-triangle intersection test specifically, allowing complex scenes to run near real-time using ray tracing. It's quite obvious that in the future, graphics card will natively support the ray-triangle intersection test and that video games will evolve towards ray tracing.
Comparing Rasterisation and Ray Tracing
We already talked a few times about the difference between ray tracing and rasterisation. Why would you choose one or the other? As mentioned before, to sort the visibility problem, rasterisation is definitely faster than ray tracing. Why is that? Converting geometry to make it work with the rasterisation algorithm takes eventually some time, but projecting the geometry itself is very fast (it just takes a few multiplications, additions and divisions). In comparison, computing the intersection of a ray with geometry requires far more instructions and is therefore more expensive. The main difficulty with ray tracing is that render time increases linearly with the amount of geometry the scene contains. Because we have to check whether any given ray intersects any of the triangles in the scene, the final cost is then the number of triangles multiplied by the cost of a single ray-triangle intersection test. Hopefully this problem can be alleviated by the use of an acceleration structure. The idea behind acceleration structures, is that space can be divided into subspaces (for instance you can divide a box containing all the geometry to form a grid - each cell of that grid represents a sub-space of the original box), and that objects can be sorted depending on the sub-space they fall into. This idea is illustrated in the figure 5.
If these sub-spaces are significantly larger than the objects' average size, then it is likely that a subspace will contain more than one object (of course it all depends how they are organised in space). Instead of testing all objects in the scene, we can first test if a ray intersects a given subspace (in other words, if it passes through that sub-space). If it does, we can then test if the ray intersects any of the objects it contains, but if it doesn't, we can then skip the ray-intersection test for all these objects. This leads to only testing a subset of the scene's geometry, which is obviously saving time.
If acceleration structures can be used to accelerate ray tracing then isn't ray tracing superior to rasterisation? Yes and no. First it is still generally slower, but using an acceleration structure raises a all lot of new problems.
- First building this structure takes time, which means the render can't start until it's actually built: this generally never takes more than a few seconds, but, if you intend to use ray tracing in a real time application, then these few seconds are already too much (the acceleration structures needs to be built for every rendered frame if the geometry changes from frame to frame).
- Second, an acceleration potentially takes a lot of memory. Obviously this all depends on the scene complexity, however, because a good chunk of the memory needs to be used for the acceleration structure, this means that less is available for doing other things, particularly to store geometry. In practice this means you can potentially render less geometry with ray tracing than with rasterisation.
- Finally finding a good acceleration structure is very difficult. Imagine that you have one triangle on one side of the scene and all the other triangles stuck together in a very small region of space. If we build a grid for this scene many of the cells will be empty but the main problem is that when a ray traverses the cell containing the cluster of triangles, we will still need to perform a lot of intersection tests. Saving one test over the hundreds that may be required, is negligible and clearly shows that a grid as an acceleration structure for that sort of scene is not a good choice. As you can see, the efficiency of the acceleration structure depends very much on the scene, and the way objects are scattered: are object smalls or large, is it a mix of small and large objects, are objects uniformly distributed over space, or very unevenly distributed? Is the scene a combination of any of these options?
Many different acceleration structures have been proposed and they all have as you can guess strengths and weaknesses, but of course some of them are more popular than others. You will find many lessons devoted to this particular topic in the section devoted to Ray Tracing Techniques.
From reading all this you may think that all problems are with ray tracing. Well, in fact, ray tracing is popular for a reason. First in its principle, it is incredibly simple to implement. We showed in the first lesson of this section that a very basic ray tracer can be written in no more than a few hundreds lines of code. In reality, we could argue that it wouldn't take much more code to write a renderer based on the rasterisation algorithm, but still, the concept of ray tracing seems to be easier to code, as maybe it is a more natural way of thinking of the process of making an image of a 3D scene. But far more importantly, it happens that if you use ray tracing, computing effects such as reflection or soft shadow which play a critical role in the photo-realism of an image, are just straightforward to simulate in ray tracing, and very hard to simulate if you use rasterisation. To understand why, we first need to look at shading and light transport in more details, which is the topic of our next chapter.
In this chapter we only look at using ray tracing and rasterisation as two possible ways of solving the visibility problem. Rasterisation is still the method by which graphics card render 3D scenes. Rasterisation is still faster compared to ray tracing when it comes to using one algorithm or the other to solve the visibility problem. You can accelerate ray tracing through with an acceleration structure, however acceleration structures come with their own set of issues: it's hard to find a good acceleration structure, one that performs well regardless of the scene configuration (number of primitives to render, their sizes and their distribution in space). They also require extra memory and building them takes time.
It is important to appreciate that at this stage, ray tracing does not really have any definite advantages over rasterisation. However, ray tracing is better than rasterisation to simulate light or shading effects such as soft shadows or reflections. When we say better we actually mean that is more straightforward to simulate them with ray tracing than it is with rasterisation, which doesn't mean at all these effects can't be simulated with rasterisation. It just generally only requires more work. We insist on this point, because there is a common misbelief regarding the fact that effects such as reflections for example can't be done with rasterisation, which is why ray tracing is used. It is simply not true. However, one might think about using a hybrid approach in which rasterisation is used for the visibility surface elimination step, and ray tracing is used for shading, the second step of the rendering process, but having to implement both systems in the same applications obviously requires more work than just using one unified framework. And since, ray tracing makes it easier to simulate things such as reflections, then most people prefer to use ray tracing to solve the visibility problem as well.