The Visibility ProblemReading time: 23 mins.
The Visibility Problem
We already explained what the visibility problem is in the previous chapters. To create a photorealistic image, we need to determine which part of an object is visible from a given viewpoint. The problem is that when we project the corners of the box for example and connect the projected points to draw the edges of the box, all faces of the box are visible. However, in reality, only the front faces of the box would be visible, while the rear ones would be hidden.
In computer graphics, you can solve this problem using principally two methods: ray tracing and rasterization. We will quickly explain how they work. While it's hard to know whether one method is older than the other, rasterization was far more popular in the early days of computer graphics. Ray tracing is notoriously more computationally expensive (and uses more memory) than rasterization, and thus is far slower in comparison. Computers back then were so slow (and had so little memory), that rendering images using ray tracing was not considered a viable option, at least in a production environment (to produce films for example). For this reason, almost every renderer used rasterization (ray tracing was generally limited to research projects). However, for reasons we will explain in the next chapter, ray tracing is way better than rasterization when it comes to simulating effects such as reflections, soft shadows, etc. In summary, it's easier to create photo-realistic images with ray tracing, only it takes longer compared to rendering geometry using rasterization which in turn, is less adapted than ray tracing to simulate realistic shading and light effects. We will explain why in the next chapter. Real-time rendering APIs and GPUs are generally using rasterization because speed in real-time is obviously what determines the choice of the algorithm. What was true for ray tracing in the 80s and 90s is however not true today. Computers are now so powerful, that ray tracing is used by probably every offline renderer today (at least, they propose a hybrid approach in which both algorithms are implemented). Why? Because again it's the easiest way of simulating important effects such as sharp and glossy reflections, soft shadows, etc. As long as the speed is not an issue, it is superior in many ways to rasterization (making ray tracing work efficiently though still requires a lot of work). Pixar's PhotoRealistic RenderMan, the renderer Pixar developed to produce many of its first feature films (Toys Story, Nemo, Bug's Life) was based on a rasterization algorithm (the algorithm is called REYES; it stands for Renders Everything You Ever Saw. It is by far considered one of the best visible surface determination algorithms ever conceived - The GPU rendering pipeline has many similarities with REYES). But their current renderer called RIS is now a pure ray tracer. Introducing ray tracing allowed the studio to greatly push the realism and complexity of the images it produced over the years.
Rasterisation to Solve the Visibility Problem: How Does it Work?
We hopefully clearly explained already the difference between rasterization and ray tracing (read the previous chapter). However let's repeat, that we can look at the rasterization approach as if we were moving a point along a line connecting P, a point on the surface of the geometry, to the eye until it "lies" on the surface of the canvas. Of course, this line is only implicit, we never really need to construct it, but this is how intuitively we can interpret the projection process.
Remember that what we need to solve here is the visibility problem. In other words, there might be situations in which several points in the scene, P, P1, P2, etc. project onto the same point P' onto the canvas (remember that the canvas is also the surface of the screen). However, the only point that is visible through the camera is the point along the line connecting the eye to all these points, which is the closest to the eye, as shown in Figure 2.
To solve the visibility problem, we first need to express P' in terms of its position in the image: what are the coordinates of the pixel in the image, P' falls onto? Remember that the projection of a point to the surface of the canvas gives another point P' whose coordinates are real. However, P' also necessarily falls within a given pixel of our final image. So how do we go from expressing P's in terms of their position on the surface of the canvas, to defining it in terms of their position in the final image (the coordinates of the pixel in the image, P' falls onto)? This involves a simple change of coordinate systems.
The coordinate system in which the point is originally defined is called screen space (or image space). It is defined by an origin that is located in the center of the canvas. All axes of this two-dimensional coordinate system have unit length (their length is 1). Note that the x or y coordinate of any point defined in this coordinate system can be negative if it lies to the left of the x-axis (for the x-coordinate) or below the y-axis (for the y-coordinate).
The coordinate system in which points are defined with respect to the grid formed by the pixels of the image, is called raster space. Its origin is generally located in the upper-left corner of the image. Its axes also have unit length and a pixel is considered to be one unit length in this coordinate system. Thus, the actual size of the canvas in this coordinate system is given by the image's vertical (height) and horizontal (width) dimensions (which are expressed in terms of pixels).
Converting points from screen space to raster space is simple. Because the coordinates P' expressed in raster space can only be positive, we first need to normalize P's original coordinates. In other words, convert them from whatever range they are originally in, to the range [0, 1] (when points are defined that way, we say they are defined in NDC space. NDC stands for Normalized Device Coordinates). Once converted to NDC space, converting the point's coordinates to raster space is trivial: just multiply the normalized coordinates by the image dimensions, and round the number off to the nearest integer value (pixel coordinates are always round numbers, or integers if you prefer). The range P' coordinates are originally in, depends on the size of the canvas in screen space. For the sake of simplicity, we will just assume that the canvas is two units long in each of the two dimensions (width and height), which means that P' coordinates in screen space, are in the range [-1, 1]. Here is the pseudo-code to convert P's coordinates from screen space to raster space:
int width = 64, height = 64; //dimension of the image in pixels Vec3f P = Vec3f(-1, 2, 10); Vec2f P_proj; P_proj.x = P.x / P.z; //-0.1 P_proj.y = P.y / P.z; //0.2 // convert from screen space coordinates to normalized coordinates Vec2f P_proj_nor; P_proj_nor.x = (P_proj.x + 1) / 2; //(-0.1 + 1) / 2 = 0.45 P_proj_nor.y = (1 - P_proj.y ) / 2; //(1 - 0.2) / 2 = 0.4 // finally, convert to raster space Vec2i P_proj_raster; P_proj_raster.x = (int)(P_proj_nor.x * width); P_proj_raster.y = (int)(P_proj_nor.y * height); if (P_proj_raster.x == width) P_proj_raster.x = width - 1; if (P_proj_raster.y == height) P_proj_raster.y = height - 1;
This conversion process is explained in detail in the lesson 3D Viewing: the Pinhole Camera Model.
There are a few things to notice in this code. First that the original point P, the projected point in screen space, and NDC space all use the Vec3f or Vec2f types in which the coordinates are defined as real (floats). However, the final point in raster space uses the Vec2i type in which coordinates are defined as integers (the coordinate of a pixel in the image). Arrays in programming, are 0-indexed, thereby, the coordinates of a point in raster point should never be greater than the width of the image minus one or the image height minus one. However, this may happen when P's coordinates in screen space are exactly 1 in either dimension. The code checks this case (lines 14-15) and clamps the coordinates to the right range if it happens. Also, the origin of the NDC space coordinate is located in the lower-left corner of the image, but the origin of the raster space system is located in the upper-left corner (see figure 3). Therefore, the y coordinate needs to be inverted when converted from NDC to raster space (check the difference between lines 8 and 9 in the code).
But why do we need this conversion? To solve the visibility problem we will use the following method:
Project all points onto the screen.
For each projected point, convert P's coordinates from screen space to raster space.
Find the pixel the point maps to (using the projected point raster coordinates), and store the distance of that point to the eye, in a special list of points (called the depth list), maintained by that pixel.
You say, project all points onto the screen. How do we find these points in the first place?"
Very good question. Technically, we would break down the triangles or the polygons objects are made of, into smaller geometry elements no bigger than a pixel when projected onto the screen. In real-time APIs (OpenGL, DirectX, Vulkan, Metal, etc.) this is what we generally refer to as fragments. Check the lesson on the REYES algorithm in this section to learn how this works in more detail.
At the end of the process, sort the points in the list of each pixel, by order of increasing distance. As a result of this process, the point visible for any given pixel in the image is the first point from that pixel's list.
Why do points need to be sorted according to their depth?
The list needs to be sorted because points are not necessarily ordered in depth when projected onto the screen. Assuming you insert points by adding them at the top of the list, you may project a point B further from the eye than a point A, after you projected A. In which case B will be the first point in the list, even though its distance to the eye, is greater than the distance to A. Thus sorting is required.
An algorithm based on this approach is called a depth sorting algorithm (a self-explanatory name). The concept of depth ordering is the base of all rasterization algorithms. Quite a few exist among the most famous of which are:
the z-buffering algorithm. This is probably the most commonly used one from this category. The REYES algorithm which we present in this section implements the z-buffer algorithm. It is very similar to the technique we described in which points on the surfaces of objects (objects are subdivided into very small surfaces or fragments which are then projected onto the screen), are projected onto the screen and stored into depth lists.
the Painter algorithm
... (list to be extended)
Keep in mind that while this may sound like old fashion to you, all graphics cards are using one implementation of the z-buffer algorithm, to produce images. These algorithms (at least z-buffering) are still commonly used today.
Why do we need to keep a list of points? Storing the point with the shortest distance to the eye shouldn't require storing all the points in a list. Indeed, you could very well do the following thing:
For each pixel in the image, set the variable z to infinity.
For each point in the scene.
Project the point and compute its raster coordinates
If the distance from the current point to the eye z' is smaller than the distance z stored in the pixel the point projects to, then update z with z'. If z' is greater than z, then the point is located further away from the point currently stored for that pixel.
You can see that, you can get the same result without having to store a list of visible points and sorting them out at the end. So why did we use one? We used one because, in our example, we just assume that all points in the scene were opaque. But what happens if they are not fully opaque? If several semi-transparent points project to the same pixel, they may be visible throughout each other. In this particular case, it is necessary to keep track of all the points visible through that particular pixel, sort them out by distance, and use a special compositing technique (we will learn about this in the lesson on the REYES algorithm) to blend them correctly.
Ray Tracing to Solve the Visibility Problem: How Does It Work?
With rasterization, points are projected onto the screen to find their respective position on the image plane. But we can look at the problem the other way around. Rather than going from the point to the pixel, we can start from the pixel and convert it into a point on the image plane (we take the center of the pixel and convert its coordinates defined in raster space to screen space). This gives us P'. We can then trace a ray starting from the eye, passing through P', and extend it down into the scene (by default we will assume that P' is the center of the pixel). If we find that the ray intersects an object, then we know that the point of intersection P is the point visible through that pixel. In short, ray tracing is a method to solve the point's visibility problem, by the mean of explicitly tracing rays from the eye down into the scene.
Note that in a way, ray tracing and rasterization are a reflection of each other. They are based on the same principle, but ray tracing is going from the eye to the object, while rasterization goes from the object to the eye. While they make it possible to find which point is visible for any given pixel in the image (they give the same result in that respect), implementing them requires solving very different problems. Ray tracing is more complicated in a way because it requires solving the ray-geometry intersection problem. Do we even have a way of finding the intersection of a ray with geometry? While it might be possible to find a way of computing whether or not a ray intersects a sphere, can we find a similar method to compute the intersection of a ray with a cone for instance? And what about another shape, and what about NURBS, subdivision surfaces, and implicit surfaces? As you can see, ray tracing can be used as long as a technique exists to compute the intersection of a ray with any type of geometry a scene might contain (or your renderer might support).
Over the years, a lot of research was put into efficient ways of computing the intersection of rays with the simplest of all possible shapes - the triangle - but also directly ray tracing other types of geometry: NURBS, implicit surfaces, etc. However, one possible alternative to supporting all geometry types is to convert all geometry to a single geometry representation before the rendering process starts, and have the renderer only test the intersection of rays with that one geometry representation. Because triangles are an ideal rendering primitive, most of the time, all geometry is converted to triangles meshes, which means that rather than implementing a ray-object intersection test per geometry type, you only need to test for the intersection of rays with triangles. This has many advantages:
First as suggested before, the triangle has many properties that make it very attractive as a geometry primitive. It's co-planar, a triangle is indivisible (as creating more faces by connecting the existing vertices, as you would for faces having at least four or more vertices), but it can easily be subdivided into more triangles. Finally, the math for computing the barycentric coordinates of a triangle (which is used in texturing) is simple and robust.
Because triangles are a good geometry primitive, a lot of research was done to find the best possible ray-triangle intersection test. What is a good ray triangle intersection algorithm? It needs to be fast (get to the result using as few operations as possible). It needs to use the least memory possible (some algorithms are more memory-hungry than others because they require storing precomputed variables on the triangle geometry). And it also needs to be robust (floating-point arithmetic issues are hard to avoid).
From a coding point of view, supporting one single routine is far more advantageous than having to code many routines to handle all geometry types. Supporting triangles only simplifies the code in many places but also allows to design code that works best with triangles in general. This is particularly true when it comes to acceleration structures. Computing the intersection of rays with geometry is by far the most expensive operation in a ray tracer. The time it takes to test the intersection with all geometry in the scene grows linearly with the amount of geometry the scene contains. As soon as the scene contains even just hundreds of such primitives it becomes necessary to implement strategies to quickly discard sections of the scene, which we know have no chances to be intersected by the ray, and test for only subsections of the scene that the ray will potentially intersect. These strategies save a considerable amount of time and are generally based on acceleration structures. We will study acceleration structures in the section devoted to ray tracing techniques. Also, it's worth noticing that specially designed hardware has been already built in the past, to handle the ray-triangle intersection test specifically, allowing complex scenes to run near real-time using ray tracing. It's quite obvious that in the future, graphics cards will natively support the ray-triangle intersection test and that video games will evolve towards ray tracing.
Comparing rasterization and ray-tracing
We already talked a few times about the difference between ray tracing and rasterization. Why would you choose one or the other? As mentioned before, to sort the visibility problem, rasterization is faster than ray tracing. Why is that? Converting geometry to make it work with the rasterization algorithm takes eventually some time, but projecting the geometry itself is very fast (it just takes a few multiplications, additions, and divisions). In comparison, computing the intersection of a ray with geometry requires far more instructions and is, therefore, more expensive. The main difficulty with ray tracing is that render time increases linearly with the amount of geometry the scene contains. Because we have to check whether any given ray intersects any of the triangles in the scene, the final cost is then the number of triangles multiplied by the cost of a single ray-triangle intersection test. Hopefully, this problem can be alleviated by the use of an acceleration structure. The idea behind acceleration structures is that space can be divided into subspaces (for instance you can divide a box containing all the geometry to form a grid - each cell of that grid represents a sub-space of the original box) and that objects can be sorted depending on the sub-space they fall into. This idea is illustrated in figure 5.
If these sub-spaces are significantly larger than the objects' average size, then it is likely that a subspace will contain more than one object (of course it all depends on how they are organized in space). Instead of testing all objects in the scene, we can first test if a ray intersects a given subspace (in other words, if it passes through that sub-space). If it does, we can then test if the ray intersects any of the objects it contains, but if it doesn't, we can then skip the ray-intersection test for all these objects. This leads to only testing a subset of the scene's geometry, which is saving time.
If acceleration structures can be used to accelerate ray tracing then isn't ray tracing superior to rasterization? Yes and no. First, it is still generally slower, but using an acceleration structure raises a lot of new problems.
First building this structure takes time, which means the render can't start until it's built: this generally never takes more than a few seconds, but, if you intend to use ray tracing in a real-time application, then these few seconds are already too much (the acceleration structures needs to be built for every rendered frame if the geometry changes from frame to frame).
Second, an acceleration structure potentially takes a lot of memory. This all depends on the scene complexity, however, because a good chunk of the memory needs to be used for the acceleration structure, this means that less is available for doing other things, particularly storing geometry. In practice, this means you can potentially render less geometry with ray tracing than with rasterization.
Finally finding a good acceleration structure is very difficult. Imagine that you have one triangle on one side of the scene and all the other triangles stuck together in a very small region of space. If we build a grid for this scene many of the cells will be empty but the main problem is that when a ray traverses the cell containing the cluster of triangles, we will still need to perform a lot of intersection tests. Saving one test over the hundreds that may be required, is negligible and clearly shows that a grid as an acceleration structure for that sort of scene is not a good choice. As you can see, the efficiency of the acceleration structure depends very much on the scene, and the way objects are scattered: are object smalls or large, is it a mix of small and large objects, are objects uniformly distributed over space or very unevenly distributed? Is the scene a combination of any of these options?
Many different acceleration structures have been proposed and they all have as you can guess strengths and weaknesses, but of course, some of them are more popular than others. You will find many lessons devoted to this particular topic in the section devoted to Ray Tracing Techniques.
From reading all this you may think that all problems are with ray tracing. Well, ray tracing is popular for a reason. First, in its principle, it is incredibly simple to implement. We showed in the first lesson of this section that a very basic raytracer can be written in no more than a few hundred lines of code. In reality, we could argue that it wouldn't take much more code to write a renderer based on the rasterization algorithm, but still, the concept of ray tracing seems to be easier to code, as maybe it is a more natural way of thinking of the process of making an image of a 3D scene. But far more importantly, it happens that if you use ray tracing, computing effects such as reflection or soft shadow which play a critical role in the photo-realism of an image, are just straightforward to simulate in ray tracing, and very hard to simulate if you use rasterization. To understand why, we first need to look at shading and light transport in more detail, which is the topic of our next chapter.
Rasterization is fast but needs cleverness to support complex visual effects. Ray tracing supports complex visual effects but needs cleverness to be fast - David Luebke (NVIDIA).
With rasterization it is easy to do it very fast, but hard to make it look good. With ray tracing it is easy to make it look good, but very hard to make it fast.
In this chapter, we only look at using ray tracing and rasterization as two possible ways of solving the visibility problem. Rasterisation is still the method by which graphics cards render 3D scenes. Rasterisation is still faster compared to ray tracing when it comes to using one algorithm or the other to solve the visibility problem. You can accelerate ray tracing through with an acceleration structure, however, acceleration structures come with their own set of issues: it's hard to find a good acceleration structure, one that performs well regardless of the scene configuration (number of primitives to render, their sizes and their distribution in space). They also require extra memory and building them takes time.
It is important to appreciate that at this stage, ray tracing does not have any definite advantages over rasterization. However, ray tracing is better than rasterization to simulate light or shading effects such as soft shadows or reflections. When we say better we mean that is more straightforward to simulate them with ray tracing than it is with rasterization, which doesn't mean at all these effects can't be simulated with rasterization. It just generally only requires more work. We insist on this point because there is a common misbelief regarding the fact that effects such as reflections for example can't be done with rasterization, which is why ray tracing is used. It is simply not true. However, one might think about using a hybrid approach in which rasterization is used for the visibility surface elimination step, and ray tracing is used for shading, the second step of the rendering process, but having to implement both systems in the same applications requires more work than just using one unified framework. And since ray tracing makes it easier to simulate things such as reflections, then most people prefer to use ray tracing to solve the visibility problem as well.