The Visibility Problem, the Depth Buffer Algorithm and Depth Interpolation
Reading time: 15 mins.In the second chapter of this lesson, we learned that in the third coordinate of the projected point (the point in screen space) we store the original vertex zcoordinate (the zcoordinate of the point in camera space):
$$ \begin{array}{l} P_{screen}.x = \dfrac{ near * P_{camera}.x }{ P_{camera}.z}\\ P_{screen}.y = \dfrac{ near * P_{camera}.z }{ P_{camera}.z}\\ P_{screen}.z = P_{camera}.z\\ \end{array} $$Finding the zcoordinate of a point on the surface of the triangle is useful when a pixel overlaps more than one triangle. And the way we find that zcoordinate is by interpolating the original vertices zcoordinates using the barycentric coordinates that we learned about in the previous chapter. In other words, we can treat the zcoordinates of the triangle vertices as any other vertex attribute, and interpolate them the same way we interpolated colors in the previous chapter. Before we look into the details of how this zcoordinate is computed, let's start to explain why we need to do so.
The DepthBuffer or ZBuffer Algorithm and Hidden Surface Removal
When a pixel overlaps a point, what we see through that pixel is a small area on the surface of a triangle, which for simplification we will reduce to a single point (denoted P in figure 1). Thus each pixel covering a triangle corresponds to a point on the surface of that triangle. Of course, if a pixel covers more than one triangle, we then have several of these points. The problem when this happens is to find which one of these points is visible. We have illustrated this concept in 2D in figure 2. We could test triangles from back to front (this technique would require sorting triangles by decreasing depth first) but this doesn't always work when triangle intersects each other (figure 2, bottom). The only reliable solution is to compute the depth of each triangle a pixel overlaps, and then compare these depth values to find out which one is the closest to the camera. If you look at figure 2, you can see that a pixel in the image overlaps two triangles in P1 and P2. However, the P1 zcoordinate (Z1) is lower than the P2 zcoordinate (Z2) thus we can deduce that P1 is in front of P2. Note that this technique is needed because triangles are tested in a "random" order. As mentioned before we could sort out triangles in decreasing depth order but this is not good enough. Generally, they are just tested in the order they are specified in the program, and for this reason, a triangle T1 that is closer to the camera can be tested before a triangle T2 that is further away. If we were not comparing these triangles' depth, then we would end up in this case seeing the triangle which was tested last (T2) when in fact we should be seeing T1. As mentioned many times before, this is called the visibility problem or hidden surface problem. Algorithms for ordering objects so that they are drawn correctly are called visible surface algorithms or hidden surface removal algorithms. The depthbuffer or zbuffer algorithm that we are going to study next belongs to this category of algorithms.
One solution to the visibility problem is to use a depthbuffer or zbuffer. A depthbuffer is nothing more than a twodimensional array of floats that has the same dimension as the framebuffer and that is used to store the depth of the object as the triangles are being rasterized. When this array is created, we initialize each pixel in the array with a very large number. If we find that a pixel overlaps the current triangle, we do as follows:

We first compute the zcoordinate or depth of the point on the triangle that the pixel overlaps.

We then compare that current triangle depth with the value stored in the depth buffer for that pixel.

If we find that the value stored in the depthbuffer is greater than the depth of the point on the triangle, then the new point is closer to the observer or the camera than the point stored in the depth buffer at that pixel location. The value stored in the depthbuffer is then replaced with the new depth, and the framebuffer is updated with the current triangle color. On the other hand, if the value stored in the depthbuffer is smaller than the current depth sample, then the triangle that the pixel overlaps is hidden by the object whose depth is currently stored in the depthbuffer.
Note that once all triangles have been processed, the depthbuffer contains "some sort" of image, that represents the "distance" between the visible parts of the objects in the scene and the camera (this is not a distance but the zcoordinate of each point visible through the camera). The depth buffer is essentially useful to solve the visibility problem, however, it can also be used in postprocessing to do things such as 2D depth of field, adding fog, etc. All these effects are better done in 3D but applying them in 2D is often faster but the result is not always as accurate as what you can get in 3D.
Here is an implementation of the depthbuffer algorithm in pseudocode:
float *depthBuffer = new float [imageWidth * imageHeight]; // Initialize depthbuffer with a very large number for (uint32_t y = 0; y < imageHeight; ++y) for (uint32_t x = 0; x < imageWidth; ++x) depthBuffer[y][x] = INFINITY; for (each triangle in the scene) { // Project triangle vertices ... // Compute 2D triangle boundingbox ... for (uint32_t y = bbox.min.y; y <= bbox.max.y; ++y) { for (uint32_t x = bbox.min.x; x <= bbox.max.x; ++x) { if (pixelOverlapsTriangle(i + 0.5, j + 0.5) { // Compute the zcoordinate of the point on the triangle surface float z = computeDepth(...); // Current point is closest than object stored in depth/framebuffer if (z < depthBuffer[y][x]) { // Update depthbuffer with that depth depthBuffer[y][x] = z; frameBuffer[y][x] = triangleColor; } } } } }
Finding Z by Interpolation
Hopefully, the principle of the depthbuffer is simple and easy to understand. All we need to do now is explained how depth values are computed. First, let's repeat one more time what that depth value is. When a pixel overlaps a triangle, it overlaps a small surface on the surface of the triangle, which as mentioned in the introduction we will reduce to a point for simplification (point P in figure 1). What we want to find here, is this point zcoordinate. As also mentioned earlier in this chapter, if we know the triangle vertices' zcoordinate (which we do, they are stored in the projected point zcoordinate), all we need to do is interpolate these coordinates using P's barycentric coordinates (figure 4):
$$P.z = \lambda_0 * V0.z + \lambda_1 * V1.z + \lambda_2 * V2.z.$$Technically this sounds reasonable, though unfortunately, it doesn't work. Let's see why. The problem is not in the formula itself which is perfectly fine. The problem is that once the vertices of a triangle are projected onto the canvas (once we have performed the perspective divide), then z, the value we want to interpolate, doesn't vary linearly anymore across the surface of the 2D triangle. This is easier to demonstrate with a 2D example.
The secret lies in figure 4. Imagine that we want to find the "image" of a line defined in 2D space by two vertices V0 and V1. The canvas is represented by the horizontal green line. This line is one unit away (along the zaxis) from the coordinate system origin. If we trace lines from V0 and V1 to the origin, then we intersect the green lines in two points (denoted V0' and V1' in the figure). The zcoordinate of this point is 1 since they lie on the canvas which is 1 unit away from the origin. The xcoordinate of the points can easily be computed using perspective projection. We just need to divide the original vertex xcoordinates by their zcoordinate. We get:
$$ \begin{array}{l} V0'.x = \dfrac{V0.x}{V0.z} = \dfrac{4}{2} = 2,\\ V1'.x = \dfrac{V1.x}{V1.z} = \dfrac{2}{5} = 0.4. \end{array} $$The goal of the exercise is to find the zcoordinate of P, a point on the line defined by V0 and V1. In this example, all we know about P is the position of its projection P', on the green line. The coordinates of P' are {0,1}. The problem is similar to trying to find the zcoordinate of a point on the triangle that a pixel overlaps. In our example, P' would be the pixel and P would be the point on the triangle that the pixel overlaps. What we need to do now, is compute the "barycentric coordinate" of P' with respect to V0' and V1'. Let's call the resulting value \(\lambda\). Like our triangle barycentric coordinates, \(\lambda\) is also in the range [0,1]. To find \(\lambda\), we just need to take the distance between V0' and P' (along the xaxis), and divide this number by the distance between V0' and V1'. If linearly interpolating the zcoordinates of the original vertices V0 and V1 using \(\lambda\) to find the depth of P works, then we should get the number 4 (we can easily see by just looking at the illustration that the coordinates of P are {0,4}). Let's first compute \(\lambda\):
$$\lambda=\dfrac{P'x  V0'.x}{V1'.x  V0'.x} = \dfrac{02}{0.42}= \dfrac{2}{2.4} = 0.833.$$If we now linearly interpolate V0 and V1 zcoordinate to find the P zcoordinate we get:
$$P.z = V0.z * (1\lambda) + V1.z * \lambda\ = 2 * 1.666 + 5 * 0.833 = 4.5.$$This is not the value we expect! Interpolating the original vertices zcoordinates, using P's "barycentric coordinates" or \(\lambda\) in this example, to find P zcoordinate doesn't work. Why? The reason is simple. Perspective projection preserves lines but does not preserve distances. It's quite easy to see in figure 4, that the ratio of the distance between V0 and P over the distance between V0 and V1 (0.666) is not the same as the ratio of the distance between V0' and P' over the distance between V0' and V1' (0.833). If \(\lambda\) was equal to 0.666 it would work fine, but here is the problem, it's equal to 0.833 instead! So, how do we find the zcoordinate of P?
The solution to the problem is to compute the inverse of the P zcoordinate by interpolating the inverse of the vertices V0 and V1 zcoordinates using \\(\\lambda\\). In other words, the solution is:
$$\dfrac{1}{P.z} = \color{purple}{\dfrac{1}{V0.z} * (1\lambda) + \dfrac{1}{V1.z} * \lambda}.$$Let's check that it works:
$$\dfrac{1}{P.z} = \dfrac{1}{V0.z} * (1\lambda) + \dfrac{1}{V1.z} * \lambda = \dfrac{1}{2} * (12/2.4)+ \dfrac{1}{5} * (2/2.4) = 0.25.$$If now take the inverse of this result, we get for P zcoordinate the value 4. Which is the correct result! As mentioned before, the solution is to linearly interpolate the vertex's zcoordinates using barycentric coordinates, and invert the resulting number to find the depth of P (its zcoordinate). In the case of our triangle, the formula is:
$$\dfrac{1}{P.z} = \dfrac{1}{V0.z} * \lambda_0 + \dfrac{1}{V1.z} * \lambda_1 + \dfrac{1}{V2.z} * \lambda_2.$$Let's now look into this problem more formally. Why do we need to interpolate the vertex's inverse zcoordinates? The formal explanation is a bit complicated and you can skip it if you want. Let's consider a line in camera space defined by two vertices whose coordinates are denoted \((X_0,Z_0)\) and \((X_1,Z_1)\). The projection of these vertices on the screen is denoted \(S_0\) and \(S_1\) respectively (in our example, we will assume that the distance between the camera origin and the canvas is 1 as shown in figure 5). Let's call S a point on the line defined by \(S_0\) and \(S_1\). S has a corresponding point P on the 2D line whose coordinates are (X,Z = 1) (we assume in this example that the screen or the vertical line on which the points are projected is 1 unit away from the coordinate system origin). Finally, the parameters \(t\) and \(q\) are defined such that:
$$ \begin{array}{l} P = P_0 * (1t) + P_1 * t,\\ S = S_0 * (1q) + S_1 * q.\\ \end{array} $$Which we can also write as:
$$ \begin{array}{l} P = P_0 + t * (P_1  P_0),\\ S = S_0 + q * (S_1  S_0).\\ \end{array} $$The (X,Z) coordinates of point P can thus be computed by interpolation (equation 1):
$$(X,Z) = (X_0 + t * (X_1  X_0), Z_0 + t * (Z_1  Z_0)).$$Similarly (equation 2):
$$S = S_0 + q * (S_1  S_0).$$S is a 1D point (it has been projected on the screen) thus it has no zcoordinate. S can also be computed as:
$$S = \dfrac{X}{Z}.$$Therefore:
$$Z = \dfrac{X}{S}.$$If we replace the numerator with equation 1 and the denominator with equation 2, then we get (equation 3):
$$Z = \dfrac{\color{red}{X_0} + t * (\color{green}{X_1}  \color{red}{X_0})}{S_0 + q * (S_1  S_0)}$$We also have:
$$\begin{array}{l} S_0 = \dfrac{X_0}{Z_0},\\ S_1 = \dfrac{X_1}{Z_1}. \end{array} $$Therefore (equation 4):
$$\begin{array}{l} \color{red}{X_0 = S_0 * Z_0},\\ \color{green}{X_1 = S_1 * Z_1}. \end{array} $$If now replace \(X_0\) and \(X_1\) in equation 3 with equation 4, we get (equation 5):
$$Z = \dfrac{\color{red}{S_0 * Z_0} + t * (\color{green}{S_1 * Z_1}  \color{red}{S_0 * Z_0})}{S_0 + q * (S_1  S_0)}$$Remember from equation 1 that (equation 6):
$$Z = Z_0 + t * (Z_1  Z_0).$$If we combine equations 5 and 6 we get:
$$Z_0 + t * (Z_1  Z_0) = \dfrac{\color{red}{S_0 * Z_0} + t * (\color{green}{S_1 * Z_1}  \color{red}{S_0 * Z_0})}{S_0 + q * (S_1  S_0)}.$$Which can be simplified to:
$$ \begin{array}{l} (Z_0 + t (Z_1  Z_0))(S_0 + q(S_1  S_0))=S_0Z_0 + t(S_1Z_1  S_0Z_0),\\ Z_0S_0 + Z_0q(S_1  S_0)+t(Z_1  Z_0)S_0+t (Z_1  Z_0)q(S_1  S_0)=S_0Z_0 + t (S_1Z_1  S_0Z_0),\\ t[(Z_1  Z_0)S_0 + (Z_1  Z_0)q(S_1  S_0) (S_1Z_1  S_0Z_0)] =qZ_0(S_1  S_0),\\ t[Z_1S_0  Z_0S_0 + (Z_1  Z_0)q(S_1  S_0)  S_1Z_1 + S_0Z_0] =qZ_0(S_1  S_0),\\ t(S_1  S_0)[Z_1  q(Z_1  Z_0)]=qZ_0(S_1  S_0),\\ t[qZ_0 +(1q)Z_1]=qZ_0. \end{array} $$We can now express the parameter \(t\) in terms of \(q\):
$$t=\dfrac{qZ_0}{qZ_0 +(1q)Z_1}.$$If we substitute for t in equation 6, we get:
$$ \begin{array}{l} Z &= Z_0 + t * (Z_1  Z_0) = Z_0 + \dfrac{qZ_0(Z_1  Z_0)}{qZ_0 +(1q)Z_1},\\ &= \dfrac{qZ_0^2 + (1q)Z_0Z_1 + qZ_0Z_1  qZ_0^2}{qZ_0 +(1q)Z_1},\\ &= \dfrac{Z_0Z_1}{qZ_0 +(1q)Z_1},\\ &= \dfrac{1}{\dfrac{q}{Z_1} + \dfrac{(1q)}{Z_0}},\\ &= \dfrac{1}{\dfrac{1}{Z_0} +q (\dfrac{1}{Z1}  \dfrac{1}{Z_0})}.\\ \end{array} $$And from there you can write:
$$\begin{array}{l} \dfrac{1}{Z} &= \dfrac{1}{Z_0} +q (\dfrac{1}{Z1}  \dfrac{1}{Z_0}) = \color{purple}{\dfrac{1}{Z_0}(1q) + \dfrac{1}{Z_1}q}. \end{array} $$Which is the formula we wanted to end up with.
You can use a different approach to explain the depth interpolation issue (but we prefer the one above). You can see the triangle (in 3D or camera space) lying on a plane. The plane equation is (equation 1):
$$AX + BY + CZ = D.$$We know that:
$$ \begin{array}{l} X_{screen} = \dfrac{X_{camera}}{Z_{camera}},\\ Y_{screen} = \dfrac{Y_{camera}}{Z_{camera}}.\\ \end{array} $$Thus:
$$ \begin{array}{l} X_{camera} = X_{screen}Z_{camera},\\ Y_{camera} = Y_{screen}Z_{camera}. \end{array} $$If we substitute these two equations in equation 1 and solve for \(Z_{camera}\), we get:
$$ \begin{array}{l} AX_{screen}Z_{camera} + BY_{screen}Z_{camera} + CZ_{camera} = D,\\ Z_{camera}(AX_{screen} + BY_{screen} + C) = D,\\ \dfrac{D}{Z_{camera}} = AX_{screen} + BY_{screen} + C,\\ \dfrac{1}{Z_{camera}} = \dfrac{A}{D}X_{screen} + \dfrac{B}{D}Y_{screen} + \dfrac{C}{D},\\ \dfrac{1}{Z_{camera}} = {A'}X_{screen} + {B'}Y_{screen} + {C'},\\ \end{array} $$With: \(A'=\dfrac{A}{D}\), \(B'=\dfrac{B}{D}\), \(C'=\dfrac{C}{D}\).
What this equation shows is that \(1/Z_{camera}\) is an affine function of \(X_{camera}\) and \(Y_{camera}\) which can be interpolated linearly across the surface of the projected triangle (the triangle in screen, NDC or raster space).
Other Visible Surface Algorithms
As mentioned in the introduction, the zbuffer algorithm belongs to the family of hidden surface removal or visible surface algorithms. These algorithms can be divided into two categories: the object space and image space algorithms. The "painter's" algorithm which we haven't talked about in this lesson belongs to the former, while the zbuffer algorithm belongs to the latter type. The concept behind the painter's algorithm is roughly to paint or draw objects from back to front. This technique requires objects to be sorted in depth. As explained earlier in this chapter, first objects are passed down to the renderer in arbitrary order, and then when two triangles intersect each other, it becomes difficult to figure out which one is in front of the other (thus deciding which one should be drawn first). This algorithm is not used anymore but the zbuffer is very common (GPUs use it).