Rasterization: a Practical Implementation

News (August, 31): We are working on Scratchapixel 3.0 at the moment (current version of 2). The idea is to make the project open source by storing the content of the website on GitHub as Markdown files. In practice, that means you and the rest of the community will be able to edit the content of the pages if you want to contribute (typos and bug fixes, rewording sentences). You will also be able to contribute by translating pages to different languages if you want to. Then when we publish the site we will translate the Markdown files to HTML. That means new design as well.

That's what we are busy with right now and why there won't be a lot of updates in the weeks to come. More news about SaP 3.0 soon.

We are looking for native Engxish (yes we know there's a typo here) speakers that will be willing to readproof a few lessons. If you are interested please get in touch on Discord, in the #scratchapixel3-0 channel. Also looking for at least one experienced full dev stack dev that would be willing to give us a hand with the next design.

Feel free to send us your requests, suggestions, etc. (on Discord) to help us improve the website.

And you can also donate). Donations go directly back into the development of the project. The more donation we get the more content you will get and the quicker we will be able to deliver it to you.

21 mns read.

Improving the Rasterization Algorithm

Figure 1: jagged edges pixel artefacts can be reduced using anti-aliasing.

Figure 2: when one sample is used the triangle is missed. But by using sub-pixels, we can detect that the pixel overlaps the triangle at least partially. The pixel color is equal to the sum of the sub-pixels colors divided by the total number of sub-pixels or samples (in this example, 16 samples).

Figure 3: pixels can't properly capture the shape of continuous surfaces.

Figure 4: using sub-pixels to improve fight aliasing.

Figure 5: anti-aliasing helps with smoothing jagged edges.

Figure 6: 1 sample per pixels or 1 pps (top) vs. 4 samples per pixels (bottom).

Figure 7: if the 4 corners of an 8x8 pixel grid overlap the triangle then all remaining pixels of the grids cover the triangle too.

Aliasing and Anti-Aliasing

All the techniques we presented in the previous chapters are the foundation of the rasterization algorithm. Though, we have only implemented these techniques in a very basic way. The GPU rendering pipeline and other rasterization-based production renderers use the same concepts but they used highly optimized versions of these algorithms. Presenting all the different tricks that are used to speed up the algorithm goes way beyond the scope of an introduction. We will just review some of them quickly now but we plan to devote a lesson to this topic in the future.

First, let's consider one basic problem with 3D rendering. If you zoom up on the image of the triangle that we rendered in the previous chapter, you will notice that the edges of the triangle are not regular (in fact, it is not specific to the edge of the triangle, you can also see that the checkerboard pattern is irregular on the edge of the squares). The steps that you can easily see in figure 1, are called jaggies. These jagged edges or stair-stepped edges (depending on how you prefer to call them) are not an artifact. This is just the result of the fact that the triangle is broken down into pixels. What we do with the rasterization process is break down a continuous surface (the triangle) into discrete elements (the pixels), a process that we already mentioned in the introduction to rendering. The problem is similar to trying to represent a continuous curve or surface with Lego bricks. You simply can't, you will always see the bricks (figure 2). The solution to this problem in rendering is called anti-aliasing (also denoted AA). Rather than rendering only 1 sample per pixel (check if the pixel overlaps the triangle by testing if the point in the center of the pixel covers the triangles), we split the pixel into sub-pixels and repeat the coverage test for each sub-pixels. Of course, each sub-pixel is nothing else than another brick thus this doesn't solve the problem entirely. Nonetheless, it allows for capturing the edges of the objects with slightly more precision. Pixels are most of the time divided into an N by N number of sub-pixels where N is generally a power of 2 (2, 4, 8, etc.), though it can technically take on any value greater or equal to 1 (1, 2, 3, 4, 5, etc.). There are different ways of addressing this aliasing issue. The method we described belongs to the category of sampling-based anti-aliasing methods.

The pixel final color is computed as the sum of all the sub-pixels color divided by the total number of sub-pixels. Let's take an example (as with pixels, if a sub-pixel or sample covers a triangle, it then takes on the color of that triangle, otherwise it takes on the background color which is usually black). Imagine that the triangle is white. If only 2 or the 4 samples overlap the triangle, the final pixel color will be equal to (0+0+1+1)/4=0.5. The pixel won't be completely white, but it won't be completely black either. Thus rather than having a "binary" transition between the edge of the triangle and the background, that transition is more gradual, which reduces the stair-stepped pixels artifact. This is what we call anti-aliasing. To understand anti-aliasing you need to study signal processing theory, which again is a very large and pretty complex topic on its own.

The reason why it is best to choose N as a power of 2, is because most processors these days, can run several instructions in parallel and the number of instructions run in parallel is also generally a power of 2. You can look on the Web for things such as SSE instruction sets which are specific to CPUs, but GPUs use the same concept. SSE is a feature that is available on most modern CPUs and that can be used to run generally 4 or 8 floating point calculations at the same (in one cycle). All this means is that, for the price of 1 floating point operation, you get 3 or 7 for free. This can in theory speed up your rendering time by 4 or 8 (you can never reach that level of performance though because you need to pay a small penalty for setting these instructions up). You can use SSE instructions for example to render 2x2 sup-pixels for the cost of computing 1 pixel, and as a result, you get smoother edges for "free* (the stair-stepped edges are less visible).

Rendering Blocks of Pixels

Another common technique to accelerate rasterization is to render blocks of pixels, but rather than testing all pixels contained in the block, we first start to test pixels at the corners of the block. GPU algorithms can use blocks of 8x8 pixels. In fact, the technique used is more elaborate and is based on a concept of tiles, but we won't detail it here. If all four corners of that 8x8 grid cover the triangle, then necessarily, the other pixels of the grid also cover the rectangle (as shown in figure 7). In that case, there is no need to test all the other pixels which saves a lot of time. They can just be filled up with the triangle's colors. If vertex attributes need to be interpolated across the pixels block, this is also straightforward because if you have computed them at the block's corners, then all you need to do is linearly interpolate them in both directions (horizontally and vertically). This optimization only works when triangles are close to the screen and thus large in screen space. Small triangles don't benefit from this technique.

Optimizing the Edge Function

The edge function too can be optimized. Let's have a look at the function implementation again:

int orient2d(const Point2D& a, const Point2D& b, const Point2D& c) { return (b.x-a.x)*(c.y-a.y) - (b.y-a.y)*(c.x-a.x); }

Recall that a and b in this function are the triangle vertices and that c is the pixel coordinates (in raster space). One interesting thing to note is that this function is going to be called for each pixel contained in the triangle bounding box. Though while we iterate over multiple pixels only c changes. The variable a and b stay the same. Suppose we evaluate the equation one time and get a result w0:

w0 = (b.x-a.x)*(c.y-a.y) - (b.y-a.y)*(c.x-a.x);

Then c.x gets incremented by an amount s (the per pixel step). The new value of w0 is going to be:

w0_new = (b.x-a.x)*(c.y-a.y) - (b.y-a.y)*(c.x+s-a.x);

Subtracting the first equation from the second, we get:

w0_new - w0 = -(b.y-a.y)*s;

-(b.y-a.y)*s is a constant value for a given triangle, because s is the same amount each time (one pixel), and a and b as already mentioned are constant too. We can calculate it once and store it in a variable (call it w0_step) and then the calculation reduces to:

w0_new = w0 + w0step;

You can do this for w1 and w2, and also do a similar thing for the c.y step.

The edge function uses 2 mults and 5 subs but with this trick, it can be reduced to a simple addition (of course you need to compute a few initial values). This technique is well documented on the internet. We won't be using it in this lesson but we will study it in more detail and implement it in another lesson devoted to advanced rasterization techniques.

Fixed Point Coordinates

Figure 8: fixed point coordinates.

Finally and to conclude this section, we will briefly talk about the technique that consists of converting vertex coordinates which are initially defined in floating-point format to fixed-point format just before the rasterization stage. Fixed-point is the fancy word (in fact the correct technical term) for integer. When vertex coordinates are converted from NDC to raster space, they are then also converted from floating point numbers to fixed point numbers. Why do we do that? There is no easy and quick answer to this question. But to be short, let's just say that GPUs use fixed point arithmetic because from a computing point of view, manipulating integers is easier and faster than manipulating floats or doubles (it only requires logical bit operations). Again this is just a very generic explanation. The conversion from floating point to integer coordinates and how is the rasterization process implemented using integer coordinates is a large and complex topic that is not documented on the Internet (you will find very little information about this topic which is very strange considering that this very process is central to the way modern GPUs work).

The conversion step involves rounding off the vertex coordinates to the nearest integer. Though if you only do so, then you sort of snap the vertex coordinates to the nearest pixel corner coordinates. This is not so much an issue when you render a still image, but it creates visual artifacts with animation (vertices are snapped to different pixels from frame to frame). The workaround is to convert the number to the smallest integer value but also reserve some bits to encode the sub-pixel position of the vertex (the fractional part of the vertex position). GPUs typically use 4 bits to encode sub-pixel precision (you can search graphics APIs documentation for the term sub-pixel precision). In other words, on a 32 bits integer, 1-bit might is used to encode the number's sign, 27 bits are used to encode the vertex integer position, and 4 bits are used to encode the fractional position of the vertex within the pixel. This means that the vertex position is "snapped" to the closest corner of a 16x16 sub-pixel grid as shown in figure 8 (with 4 bits, you can represent any integer number in the range [1:15]). Somehow the vertex position is still snapped to some grid corner, but snapping, in this case, is less of a problem than when the vertex is snapped to the pixel coordinates. This conversion process leads to many other issues one of which is integer overflow (overflow occurs when the result of an arithmetic operation produces a number that is greater than the number that can be encoded with the number of available bits). This may happen because integers cover a smaller range of values than floats. Things gets also sort of complex when anti-aliasing is thrown into the mix. It would take a lesson on its own to explore the topic in detail.

Fixed points coordinates allow to speed up the rasterization process and the edge function even more. This is one of the reasons for converting vertex coordinates to integers. These optimization techniques will be presented in another lesson.

Notes Regarding our Implementation of the Rasterization Algorithm

Finally, we are going to quickly review the code provided in the source code chapter. Here is a description of its main components:

Use this code for learning purposes only. It is not efficient at all. Many simple optimizations could be made but we wrote it for clarity, not efficiency. Our goal is not to write production code but code that helps to learn how things work in principle. Optimizations are left to you but trying to improve the code could be a good exercise.
The object that is being rendered by the program is stored in an include file. This is fine for this small program but this would never be done in a professional application. Geometry files can be very large and including this data in the program means that the size of the program will become very large as well (which is not a good thing, plus compile time is going to increase a lot). Though again for this simple test and because the object we are using is quite light, this is not an issue. Though more generally if you are unsure about the way we represent the geometry of a 3D object in a program, check the lesson on this topic in the Modeling and Geometry section. The only information we will be using in this program are the triangle vertices' positions (in world space) and their texture or st coordinates.

Here is the result the program should produce. On the left, you can see a render of the object in Maya. On the right, our render. As you can see the results are identical in terms of objects' position and shading. Maya uses a technique called stochastic sampling that has for effect to make the aliasing artifact slightly less visible than in our render, but the difference is subtle.

As you can see, there is nothing magic about rendering. When you know the rules, you can reproduce images produced by professional applications.

As a bonus, we exported the position in world space of the point on the triangle that each pixel in the image overlaps. We then displayed all the points in a 3D viewer. You can see the results below. Not surprisingly, points only show up on parts of the object that are directly visible to the camera. This shows that the depth-buffer technique works as expected. Every triangle on the back of the object that is hidden by another triangle is not rendered. The second image shows a close-up of the same point set (right).

Conclusion

The main advantage of rasterization is simplicity and speed. The main drawback is that it is only useful to solve the visibility problem. Remember that rendering involves two steps: visibility and shading. The algorithm is of no use when it comes to shading.

In this lesson, we have told you everything you needed to know about the rasterization algorithm. Everything else you can learn from there is not so much about the technique itself but more about optimizing it (how to run it in parallel, how to make it efficient, how to run it on the GPU, etc.).

We hope to have provided one of the best introductions to the rasterization technique. If you appreciated this work and learned something while reading this lesson, please consider donating.

Exercises

Reference

A Parallel Algorithm for Polygon Rasterization. Juan Pineda. Siggraph 1988.

A Subdivision Algorithm for Computer Display of Curved Surfaces. Edwin Earl Catmull. Thesis 1974.

Fundamentals of Texture Mapping and Image Wrapping. Paul S. Heckbert. Thesis 1989.