**Contents**

You may have heard the term Quasi Monte Carlo or Quasi Monte Carlo Ray Tracing and wonder what it means (another one of these magic words people from the graphics community use all the time and each time your hear it you grind your teeth in frustration and think "damn if someone could just explain it to me in simple terms" - that's why we are here for). The math behind Quasi Monte Carlo can be quite complex, and it requires more than a chapter to study them seriously (entire Mathematics text books are devoted to the matter). However to give you a complete overview on Monte Carlo, we will try in this chapter to give you a quick introduction to this concept, and explain as usual, what it is, how and why it works.

We will revise this chapter, once all other lessons are completed. QMC (Quasi Monte Carlo) is a very interesting method and a lesson will be devoted to this topic in the advanced lesson at some point in time. Once we have completed the lesson on Light Transport Algorithms and Sampling, it will become easier to experiment with QMC and show with more concrete example why it is superior to basic MC.

## Introduction

Basic Monte Carlo works. We showed this in the previous chapters. It's a great technique, simple to implement and it can be used to solve many problems, especially in computer graphics. We also studied importance sampling in the previous chapter. It potentially reduces variance by increasing the probability of generating samples where the function being integrated is the most important. However both basic Monte Carlo and important sampling suffer from a problem known as **sample clumping**. When the samples are generated, some of them might be very close to each other. They form clumps. Our credit sample being limited we need to use it wisely to gather as much information as possible on the integrand. If two (or more) samples are close to each other, they each provide us with the same information on the function (more or less), and thus while one of them is useful the others are definitely wasted. Importance sampling doesn't save us from clumping. The PDF only controls the density of samples within any part of the integrand, not their numbers. An example of clumping is showed in figure 1.

When Monte Carlo is used, we know that samples are randomly chosen. On the other hand (and in contrast to using a Monte Carlo integration), you can use a deterministic quadratic technique such as the Riemann sum, in which the function is sampled at perfectly regular intervals (as showed in figure 2). So, on one hand we have perfectly regularly spaced samples (with the Riemann sum) and on the other, you have samples whose positions are completely random (with Monte Carlo integration) which potentially lead to clumping. So, as you may have guessed, you might be tempted to ask: "can we come up with something in between? Can we use some sort of slider to go from completely regularly spaced to completely random, and find some sort of sweet spot somewhere in the middle?" And the quick answer is yes: that's more or less what Quasi Monte Carlo is all about. Remember that the only reason why we don't use deterministic techniques such as the Riemann sum is because they don't scale up well: they become expensive as the dimension of the integral increases. Which is also why MC integration is attractive in comparaison.

In fact, nothing imposes us in a Monte Carlo integration to use stochastic samples. We could very well replace \(\xi\) with a series of evenly distributed samples in the MC integration, and the resulting estimate would still be valid. In other words, we can replace random samples by non random samples and the approximation still works. Note that when the points are perfectly evenly distributed then, the Monte Carlo integration and the Riemann sum are similar. If \({s_1, s_2, ..., s_n}\) is a sequence of evenly distributed numbers on the interval [a,b], then:

$$\lim_{N \to \infty} {(b-a) \over N} \sum_{i=1}^N f(s_i) = \int_a^b f(x) \: dx,$$where f is a Riemann-integrable function. Though in QMC the sequences commonly used are said to be **equidistributed**, a term we will explain later. But as an intuitive definition, we can say that equidistributed points are points which are more or less evenly distributed, where the measure of the difference between the actual point distribution compared to a squence of points perfectly well distributed is called in QMC jargon, **discrepancy**. In the world of QMC we won't really speak of variance but more of discrepancy. Again, these will be explained briefly in this chapter and in more details in the lesson on QMC in the advanced section. But in short, QMC blurs the lines between random and deterministic sequences. And in effect, the idea behind using these sequences of points, is to get as closely as possible to the ideal situation in which **points are well-distributed, and yet not uniformly distributed**. Figure 3 below illustrates this idea. The random sequence at the top shows some clumping and gaps (regions with no samples) and thus is not ideal. The sequence in the middle is purely regular and thus will cause aliasing. The sequence at the bottom is somewhere in between: while it exhibits some property of regularity, the points are not completely uniformly distributed. The idea of dividing the sample space into regular intervals and randomly placing a sample within each one of these sub-intervals is called jittering or stratified sampling.

Quasi Monte Carlo methods were first proposed in the 1950s about a decade after Monte Carlo methods (which makes sense because once people started to get familiar with Monte Carlo methods, they started to look into ways of improving them). QMC is used in many areas, engineering, finance and of course computer graphics (to just name a few), and even though few people can claim to be specialists of the subject, it certainly gained popularity over the last few years. Whether using QMC is better than traditional MC in rendering is often debated. It does converge faster than MC bit it has its own disatvantages. For example it can sometimes produce aliasing

Remember that in terms of visual artifact, deterministic techniques such as the Rieman sum produce **aliasing** while Monte Carlo integration produces **noise**. Visually noise is often considered as the lesser of two evils, compared to aliasing.

These visual artifact can easily be generated. Figure 4 shows the result of sampling a sine function with increasing frequency. When the sampling rate gets lower than the Nyquist frequency, noise or aliasing show up. A lesson of the advanced of the advanced section is devoted to aliasing in wich the concept of Nyquist limit is explained and we will explain aliasing from a mathematical point of view when we get to the topic of sampling and Fourier analysis (which we kept for the end).

## Stratified Sampling

Before we look into generating sequences of quasi random numbers, we will first talk about a method which is somewhere in between random and regular distributed samples. This method is called stratified sampling and was introduced to the graphics community by Robert L. Cook in a seminal paper entitled Stochastic Sampling in Computer Graphics (which he published in 1986 but the technique was developed at Pixar in 1983). This paper is quite fundamental to the field of computer graphics and will be studied in depth in the lesson on Sampling (in the basic section). In this paper Cook studied the quality of different sampling strategies notably by using Fourier analysis which is with Monte Carlo theory, another one of these really large topics which we better keep for the end. In his paper, Cook didn't call the method stratified sampling but **jittered sampling**, but nowadays it only seems to be known under the former name.

The concept of stratified sampling is very simple. It combines somehow the idea of regular and random sampling. The interval of integration is divided into N subintervals or cells (also often called **strata**), samples are placed in the middle of these subintervals but are jittered by some negative or positive random offset which can't be greater than half the width of a cell. In other words if h is the width of the cell: \(-h/2 \leq \xi \leq h/2\). Technically though we generally place the sample on the left boundary of the cell and jitter its position by \(h * \xi\). In other words, we simply place a random sample within each strata. The result can be seen a sum of N Monte Carlo integrals over sub-domains.

Extending this idea to 2D is straightforward. The domain of integration (often a unit square, such as pixel for instance) is divided into N * M strata and a random sample is placed within each cell. Again, this can easily be extended to higher dimensions. In the case of stratified sampling, variance reduces linearly with the number of samples (\(\sigma \propto 1 / N\)). In almost all cases, stratified sampling is superior to random sampling and should be preferentially used.

## Quasi Random Numbers and Low-Discrepancy Sequences

Quasi Monte Carlo integration relies on sequences of points with particular properties. As explained in the introduction, the goal is to generate sequences of samples which are not exactly uniformly distributed (uniformly distributed samples cause aliasing) and yet which appear to have some regularity in the way they are spaced. Mathematicians introduced the concept of **discrepancy** to measure the distribution of these points. Intuitively, discrepancy can be seen (interpreted) as a measure of how the samples deviate in a way from a regular distribution. Random sequences have a high discrepancy while a sequence of uniformly distributed samples would have a zero discrepancy. The sort of sequences we are after are the ones with the lowest possible discrepancy but not a zero discrepancy. There are sequences which minimize randomness (the deviation of the actual samples position from a uniform distribution should be a small as possible) but are not exactly uniform (i.e. we are obviously not interested in sequences with 0 discrepancy).

## Example: the Van der Corput Sequence

Algorithms for generating low discrepancy sequences were found in the 1950's (it will hopefully be clear to you how computer technology has influenced their design). Interestingly, these algorithms are often quite simple to implement (they often only take few lines of code) but the mathematics behind them can be quite complex. These sequences can be generated in different ways but for this introduction we will only consider a particular type of sequence called **Van der Corput sequence** (many over algorithms are constructed using this approach). Having one example to play with, should at least show you what these sequences look like (and help us to study some of their basic properties). To generate a Van der Corput sequence, you need to be familiar with the concept of radical and radical inverse which we will now explain (we will also explain why it's called radical inverse further down).

Every integer number can decomposed or redefined if you prefer in a **base** of your choice. For example our numerical system uses a **base ten**. Also recall that any number (integer) to the power of zero is one (\(10^0=1\)). So in base 10 for instance, the number 271 could be defined as: \(\color{red}{2}\times\color{green}{10^2} + \color{red}{7} \times \color{green}{10^1} + \color{red}{1} \times \color{green}{10^0}\). Mathematically we could write this sum as:

$$n = \sum_{i=1}^N \color{red}{d_i} \color{green}{b^{i-1}}.$$

where \(b\) is the **base** in which you want to define your number (in our example \(b= 10\)) and the \(d\)'s which are called **digits**, are in our example the numbers 2, 7 and 1. The same method can be used with a base two which as you know, is used by computers to encode numbers. The first bit (the one to the right of a byte) represents 1 (\(2^0\)).

The second bit is 2 (\(2^1\)).

The third bit is 4 (\(2^2\)).

The fourth bit is 8 (\(2^3\)), and so on.

If you combine these bits together you can create more numbers. For example number 3 can be encoded by turning on the first and second bit. Adding up the numbers they represent gives 3 (\(2^0 + 2^1 = 1 + 2 = 3\)).

This byte encodes the number 11:

(\(1 \times 2^0 + 1 \times 2^1 + 0 \times 2^2 + 1 \times 2^3 = 1 + 2 + 0 + 8 = 11\))

Which we can write mathematically as:

$$n = \sum_{i=1}^{8} d_i 2^{i-1}.$$

where \(d_i\) in base 2 can either be 0 or 1. With this in hand, we can now decompose any integer into a series of digits in any given base (we call this, a digit expansion in base b). Now note that the number we can recreate with this sequence is a positive integer. In other words, you can see this number as being a series of 0 and 1 on the left inside of a "imaginary" **decimal point**.

The idea behind the Van der Corput sequence, is to mirror these bits around this imaginary decimal point. For example our number 11, becomes:

That can be done by making a slight change to the equation which we used for computing n:

$$ \begin{equation} \tag{equation 1} \phi_b(n) = \sum_{i=1}^N { d_{i-1} \over b^i }. \end{equation} $$which is equivalent to:

$$\phi_b(n) = {d_0 \over {2^1}} + {d_1 \over {2^2}} ... + {d_{N-1}\over {2^N}}.$$As you can see, instead of multiplying the digit \(d_i\) by \(b^{i-1}\), we divide it instead by \(b^{i} \) (we take its inverse). We call this function the **radical inverse** function \(\phi_b(n)\) in base b and it computes a one-dimensional Van der Corput sequence (why it's called "radical inverse" is explained further down in a note). The coefficients \(d_i \) are given as explained, by the digit expansion in base b of n:

Keep in mind that:

$${1 \over b^n} = 1 \times b^{-n}.$$If n = 1 for example, \(\phi_2(1) = 1 \times 2-{1} = 1 \times 0.5 = 0.5\) (in base 2). If n = 3, \(\phi_2(3) = 1 \times 2^{-1} + 1 \times 2^{-2} = 0.75\). The following table shows the first few numbers of the the Van der Corput sequence (in base 2).

\(i\) decimal |
\(i\) binary |
\(\phi_2(i)\) binary |
\(\phi_2(i)\) decimal |
---|---|---|---|

0 |
0000.0 |
0.0000 |
0.0 |

1 |
0001.0 |
0.1000 |
0.5 |

2 |
0010.0 |
0.0100 |
0.25 |

3 |
0011.0 |
0.1100 |
0.75 |

4 |
0100.0 |
0.0010 |
0.125 |

5 |
0101.0 |
0.1010 |
0.625 |

... |
|||

11 |
1011.0 |
0.1101 |
0.8125 |

... |

Implementing this algorithm is actually quite simple. The first task is to find a way of computing the digit expansion. The function we need is the modulus operator: num % 2 computes the remainder when num (an integer) is divided by 2. So for example 1 % 2 = 0, 2 % 2 = 0, 3 % 2 = 1, 4 % 2 = 0, and so on. So in essence num % 2 tells us whether the current value of n is a multiple of the base or not. In a binary integer, the only bit that is making the number odd or even is the right-most bit (the least significant bit which has the bit position 0). So if number is even (n % 2 = 0), this last bit should be 0.

The next step is to start adding up the result of the digit divided by the base raised to the power of the digit position plus one (as in equation 1).

The next step is to compute the next digit in the expansion. To do so, we will simply divide n by the base. For example if n = 11, then n / 2 = 5. Keep in mind we work with integers. Why are we doing this? Let's take an example. When the last three bits of a binary integer are set to 1 (and all the others to 0), we get the integer 7. Our test 7 % 2 gives 1, which set our first bit to 1. Now that we know the value for the right-most bit, we can as well remove it from the number by shifting it to the right. This gives us a new digit number and since we know that each bit in a digit number is twice the number represented by the bit on its left, shifting the least significant bit to the right (which is another way to say that we remove it) corresponds to dividing the number by 2. For example if we remove the right-most bit in 111 (the integer 7) we get 11 which is the integer 3 which is also, as you can see, 7 divided by 2. This idea is illustrated in figure 6.

As you can see with figure 6, by successively dividing n by 2, we will be able to test if the next digit in the expansion should be set to 1 or 0 by just looking at the result of n % 2. Let's take another example with n = 11:

n = |
n % 2 = |
n / 2 = |
n binary |
\(d_i\) |
---|---|---|---|---|

11 |
1 |
5 |
00001011 |
\(d_0 = 1\) |

5 |
1 |
2 |
x0000101 |
\(d_1 = 1\) |

2 |
0 |
1 |
xx000010 |
\(d_2 = 0\) |

1 |
1 |
0 |
xxx00001 |
\(d_3 = 1\) |

0 |
- |
- |
- |
- |

Finally here is the complete implementation of the algorithm:

**Radical Inverse**"? In mathematics, a radical function is a function which has a variable inside a \(\sqrt{ } \). However, radical in the context of the Van der Corput algorithm has nothing to do with the square root sign. Rather, it's the adjective from

**radix**, which is another word for base, as in base-10, base-2, and so on. You're taking the expansion to base b - the expansion with radix b - and writing it backwards - "inverting" it.

The Van der Corput sequence has been extended to multi-dimensions by Halton (1960), Sobol (1967), Faure (1982) and Niederreiter (1987). For more information, check the lesson on Sampling in the advanced section. The **Halton sequence** uses different Van der Corput sequences in different bases, where the base is a prime number (2, 3, 5, 7, 11, etc.). For example if you want to generate 2D points, you can assign a Van der Corput sequence in base 2 to the x coordinates of the points, and another sequence in base 3 to the point's y coordinate.

The **Hammersley sequence** uses a Van der Corput sequence in base 2 for one of the coordinates and just assign \(n / N\) to the other (assuming your point is a 2D point).

The following code creates an Halton sequence for generating 2D points:

The figure above shows a plot of points from a 2D Hammersely sequence for different number of samples (N = 16, 24 and 32). Note that the point (0,0) in red, is always in the sequence and that points are nicely distributed for all values of N. The code used to generated the Van der Corput sequence can be optimized using bit operations (for instance dividing a integer by 2 can be done with a bit shift operation). Because the sequence is called with incrementing values of n, results of previous calculation could also be re-used (which could also speed up the generation of the samples). The next revision of this lesson will include an optimized version of this function.

## Quasi Monte Carlo

The idea behind quasi Monte Carlo is obviously to use quasi random sequences in the generation of samples rather than random samples. These sequences are very often used to generate pixel samples. In this lesson and the lesson Introduction to Shading and Radiometry, we showed how Monte Carlo integration could be used to approximate radiance passing through a given pixel (figure 7). Random sampling is of course the easiest (at least the most basic) way of generating these samples. We can also use stratified sampling (which is better than random sampling) or of course quasi random sequences (like the Halton or the Hammersley sequence which we talked about in this chapter). When low-discrepancy sequences are used, we speak of **Quasi Monte Carlo ray tracing** (as opposed to Monte Carlo ray tracing which is using random or stratified sampling). Note that computing the radiance of a pixel is only one example of Monte Carlo integration in which low-discrepancy sequences can be used. Halton, Hammersley or any other such sequences are also commonly used in shading (this will be explained in the next lessons).

## Pros and Cons of Stratified Sampling & Quasi Monte Carlo

QMC has a faster rate of convergence than random sampling and this is certainly its main advantages from a rendering point of view (we will explain mathematically why in a future revision of this lesson). Images produced with low-discrepancy sequences, have generally less noise and have a better visual quality than images produced with random sampling (for the sample number of samples). With stratified sampling it is not always possible to efficiently generate samples for arbitrary values of N (try to generate 7 2D samples for instance using stratified sampling) while with low-discrepancy sequences, this is not a problem. Furthermore, note that if you want to change the value of N, you can do so without losing the previously calculated samples. N can be increased while all samples from the earlier computation can still be used. This is particularly useful in adaptive algorithms in which typically a low number of samples is used as a starting point and more samples are generated on the fly in areas of the frame where variance is high.

Cons: as mentioned earlier in this chapter, QMC can suffer from aliasing and more over, because low-discrepancy have a periodic structure, they can generate a discernible pattern (figure 8). This is particularly hard to avoid as sequences are potentially re-used from one pixel to another. Getting rid of the structure can require additional work (such as rotating the samples, etc.). This potentially makes a robust implementation of LDS in a renderer more complex than the stratified sampling approach which is really simple to implement. QMC has been been a quite active field of research since the 2000's. While low discrepancy sequences are now considered as a viable alternative to other sampling methods, there is still an active debate in the graphics community as whether they are superior to other methods and really worth the pain. A. Kensler for instance, has recently published a new technique for generating stratified samples which is competitive with low discrepancy quasi-Monte Carlo sequences (see reference below).

Generally, generating (good) random sequences is a complex art and studying their properties involved complex techniques (such as Fourier analysis). A chapter can only give a very superficial overview of these methods. You will find a more thorough treatment of the subject matter in this lesson on sampling which you can find in the advanced section.

## A Final World

This conclude our introduction to Monte Carlo methods. These lessons hopefully gave you a good understanding of what Monte Carlo methods are, how and why they work. While we spoke about Monte Carlo ray tracing a few times in the last few lessons, you may still want to see how all the things we learned so far apply to rendering and ray tracing in particular. This is the topic of our next lesson.

## Reference

*Stochastic Sampling in Computer Graphics*, Robert L. Cook, Siggraph 1986.

*Efficient Multidimensional Sampling*, T. Kollig and A. Keller, 2002.

*Correlated Multi-Jittered Sampling*, A. Kensler, 2013.