**Contents**

## Generating (Pseudo-)Random Numbers on a Computer

This chapter will be short. We don't intend, at least in this version of the lesson, to talk about the topic of random number generator in very much detail. However, because Monte Carlo methods rely mostly on being able to generate random numbers (often with a given PDF), it is really important to mention that having a good random number generator is important to guarantee the quality of the output of Monte Carlo method. We will explain what a good random number generator is in a moment, and explain briefly how these generators work.

The fact is, generating random numbers is a prerequisite to using any of the Monte Carlo methods in a computer program. So how do you generate random numbers? True random numbers (true as in **unpredictable**) can actually be generated with a computer! Imagine for instance a device that would measure a natural phenomenon which is itself random (such as the temperature of your cpu or the number of times you typed the letter 'a' on your keyboard in the last 10 seconds) and convert this measure to a digit that the computer can use as a random number. As random as these natural phenomena truly are, it is not hard to understand why such a device would be ideal for providing random numbers to a program. Beside being too slow for a computer, the main problem of such a system is that the random values that we get from monitoring these "real world" random variables, doesn't necessarily map properly to the type of random variable a computer program needs. As the Russian mathematician Ilya M. Sobol puts it:

Furthermore, the quality of the numbers produced by such device would be hard to check. And maybe most importantly, it may be advantageous to "lock" the sequence of generated random numbers, so that each time the program is run, the same numbers are produced in the same order. You might think, well that's not really random then? You are right they are in fact **deterministic** which sounds really like the opposite of random, which is why we call such system **pseudorandom** generator rather than random. They produce numbers which are still randomly distributed, but their values and the order in which they are produced is always the same. In rendering, this property is very important. If you use Monte Carlo methods to produce an image, a "repeatable" random number generator (it would pseudorandom then), allows to lock the noise pattern in the image. "Repeatability" of the results in a production environment, is an absolute necessity. Thus, any random number generator that relies on a physical device (and they do exist, we will talk about them later in this chapter), however probably "truly random", is actually not always ideal. To summarize "controllable and repeatable randomness" is generally what we are truly after.

Generally, you can use three different approach to get random numbers in a program. You can use pre-generated sequences of random numbers stored in tables (an old but very easy to implement method if one day you have to quickly write something of your own), you can use a (true-)random number generator or a pseudorandom number generator.

## Tables of Random Numbers

The principle is very simple and is similar to the way lottery numbers are generated. If you write each number from 0 to 99 (say) on a card, put all the cards in a hat shake it and draw all the cards from this hat one by one, you end up with a sequence of 100 random numbers. To use these numbers in a program, you can store them in memory (in the same order in which they were picked from the hat) and create a global variable to keep track the next random number that can be used from that table. This is simple, of course, but there is two difficulties with this approach. The sequence of number needs to be truly random (you don't shake the hat very well, and many of the numbers end up in increasing order). And it is hard to guarantee that the physical device used in generating these numbers, produces sequences with the desired probability distribution (you can use statistical tests to check whether your sequence of random numbers has some desired properties such for instance whether the numbers are truly independent and identically distributed or i.i.d.). Furthermore, the sequence of random numbers is limited in size.

In 1955, Rand Corporation released a sequence of 1 million random digits. It is the longest sequence of random numbers ever published (look for "A MILLION Random Digits WITH 100,000 Normal Deviates", on the Rand Corporation website).

## (True-)Random Number Generator

As its name suggests, a random number generator produces truly random numbers (as in "you will never know what you will get" or in more formal terms, the results are **unpredictable**). These are generally produced by physical devices also known as noise generator which are coupled with a computer. In computing, an apparatus that produces random numbers from a physical process is called a hardware random number generator or TRNG (for true random number generator).

As mentioned before, if "true" randomness is required, these generators are good but they present some difficulties of their own. First as mentioned, noise is generated by a physical device (such as a vacuum diode) and the resulting analog signal needs to be converted somehow to a sequence of digits (a sequence of 0 and 1 which can then be interpreted as digits). This is not necessarily a trivial tasks. Plus, these generators don't necessary produce random numbers as quickly as the program needs them (or put it differently, it is hard to synchronize the calls of the program to the random number generator with the output of the hardware noise generator). They can also suffer from hardware failures and the quality of the sequence they produce is hard to check (they can suffer from bias, which happens when they statistically produce more 0s than 1s).

## (Pseudo-)random Number Generator

As long as the random numbers produced by a system can be checked by some tests that proof that they have the same "statistical" properties than a truly random variable (such as being uniformly distributed for example), the mean by which these numbers are generated doesn't really matter. Computers are deterministic in nature so producing truly random numbers with a computer is challenging as we mentioned before, which is why we generally resort to using noise generators (which we talked about earlier on) if we need "true" randomness. However, what we can do on a computer, is develop some sort of algorithm for generating a sequence of numbers that approximates the properties of random numbers. When numbers are produced by some sort of algorithm or formula that simulates the values of a random variable X, they are called pseudorandom numbers. And the algorithm is called a **pseudorandom number generator** (or **PRNG**). The term "simulate" here is important: it simply means that the algorithm can generate sequences of numbers which have statistical properties that are similar (and this can be tested) to that of the random variable we want to simulate. For instance if we need to simulate a random variable X with probability distribution D, then we will need to test whether the sequence of numbers produced by our PRNG has the same distribution.

Before it gets too abstract, let's give a practical example of such generator. One of the first PRNG algorithms was actually proposed by John von Neumann himself. It is called the **middle-square method**. This method is presented as an example but don't use it in production. The idea is very simple. We start with a four digit number such as \(n_0 = 0.4872\). If we square this number we get the eight-digit number 0.69**7997**13. If we now take out the middle four digits of this number, we get \(n_1 = 0.7997\). If we square this number we get 0.89**4259**47. Once again we take out the middle four digits to find the next number in our sequence: \(n_2 = 0.4259\). If we keep repeating this process again and again, we get the following sequence of numbers: \(n_0\) = 0.4872, \(n_1\) = 0.7997, \(n_2\) = 0.4259, \(n_3\) = 0.2610, \(n_4\) = 0.0881, \(n_5\) = 0.6816, etc. The following small program is a quick implementation of this algorithm (it only compute the first 10 "random numbers" of the sequence).

Again, this algorithm is too limited to be actually really useful. For example the program doesn't work if the seed is 0. Better algorithms exist and we will briefly talk about them in a moment. But for now, let's discuss the pros and cons of PRNG. In computing, at least for people writing programs using Monte Carlo methods, pseudorandom generators are generally the preferred choice. First a software solution is generally better than a hardware one, because it's more reliable, easier to integrate to other programs, and because they can be programmed, they can generate sequences of random numbers in a predictable manner (and the statistical properties of these pseudorandom numbers are also predictable. They need to be checked only once which is not the case of numbers produced by noise generators). In other words they are deterministic, which as we explained before, is useful to "lock" the output of a process based on a random number generator (which is the case of Monte Carlo ray tracing). PRNG algorithms are also generally pretty fast and fit in just a few lines of code. Note that all these algorithms can be seeded. In other words, you can intialize them with a number, and the sequence of resulting numbers depends on this seed.

The main disadvantage is that the sequences produced by these algorithm are **periodic**. Why? Because each number depends on the previous number in the sequence. We can write this as:

The precision of any number in the memory of a computer is limited by the number of bytes used to store this number. If the number of bytes is fixed, only a finite number of "numbers" can be generated. Therefore, it is expected that sooner or later, one of the generated number (let's call this number \(x_L\)) will coincide with one of the numbers already generated in the sequence (let's call this number \(x_N\)). The period of the sequence would then be L - N (the period is the number of values in the sequence before the sequence repeats itself). Should we care about this? The period of the C library function drand48() for example is \(2^{48}\ = 281474976710656\). This is a very big number, and some of the more recent generators have even longer periods so unless the number of random numbers generated by your program exceeds this period (which is unlikely to happen), you shouldn't probably care.

## The C++11 Random Library

It is quite useful and important to know the function from the C/C++ libraries you can use to generate pseudorandom numbers. For a long time you could only use functions such as drand48(). It produces (almost) uniformly distributed random numbers in the interval [0,1]. If you wanted to generated random numbers with any other distribution you had to write you own!

Hopefully, the latest C++ standard (C++11), now comes with a random library. It provides a pseudorandom generator which is made of two parts. The first part is called the generator engine. It generates the numbers (you can choose among three different algorithms for generating these numbers) and the second part controls the distribution of the outcome (uniform, poisson, normal, etc.).