**Contents**

**News (August, 31)**: We are working on Scratchapixel 3.0 at the moment (current version of 2). The idea is to make the project open source by storing the content of the website on GitHub as Markdown files. In practice, that means you and the rest of the community will be able to edit the content of the pages if you want to contribute (typos and bug fixes, rewording sentences). You will also be able to contribute by translating pages to different languages if you want to. Then when we publish the site we will translate the Markdown files to HTML. That means new design as well.

That's what we are busy with right now and why there won't be a lot of updates in the weeks to come. More news about SaP 3.0 soon.

We are looking for native Engxish (yes we know there's a typo here) speakers that will be willing to readproof a few lessons. If you are interested please get in touch on Discord, in the #scratchapixel3-0 channel. Also looking for at least one experienced full dev stack dev that would be willing to give us a hand with the next design.

Feel free to send us your requests, suggestions, etc. (on Discord) to help us improve the website.

And you can also donate). Donations go directly back into the development of the project. The more donation we get the more content you will get and the quicker we will be able to deliver it to you.

Our next step is to develop a virtual camera working on the same principle as a pinhole camera. More precisely, our goal is to create a camera model delivering images similar to those produced by a real pinhole camera. If we take a picture of a given object with a pinhole camera, then when a 3D replica of that object is rendered with our virtual camera, the size and shape of the object in the CG render must match exactly the size and shape of the real object in the photograph. But before we start looking into the model itself, it is important to learn a few more things about computer graphics camera models.

First the details:

- CG cameras have a near and far clipping plane. Objects closer than the near clipping plane or farther than the far clipping plane are invisible to the camera. They can be used to exclude some of a scene's geometry and render only certain portions of the scene. They are necessary for rasterization to work.
- In this chapter, we will also see why in CG the image plane is positioned in front of the camera's aperture rather than behind as with real pinhole cameras. This plays an important role in the way cameras are conventionally defined in CG.
- Finally, we need to look into how we can render a scene from any given viewpoint. This is something we already talked about in the previous lesson, but this point will be covered again briefly in this chapter.

The really important question we haven't looked into yet (asked and answered), is, "studying real cameras to understand how they work is great, but how is the camera model being used in the end to produce images?". We will show in this chapter, that the answer to this question depends on whether we use the rasterization or ray-tracing rendering technique.

In this chapter, we won't get into developing a virtual camera model just yet. We are first going to review the points listed above one by one to give a complete "picture" of the way cameras work in CG. The model itself will be introduced and implemented in a program in the next (and final) chapter of this lesson.

## How Do We Represent Cameras in the CG World?

Photographs produced by real-world pinhole cameras are upside down. This is happening because, as explained in the first chapter, the film plane is located behind the center of projection, however, this can be avoided if the projection plane lies on the same side as the scene as shown in figure 1. In the real world, the image plane can't be located in front of the aperture, but in the virtual world of computers, constructing our camera that way is not a problem at all. Conceptually, by construction, this leads to seeing the hole of the camera (which is also the center of projection) as the actual position of the eye, and the image plane, the image that the eye is looking at.

Defining our virtual camera that way, is showing us more clearly how constructing an image by following light rays from wherever point in the scene they are emitted from, to the eye, turns out to be a simple geometrical problem which was given the name of (as you know it now) **perspective projection**. Perspective projection is a method for building an image through this apparatus which is a sort of pyramid whose apex is aligned with the eye, and whose base defines the surface of a canvas on which the image of the 3D scene is "projected" onto.

## Near and Far Clipping Planes and the Viewing Frustum

The **near** and **far clipping** plane are virtual planes located in front of the camera and parallel to the image plane (the plane in which the image is contained). The location of each clipping plane is measured along the camera's line of sight (the camera's local z-axis). They are used in most virtual camera models and have no equivalent in the real world. Objects closer than the near clipping plane or farther than the far clipping plane are invisible to the camera. Scanline renderers using the z-buffer algorithm such as OpenGL, need these clipping planes to control the range of depth values over which the objects' depth coordinates are remapped when points from the scene are projected onto the image plane (and this is their primary if only function). Without getting into too many details, let's just say that adjusting the near and far clipping planes can also help to resolve precision issues that arise with this type of renderers. More information on this problem known as z-fighting can be found in the next lesson. In ray tracing, clipping planes are not required by the algorithm to work and are generally not used.

## The Near Clipping Plane and the Image Plane

The canvas (also termed screen in other CG books), is the 2D surface (a bounded region of the image plane) on which the image of the scene is projected onto. In the previous lesson, by convention, we placed the canvas 1 unit away from the eye, however, the position of the canvas along the camera's local z-axis doesn't matter. We only made that choice because it simplified the equations for computing the point's projected coordinates, but, as you can see in figure 3, the projection of the geometry onto the canvas produces the same image regardless of its position, thus you are not required to keep the distance from the eye to the canvas equal to 1. We also know that the viewing frustum is a truncated pyramid (the base of the pyramid is defined by the far clipping plane and the top of the pyramid is cut off by the near clipping plane). This volume defines the part of the scene that is visible to the camera. A very common way of projecting points onto the canvas in CG, is to remap points contained within the volume defined by the viewing frustum to the **unit cube** (a cube of side length 1). This technique is central to the development of the perspective projection matrix which is the topic of our next lesson. We don't need to understand it for now. What is interesting to know about the perspective projection matrix in the context of this lesson though, is that it works on the basis that the image plane is located at the near clipping plane. We won't be using the matrix in this lesson nor studying it, however, in anticipation of the next lesson devoted to this topic, we will place the canvas at the near clipping plane. Keep in mind though that this is an arbitrary decision, and that, unless you use a special technique such as the perspective projection matrix that requires the canvas to be positioned at a specific location, it can otherwise be positioned anywhere along the camera's local z-axis.

From now on and for the rest of this lesson, we will though assume that the canvas (or screen or image plane), is positioned at the near clipping plane. Keep in mind that this is just an arbitrary decision and that the equations we will develop in the next chapter to project points onto the canvas works independently from its position along the camera's line of sight (which is also the camera z-axis). This setup is illustrated in figure 4.

Keep in mind as well, that the distance between the eye and the canvas, the near clipping plane, and the focal length are all different things. We will focus on this point more fully in the next chapter.

## Computing the Canvas Size and the Canvas Coordinates

The reason why we insisted a lot in the previous section on the fact that the canvas could be anywhere along the camera's local z-axis, is because that position affects the canvas size. When the distance between the eye and the canvas decreases, the canvas gets smaller, and when that distance increases, it gets bigger. The bottom-left and top-right coordinates of the canvas are directly linked to the canvas size. Once we know the size, computing these coordinates is trivial considering that the canvas (or screen) is centered about the image plane coordinate system origin. Why are these coordinates important? Because they can be used to easily check whether a point projected on the image plane lies within the canvas and is therefore visible to the camera. In figures5, 6, and 7 two points are projected onto the canvas. One of them (P1') is contained within the canvas limits and is visible to the camera. The other (P2') is outside the boundaries and is thus invisible. When we both know the canvas coordinates and the point projected coordinates, then testing if the point is visible is simple.

Let's see how we can mathematically compute these coordinates. In the second chapter of this lesson, we gave the equation to compute the canvas size (we will assume that the canvas is a square for now as in figure 3, 4, and 6):

$$\text{Canvas Size} = 2 * \tan({\theta \over 2}) * \text{Distance to Canvas}$$**Where \(\theta\) is the angle of view** (hence the division by 2). Note that when the canvas is a square, the vertical and horizontal angles of view are the same. Since the distance from the eye to the canvas is defined as the near clipping plane, we can write:

Where \(Z_{near}\) is the distance between the eye and the near clipping plane along the camera local z-axis (figure 5). Since the canvas is centered on the image plane coordinate system's origin, computing the canvas's corner coordinates is trivial. We need to divide the canvas size by 2, and set the sign of the coordinate based on the corner's position relative to the coordinate system's origin:

$$ \begin{array}{l} \text{top} &=&&\dfrac{\text {canvas size}}{2}\\ \text{right} &=&&\dfrac{\text {canvas size}}{2}\\ \text{bottom} &=&-&\dfrac{\text {canvas size}}{2}\\ \text{left} &=&-&\dfrac{\text {canvas size}}{2}\\ \end{array} $$Once we know the canvas bottom-left and top-right canvas coordinates, we can then compare the projected point coordinates with these values (we, of course, first need to compute the coordinates of the point onto the image plane which is positioned at the near clipping plane. We will learn how to do so in the next chapter). Points lie within the canvas boundary (and are therefore visible) if their x and y coordinates are either greater or equal and lower or equal than the canvas bottom-left and top-right canvas coordinates respectively. The following code fragment computes the canvas coordinates and tests the coordinates of a point lying on the image plane against these coordinates:

## Camera to World and World to Camera Matrix

Finally, we need to have a method to produce images of an object or scene from any given viewpoint. This is a topic we have already talked about in the previous lesson, but we will cover it again briefly in this chapter. CG cameras are similar to real cameras in that respect: in CG, we look at the camera's view (the equivalent of a real camera viewfinder) and move around the scene or object to select a viewpoint ("viewpoint" is the camera position in relation to the subject).

When a camera is created, by default it is located at the origin and oriented along the negative z-axis (figure 8). The reason of this orientation is explained in detail in the previous lesson. By doing so, the camera's local and world coordinate system's x-axis point in the same direction. Defining the camera's transformations with a 4x4 matrix is convenient. This 4x4 matrix which is no different from 4x4 matrices used to transform 3D objects, is called the **camera-to-world** transformation matrix (because it defines the camera's transformations with respect to the world coordinate system).

The camera-to-world transformation matrix is used differently depending on whether rasterization or ray-tracing is being used:

- In rasterization, the inverse of the matrix (the world-to-camera 4x4 matrix) is used to convert points defined in world space to camera space. Once in camera space, we can perform a perspective divide to compute the projected point coordinates in the image plane. An in-depth description of this process can be found in the previous lesson.
- In ray-tracing, we build camera rays in the camera's default position (the rays' origin and direction) and then transform them with the camera-to-world matrix. The full process is detailed in the lesson "Rays and Cameras" [link]

Don't worry if you don't understand yet very well how things work in ray-tracing. We will study rasterization first and then move on to ray-tracing next.

## Understanding How Virtual Cameras Are Used

At this point of the lesson, we have explained almost everything there is to know about pinhole cameras and CG cameras. However, we haven't explained how images are formed with these cameras. The process depends on whether the rendering technique being used is rasterization or ray-tracing. We are now going to consider each case individually.

Before we do so though, let's briefly recall again the principle of a pinhole camera. When light rays emitted by a light source intersect objects from the scene, they are reflected off of the surface of these objects in random directions. For each point of the scene visible by the camera, only one of these reflected rays will pass through the aperture of the pinhole camera and strike the surface of the photographic paper (or film or sensor) in one unique location. If we divide the surface of the film into a regular grid of pixels, what we get is a **digital pinhole camera** which is essentially what we want our virtual camera to be (figure 9 and 10).

This is how things work with a real pinhole camera. But how does it work in CG? In CG, cameras are built on the principle of a pinhole camera, but the image plane is in front of the center of projection (the aperture which in our virtual camera model we prefer to call the eye) as shown in figure 11. How is the image produced with this virtual pinhole camera model, depends on the rendering technique that's being used. Let's consider the two main visibility algorithms: rasterization and ray-tracing.

RasterisationWe are not going to explain how the rasterization algorithm works in this chapter. To have a complete overview of the algorithm, you are invited to read the lesson devoted to the REYES algorithm [link], a popular rasterisationrasterization algorithm. What we are going to look into, is how is the pinhole camera model being used with this particular rendering technique. To do so, let's recall that each ray passing through the aperture of a pinhole camera strikes the surface of the film in one location, which is eventually a pixel if we consider the case of digital images. Let's take the case of one particular ray R, reflected off of the surface of an object at O, traveling towards the eye in the direction D, passing through the aperture of the camera in A, and striking the image at the pixel location X (figure 12). To simulate this process, all we need to do is compute in which pixel of an image any given light ray strikes the image, and record the color of this light ray (the color of the object at the point where the ray was emitted from, which in the real world, is essentially the information carried by the light ray itself) at that pixel location in the image.

This essentially comes back to computing the pixel coordinates X of the 3D point O using perspective projection. In perspective projection, the position of a 3D point onto the image plane is found by computing the intersection of a line connecting the point to the eye with the image plane. The method for computing this point of intersection, was described in detail in the previous lesson. In the next chapter, we will learn how to compute these coordinates when the canvas is positioned at an arbitrary distance from the eye (in the previous lesson, the distance between the eye and the canvas was always assumed to be equal to 1).

The projection process can be seen as an interpretation of the way an image is formed inside a pinhole camera, by "following" the path of light rays from whether points they are emitted from in the scene to the eye and "recording" the position (in terms of pixel coordinates) where these light rays intersect the image plane. To do so, we first need to transform points from world space to camera space, perform a perspective divide on the points in camera space to compute their coordinates in screen space, then convert the points' coordinates in screen space to NDC space, and finally convert these coordinates from NDC space to raster space. We used this method in the previous lesson to produce a wireframe image of a 3D object.

In this technique, the image is formed by a collection of "points" (these are not points, but conceptually, it is convenient to define where the light rays are reflected off of the surface of objects as points) projected onto the image. In other words, you start from the geometry, and you "cast" light paths to the eye, to find the pixel coordinates where these rays hit the image plane and from the coordinates of these intersections points on the canvas, you can then find where they should be recorded in the digital image. In a way, the rasterization approach is "object-centric".

Ray-TracingThe way things work in ray-tracing (with respect to the camera model), is the opposite of the way the rasterization algorithm works. When a light ray R reflected off of the surface of an object passes through the aperture of the pinhole camera and hit the surface of the image plane, it hits a particular pixel X on the image as described earlier. In other words, to each pixel X in an image, corresponds a light ray R with a given direction D and a given origin O. Note that, we do not need to know the origin of the ray to define its direction. The direction of the ray can be found by tracing a line from O (the point where the ray is emitted) to the camera's aperture A. It can also be defined by tracing a line from the pixel X that the ray intersects to the camera's aperture A (as shown in figure 13). Therefore, if you can find the ray direction D by tracing a line from X (the pixel) to A (the camera's aperture), then you can extend this ray into the scene to find O (the origin of the light ray) as shown in figure 14. This is the principle of ray-tracing (which is also called ray-casting). We can produce an image by setting the pixel's colors with the color of the light rays' respective points of origin. Due to the nature of the pinhole camera, to each pixel in the image corresponds one singular light ray that we can construct by tracing a line from the pixel to the camera's aperture. We then cast this ray into the scene, and set the pixel's color to the color of the object the ray intersects (if any — the ray might not intersect any geometry indeed, in which case we set the pixel's color to black). This point of intersection corresponds to the point on the surface of the object, from which the light ray was reflected off towards the eye.

To the contrary of the rasterization algorithm, ray-tracing is "image-centric". Rather than following the natural path of the light ray, from the object to the camera (as we do with rasterization in a way), **we follow the same path but in the other direction**, from the camera to the object.

**ideal pinhole camera**in a way, because we consider that a single ray only is passing through the aperture (as opposed to a beam of light containing many rays as with real pinhole cameras). This is of course impossible with a real pinhole camera. When the hole becomes too small light rays are diffracted. With such an ideal pinhole camera, we can create perfectly sharp images.

Here is the complete algorithm in pseudo-code:

As explained in the first lesson, things are a bit more complex in ray-tracing because any given camera ray can intersect several objects as shown in figure 15. Of all these points, the point visible to the camera is the point with the closest distance to the eye. If you are interested in a quick introduction to the ray-tracing algorithm, you can read the first lesson of this section or keep reading the lessons from this section which are devoted to ray-tracing specifically.

**an "exchange" or transport of light energy between surfaces**. You can find information on this topic in lessons from the Mathematics and Physics of Compute Graphics (check the Mathematics of Shading and Monte Carlo Methods) as well as the lesson called Monte Carlo Ray Tracing and Path Tracing.

## What's Next?

We are finally ready to implement a pinhole camera model with the same controls as the controls you can find in software such as Maya. It will be followed as usual with the source code of a program, capable of producing images matching the output of Maya.