# The Perspective and Orthographic Projection Matrix

## What Will We Study in This Chapter?

"In the first chapter, we discussed the crucial role in the GPU rendering pipeline that projection matrices play. We highlighted the existence of two types of GPU rendering pipelines: the older "fixed-function pipeline" and the newer, often referred to as the "programmable rendering pipeline." We delved into the process of clipping, which involves discarding or trimming primitives that fall outside or on the boundaries of the frustum, and how this occurs during the transformation of points by the projection matrix. Additionally, we clarified that projection matrices actually transform points from camera space to homogeneous clip space, not to NDC (Normalized Device Coordinate) space. Now, it's time to delve deeper into these subjects. We will explain the mechanism of clipping during the transformation process, define what clip space entails, and review the application of projection matrices in both the old and new GPU rendering pipelines.

## Clipping and Clip Space

Let's briefly recall that the primary goal of clipping is to effectively "reject" geometric primitives that are behind the eye or positioned exactly at the eye (which would result in division by zero, an undesirable outcome) and, more broadly, to trim parts of geometric primitives that lie outside the viewing area (further details on this topic can be found in Chapter 2). This viewing area is delineated by the truncated pyramid shape of the perspective or viewing frustum. Implementing this step is a necessity in any professional rendering system. It's important to note that this process can lead to the creation of more triangles than were initially present in the scenes, as illustrated in Figure 1.

The most commonly used clipping algorithms include the Cohen-Sutherland algorithm for lines and the Sutherland-Hodgman algorithm for polygons. It turns out that clipping is more efficiently executed in clip space than in camera space (before vertices are transformed by the projection matrix) or screen space (after the perspective division). It's crucial to remember that when points are transformed by the projection matrix, they are first processed as they would be with any other 4x4 matrix. The transformed coordinates are then normalized, meaning the x, y, and z coordinates of the transformed points are divided by the transformed point's z-coordinate. Clip space refers to the space in which points exist just before they undergo normalization.

In summary, the process on a GPU unfolds as follows:

• Points are transformed from camera space to clip space in the vertex shader. The input vertex is converted from Cartesian coordinates to homogeneous coordinates, and its w-coordinate is set to 1. The predefined gl_Position variable, where the transformed point is stored, also represents a point in homogeneous coordinates. However, when the input vertex is multiplied by the projection matrix, the normalization step has not yet occurred. gl_Position is in homogeneous clip space.

• After all vertices have been processed by the vertex shader, triangles with vertices now in clip space undergo clipping.

• Once clipping is complete, all vertices are normalized. The x, y, and z coordinates of each vertex are divided by their respective w-coordinate, marking the occurrence of the perspective divide.

After the normalization step, points that are visible to the camera fall within the range $$[-1,1]$$ in both x and y dimensions. This is part of the final stage of the point-matrix multiplication process, where the coordinates are normalized as mentioned:

\begin{align*} -1 \leq \frac{x'}{w'} \leq 1 \\ -1 \leq \frac{y'}{w'} \leq 1 \\ -1 \leq \frac{z'}{w'} \leq 1 \\ \end{align*}

Or, depending on the convention being used, $$0 \leq \frac{z'}{w'} \leq 1$$. Hence, we can also express this as:

\begin{align*} -w' \leq x' \leq w' \\ -w' \leq y' \leq w' \\ -w' \leq z' \leq w' \\ \end{align*}

Which state are x', y', and z' in before they get normalized by w', or to put it differently, when coordinates are in clip space? We can introduce a fourth equation: $$0 < w'$$. The purpose of this equation is to ensure that we never divide any of the coordinates by 0, which would be a degenerate case.

These equations are mathematically sound. However, there's no need to attempt to visualize what vertices look like or what it means to work within a four-dimensional space. What this essentially indicates is that the clip space for a given vertex with coordinates {x, y, z} is defined by the extents [-w, w] (where the w value specifies the dimensions of the clip space). It's important to note that this clip space is consistent for each coordinate of the point, and the clip space for any given vertex is cubic. However, it's also crucial to understand that each point is likely to have its unique clip space (each set of x, y, and z coordinates is likely to have a different w value). In other words, every vertex exists within its own clip space and must "fit" within it.

This lesson focuses solely on projection matrices. All that is necessary to know in this context is where clipping occurs in the vertex transformation pipeline and the definition of clip space, which we have just elucidated. Further details will be covered in lessons on the Sutherland-Hodgman and Cohen-Sutherland algorithms, which are found in the Advanced Rasterization Techniques section.

## The "Old" Point (or Vertex) Transformation Pipeline

The fixed-function pipeline is now deprecated in OpenGL and other graphics APIs. It is advised not to use it anymore. Instead, use the "new" programmable GPU rendering pipeline. This section is retained for reference purposes and because you might still encounter some articles on the Web referencing methods from the old pipeline.

The term "vertex" is preferred when discussing the transformation of points (vertices) in OpenGL (or Direct3D, Metal, or any other graphics API you can think of). In the old fixed-function pipeline, OpenGL (and other graphics APIs) offered two modes for altering the camera's state: GL_PROJECTION and GL_MODELVIEW. GL_PROJECTION was used for setting the projection matrix itself. As we have learned (see the previous chapter), this matrix is constructed from the left, right, bottom, and top screen coordinates (determined by the camera's field of view and near clipping plane), as well as the near and far clipping planes (parameters of the camera). These parameters delineate the camera's frustum shape, and all vertices or points within this frustum are visible. In OpenGL, these parameters were specified through a call to glFrustum (an implementation of which was shown in the previous chapter):

glFrustum(float left, float right, float bottom, float top, float near, float far);


The GL_MODELVIEW mode was used to set the world-to-camera matrix. A typical sequence of calls in an OpenGL program to set the perspective projection matrix and the model-view matrix would be:

glMatrixMode(GL_PROJECTION);
glFrustum(l, r, b, t, n, f);
glMatrixMode(GL_MODELVIEW);
glTranslate(0, 0, 10);
...


Initially, the GL_PROJECTION mode is activated (line 1). Then, to configure the projection matrix, a call to glFrustum is made, providing the left, right, bottom, and top screen coordinates, along with the near and far clipping planes as arguments. After setting up the projection matrix, the mode is switched to GL_MODELVIEW (line 4). In essence, the GL_MODELVIEW matrix can be considered a combination of the "VIEW" transformation matrix (the world-to-camera matrix) with the "MODEL" matrix (the transformation applied to the object, or the object-to-world matrix). There was no separate concept of the world-to-camera transform apart from the object-to-world transform; both were amalgamated in the GL_MODELVIEW matrix.

$$GL\_MODELVIEW = M_{object-to-world} \times M_{world-to-camera}$$

Initially, a point $$P_w$$ in world space is transformed to camera space (or eye space) using the GL_MODELVIEW matrix. The resultant point $$P_c$$ is then projected onto the image plane using the GL_PROJECTION matrix, ending up as a point in homogeneous coordinates, where the w coordinate contains the z coordinate of point $$P_c$$.

## The Vertex Transformation Pipeline in the New Programmable GPU Rendering Pipeline

The pipeline in the new programmable GPU rendering pipeline remains largely similar to the old pipeline, but with a significant difference in setup. In this updated pipeline, the concepts of GL_MODELVIEW and GL_PROJECTION modes no longer exist. Instead, this functionality can now be custom-programmed within a vertex shader. As outlined in the first chapter of this lesson, the vertex shader acts as a mini-program that dictates how the GPU processes the vertices of the scene's geometry. This means all vertex transformations, including the world-to-camera transformation and, more critically, the projection transformation, should be executed here. It's important to note that a program utilizing the OpenGL API won't generate an image unless both the vertex and its corresponding fragment shader are defined. The simplest vertex shader might look something like this:

in vec3 vert;

void main()
{
// does not alter the vertices at all
gl_Position = vec4(vert, 1);
}


This example doesn't transform the input vertex with a perspective projection matrix, which, under certain conditions, can still produce a visible result based on the geometry's size and position, as well as the viewport configuration. However, this falls outside the scope of our current discussion. From this code snippet, we observe that the input vertex is treated as a vec4, essentially a point in homogeneous coordinates. Similarly, gl_Position represents a point in homogeneous coordinates. As anticipated, the vertex shader outputs the vertex position in clip space (refer to the diagram of the vertex transformation pipeline mentioned earlier).

In practice, a more commonly used vertex shader would be structured as follows:

uniform mat4 worldToCamMatrix, projMatrix;
in vec3 vert;

void main()
{
gl_Position = projMatrix * worldToCamMatrix * vec4(vert, 1);
}


This shader employs both a world-to-camera and projection matrix to transition the vertex through camera space into clip space. These matrices are configured externally via specific calls (glGetUniformLocation to locate the shader variable and glUniformMatrix4fv to set the matrix variable using the identified location), facilitated by the OpenGL API:

Matrix44f worldToCamera = ...
// Note: Determine if you need to transpose the matrix before using it in glUniformMatrix4fv
// worldToCamera.transposeMe();
// projMatrix.transposeMe();
GLuint projMatrixLoc = glGetUniformLocation(p, "projMatrix");
GLuint worldToCamLoc = glGetUniformLocation(p, "worldToCamMatrix");
glUniformMatrix4fv(projMatrixLoc,  1, GL_FALSE, projMatrix);
glUniformMatrix4fv(worldToCamLoc,  1, GL_FALSE, worldToCamera);


Do I need to transpose the matrix in an OpenGL program?

It can be confusing to determine whether you should transpose your matrix before passing it to the graphics pipeline. According to the OpenGL specifications, matrices are conventionally written in column-major order. However, the confusion arises because API calls, like glUniformMatrix4fv(), are designed to accept coefficients in memory mapped in row-major order. Therefore, if your matrices are laid out in memory in row-major order, there's no need to transpose them before passing them to OpenGL. Conversely, if they're in column-major order, you might need to transpose them—though it's not strictly necessary. This is because you can control the need for transposition through a flag in the glUniformMatrix4fv() function. The third parameter of this function, set to GL_FALSE in the example, tells the graphics API whether to transpose the matrix's coefficients for you. Thus, even with coefficients in column-major order, you can avoid manual transposition by setting the transpose flag in glUniformMatrix4fv() to GL_TRUE.

The situation becomes more perplexing when considering the order in which matrices are applied in OpenGL vertex shaders. You might notice the operation $$Proj * View * vtx$$ rather than $$vtx * View * Proj$$, the former being indicative of column-major matrices usage (suggesting matrix multiplication with the point, rather than point multiplication with the matrix, as explained in our lesson on Geometry). Thus, OpenGL's shader assumes column-major matrices, even though coefficients are stored in row-major order. Confused yet?

Remember, OpenGL (and vectors) operate using column-major order. Therefore, if you're using row vectors, as is the case on Scratchapixel, you'll need to transpose the matrix before incorporating it into the vertex shader setup (seen in line 2). While modern OpenGL offers alternative methods for handling this, they are beyond the scope of this lesson, which is not dedicated to that topic. Further information on these alternatives can readily be found online.