About the Projection Matrix, the GPU Rendering Pipeline and Clipping
Reading time: 11 mins.What Will We Study in This Chapter?
"In the first chapter, we discussed the crucial role in the GPU rendering pipeline that projection matrices play. We highlighted the existence of two types of GPU rendering pipelines: the older "fixedfunction pipeline" and the newer, often referred to as the "programmable rendering pipeline." We delved into the process of clipping, which involves discarding or trimming primitives that fall outside or on the boundaries of the frustum, and how this occurs during the transformation of points by the projection matrix. Additionally, we clarified that projection matrices actually transform points from camera space to homogeneous clip space, not to NDC (Normalized Device Coordinate) space. Now, it's time to delve deeper into these subjects. We will explain the mechanism of clipping during the transformation process, define what clip space entails, and review the application of projection matrices in both the old and new GPU rendering pipelines.
Clipping and Clip Space
Let's briefly recall that the primary goal of clipping is to effectively "reject" geometric primitives that are behind the eye or positioned exactly at the eye (which would result in division by zero, an undesirable outcome) and, more broadly, to trim parts of geometric primitives that lie outside the viewing area (further details on this topic can be found in Chapter 2). This viewing area is delineated by the truncated pyramid shape of the perspective or viewing frustum. Implementing this step is a necessity in any professional rendering system. It's important to note that this process can lead to the creation of more triangles than were initially present in the scenes, as illustrated in Figure 1.
The most commonly used clipping algorithms include the CohenSutherland algorithm for lines and the SutherlandHodgman algorithm for polygons. It turns out that clipping is more efficiently executed in clip space than in camera space (before vertices are transformed by the projection matrix) or screen space (after the perspective division). It's crucial to remember that when points are transformed by the projection matrix, they are first processed as they would be with any other 4x4 matrix. The transformed coordinates are then normalized, meaning the x, y, and z coordinates of the transformed points are divided by the transformed point's zcoordinate. Clip space refers to the space in which points exist just before they undergo normalization.
In summary, the process on a GPU unfolds as follows:

Points are transformed from camera space to clip space in the vertex shader. The input vertex is converted from Cartesian coordinates to homogeneous coordinates, and its wcoordinate is set to 1. The predefined
gl_Position
variable, where the transformed point is stored, also represents a point in homogeneous coordinates. However, when the input vertex is multiplied by the projection matrix, the normalization step has not yet occurred.gl_Position
is in homogeneous clip space. 
After all vertices have been processed by the vertex shader, triangles with vertices now in clip space undergo clipping.

Once clipping is complete, all vertices are normalized. The x, y, and z coordinates of each vertex are divided by their respective wcoordinate, marking the occurrence of the perspective divide.
After the normalization step, points that are visible to the camera fall within the range \([1,1]\) in both x and y dimensions. This is part of the final stage of the pointmatrix multiplication process, where the coordinates are normalized as mentioned:
\[
\begin{align*}
1 \leq \frac{x'}{w'} \leq 1 \\
1 \leq \frac{y'}{w'} \leq 1 \\
1 \leq \frac{z'}{w'} \leq 1 \\
\end{align*}
\]
Or, depending on the convention being used, \(0 \leq \frac{z'}{w'} \leq 1\). Hence, we can also express this as:
\[
\begin{align*}
w' \leq x' \leq w' \\
w' \leq y' \leq w' \\
w' \leq z' \leq w' \\
\end{align*}
\]
Which state are x', y', and z' in before they get normalized by w', or to put it differently, when coordinates are in clip space? We can introduce a fourth equation: \(0 < w'\). The purpose of this equation is to ensure that we never divide any of the coordinates by 0, which would be a degenerate case.
These equations are mathematically sound. However, there's no need to attempt to visualize what vertices look like or what it means to work within a fourdimensional space. What this essentially indicates is that the clip space for a given vertex with coordinates {x, y, z} is defined by the extents [w, w] (where the w value specifies the dimensions of the clip space). It's important to note that this clip space is consistent for each coordinate of the point, and the clip space for any given vertex is cubic. However, it's also crucial to understand that each point is likely to have its unique clip space (each set of x, y, and z coordinates is likely to have a different w value). In other words, every vertex exists within its own clip space and must "fit" within it.
This lesson focuses solely on projection matrices. All that is necessary to know in this context is where clipping occurs in the vertex transformation pipeline and the definition of clip space, which we have just elucidated. Further details will be covered in lessons on the SutherlandHodgman and CohenSutherland algorithms, which are found in the Advanced Rasterization Techniques section.
The "Old" Point (or Vertex) Transformation Pipeline
The fixedfunction pipeline is now deprecated in OpenGL and other graphics APIs. It is advised not to use it anymore. Instead, use the "new" programmable GPU rendering pipeline. This section is retained for reference purposes and because you might still encounter some articles on the Web referencing methods from the old pipeline.
The term "vertex" is preferred when discussing the transformation of points (vertices) in OpenGL (or Direct3D, Metal, or any other graphics API you can think of). In the old fixedfunction pipeline, OpenGL (and other graphics APIs) offered two modes for altering the camera's state: GL_PROJECTION and GL_MODELVIEW. GL_PROJECTION was used for setting the projection matrix itself. As we have learned (see the previous chapter), this matrix is constructed from the left, right, bottom, and top screen coordinates (determined by the camera's field of view and near clipping plane), as well as the near and far clipping planes (parameters of the camera). These parameters delineate the camera's frustum shape, and all vertices or points within this frustum are visible. In OpenGL, these parameters were specified through a call to glFrustum
(an implementation of which was shown in the previous chapter):
glFrustum(float left, float right, float bottom, float top, float near, float far);
The GL_MODELVIEW mode was used to set the worldtocamera matrix. A typical sequence of calls in an OpenGL program to set the perspective projection matrix and the modelview matrix would be:
glMatrixMode(GL_PROJECTION); glLoadIdentity(); glFrustum(l, r, b, t, n, f); glMatrixMode(GL_MODELVIEW); glLoadIdentity(); glTranslate(0, 0, 10); ...
Initially, the GL_PROJECTION mode is activated (line 1). Then, to configure the projection matrix, a call to glFrustum is made, providing the left, right, bottom, and top screen coordinates, along with the near and far clipping planes as arguments. After setting up the projection matrix, the mode is switched to GL_MODELVIEW (line 4). In essence, the GL_MODELVIEW matrix can be considered a combination of the "VIEW" transformation matrix (the worldtocamera matrix) with the "MODEL" matrix (the transformation applied to the object, or the objecttoworld matrix). There was no separate concept of the worldtocamera transform apart from the objecttoworld transform; both were amalgamated in the GL_MODELVIEW matrix.
$$GL\_MODELVIEW = M_{objecttoworld} \times M_{worldtocamera}$$Initially, a point \(P_w\) in world space is transformed to camera space (or eye space) using the GL_MODELVIEW matrix. The resultant point \(P_c\) is then projected onto the image plane using the GL_PROJECTION matrix, ending up as a point in homogeneous coordinates, where the w coordinate contains the z coordinate of point \(P_c\).
The Vertex Transformation Pipeline in the New Programmable GPU Rendering Pipeline
The pipeline in the new programmable GPU rendering pipeline remains largely similar to the old pipeline, but with a significant difference in setup. In this updated pipeline, the concepts of GL_MODELVIEW and GL_PROJECTION modes no longer exist. Instead, this functionality can now be customprogrammed within a vertex shader. As outlined in the first chapter of this lesson, the vertex shader acts as a miniprogram that dictates how the GPU processes the vertices of the scene's geometry. This means all vertex transformations, including the worldtocamera transformation and, more critically, the projection transformation, should be executed here. It's important to note that a program utilizing the OpenGL API won't generate an image unless both the vertex and its corresponding fragment shader are defined. The simplest vertex shader might look something like this:
in vec3 vert; void main() { // does not alter the vertices at all gl_Position = vec4(vert, 1); }
This example doesn't transform the input vertex with a perspective projection matrix, which, under certain conditions, can still produce a visible result based on the geometry's size and position, as well as the viewport configuration. However, this falls outside the scope of our current discussion. From this code snippet, we observe that the input vertex is treated as a vec4
, essentially a point in homogeneous coordinates. Similarly, gl_Position
represents a point in homogeneous coordinates. As anticipated, the vertex shader outputs the vertex position in clip space (refer to the diagram of the vertex transformation pipeline mentioned earlier).
In practice, a more commonly used vertex shader would be structured as follows:
uniform mat4 worldToCamMatrix, projMatrix; in vec3 vert; void main() { gl_Position = projMatrix * worldToCamMatrix * vec4(vert, 1); }
This shader employs both a worldtocamera and projection matrix to transition the vertex through camera space into clip space. These matrices are configured externally via specific calls (glGetUniformLocation
to locate the shader variable and glUniformMatrix4fv
to set the matrix variable using the identified location), facilitated by the OpenGL API:
Matrix44f worldToCamera = ... // Note: Determine if you need to transpose the matrix before using it in glUniformMatrix4fv // worldToCamera.transposeMe(); // projMatrix.transposeMe(); GLuint projMatrixLoc = glGetUniformLocation(p, "projMatrix"); GLuint worldToCamLoc = glGetUniformLocation(p, "worldToCamMatrix"); glUniformMatrix4fv(projMatrixLoc, 1, GL_FALSE, projMatrix); glUniformMatrix4fv(worldToCamLoc, 1, GL_FALSE, worldToCamera);
Do I need to transpose the matrix in an OpenGL program?
It can be confusing to determine whether you should transpose your matrix before passing it to the graphics pipeline. According to the OpenGL specifications, matrices are conventionally written in columnmajor order. However, the confusion arises because API calls, like glUniformMatrix4fv()
, are designed to accept coefficients in memory mapped in rowmajor order. Therefore, if your matrices are laid out in memory in rowmajor order, there's no need to transpose them before passing them to OpenGL. Conversely, if they're in columnmajor order, you might need to transpose themâ€”though it's not strictly necessary. This is because you can control the need for transposition through a flag in the glUniformMatrix4fv()
function. The third parameter of this function, set to GL_FALSE
in the example, tells the graphics API whether to transpose the matrix's coefficients for you. Thus, even with coefficients in columnmajor order, you can avoid manual transposition by setting the transpose flag in glUniformMatrix4fv()
to GL_TRUE
.
The situation becomes more perplexing when considering the order in which matrices are applied in OpenGL vertex shaders. You might notice the operation \(Proj * View * vtx\) rather than \(vtx * View * Proj\), the former being indicative of columnmajor matrices usage (suggesting matrix multiplication with the point, rather than point multiplication with the matrix, as explained in our lesson on Geometry). Thus, OpenGL's shader assumes columnmajor matrices, even though coefficients are stored in rowmajor order. Confused yet?
Remember, OpenGL (and vectors) operate using columnmajor order. Therefore, if you're using row vectors, as is the case on Scratchapixel, you'll need to transpose the matrix before incorporating it into the vertex shader setup (seen in line 2). While modern OpenGL offers alternative methods for handling this, they are beyond the scope of this lesson, which is not dedicated to that topic. Further information on these alternatives can readily be found online.