Geometry

Transforming Points and Vectors

Reading time: 8 mins.

Point Transformation Techniques

This section delves into the necessary steps to transform points using matrices, with a specific focus on integrating translation into matrix multiplication, a concept not extensively covered in previous discussions. Despite translation being one of the simplest linear operations to apply to a point, its incorporation into the matrix framework requires a structural adjustment of the point itself.

Recalling from earlier discussions, matrix multiplication is feasible only when the involved matrices are of compatible sizes, specifically m x p and p x n dimensions. Starting with the fundamental 3x3 identity matrix, where a point's coordinates remain unchanged upon multiplication, we explore the necessary modifications to this matrix to accommodate translation. Translation essentially involves adding a specific value to each coordinate of a point, such as transforming the point (1, 1, 1) to (2, 3, 4) by adding 1, 2, and 3 to its x, y, and z coordinates, respectively. Points and vectors, for the purposes of our discussion, are considered as 1x3 matrices.

To incorporate translation into the matrix that already performs rotation, we introduce a fourth term to encode the translation components. This extension requires adding T_X, T_Y, and T_Z to the matrix multiplication formula, resulting in a modified expression that includes these translation values:

$$ \begin{array}{l} P'.x = P.x * M_{00} + P.y * M_{10} + P.z * M_{20} + T_X\\ P'.y = P.x * M_{01} + P.y * M_{11} + P.z * M_{21} + T_Y\\ P'.z = P.x * M_{02} + P.y * M_{12} + P.z * M_{22} + T_Z \end{array} $$

This adjustment suggests a 4x3 matrix, diverging from the initial 3x3 format. To address the discrepancy in matrix sizes and enable multiplication with a point represented as a 1x3 matrix, we expand the point to a 1x4 matrix by adding a fourth component set to 1, transforming it into a homogeneous point. This adaptation seamlessly integrates translation into our matrix, as shown in the following formula, where the translation is effectively encoded by multiplying the added component by the matrix's translation terms:

$$ \begin{array}{l} P'.x = P.x * M_{00} + P.y * M_{10} + P.z * M_{20} + 1 * M_{30}\\ P'.y = P.x * M_{01} + P.y * M_{11} + P.z * M_{21} + 1 * M_{31}\\ P'.z = P.x * M_{02} + P.y * M_{12} + P.z * M_{22} + 1 * M_{32} \end{array} $$

This foundational theory facilitates the encoding of translation, scale, and rotation within a single matrix when dealing with points in homogeneous coordinates. Although the fourth value is implicitly considered to be 1 and not explicitly defined in code, the transformation formulas are adapted accordingly, leading to the final structure of our matrix as a 4x3 matrix. To transition to the most commonly used 4x4 matrix format in computer graphics (CG), we recognize the fourth column's role in perspective projection and other transformations, typically set to (0, 0, 0, 1). This section sets the stage for further exploration into homogeneous points and the impact of varying the default values of the fourth column, an uncommon yet possible scenario in specific transformations.

Homogeneous Coordinates Are No Magic

The concept of representing points as homogeneous coordinates is pivotal for enabling multiplication by [4x4] matrices in computer graphics. However, this representation is often managed implicitly in programming due to the homogeneous coordinate (w) typically being set to 1. As such, in C++ coding practices, a Point class will define a point with just three floats (x, y, and z), sidestepping the explicit declaration of the fourth w coordinate. When a homogeneous point undergoes multiplication by a [4x4] matrix, the transformed point's w coordinate is calculated by the matrix's fourth column coefficients. This column is usually (0, 0, 0, 1), resulting in a transformed w coordinate (w') of 1, thereby allowing the direct use of the transformed x', y', and z' coordinates.

However, this standard scenario shifts when dealing with projection matrices, where the fourth column's deviation from (0, 0, 0, 1) can lead to w' differing from 1. To adjust for this, the transformed coordinates (x', y', z') must be normalized by dividing each by w' to revert back to Cartesian coordinates, as demonstrated in the provided pseudo-code:

P'.x = P.x * M00 + P.y * M10 + P.z * M20 + M30;
P'.y = P.x * M01 + P.y * M11 + P.z * M21 + M31;
P'.z = P.x * M02 + P.y * M12 + P.z * M22 + M32;
w'   = P.x * M03 + P.y * M13 + P.z * M23 + M33;
if (w' != 1 && w' != 0) {
    P'.x /= w', P'.y /= w', P'.z /= w';
}

This approach eliminates the need for explicitly declaring a w coordinate in the Point's data type, allowing for on-the-fly computation of w' under the assumption that the point is inherently a Cartesian point or a homogeneous point with an undeclared w coordinate (always equal to 1). This method is particularly relevant when multiplying by a projection matrix, necessitating the normalization of all coordinates to set w' back to 1, thus reconverting it to a usable point within the Cartesian coordinate system.

The primary takeaway is that homogeneous coordinates typically require attention only when points are subject to a perspective projection matrix. This scenario is less common in ray tracing, where such matrices are not utilized. For further understanding of the w coordinate's role and application, the Perspective and Orthographic Projection Matrix lesson offers insights into projecting 3D points onto the image plane using perspective projection, clarifying the concept of homogeneous points.

Implementing this functionality in C++ can follow two paths:

Some developers opt to always calculate w' and adjust the transformed point coordinates by w' if it differs from 1. This method, though comprehensive, is often unnecessary outside the context of projection matrices and can lead to wasted CPU resources in the majority of cases.
Alternatively, one might disregard w and w', assuming the use of matrices with a fourth column set to (0, 0, 0, 1). For projection matrices, a separate function can be designed to handle w' and adjust the coordinates accordingly.

For clarity and to maintain a balance between generality and optimization, the example adopts a generic approach that includes computing w' and normalizing the coordinates when necessary:

void multVecMatrix(const Vec3<T> &src, Vec3<T> &dst) const
{
    dst.x = src.x * m[0][0] + src.y * m[1][0] + src.z * m[2][0] + m[3][0];
    dst.y = src.x * m[0][1] + src.y * m[1][1] + src.z * m[2][1] + m[3][1];
    dst.z = src.x * m[0][2] + src.y * m[1][2] + src.z * m[2][2] + m[3][2];
    T w = src.x * m[0][3] + src.y * m[1][3] + src.z * m[2][3] + m[3][3];
    if (w != 1 && w != 0) {
        dst.x /= w;
        dst.y /= w;
        dst.z /= w;
    }
}

Vector Transformation

Vectors, unlike points, represent direction and magnitude without an inherent position, making their transformation simpler than points. Since vectors do not require translation—because their position is inherently meaningless—we focus solely on their direction and possibly their length. This distinction allows for a streamlined transformation process that omits translation, as illustrated in the transformation code comparison between points and vectors.

Here's the straightforward code for vector transformation, which notably excludes the translation component present in point transformation:

V'.x = V.x * M00 + V.y * M10 + V.z * M20;
V'.y = V.x * M01 + V.y * M11 + V.z * M21;
V'.z = V.x * M02 + V.y * M12 + V.z * M22;

Implementing vector transformation in C++ is achieved as follows, maintaining the exclusion of translation to preserve the vector's directional integrity:

void multDirMatrix(const Vec3<T> &src, Vec3<T> &dst) const
{
    dst.x = src.x * m[0][0] + src.y * m[1][0] + src.z * m[2][0];
    dst.y = src.x * m[0][1] + src.y * m[1][1] + src.z * m[2][1];
    dst.z = src.x * m[0][2] + src.y * m[1][2] + src.z * m[2][2];
}

Transforming Normals

Normals, despite their vector-like properties, introduce additional complexity in their transformation, a subject to be elaborated in a dedicated chapter on Transforming Normals.

Concluding Insights

This discussion elucidates the preference for [4x4] matrices over [3x3] matrices, highlighting the essential role of the $c_{30}$, $c_{31}$, and $c_{32}$ coefficients in encoding translation values. The expansion to [4x4] matrices necessitates augmenting points with an additional coordinate, implicitly treating them as Homogeneous points for integration into Cartesian coordinate systems. Typically, the fourth column of transformation matrices is set to (0, 0, 0, 1), ensuring the w' coordinate remains 1. However, exceptions, such as projection matrices or shear transformations, may alter w', prompting normalization to maintain Cartesian relevance by adjusting x', y', and z' accordingly.

Alternative transformation representations exist beyond matrices, such as Euler's rotation vectors and Rodrigues' rotation formula), offering solutions to specific graphics problems, including the avoidance of gimbal lock — a limitation of matrix-based transformations. Quaternions, despite their complexity, are favored for their efficiency in interpolating rotations and avoiding gimbal lock, underscoring the diverse toolkit available for managing transformations in computer graphics.