Generating Camera Rays with Ray-Tracing

Standard Coordinate Systems

Reading time: 8 mins.

Standard Coordinate Systems

When it comes to the vertex transformation pipeline, some differences exist in the terminology used to describe the coordinate systems vertices are transformed to, particularly between two of the most common rendering APIs, OpenGL and the RenderMan Interface (the definition of the coordinate systems we are using in the lesson follow the definitions from this interface). If you are familiar with the OpenGL rendering pipeline, you maybe thought that we used some of the terms in the previous chapter incorrectly. First, it is important to remember that this lesson is not about OpenGL and the classic vertex transformation pipeline used by z-buffer-based renderers but about how primary or camera rays are generated by ray tracers. The process for generating these rays is the opposite process of the one that is being used by rasterization-based renderers to project points on the image plane. Second, note that the meaning of the coordinate systems used to define the space in which the points are transformed doesn't depend on whether we are using a ray tracer or a rasterizer but is more a matter of convention. Generally, world, camera, object, and raster space have the same definition in all rendering APIs but the definition of clipping coordinates which is more specific to OpenGL, NDC, and screen space can vary from system to system. To more clearly help you make the difference between at least the system we described in this lesson for building primary rays and the OpenGL vertex transformation pipeline (which we described in the previous lesson), we will briefly describe them again and compare them in this chapter. Hopefully, this exercise will remove any possible confusion that the reading of this lesson may have introduced in the graphics APIs experts' minds.

In ray tracing, we start from a pixel position which we transform to a point on the image plane from which we can build a ray direction. The process involves transforming the original "points coordinates" which are originally expressed in terms of pixel coordinates or raster space to NDC space and then screen space (a process we have described in the previous chapters). The raster to NDC conversion remaps the original pixel coordinates to the range of values [0,1]. The NDC-to-screen space conversion remaps values from [0,1] to [-1,1] if the image is square, and [-aspect ratio, aspect ratio] along the x-axis and [-1,1] along the y-axis if the width of the image is greater than its height. These coordinates are then scaled by the tangent of the camera's field of view divided by two. To summarize, we have been from raster to NDC, then to screen space.

The process for rasterization is different. The goal is to take a point in 3D space, project it on the image plane, and convert the resulting coordinates to pixel coordinates. We start from a point in 3D space which we need to somehow project on the image, a process for which a perspective (or orthographic) projection matrix is generally used. First, points are converted from 3D world space to camera space. The point's coordinates are defined in regard to the camera's coordinate system. Once in camera space, the points are then projected onto the image plane using for instance a perspective projection matrix. At this stage of the OpenGL vertex transformation pipeline, the points are said to have clipping coordinates. The points haven't yet been divided by their z-coordinate but OpenGL can already test if they are visible or not. If they pass this visibility or clipping test, they are then converted from homogeneous to Cartesian coordinates (a process known as a z or perspective divide). The points are said to have normalized device coordinates or to be in NDC space. Note that in this space, points' coordinates in OpenGL are contained in the range [-1,1]. In the RenderMan Interface, NDC space defines points whose coordinates are in the range [0,1]. Finally, points in NDC space are converted to raster coordinates or window coordinates which are nothing else that the point's final pixel coordinates.

We have summarized the different steps of the two systems and their associated coordinate systems in the following table:

Computing Ray Direction from Pixel Coordinates	OpenGL/Metal/Vulkan/DirectX (vertex transformation pipeline)
Position of a pixel (pixel coordinates). Raster space.	Point's coordinates are defined in world space.
Transform points from pixel coordinates or raster space to NDC space. Points coordinates are remapped to the range [0:1].	Transform point from world to camera space.
Remap point's coordinates from NDC to screen space. Point's coordinates in screen space vary between [-aspect ratio, aspect ratio] along the x-axis, and [-1, 1] along the y-axis.	Project point onto the near-clipping plane (the image plane) using a projection matrix.
Points in screen space are scaled by the tangent of the camera field of view divided by two.	Clipping coordinates (before the perspective divide). Point passes the clipping test: is it visible or not?
Coordinates of the resulting point are used to build the ray direction vector.	Perspective divide. The resulting point's coordinates are said to be in NDC space. They are now in the range [-1, 1] (including the z coordinate).
The ray origin and direction are transformed by the camera-to-world matrix. The ray can now be tested for intersection against the geometry of the scene.	The point is transformed from NDC space to raster space or window coordinates (pixel coordinates).

The point is transformed from NDC space to screen space or window coordinates (pixel coordinates).

Coordinate Space Name	Description
Object Space	Space in which 3D models are before they eventually get transformed. When you model an object in a 3D modeling application, the model is generally centered at the origin. However, you may need to move this object (which includes rotation, translation, and scaling) to another position in the rendered scene. Object space refers to the position of the object before it gets transformed to this final position in the scene.
World Space	The vertex coordinates are expressed in regard to the world cartesian coordinate system. Coordinates of the vertices after the object was transformed to its final position in the scene.
Camera Space	Objects are vertices in world space that are transformed to be expressed in regards to the camera's cartesian coordinate system (the world and camera cartesian coordinate systems are aligned when the camera is in its default position).
Screen Space	In the RenderMan specifications, this space refers to the coordinates of a point on the image plane of the camera. In OpenGL, it refers to the position of a projected point expressed in pixel coordinates (the origin of this coordinate system is the top-left corner of the frame).
NDC Space	Normalized Device Coordinates - In the RenderMan specifications, points expressed in this space have their x and y coordinates contained within the range [0, 1] (same as raster space but rather than being in the range [0:width] along the x-axis and [0:height] along the y axis, the coordinates both run from 0 to 1). In OpenGL, points expressed in these coordinates have their x and y coordinates contained within the range [-aspect ratio: aspect ratio] along the x-axis, and [-1, 1] along the y-axis (a summing the width of the image is greater than its height).
Raster Space	In the RenderMan specifications, it refers to the position of a point expressed in pixel coordinates (same as the screen space or window coordinates in OpenGL). In this space, 1 unit corresponds to 1 pixel (see NDC space).
Clipping Coordinates	Coordinates of the projected points before the z divide (OpenGL only).
Window Coordinates	Coordinates of the projected points in pixel coordinates (OpenGL only). Similar to raster space in RenderMan.

In conclusion, we have shown the two systems described are not doing the same thing and should not be confused. The system we have described in the previous chapters converts points defined in pixel coordinates to points on the image plane which are used to compute ray directions. The second system, the OpenGL vertex transformation pipeline, projects points from the 3D scene onto the image plane and converts these projected points to pixel coordinates.

A Word of Warning

Coordinate systems with standard names such as the screen space and NDC space both appear in the OpenGL vertex transformation pipeline as well as in the RenderMan Interface however they have in each one of these rendering APIs, different usage and definition. If you are familiar with the OpenGL rendering pipeline we recommend that you read the next chapter in which we explain the difference between the two rendering interfaces.