Splat (2024)

source code

A simple and not very optimized Gaussian splat renderer.

Developed in the early days of a university project, in order to help me understand how Gaussian splats work.


Notes    

Below, I’ve included the notes I took on the math behind 3DGS. They’re not very fleshed out, but I may as well just put them here so they don’t sit alone in some private git repo :D

Step 1: Reading the Data    

The center of each Gaussian is given as 3D coordinates \(\vec c\):

\[\vec c = \begin{bmatrix} x & y & z \end{bmatrix}.\]

The point_cloud.ply files store scale and rotation as a vector and quaternion, respectively:

\[\vec s = (s_x, s_y, s_z)^T,\quad q = (x, y, z, w).\]

From these, we can obtain \(3\times 3\) scale and rotation matrices, \(S\) and \(R\) respectively. For splatting, we are interested in the covariance matrix \(\Sigma\), which describes how the Gaussian distribution falls off from the center outwards – this is kinda like a three-dimensional version of what the variance \(\sigma\) is for a 1D Gaussian distribution.

The covariance \(\Sigma\) can be obtained as the result of applying both the rotation \(R\) and scaling \(S\):

\[\Sigma = RSS^T R^T = (RS)(RS)^T.\]

Note that since \(\Sigma\) is the result of a multiplication of a matrix \(RS\) with its transpose \((RS)^T\), it is symmetric, i.e. of some form

\[\Sigma = \begin{bmatrix} a & b & c \\ b & d & e \\ c & e & f \end{bmatrix}.\]

Color is stored using spherical harmonics – this is what makes color view-angle-dependent.

However, the first harmonic \(Y_{0}^{0}\) does not depend on the angles \(\theta\) and \(\varphi\), hence can be used to obtain a view-independent “base” color that, for a simple Gaussian renderer will look okay enough.

\[Y_{0}^{0}(\theta,\varphi)={\frac{1}{2}}\sqrt{\frac{1}{\pi}} \approx 0.2820947917.\]

The point_cloud.ply files store the coefficients as the properties f_dc_0 through f_dc_2. The colors can be obtained as such:

\[r = 0.5 * Y_{0}^{0} * \texttt{f\\\_dc\\\_0}\] \[g = 0.5 * Y_{0}^{0} * \texttt{f\\\_dc\\\_1}\] \[b = 0.5 * Y_{0}^{0} * \texttt{f\\\_dc\\\_2}\]

The opacity value is stored as the opacity property, where we obtain \(\alpha\) like:

\[\alpha = \left(1 + e^{-\texttt{opacity}} \right)^{-1}\]

Note that this way, the values \((r,g,b,\alpha)\) are between \(0\) and \(1\).

Step 2: Rendering the Thing    

The overall approach is this: Given our 3D covariance matrix \(\Sigma\), as well as the view matrix \(W\) and the projection transform, find the 2D covariance matrix \(\Sigma'\) that approximates the projection of the 3D Gaussian onto the screen as a 2D Gaussian.

Typically, we’re given a \(4\times 4\) model-view matrix \(W'\). To work with the \(3\times 3\) matrix \(\Sigma\), we’ll take the upper left corner of \(W'\) to get a \(3\times 3\) matrix \(W\), thereby disregarding translation.

We’ll also want to perform projection, for which we have the \(4\times 4\) projection matrix \(P\). This transformation is, however, not affine, i.e. we can’t just take the upper left corner here. Instead, the non-affine projection is approximated as follows:

\[\vec v_p + J(\vec v - \vec v_c)\]

This is using the first two terms of the Taylor expansion, and akin to the 1D case of approximating some function \(f(x)\) near a point \(x_0\) as \(f(x) \approx f(x_0) + f'(x_0)(x-x_0)\). Here, we’re approximating near the Gaussian’s center \(v_c\). The center, projected into view space is \(v_p\), equivalent to the \(f(x_0)\) term. For what would be the derivative \(f'\) in the 1D case, we’re using the Jacobian \(J\).

For projecting, we have the following transformation of a point \(\vec u\) onto the near plane at \(n\):

\[\begin{bmatrix} x_0 \\ x_1 \\ x_2 \end{bmatrix} = \mathbf m (\vec u) = \begin{bmatrix} -n u_0 / u_2 \\ -n u_1 / u_2 \\ || (u_0, u_1, u_2)^T || \end{bmatrix}\]

The Jacobian \(J_{\mathbf m}\) of the function \(\mathbf m\) is defined as

\[J_{\mathbf m} = \begin{bmatrix} \displaystyle\frac{\partial \mathbf m}{\partial u_0} & \displaystyle\frac{\partial \mathbf m}{\partial u_1} & \displaystyle\frac{\partial \mathbf m}{\partial u_2} \end{bmatrix}.\]

This yields:

\[J_{\mathbf m} = \begin{bmatrix} -n/u_2 & 0 & u_0 / ||\vec u|| \\ 0 & -n/u_2 & u_1 / ||\vec u|| \\ u_0 / u_2^2 & u_1 / u_2^2 & u_2 / ||\vec u|| \end{bmatrix}\]

The \(2\times 2\) covariance matrix in camera space is obtained by calculating

\[\Sigma' = J W \Sigma (JW)^T\]

where we discard the third row and column.

We can then get the eigenvalues \(\lambda_{1/2}\) and eigenvectors \(\vec e_{1/2}\) of \(\Sigma'\). The vectors \(\vec b_i = \sqrt{2\lambda_i} \vec e_i\) span a screen-space quad corresponding to the projection of the Gaussian.