Explanation of the paper 'View-warped Multi-view Soft Shadows for Local Area Lights'

UPDATE: after receiving a small correction from one of the authors of the paper, the article has been corrected.

I will in this post write down an explanation of the intuition behind the paper 'View-warped Multi-view Soft Shadows for Local Area Lights'.

The paper describes a real-time technique for soft shadows. Area-lights do not create a sharp shadow. Instead, they create a soft shadow, as can be seen in the image below.

The yellow object is an area-light that emits light, and the blue object is an occluder that occludes light. The area-light casts a shadow on the flat plane below the occluder. Points that are not at all visible from the area-light are in complete darkness, which is called the umbra. Points which are partly visible from the area-light are part of the penumbra. The penumbra gradually goes from shadow to non-shadow, as is illustrated by the gradient in the bottom of the image. If a point is not in the umbra or the penumbra, all points on the area-light are visible from that point, and it is thus a non-shadowed point. On the other hand, if only a subsection of the area-light is visible from the point, then it will receive a certain amount of light from that subsection, and will thus not be in complete darkness, and it will be in the penumbra. Finally, points that are part of the umbra are completely hidden from the area-light, and thus receive no light from it, and are black. The below image illustrates soft-shadows, as implemented in a 3D rendering engine:

In order to implement soft shadows, we need to find a value in the range $[0,1]$ for each point on the plane. $1$ signifies that the point is not in shadow, $0$ means the point is in the umbra, and values in-between are part of the penumbra. Multi-view rasterization is a solution to the problem of rendering soft-shadows that provides good quality. The idea is simple: pick a number of arbitrary points on the area-light, and render shadow maps from these points, as illustrated below:

The red, green, and blue triangles illustrates shadow map frustums. Using these shadow maps, we can compute soft-shadows for every point on the plane. We loop through all the shadow maps, and check whether the point is visible from the shadow map frustum; this gives us a binary result, either a $0$ and $1$. Now, if we take the average of all these results, we get a soft shadow. This is because points that are visible from many shadow maps, like P1, will have a value that is close to 1 if we compute the average, which is exactly what we want. Furthermore, points like P0 that are in the umbra, will not be visible from a single shadow map, so P0 will get a value of 0. Finally, points like P2, that are near the umbra, will not be visible from very many shadow maps, and so it will get a value that is close to 0; again, exactly our desired outcome.

The main issue with MVR, is that we have to render a lot of shadow maps for good results. The paper presents an elegant solution to the problem: We convert our geometry into a point cloud, and reprojects these points into all of our shadow map frustums. First, we make a frustum that entirely contains the area light source, as is illustrated below.

As can be seen, the frustum entirely contains our yellow light source. We rasterize our geometry from the perspective of this frustum, and all geometry contained in the frustum will be rasterized into a fragment. Every rasterized fragment will give us a point, and we append that point to a buffer(for HLSL, a AppendStructuredBuffer). The below image illustrates the points we get from rasterizing a single triangle.