Friday, January 20, 2012

Sample Distribution Shadow Maps

Correct choosing space for doing calculation on the incoming data is very important thing. Sometimes it can give you significant performance increase or memory reduction gain. There a lot of different examples of these smartly selected spaces:
  •  Frustum culling may be performed in the clip space instead of the world space. This simple moving provides significant performance boost because we can minimize calculation operation count and reuse some extended instruction of CPU like SSE.
  •  Deferred shadows techniques allows save performance by not doing expensive filtering calculation on the long distance splits and at the same time allows save memory by doing the shadow resolving process consecutively for each individual split (reusing the same shadow map more times during a frame). This becomes available because we've moved calculation from the world space to the screen space.
  • Deferred shading and its different variations can also save a lot of performance by reduction overdraw effect to zero. Again this becomes available by moving from world space (or object space lighting calculation) to just screen space.
The list may be continued and I’m sure that you also can give some another examples.

Recently I’ve read paper about Sample Distribution Shadow Maps (SDSM) technique which tries to improve the shadow mapping quality. But it does that in very ingenious way. Authors of this technique use clip space of the light to select reuse useful area as much as possible. Their work based on the Cascaded Shadow Map (CSM) technique. But instead of manually selecting partition scheme for the camera frustum, like Parallel Split Shadow Mapping (PSSM) technique does, they use information about shadow sample distribution in the light clip space.
On the image you can see shadow sample distribution (orange) in light clips space and smart bounds(red, green, yellow, blue respectively) selection for each individual split.
 So by doing this they can greatly reduce AABB’s size of the needed for drawing into desired shadow map objects. This gives opportunity not only being rid of manually selecting split scheme parameters but significantly improve using of useful area of each individual shadow map and often even improve performance because AABB of visible object for drawing into  the shadow map is usually smaller. More over algorithm can skip occluded object from the drawing into the shadow map, because pixels in this shadow map are not needed during the shadow resolving process.

Of course there are some drawbacks:
  •  shadow edge shimmering because shadow’s volume changes every frame. But authors say that this is not a problem because with the tighter shadow volume and in fact with much more intelligent shadow space usage they can achieve almost sub-pixel quality along the whole view frustum.  
  •  Unfortunately this technique makes impossible using CPU frustum culling without overhead on the CPU <-> GPU sync. But I think with help of fully hardware frustum and occlusion culling (like this) it will be almost solved.

3 comments:

  1. And what about offload that completely to CPU - do that like software occlusion culling done, half space rasterization to small buffer?

    ReplyDelete
  2. SDSM uses main camera depth buffer to understand which pixels are needed in the shadow map during the real shadow resolving process (they call this "sample distribution"). And then based on this distribution they try to create tight bounds for each split. In other words they try to cover distribution area with 4 (for 4 split SDSM) rectangles and they do that in order to minimize waste space. It's like trying to cover arbitrary 2D figure with 4 rectangles.

    I've attached the screen to the original post for better understanding what is going on.

    ReplyDelete
  3. I've thought about your idea slightly more and now I think it's really god idea. If our engine uses SOC, before rendering shadows we will have filled OB and if it'll be not so big (for instance 200x100) we can do the all work entirely on the CPU.

    ReplyDelete