Skip to content

Depth peeling optimizations

Performance tuning and new optimizations for vtkDualDepthPeelingPass.

This branch adds the following optimizations to the new peeling code:

  • Delayed occlusion queries: Rather than check the occlusion ratio (which causes a full pipeline stall) after every pass, a vtkOpenGLOcclusionQueryQueue object is used to track the occlusion queries. These are only checked after significant numbers of passes; for example, queries are checked after the number of passes needed to complete the last frame, as there is typically little variation in depth complexity between frames.

  • Depth complexity analysis: During the pre-peeling initialization pass through the geometry, the stencil buffer is used to count the number of non-occluded transluscent fragments that will be rendered to each pixel. This information is asynchronously transferred to system memory while the first few peeling passes occur. When it is available, it is inspected to determine the exact number of passes needed to fully process the scene.

  • Stenciled fullscreen blend passes: At several points during peeling, full-screen textures need to be blended to produce either intermediate or final renderings. These passes re-use the stencil buffer used for depth complexity analysis to limit the blending operations to only those pixels which should have fragments from the current peel layers.

New depth peeling benchmarks show these modifications having a significant impact on performance. The benchmarks were run on an nVidia Quadro 2000M with the command:

TimingTests -regex ^(testname)$ -ss 8 -se 8

The test details and results (with and without this branch's modifications) are shown below. Measurements were taken from the average of 5 trials.

Test name:                  DepthPeeling01
Depth Complexity:           2-6
Number of Triangles:        1M
Frame Rate (FPS):           85.4 --> 91.3 ( +6.9%)
First Frame Time (ms):      235  --> 223  ( -5.4%)
Subsequent Frame Time (ms): 11.7 --> 11.0 ( -6.4%)

Test name:                  DepthPeeling03
Depth Complexity:           8-12
Number of Triangles:        3M
Frame Rate (FPS):           20.1 --> 23.6 (+17.5%)
First Frame Time (ms):      261  --> 246  ( -5.7%)
Subsequent Frame Time (ms): 50   --> 42   (-14.9%)

Test name:                  DepthPeeling05
Depth Complexity:           14-20
Number of Triangles:        5M
Frame Rate (FPS):           9.15 --> 10.8 (+17.5%)
First Frame Time (ms):      304  --> 271  (-10.7%)
Subsequent Frame Time (ms): 109  --> 93   (-14.9%)

Test name:                  DepthPeeling10
Depth Complexity:           25-40
Number of Triangles:        10M
Frame Rate (FPS):           2.57 --> 2.84 (+10.4%)
First Frame Time (ms):      541  --> 357  (-33.9%)
Subsequent Frame Time (ms): 389  --> 353  ( -9.4%)

Merge request reports