Posts Tagged ‘Optimization’

Shader Variant Stripping in Custom SRP

This week I’ve been looking at Shader Variant Stripping while using a Custom Scriptable Rendering Pipeline (SRP).

Last year I was working on a small project called “The Maze Where The Minotaur Lives” to create a Custom Scriptable Rendering Pipeline and learn more about Shaders and Rendering in Unity.

And on and off for the last year, I’ve been taking what I’ve learned to create another Custom SRP to refine my understanding and address issues I encountered during that project.

The Retro Rendering Pipeline pixelates Shadows, Specular Highlights, Reflections, and Refractions.

One of those issues was Shader Variants.

Unity’s Shader compiler automates the creation of every possible shader variant. For every keyword used in a “shader_feature” and “multi_compile” pragma, it generates a variation based on combinations of other shader features and multi-compile keywords.

This is great since it means you don’t have to manually create these variations yourself, which is very time-consuming. But what it does mean is that Unity will automatically create EVERY possible shader variant, whether they are used or not, which is very time-consuming. >: (

This leads to extremely long build times.

This isn’t too bad, since Unity will cache the shader compiler results during the first build so that it can be used in subsequent builds. But, when you’re working on a project where shader changes are frequent, it doesn’t help much. Requiring shaders to recompile, every single time.

What’s worse is Unity will bundle all the unused shader variants with your build, whether they are used or not, adding unnecessary bloat to your final build.

The initial build times I was experiencing on The Maze Where The Minotaur Lives were 2 hours+, creating 120,000+ shader variants for a single diffuse shader. It may have been more since it was some time ago.

Two hours is not something to be proud of and every small change to a shader meant torment.

At University (in 2007), when I was learning about animation, rendering, and compositing for Film. I would often hear the bleeding edge students gloating about having 3-4 hour render times for a single frame in 3DS Max.

They would turn on all the settings, Global Illumination, Ray Traced lighting, and anything else that online tutorials said would make it look pretty.

I never understood that.

Incidentally, my render times were 4 minutes per frame and looked as bad as theirs.

A single frame from my short animation from 2007, called Ninja Stars and was created in 3DS Max.

As an experiment, I wanted to see how long it would take to compile all the shaders when I removed a few optimizations.

This is what it looks like without any pragma optimizations on the Retro Diffuse shader in The Maze Where The Minotaur Lives.

And this is what it looks like trying to build it.

That’s a lot of shader variants.

To get it working again, I used the “vertex” and “fragment” suffixes to help narrow down which part of the shader the keywords should compile in. I also used the “local” suffix, to ensure that certain keywords only took place within their own shader, and not in combination with other shaders.

Adding the “local”, “vertex” and “fragment” suffixes to the pragma definitions on the same Retro Diffuse Shader.

And this is what the build looks like now.

On the left is the number of Vertex Shader variants (12,288), and on the right is the number of Fragment Shader variants (392,216).

It’s an improvement, but it would take hours to build and it also introduces many variants that will never be used. And for Android builds, these numbers double.

To optimize and strip this further, I wrote a preprocess that uses the IPreprocessShaders interface and used the rendering pipeline settings to help determine what shader variants could safely be left out.

This code snippet removes shader keywords related to Direction Shadows, removing “DIRECTIONAL_SHADOWS_ENABLED” and “_DIRECTION_PCF” keywords.

This is what the builds look like now.

Significantly fewer shader variants.

The stripping process can take some time depending on the number of variants, taking up to 10 minutes+ for some shaders. But it’s a huge reduction in build time, from hours to minutes. It also leaves out the shaders combinations that will never be used.

But when working on Shaders, 10 minutes is still a lot of time to test small changes.

So in the new Retro Rendering Pipeline, I wanted to improve it, focusing on removing unnecessary features and simplifying the Material and Rendering pipeline GUI.

On the left, is the rendering pipeline used in The Maze Where The Minotaur Lives. On the right is the new rendering pipeline.

The biggest change I made was combining all the Light’s Shadow Filtering into a single option for all Light types, which reduces the number of shader variants significantly. I also moved the Specular lighting model to be global instead of per material, simplifying the material interface to have a simple “Use Specular” check box.

The new Retro Diffuse shader during build time (without stripping) generates 42,000+ Fragment Variants, taking 1.5+ hours to compile. With the variant stripping preprocess, the number is down to 512, taking less than 5 minutes to compile.

The new Diffuse shader removes Shadow Filtering per light type and combines them into a single set of Keywords, reducing the number of variants.

The reduction in build time is very welcome.

If you’d like to know more, the Unity Blog posted a great article, “Stripping scriptable shader variants“, covering shader stripping in-depth. It’s worth a read if you’re working with Shaders and/or Custom Rendering Pipelines.

Visibility Culling for the Maze

This week I worked on optimization, focusing on visibility culling in the maze to reduce the amount of overdraw and learn where it fits into the Unity Scriptable Rendering Pipeline.

On the left, is how the maze looks to the player. On the right, is what is being drawn to the screen.

Since the maze is procedurally generated at runtime, I can’t use Unity’s in-built occlusion system to minimize the overdraw. That means I need to come up with my own solution to manage occlusion culling.

I started out by researching a few techniques for occlusion culling, looking at how classic rogue-like games do shadow casting in tile-based dungeons, and toying with the idea of some kind of specialized flood fill algorithm to figure out what should be visible.

It was interesting, but it all felt a little overkill.

So I ended up going with a simpler approach, which was raycasting and grid intersections.

The test scene I used to code the grid intersection test. The white box is the starting point, the blue line is the ray, the red points are intersections on the Z-axis, and the greens point are intersections on the X-axis.

Since it’s in first-person, I only needed to make sure that walls immediately in front of the camera were visible. And since the maze is generated on a grid, grid intersection made the most sense.

First, I cast a series of rays across the viewing angle of the camera like a fan, ignoring any up and down rotation. When the ray hits a wall, I convert the hit position into a grid coordinate, which I then keep in a “visible” list.

Sweeping from left to right inside the camera. Green lines indicate walls that have been detected for the first time to keep them visible.

I then use the line between the camera and the ray’s hit point to calculate where it intersects the maze grid, and calculate the visible cells along that line, adding any new cells to the “visible” list.

The walls detected by the maze culling system. Notice how some of the shadows don’t look correct.

I then expand the “visible” list to include neighboring cells. This ensures that shadows cast by walls on the opposite side don’t disappear and prevents creating light where there shouldn’t be any.

The detected walls expanded to it’s neighbours.

Any walls that aren’t detected by the rays, are turned off, or culled.

It worked for a stationary camera. But once I started moving around the maze there were holes where walls should be. This problem was especially noticeable near corners.

Some walls not being detected and culled, creating a gap in the middle.

To fix it, I modified the rays to fire from the side of the camera instead, which allows the culling system to see around the corners before the player does.

The modified ray cast to fix culling visibility around corners.

The problem still happens, but it’s less obvious.

No more gap in the walls.

With culling enabled, the amount of overdraw is significantly reduced. And because there is less to render, it also reduces the number of walls and objects that need to be rendered for shadow casting lights.

After a bit of tweaking and adjusting, I managed to increase the frame rate by 50 to 100 frames.

The overdraw image on the right now renders significantly less walls thanks to culling.

I originally learned the raycasting technique last year in a course by Gustavo Pezzi of Pikuma called “Raycasting Programming with C”. It’s a great course that goes over creating a classic Wolfenstein-style game. If you’re interested in that kind of thing, you should check it out at

There are still a lot of improvements I’d like to make to how the maze culling works, especially since it only culls the inside of the maze. But I’m not going to worry about it too much yet.

I’m not sure what I’ll be moving on to next, but I am hoping to start finishing this little project up. So I guess my focus next will be to finish it maybe?