Evil Stick Blog

DX9 Engine Design, part 1

by Evil Stick Man on May.20, 2009, under Game related, Technology

Note: A lot of this is based on http://ati.amd.com/developer/gdc/D3DTutorial3_Pipeline_Performance.pdf, which was a presentation given at GDC several years ago (I don’t know the exact year). It is based on DirectX 9, and while most of the info is probably still valid, it wouldn’t surprise me if there are several items that were no longer valid as of the advent of DirectX 10 and later. Most of what I say will be a basic thought exercise stemming from the presentation, coupled with my experiences working with a DX9 engine at my last position.

At the core of every 3D game is its graphics engine. The graphics engine determines what the player sees, which then determines how they react. It’s something that’s very easy to do wrong, and the specifics are often under heated debate both in a company and on the web. Below are some guidelines to use when designing a high-performance 3D engine, or when optimizing an existing engine. There won’t be any code here, as I currently don’t have an engine utilizing these principles (it’s currently under development), but more some tips I discovered on the job.

Know thine enemy

First, before we embark on any engine enhancement journey, we need to make sure we can actually pinpoint where our engine slowdown is originating from. There’s no point in streamlining your vertex information if what you perceive as slowdown is actually taking place in your AI or physics code. There are a number of ways to gauge overall engine performance, and determine at least the major causes of slowdown:

  • Keep ample statistics - the success of your optimization effort, as well as turnaround time on improvements, will hinge on your ability to accurately judge the results of your improvements quickly. One of the best ways to do this is to develop a set of metrics in your engine. Proper encapsulation coupled with judicious use of time keeping will go a long way towards identifying problem areas. Encase your code in #ifdef __DEBUG” blocks and you can’t go wrong. At my last position, we often relied on these statistics to detect major problems both in the engine and in the engine management structure - for instance, if we weren’t releasing an image and were leaking memory to the video card, we could tell pretty quickly by watching a list of loaded images, coupled with instance counts.
  • Find some effective third-party tools - when I was tracking down engine speed blocks, both PIX for windows (included with the D3D SDK) and NVIDIA’s PerfHUD proved invaluable. PIX was great for tracking actual geometry issues - if something was coming out garbled, I could quickly use PIX to view the offending D3D call and the vertex/index buffers causing the trouble. PerfHUD, on the other hand, did a great job of providing metrics on the graphics pipeline itself. At a glance it’d give you load on your shader processors (vertex, pixel, and geometry), number of DIPs, memory usage, triangles in a scene, and so on. The instruction manual for this tool is in excess of 40 pages, and it is packed full of useful ways to utilize this tool.
  • Profiling helps, but it depends on your profiler - I was able to obtain some info using Visual Studio’s profiler, but at a certain level of engine complexity it fails to be useful. Most of the discoveries I was able to make were largely due to our own engine metrics, as well as use of the aforementioned tools.

Basically, you’ve got two bottlenecks in your engine: the CPU, and the GPU. Maxing one will impair the performance of the other. 95% of games are CPU limited, so start there in your quest for optimization. You can tell pretty quickly if you’re CPU limited by watching the graphs in PerfHUD - if you’re pinging only 25% GPU usage and only sending out 100k triangles and you’re still running slow, odds are you’re CPU bound and no amount of engine optimization will help you. In fact, contrary to popular wisdom, in these cases you want to do more work on the graphics card, not less, to take some load off your CPU.

Watch your creations and deletions

If you’re creating and deleting graphics objects during your game’s main loop, I highly recommend reorganizing your code structure. Creation and deletion are time-consuming operations, and can drastically impact performance. Your game should have, at least, a load and an unload phase that doesn’t impact play at all. There are exceptions to this (specifically when dealing with large, open worlds), but if your levels are small there should be no reason for you to constantly create and delete objects in your game loop. Store objects to be destroyed in a queue, and kill them when you have some extra time in your rendering (metrics in your engine help here).

Use all of your time

Set your D3DPRESENT settings to “IMMEDIATE,” and take control of engine rendering yourself. Letting DirectX do vsync for you is nice, but when you use this your present calls will block until their 1/60th of a second is complete (at least in my testing that would happen). In a highly dynamic environment, such as a game engine, this will mean that you’re going to start dropping frames once your non-rendering code takes more than its alloted time to run, and that for a brief segment of time you’ll be effectively running at 1/2 your refresh rate.

Basically all you need here is a method to call “present” at a regular time. You can either set a timer event to simply fire off a present call every 1/60th of a second (if you’re going for 60FPS), or at the least you can control the slowdown on a more granular level. By making some intelligent use of engine compartmentalization, you can also spread out your engine’s labor so that you minimize cyclic slowdown. Keep a queue of tasks that don’t necessarily need to be performed immediately, and then wait for some free time in your engine. Keep an overall frame time, and if you get to a draw call after having taken less time than expected, use that extra time to kill some objects, or load some objects, or do some poly sorting. How you end up specifically using this will depend on your engine architecture, but this’ll give you one more dial to play with when optimizing for performance.

The graphics card is a pipeline

One thing to keep in mind is that a graphics card, at its core, is largely just a big parallel-processing beast. It wants to do one thing, and do it repeatedly. Think of your graphics card as a water pipe - it takes the water it gets in a fixed path, and does it very well. However, if you want to change that path you need to completely drain the pipe before you reconfigure things, which reduces the amount of water you can push through. With that analogy in mind, here are some sorting criterion from your object that transcend the standard “Sort objects front to back, and back to front for transparency” logic.

Avoid excessive shader changes

After you’ve sorted your polys from front-back, try sorting your object by shader. The vital distinction here is to avoid excessive switches between the programmable pipeline and the fixed function pipeline. A previous engine I worked on had cars that were made up of multiple objects split pretty evenly among shaded and unshaded. By organizing the game to draw all the shaded parts first and then draw all the fixed function stuff, we were able to get about 5 extra frames per second.

Avoid excessive state changes

State changes (such as texture state changes) are processed by the CPU, and not the graphics card. After you’ve sorted your objects by z-val (which you might try doing in the shader when you’re CPU-bound) and vertex shader, try sorting by texture stage state.

Avoid excessive texture changes

Sort your objects by the texture they use, if possible.

Divest your rendering code from your game

At its core, the graphics card just wants to sit there and draw polys. It doesn’t care about your awesome physics, or your lifelike AI, it just wants to make pretty pictures. So try to divest your actual geometry data from your game logic as much as humanly possible. The ideal engine in my mind has the rendering code running on a separate thread, just spinnig in a loop, flinging polys at the graphics card. Consider storing all of your geometry objects in a geometry manager, which handles all of your sorting and communication with the card. Return a pointer to your parent object if you need it, and you’re good to go. More on this in later posts

There’ll be more to come here, but I figure that’s enough for now. Watch for part 2 in the near future.

:, ,
No comments for this entry yet...

Leave a Reply