Article
files/profile

How to Optimize Virtual Function Calls in C++

Alex J. Champandard on July 31, 2007

High-level logic is typically full of indirection, so performance suffers when it’s used heavily. This is the case in game AI when you start adding more behaviors; a constant overhead for virtual calls quickly takes its toll.

So what do you do to optimize your engine, once the typical bottlenecks (i.e. collision queries, animation, path-finding) are taken care of? The best thing you can do is optimize your logic framework… and virtual functions are the primary target!

1) Prepare for Platform-Specific Optimizations

The first strategy is to abstract away frequent calls to virtual functions. (Yes, this means hiding slow indirection behind fast indirection!) You’ll have to do lots of experimenting to optimize for each platform, so it’s best to keep the performance sensitive code localized.

Make sure your most common observer dispatches, uses of the strategy patterns, or frequent virtual calls in general are hidden behind macros or ideally inline functions (these have no overhead in Release builds). You want to to be able make changes in very few places when optimizing.

2) Call Fewer Virtual Functions

If you have a large API, many virtual methods might use the empty default implementation, or may be called for very small operations. Even if it’s a nice modular interface, this isn’t ideal for speed.

  • Consider simplifying your API so all function implementations are useful and mandatory. If necessary, consolidate multiple virtual functions into one big function. (Keep the API the same, just move the point of indirection.)

  • In some cases where functions are called infrequently, it can be faster to have a conditional check of an empty pointer, then double dereferencing. (In the best case, this gives you a 200% speed increase.)

This last tricks strategically increases the level of indirection for performance, but it should be avoided for frequent calls.

MyObject& obj;
MyObject* ptr;
// Profile this:
obj.doOptionalStuff();
// Against this:
if (ptr != NULL) {
    ptr->doStuff();
}

3) Consider Memory Allocation

The performance hit for indirection is much lower when your objects are in the cache. You want to be allocating memory as wisely as possible. Objects that are called nearby in the code should be together in memory.

Preventing excessive cache misses isn’t always straightforward for logic built as a tree or graph. You’ll need to experiment with different strategies, but just having your own allocators for small virtual objects is a good start. Specifically, keep all the logic for actor AI in a small and well identified block of memory.

4) Use Compile-Time Indirection

In many cases, indirection is only necessary for ease of development. If that’s the case, you probably don’t need runtime indirection, and you can optimize it out using C++ tricks.

  • Consider using a template method pattern, where the implementation of the algorithm is a C++ template. This allows the compiler to optimize calls to functions. On the down side, it can bloat the code and it’s more awkward to write.

template 
class MyAlgorithm : public BASE {
    void process() {
        // Non-virtual call can be optimized.
        while (BASE::step()) {
        }
    }
}
  • Alternatively, consider helper classes in the form of the curiously recurring template pattern (CRTP). This the base class to access member functions of the derived, which helps optimizations.

5) Reduce Indirection

For certain functions types, it may also be faster to store a fast delegate. If a delegate points to a non-virtual class, this can be optimized better by certain compilers. Sadly, however, this varies greatly from one compiler to another.

// Self-contained member function pointer.
using namespace fastdelegate;
typedef FastDelegate()> ProcessStuff;

Summary

Now I’m supposed to say that premature optimization is the root of all evil. That may well be true, but if you don’t establish an efficient framework upfront, once you’re in production it’s usually too late to make drastic changes.

Professional AI developers take a few years and multiple attempts to get it right, just like dynamic programming mature and become more efficient… so do your homework and stay ahead of the pack!

Do you have any tricks to optimize the indirections in your AI logic?

Discussion 4 Comments

gware on July 31st, 2007

Not only virtual calls should be optimized :) Although I agree with the moto "early optimization is the root of all evil" I will also say that it's important to help the compiler understand what you want, so that he will the one to do the evil part ;) (or at least some of it) Anyway, here are some tricks I find useful : - avoid pointer to functions. They cannot be optimized so it's better to call directly the function than passing pointers to function when possible. (stl::for_each is a good exemple of things you should try to avoid since it uses a functor, and most of the time it's easier to loop on each object) - use the const keyword. Some compilers (on some architectures) treat variable differently if they are const or not, thus it helps them optimizing register use. - use the const keyword on functions. It helps the compiler detect if an object is modified or not, and it may be able to optimize generated code. - prefer : unsigned int const vect_size = myvect.size(); for (unsigned int index = 0; index < vect_size; ++index) instead of : for (unsigned int index = 0; index < myvect.size(); ++index) which can generate calls to size at each iteration (in case vect has been modified during the loop) - do not mix inline a virtual calls - try to avoid virtual methods in little objects, since there is an overhead for the vtable.

gware on August 1st, 2007

I forgot to mention the pimpl idiom (see wikipedia, GoF for "opaque pointers" or "pimpl") as a very efficient way to reduce indirections. From my experience, a lot of interfaces are created because of cross platform objectives. As an example: you have a FileSystem class that will abstract platform specific operations, and two platforms Win32 and Linux. No need to create an interface and make all methods go through a vtable indirection. From my point of view interfaces (virtual methods and such) are ONLY needed for run-time polymorphism. If during your execution all pointer to some interface points to the same kind of objects you probably do not want to use an interface. First check if policies are a good solution (see "Modern C++ design" by Andrei Alexandrescu), which is Alex's fourth point. Policies offers something great: users are able to change the implementation. Sometimes this is not what you want. For example, you may want to totally hide the internals from your users. This means they do not provide any template parameter( and do not need to rebuild if they use a provided typedef and you changed the implementation). This is where the pimpl idiom comes into play. By using a class wrapping an implementation pointer, you can provide your user a fully static interface with no virtual methods, and most of the time methods prone to inlining (thus no indirection at all). Also this can help reducing build time, since nobody except the wrapper class need to know the implementation : you do not need to rebuild every source using your implementation anymore.

alexjc on August 3rd, 2007

Wow, thanks Gabriel. That's some great optimization advice! I tried to cover the most important points about virtual functions, but I didn't realize I'd left so many tips out :-) Anyway, here are another two: [LIST] [*]Use simple data-types as return values if necessary (like integers, booleans, floats, etc.) [*]Keep the number of input parameters short (1 or 2 maximum) and keep them simple, in the same way as above. [/LIST] These tricks allow the compiler to use registers instead of the stack to pass around values, so it reduces the overhead of any function call. Alex

bknafla on February 19th, 2008

Another idea to optimize virtual function calls is to call them with a batch of data to work on. If, for example, a class offers a virtual function call to add elements to it also offer a call to add a number of elements at the same time. The virtual member function can batch process the elements and therefore amortizes the virtual function call. Cheers, Bjoern

If you'd like to add a comment or question on this page, simply log-in to the site. You can create an account from the sign-up page if necessary... It takes less than a minute!