The Curiously Recurring Template Pattern

The CRTP is a C++ idiom which is both confusing when you read about it, but straightforward enough that you may have already used it without realizing it. Its kind of funny that way; a strange recursive templating technique.

I was working on some new system at work which involved some tedious class hierarchies in performance-critical components. While doing some research, I fell up on the CRTP wiki page while researching the enduring battle between the performance of polymorphic code and the complexities of modern CPU architectures.

Hold on a minute. Some background first.

Dynamic polymorphism is a language abstraction in which different types of objects can use generic interfaces, and the type of object (and thus, the nature of the work to be done) is determined at runtime. This is achieved in C++ though the use of virtual functions, which in turn have entries in a class virtual function tables. Because parent classes and sibling classes can share virtual function names, the purpose of a vtable is for a class to say “my implementation of function f() is located here.” When an object of a particular type is instantiated, the compiler throws into it a pointer to the vtable for the proper class. Class functions which are non-virtual do not require entries in a vtable because there is only one implementation and location of that function.

Static polymorphism is a language abstraction where different types of objects are determined at compile-time, and as a result do not require work at runtime to determine the type of the object or variable. As a result, class hierarchies built-in this fashion don’t require a vtable. This is where the CRTP comes in.

Say you have a class hierarchy like this:


class A
{
public:
  virtual void f () = 0;
  void g ();
};

class X : public A
{
public:
  void f ();
};

class Y : public A
{
public:
  void f ();
};

A is a parent class with a pure virtual function f(), with X and Y implementing A. A has an additional non-virtual function g(). Now say you have a function foo(A &a):


void
foo (A &a)
{
  a.f ();
}

When foo() calls A::f(), the following things happen:

  1. The object’s vptr (pointer to the class vtable) is used to retrieve the appropriate vtable
  2. The address of f() according to the vtable is loaded from an index into the vtable
  3. The program jumps to the loaded address

You can use the “-fdump-class-hierarchy” flag in g++ to generate the class hierarchy and vtable structure for your code. For the above code, the vtables look like this:

Vtable for A
A::_ZTV1A: 3u entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI1A)
16    __cxa_pure_virtual

Vtable for X
X::_ZTV1X: 3u entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI1X)
16    X::f

Vtable for Y
Y::_ZTV1Y: 3u entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI1Y)
16    Y::f

You can see above entries in each class for the default constructor, the copy-constructor, and then an entry to the pure virtual function f(), with their offsets into the vtable. They’re eight bytes apart because its table of pointers. Note that A::g() does not appear in the vtable – this is because g() is a non-virtual function in A.

Obviously this requires a little more work than a non-virtual function, and there’s been research to determine what exactly the cost of this extra work is. As I mentioned earlier, there’s a lot of argument over the topic, and although it’s really no more expensive than de-referencing a pointer for every virtual function call, there are more subtle consequences regarding branch prediction, pipelining, etc that weigh in.

Regardless of all of that, if you have some very intensive code that uses lots of dynamic polymorphism, it’s going to cost you. Maybe not a lot, but some.

So what does the CRTP have to do with this? Well the goal is to resolve as much type information as possible at compile-time so that you can skip the vtable lookup step at runtime. We can do this by using templates inside the parent class definition. Now consider the following, revised class hierarchy:


template <typename Child>
class A<Child>
{
public:
  void f ();
  void g ();
};

class X : public A<X>
{
public:
  void f ();
};

class Y : public A<Y>
{
public:
  void f ();
};

Now we’ll implement A::f() as so:

template<typename Child>
void
A::f ()
{
  (static_cast<Child*> (this)) -> f ();
}

Ok, this probably need some explaining.

Note how the parent class uses a type Child as its templated typename. In typical uses of generic programming, this is not particularly unusual. Now, each of X and Y are defined as a subclass of A<Child>, where they pass their own class name as the templated type. This is confusing; however, totally syntactically legal. At compile-time, function Child::f() will get placed directly into A::f(), thus avoiding the need for a runtime vtable lookup. As a result, the virtual identifier is dropped from the parent class A, and now the vtables for all classes are empty. Thus, no runtime overhead.

What’s the tradeoff? Well, our function foo(A &a) from earlier no longer works, because A now requires a template parameter. You’d have to create overloaded functions or come up with some other way to deal with the fact that you’re using both A<X> and A<Y>. That’s the flexibility you lose with static polymorphism!

I said earlier that you may have used this pattern without even realizing it. This happened to me when I was working on the project I mentioned – and at the time I didn’t even consider it polymorphism (and thought I was getting away with a hack). Turns out it’s actually a recognized idiom. I guess C++ is like that.

Advertisements

One response to “The Curiously Recurring Template Pattern

  • Dave Raymer

    Great post, I like that you provided the research paper — I found that one independently a few days ago.

    With regard to the cost of virtual methods: I have done some preliminary testing with the type of blade used where I work for our high performance applications (I work for a company that handles millions of transactions in a day, blades are a hex-core latest generation processors with copious amounts of memory) and for GCC 4.7.2 with -03, we are seeing an median average of about 18.5% overhead for use of virtual methods vs. non-virtual methods. Thought some other folks might like a number.

    With CRTP, the overhead is 0 …

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: