C++ is bad: inheritance is counterintuitive: big_bad

big_bad_al

C++ is bad: inheritance is counterintuitive

Jul 08, 2008 20:49

In our latest installment of "why not to program in C++," let's take a look at a simple class hierarchy:
class Duck {
public:
Duck() {
InitializeMigratoryRoute();
}

// Default behavior is nonmigratory; this is overridden in the Mallard class.
virtual void InitializeMigratoryRoute();
}

class Mallard : public Duck {
public:
Mallard();

// Migrate south in the fall and north in the spring
virtual void InitializeMigratoryRoute();
}

class DonaldDuck : public Duck {
public:
DonaldDuck() {
outfit_.AddShirt(Outfit::Shirts::SAILOR);
outfit_.AddHat(Outfit::Hats::SAILOR);
}

private:
Outfit outfit_;
}You can assume this code compiles and runs (i.e., the Outfit class is defined, it implements AddShirt(), both versions of InitializeMigratoryRoute() are defined, etc). I would say this looks like perfectly reasonable code: its intention is clear and straightforward, and any sane language should have no trouble implementing this. [1] However, we're dealing with C++, which means there are always more problems, and in fact this code is not in the least alright. In particular, there are two major things wrong with it. Do you see what they are?

Take your time; try to find where this code goes awry.

Do you have some ideas about what's wrong? Let's take a look. First off, none of the Mallards will migrate. The first thing that happens in the Mallard constructor is a call to the Duck constructor, to set up the base class. This, in turn, calls InitializeMigratoryRoute(). However, since the Mallard object isn't created yet, it would be unsafe to call the overridden version (which might depend on some member variables in the Mallard object itself that haven't been constructed yet). The constructor therefore calls Duck::InitializeMigratoryRoute() instead of Mallard's version. It doesn't matter that InitializeMigratoryRoute() is a virtual function that was overridden in the Mallard class; if you construct the base class before constructing the derived class (which is how derived classes are constructed in C++), calling the derived version may rely on data that isn't created yet, and so the only safe thing to do is to call the implementation in the base class.

This leads to very nefarious bugs: any pure Ducks you create will function perfectly well, and the Mallard code is written correctly, yet it has the wrong behavior. In other words, the correctly implemented code has the wrong behavior while the incorrect code has the right behavior, and consequently it's really hard to find this kind of bug.

Incidentally, Java does this the other way around-Mallard::InitializeMigratoryRoute() would be called in the constructor, even though it may very well use data whose value hasn't been initialized (which is to say, the variables have automatically been set to 0/null/false/etc by the virtual machine, but their intended values haven't been stored because the code in Mallard's constructor hasn't run yet). I'm not convinced Java's way of doing things is any better than C++'s way; they both can have unexpected surprises if you're not on guard. Perhaps we should get in the habit of having initializers that are separate from the constructors, to avoid this issue entirely. Edit: if you use initializers, you can't have const variables. Getting this whole issue right is a tricky, subtle problem.

So there's the first thing wrong with that code. Any idea about the second problem? Any C++ programmer worth his or her salt knows about memory leaks. Do you see the memory leak in the above code? It doesn't matter that there isn't a new or malloc anywhere to be found; I assure you there's a memory leak there, just waiting to happen. If you need a hint, take a look at this code:
Duck* temp = new DonaldDuck();
delete temp;Do you see it? The destructor of Duck is not virtual, which means that when you delete a Duck pointer, you call ~Duck() as the destructor, even if you should be calling ~DonaldDuck(). Consequently, outfit_'s destructor is never called. If it dynamically allocated its own member variables, that memory will never be deallocated. This can be tricky to notice, since the crux of the problem is that the code you didn't write and can't see (Duck's destructor) breaks the code you didn't write and can't see (the internals of Outfit).

Any class with a virtual function should have a virtual destructor to avoid this issue, and these days g++/GCC even has a -Wnon-virtual-dtor flag that warns you if you have a non-virtual destructor in a class with virtual functions. There is even an argument to be made that all destructors should always be virtual, to proactively avoid this problem whether or not you anticipate subclasses. The only drawback is that it would add a vtable to the classes that don't have other virtual functions, and it adds an extra layer of indirection when calling the destructor. but unless you're writing really performance intensive code, the time needed to follow one extra pointer doesn't seem that onerous. The reason this isn't considered a good practice is that the benefit gotten is even smaller than the cost incurred; very few classes without virtual functions ever get subclassed. Nevertheless I think it's interesting to consider this idea, even if we reject it afterwards.

As always in my "C++ is bad" posts, the conclusion here is that even simple things in C++ can have very surprising outcomes, and to really understand what your code is doing requires knowledge of the underlying implementation that the compiler creates. I would much rather use a high-level language where I don't need to worry about these details. The behavior of your code should be as unsurprising as possible, and I don't think C++ passes this test.

[1] Sure, a good language shouldn't need to distinguish between public and private inheritance (inheritance should always be an IsA/public relationship, not a HasA/private relationship, which is really encapsulation), and a good language should also not require the virtual keyword. but I'll accept those problems as necessary evils of an old language.

software engineering, code, software, inheritance, cpp, cpp is bad