The little exception shop of horrors

This post first appeared on my old C++ blog. You might have seen it before.
I think by now we can all agree that exceptions are generally a good thing in C++. They allow us to separate the error handling from the general flow of control inside a program and also enable us to handle errors at the most appropriate point. More often than not, the best point to handle errors is quite far removed from the source of the exception.

Imagine my surprise when I came across the following gem that was almost worthy of the Daily WTF:

catch (std::string ex) {
  // Do something horrible with the exception
}
 catch (const char *ex) {
   // Eh?
 }
 catch (int i) {
   // Uh oh
 }

I guess in some circles, being able to throw any sort of POD or object as an exception is considered a good thing. It’s not necessarily something I consider a good idea. For the moment we’ll also gloss over the advisability of catching exceptions by value. That doesn’t mean I’m condoning that sort of coding, especially not when you’re catching types that might trigger memory allocations when copied.

But wait, it gets better. You thought that the code snippet above was part of a larger chunk of code that I omitted, didn’t you? Well, it actually wasn’t – it was all parked in a header file. Obviously in the interests of brevity I removed the code that actually handled the exceptions while preserving the full structure of the code in the header file.

So what the heck is going on here then? It looks like in order to avoid duplication of code – always a worthy goal – the above header file got include wherever “needed”, so at least one of the project’s source files was full of code like this:

void doSomething() {
  try {
    // Lots of processing that might throw
  }
#include "catch_clauses.H"
}

Pass the Stroustrup, I need to refactor a developer or three…

Some Boost.Test notes

These are just a couple of notes for some neat tips and tricks I’ve discovered over the years when using Boost.Test. They may not be all that useful to everybody else but they’re the ones I tend to forget about and then end up rediscovering. I’m using most of these with recent versions of Boost and these were tested with 1.54.

  • You can nest test suites. I don’t use this feature and was surprised to discover it, but it makes a lot of sense if you’re dealing with large test suites that can be easily broken down into smaller logical chunks.
  • If you arrange the tests in test suites (which you should anyway), you can run a single test suite by using -run_test=TestSuite. You can also run a single test inside a test suit or a nested suit by giving the appropriate path to -run_test, like so -run_test=TestSuite/TestName
  • Fixtures are normal C++ structs/classes so you can derive from them to create more specialized fixtures. For example, in a set of test suite used to test a fairly large set of COM APIs, I use a single fixture to initialize COM, then derive fixtures to set up the tests for each interface. This cuts down a lot on duplicated initialization code.
  • Boost.Test supports fixtures. Make lots of use of them to set up the environment for each of your tests instead of dumping a lot of the initialization code into each test case, which makes the test cases smaller, easier to read and you can follow the “let the first call in your test be the function you’re testing” mantra.
  • Members of the fixtures can be used directly in your unit tests
  • You can use BOOST_CHECK/BOOST_REQUIRE macros in the fixtures as well, although I would advise to make fairly minimal use of that. It does help if you are trying to test what I call “flaky” pieces of code.

Make sure you’re using the right command processor when running Incredibuild

Quick hack/warning for those using an alternative command line processor like TCC and also use Xoreax’ Incredibuild for distributed builds. Incredibuild is awesome, by the way, and if you have a larger C++ project that takes a long time to build, you should use it. And no, I’m not getting paid or receive free stuff for writing that.

However, if you have to start your Visual Studio instance from the command line because you need to set some environment variables first, or because of your general awesomeness, make sure you’re starting it from a stock Windows shell. Either the standard Windows command line (cmd.exe) or PowerShell will do nicely, thank you. If you start VS from TCC and have a couple of build tasks that spawn out to the shell, Incredibuild wants to shell out into TCC to run these tasks and the shelled out task don’t seem to return control to Incredibuild again. Yes, I was too lazy to investigate further as the method described above works.

C++ memory management – why does my object still hang around after its destructor was called?

I recently came across a discussion on LinkedIn where someone had run into memory related undefined behaviour. This prompted me to write this post as it’s a common, subtle and often not very well understood bug that’s easily introduced into C++ code.

Before we start, please keep in mind that what we’re talking about here is undefined behaviour in any standardized version of C++. You’re not guaranteed to see this behaviour in your particular environment and pretty much anything might happen, including your cat hitting the power button on your computer.

The behaviour I am describing in this blog post is fairly common though – especially on Windows – and it’s one of the harder problems to debug, especially if you haven’t encountered it before or it’s been so long you can’t remember it (lucky you).

Let’s assume you have the following code:

class notMuch
{
public:
  notMuch(char const* str)
  {
    strncpy(m_str, str, 315);
  }

  char* getStr()
  {
    return &m_str[0];
  }

private:
  char m_str[315];
};

void oops()
{
  notMuch* nm = new notMuch("Oops");
  delete nm;
  std::string hmmm(nm->getStr());
}

Obviously this is a contrived example and decidedly bad code on multiple levels, but I constructed it like this to make a point.

Now assume that you are debugging the above code because you are trying to track down some odd memory behaviour. For some reason, everything is working fine when you run the code under the debugger, even though there is clearly something wrong in this code and you’d expect it to fail in a pretty spectacular fashion. Only that it doesn’t. You run the code and everything seems to be working, even though the code is clearly accessing an invalid object.

In order to understand what is happening here, we need to delve a bit deeper into C++ memory handling.

In desktop and server OSs, the call to operator new will be handled by your runtime library’s memory manager. The memory manager will attempt to locate a suitably sized chunk of memory and return it to operator new, which then makes use of this memory. In case of a low memory condition, the runtime’s memory manager will request additional memory from the operating system, but as OS calls tend to be a fairly expensive operations compared to checking an internal list, the runtime tries to minimize these calls and keep the requested memory around even if it’s not in use. When the object’s destructor is called, it takes care of its business – not much in this case as the object only contains an array of PODs, so the destructor is essentially a no-op – and then uses operator delete to “free” the memory.

Notice the quotation marks around “free”? That’s because in most cases, freeing the memory results in an entry in a list or bit field being updated. In the interest of performance, the memory isn’t being touched otherwise and now patiently waits to be reused, but the data in the chunk of memory remains unaltered until the memory chunk is being reused. The pointer to the object nm is still pointing to the (now freed) memory location. It’s invalid, but neither the runtime nor the user can determine that by simply looking at the pointer itself.

Unfortunately this means you now have an invalid pointer that’s pointing to an unusable object somewhere in the weeds that you shouldn’t be able to dereference. The trouble is that in a lot of implementations, you are still able to dereference the pointer and make use of the underlying object as if it hadn’t been deleted. In the example above, this works the majority of the time. If we don’t assume a multithreaded environment, there is no way the memory containing the invalid object will be touched, so it’s safe to access and the user ends up seeing the correct data.

So it’s bad, but not really, right?

No so fast.

Let’s now assume that before our call to getStr(), we insert another function call that triggers a memory allocation. Depending on the size of the requested chunk of memory and the current availability of other chunks of memory, the memory manager will occasionally reuse the memory that was previously occupied by nm. Not all the time, but occasionally. And as already mentioned, “nm” is indistinguishable from a valid pointer. All of a sudden, the attempt to create the string “hmmm” results in the string containing garbage or the program crashing, but the behaviour cannot be reproduced consistently and the crashes seem to depend on the phase of the moon.

The big question is, what can we do about this problem?

First, using RAII is a much better idea than raw pointer slinging. The above example contains a good reason why – if nm had been handled in an RAII fashion by using a std::shared_ptr or simply allocating the object on the stack the string hmmm would be constructed when nm was still in scope and a valid object.

Second, a lot of developers advocate nullptr’ing nm after the delete, at least in debug mode. That’s a valid approach and it will allow you to see the problem in the debugger, but it relies on the weakest link – the one located between the chair and the keyboard – to apply this pattern consistently. In any reasonably sized code base one can pretty much expect that someone will forget to apply the pattern in a critical piece of code, with predictable late night debugging sessions being the result.

Third, the use of a good memory debugger like Valgrind or Rational Purify will catch this issue and a whole raft of other, similar issues like double deletes, buffer overruns, memory leaks etc. If you’re writing production C++ code and you care about the quality of your work, you need to one of these tools.

Visual Studio 2012 C++ Variadic Template support in the runtime library

As VS2012’s C++ compiler doesn’t support “true” variadic templates, the new runtime library classes that use variadic templates are implemented using macro magic behind the scenes. In order to get the “variadic” templates to accept more than the default of five parameters, you’ll have to set _VARIADIC_MAX to the desired maximum number of parameters (between five and ten).

For more information, see the “faux variadics” section of this blog post on MSDN.

If you’re using boost::variant, you need to have a look at Boost 1.53

I was profiling some code a while ago that makes extensive use of boost::variant and one of the lessons from the profiler run was that boost variants appear to be fairly expensive to construct and copy.

As of 1.53, variants support rvalue constructors and rvalue assignment operators. My initial measurements suggest that when used with types that are “move enabled”, there is a benefit in upgrading to this version of boost variant, both in performance and memory consumption.

I don’t want to see another”using namespace xxx;”in a header file ever again

There, I’ve said it. No tiptoeing around.

As a senior developer/team lead, I get involved in hiring new team members and in certain cases also help out other teams with interviewing people. As part of the interview process, candidates are usually asked to write code, so I review a lot of code submissions. One trend I noticed with recent C++ code submissions is that the first like I
encounter in *any* header file is

using namespace std;

If we happen to review the code submission using our code review system (a practice I’d highly recommend), the above line is usually followed by a comment along the lines of “Timo is not going to like this”. And they’re right, I don’t.

So, why am I convinced that this is really, really bad practice despite a ton of (maybe not very good) C++ textbooks containing the above piece of code verbatim? Let’s for a moment review what the above statement does. Basically, it pulls the whole contents of the namespace “std” (or any other namespace that the author used the using statement
for) into the current namespace with no exceptions. And I mean *anything*, not just the one or two classes/typedefs/templates you’re trying to use. Now, the reason that namespaces were introduced in the first place is to improve modularization and reduce the chances of naming conflicts. It basically allows you to write the following
code and ensure that the compiler picks the correct implementations:

std::vector<std::string> names;
my_cool_reimplementation::vector<our_internal_stuff::string> othernames;

Now assume that we’re trying to reduce the amount of typing and put the above using statement in the code (or worse, use both namespaces) and write the following:

vector<string> names;
vector<our_internal_stuff::string> othernames;

If the author of the code is very lucky, the correct implementation of vector will be picked, at least initially. And some time down the road, you’ll encounter strange compiler errors. Good luck finding those – I’ve been in situations where it took days to track down this sort of problem. That’s a heck of a timewaster that you just got for saving to
type five characters.

Also, if you’re putting the using statement into the header file, you’re aggravating the problem because the conflicts you will run into sooner or later will be in a module far, far away for no apparent reason until you find out that three layers down, one include file happens to include the file that contains the using directive and suddenly polluted
whichever namespace the file contained with the whole contents of namespace std.

So why is using namespace std; found in so many textbooks? My theory is that it does reduces visual clutter and also helps with the layout of the book. In a dead tree book, you only have very limited amounts of space so you need to make the most of it. Code examples are generally fairly trivial – well, they have to be if you want the get the point across. Neither of these really hold true when it comes to production code in the age of 24″ monitors and text editors that can handle more than 60-80 characters per line (try it, it works!). So, don’t do it.

So, what can you do if you absolutely, positively have to use a using declaration in a header file? There are other ways to reduce the impact of having to do so – you can use one of them, or all of them in various combinations.

First, you can simply use a typedef. I would suggest that this is good practise anyway even if I don’t always follow my own advice. Using a typedef actually has two benefits – it makes a typename more readable and it documents the author’s intent if a well chosen name has been used. Compare the following declarations:

std::map<std::string, long> clientLocations;

typedef std::map<std::string, long> ClientNameToZip;
ClientNameToZip clientLocations;

The second declaration – even though it spans two lines – is much more self documenting than the first one and it gets rid of the namespace fuzz at the same time.

Another option is to limit the scope of the using statement in two ways – only “using” the names of the symbols you need to use, for example:

using std::string;

Again, just throwing this statement into a header file is almost as bad an idea as putting in “using namespace”, so you should limit its visibility by making use of C++ scopes to ensure that your using declaration really only affects the parts of the code that need to see your using declaration in the first place. For example, you could limit the scope to a class declaration:

namespace bar
{
  struct zzz
  {
    …
  };
}

Unfortunately what you then can't do is simply pull in the namespace into a class definition in a header file:

class foo
{
  using namespace bar;

  zzz m_snooze; // Doesn'tpull in bar::zzz, would be nice though
};

What you can do if the design of your code allows for it is to pull the ‘zzz’ namespace into the namespace that contains class foo, like so:

namespace yxxzzz
{
  using namespace zzz;

  class foo
  {
    zzz m_snooze;
  };
}

This style has the advantage that you’re “only” cluttering a single namespace with your using directive rather than the whole global namespace. At least this way, you are limiting possible naming conflicts to a single namespace. But it’s still a headache if you are using the, err, using directive in a header file this way.

Another way to limit the scope of a using directive to a single function, for example like this:

void temp()
{
  using namespace std;
  string test = "fooBar";
}

This works in both header and implementation files.

Actually, I would write the above code like this, because there is no good reason (but plenty of bad ones) to pull in the whole std namespace if you only need part of it.

void temp()
{
  using std::string;
  string test = "fooBar";
}

In either case, you are restricting the visibility of a using directive to the part of the code that requires it rather than throwing it right into everybody’s way. The bigger your projects get, the more important it is to ensure proper modularisation and minimise unintended, harmful side effects.

If you want to remove a (C++) project from a Visual Studio 2010 solution

… make sure that you have removed all dependencies on the project that you are about to remove before you remove the project from the solution.

If you don’t, the projects that still have dependencies on the project you just removed will retain the dependencies, but the dependencies will have become invisible and the only way to rid yourself of the “phantom dependencies” is by editing the actual vxcproj files with a text editor and remove the dependency entry in there manually.

Don’t ask how long it took me to figure that out.