Lisp like filtered container views in C++

Lisp dialects like Clojure have a very rich set of algorithms that can present altered views on containers without modifying data in the underlying container. This is very important in functional languages as data is immutable and returning copies of containers is costly despite the containers being optimised for copy-on-write. Having these algorithms available prevents unnecessary data copies. While I am not going into mutating algorithms in this post, the tradition of non-modifying alghorithms that work on containers leads to an expressiveness that I often miss in multi-paradigm languages like C++. As an example I will show you how to use a filtered container view in C++ like you would in Clojure.

Read More

An elegant way to extract keys from a C++ map

I’ve been doing a reasonable amount of Clojure development recently and like a lot of other Lisp dialect have marveled at the ease of separately pulling out the keys and values from a map. This is a very common operation after all, but C++ does only appear to support manual key or value extraction from a std::map.

Obviously the code isn’t hard to write. In C++11, the following function will return a vector of all keys in a map:

std::vector<std::string> extract_keys(std::map<std::string, std::string> const& input_map) {
  std::vector<std::string> retval;
  for (auto const& element : input_map) {
    retval.push_back(element.first);
  }
  return retval;
}

Same goes for the values:

std::vector<std::string> extract_values(std::map<std::string, std::string> const& input_map) {
  std::vector<std::string> retval;
  for (auto const& element : input_map) {
    retval.push_back(element.second);
  }
  return retval;
}

This being C++, we most likely don’t want to go and write the code every time, so we turn it into a set of templates:

template<typename TK, typename TV>
std::vector<TK> extract_keys(std::map<TK, TV> const& input_map) {
  std::vector<TK> retval;
  for (auto const& element : input_map) {
    retval.push_back(element.first);
  }
  return retval;
}

template<typename TK, typename TV>
std::vector<TV> extract_values(std::map<TK, TV> const& input_map) {
  std::vector<TV> retval;
  for (auto const& element : input_map) {
    retval.push_back(element.second);
  }
  return retval;
}

The code above is reasonably efficient under most circumstances. Yes, you can improve it by calling retval.reserve() to make sure we don’t have to reallocate memory for the vector and make some other small tweaks, but overall, the couple of lines above do the job.

They’re also a lot better than finding the above for loop in the middle of a function. That requires me to scratch my head for a couple of seconds before the usual dim light bulb appears over my head and tells me, “hey, someone’s trying to pull the keys out of a map”. Why is this?

Because just having the loop floating around in the middle of your or my code doesn’t document intent, all it documents is that, well, we have a loop that copies stuff. Even if you wrote the code and then come back to it six months later, this is likely to give you a bit of a pause until you remember what it does. Of course you can put a comment right next to it, but that is also a poor attempt at documenting intent. Not to mention that the comment will go out of date faster than the code itself will. Using a function with a descriptive name is much better as you can then read the code and the function name gives you a reasonable expectation what the code is supposed to do.

Of course, if you use Boost, you can just write this piece of code using the boost range libraries:

boost::copy(input_map | boost::adaptors::map_keys,
            std::back_inserter(output_vector));

It’s probably not going to do Good Things for your compile time, but it’s succinct, readable and documents intent.

CD of the Symantec C++ Windows + DOS development environment back from 1993

Symantec sold a C++ compiler?

Stuff you find that shows you’ve been around this programming business for a while:

CD of the Symantec C++ Windows + DOS development environment back from 1993
Original copy of Symantec C++ (née Zortech C++) 6.1

Most people these days are surprised that Symantec actually sold a C++ compiler at some point. I used this particular copy in my first, not very successful business venture – I started out using Walter Bright’s Zortech C++ compiler which eventually morphed into Symantec C++.

At the time, I preferred this compiler to Microsoft’s C++ compiler – it was a much faster compiler and while the generated code wasn’t quite as fast as the Microsoft compiler’s one (and the code produced by that one was beaten hands down by the Watcom C/C++ compiler, which I also used at some point), it was a considerably faster compiler. All of the C++ compilers of that era were, err, slightly quirky and portability of the code between them – if it wasn’t plain C – was still a distant dream.

Symantec C++ was a bit of a niche product and eventually lost the popularity contest to the Microsoft C++/Borland C++ compiler duopoly, with companies like Watcom filling specialist niches.

I even found an old Dr Dobbs’ article by Al Stevens where he’s looking at the above compiler.

Useful regular expressions for searching C++ code with Visual Studio

I’m generally more of a grep person but sometimes it’s easier to just use the built-in search in Visual Studio, especially if you want to be able to restrict the search to parts of your Visual Studio solution. Visual Studio does have pretty powerful search built in if you do use regular expressions instead of the default text matching. Here are a couple of regexes to get you started:

Find all shared_ptr calls that use “new” instead of the recommended make_shared: shared_ptr<.+>(new .+)

Find all empty destructors – very useful if you want to remove them in C++11 code: ~.+s+{s*}

Checking C++ library versions during build time

In my previous post, I discussed various strategies for managing third party libraries. In this post I’ll discuss a couple of techniques you can use to ensure that a specific version of your source code will get compiled with the correct version of the required libraries.

Yes, you can rely on your package management tools to always deliver you the correct versions. If you’re a little more paranoid and/or spent way too much time debugging problems stemming from mixing the wrong libraries, you may want to continue reading.

Suggestion #1 – use C++ compile time assertions to check library versions

This suggestion only works if your libraries have version numbers that are easily accessible as compile time constants. You can use something like BOOST_STATIC_ASSERT or C++ 11’s static_asset to do a compile time check of the version number against your expected version number. If the test fails, it’ll break the compilation so you get an immediate hint that there might be a problem.

The code for this could look something like this example:

First, in the header file the version number constant is defined:

 
...
const int libgbrmpzyyxx_version = 0x123;
...

The header file or source file pulling in all the version headers then checks that it’s pulled in the correct version:

 
#include "boost/static_assert.hpp"
#include "lib_to_check"

BOOST_STATIC_ASSERT(libgbrmpzyyxx_version == 0x123);

If you are so inclined and are using C++ 11, you can replace the BOOST_STATIC_ASSERT with the standard static_assert.

My suggested approach would be to have a single header file in each module of your project that pulls in all relevant #include files from all required libraries. This file should also contain all the checks necessary to determine if the libraries that got pulled in have the correct version numbers. This way, having a compilation error in a single, well-named file (call it ‘libchecks.H’, for example) should immediately suggest to you that a library needs updating. If you keep the naming schema consistent, a quick glance at the error message should provoke the right sort of “don’t make me think – I know what the problem is already” type response.

Suggestion #2 – use link failures to indicate library versioning problems

This is a variation of suggestion #1, only that instead of using a compile time check along the lines of BOOST_ASSERT, your library contains a version specific symbols which your code references. Obviously if the code is referencing a symbol that doesn’t exist in the library, the linker will fail and you’ll get a message with is relatively easy to parse for a human and still pinpoints the problem. The advantage of this method is that it does work across languages – you can use it in plain C code when linking against a library that is implemented in C++, for example, or in C extension modules built for dynamic languages. Its main downside is that in case of a version mismatch, the build fails a lot later in the process and gives you the same information that you may have received using suggestion #2, only three cups of coffee later. That said, if your project builds fast enough the difference in elapsed time between suggestions #2 & #3 might be negligible. On the other
hand if your build takes hours or days to complete, you really should try to make suggestion #2 work for you.

This suggestion relies on the fact that somewhere in the library code, a constant is defined that is externally visible, ie

 
...
const int libgbrmpzyyxx_1_23_version = 0;
...

And somewhere in your code, you try to access the above constant simply by referencing it:

 
int test_lib_version = libgbrmpzyyxx_1_23_version;

Suggestion #3 – use runtime checks

Sometimes, the only way to work out if you are using the right version of a library is a runtime check. This is unfortunate especially if you have long build times but if your library returns, say, a version string this would be the earliest you can check that your project is linked with or loaded the correct version. If you are working a lot with shared libraries that are loaded dynamically at runtime , this is a worthwhile check anyway to ensure that both your build and runtime environments are consistent. If anything I would consider this an additional check to complement the ones described in suggestions #1 & #2. It also has the advantage that you can leave the check in the code you ship and thus detect a potential misconfiguration at the client end a lot easier.

Conclusion

I personally prefer suggestion #1 as I want to ensure the build fails as early as possible. Suggestion #2 works especially when you can’t use boost for whatever reason and don’t have a C++11 compiler, but otherwise I personally would not use it. Suggestion #3 is something you use when you need it, but if you do at least try to cover the relevant cases in your unit tests so your QA team doesn’t have to try and find out manually if you are using the correct library version for every component.

Managing third party libraries in C++ projects

Every reasonably sized C++ project these days will use some third party libraries. Some of them like boost are viewed as extensions of the standard libraries that no sane developer would want to be without. Then there is whatever GUI toolkit your project uses, possibly another toolkit to deal with data access, the ACE libraries, etc etc. You get the picture.

Somehow, these third party libraries have to be integrated into your development and build process in such a way that they don’t become major stumbling blocks. I’ll discuss a few approaches that I have encountered in the multitude of projects I was part of, and will discuss both their advantages and problems.

All the developers download and install the libraries themselves

This is what I call the “good luck with that” approach. You’ll probably end up documenting the various third party libraries and which versions of which library were used in which release on the internal wiki and as long as everybody can still download the appropriate versions, everything works. Kinda.

The problems will start to rear their ugly heads when either someone has to try to build an older version of your project and can’t find a copy of the library anymore, someone forgets to update the CI server or – my favourite – the “bleeding edge” member of the team starts tracking the latest releases and just randomly checks in “fixes” needed to build with newer versions of the library that nobody else is using. Oh, and someone else missed the standup and the discussion that everybody needs to update libgrmblfx to a newer, but not current version and is now having a hard time figuring out why their build is broken.

Whichever way you look at it, this approach is an exercise in controlled chaos. It works most of the times, you can usually get away with it in smaller and/or short term projects but you’re always teetering on the edge of the Abyss Of Massive Headaches.

What’s the problem? Just check third party libraries into your version control repository!

This is the tried and tested approach. It works well if you are using a centralized VCS/CM system that just checks out a copy of the source. Think CVS, Subversion, Perforce and the like. Most of these systems are able to handle binaries well in addition to “just” managing source code. You can easily check in pre-built versions of your third party libraries. Yes, the checkouts may be a little on the slow side when a library is updated but in most cases, that’s an event that occurs every few months. In a lot of teams I used to work in, the libraries would be updated in the main development branches after every release and then kept stable until the next release unless extremely important fixes required further updates. This model works well overall and generally keeps things reasonably stable, which is what you want for a productive team because you don’t want to fight your tools. Third party libraries are tools – never forget that.

The big downside to this approach is when you are using a DVCS like git or Mercurial. Both will happily ingest large “source” trees containing pre-built third-party libraries, but these things can be pretty big even when compressed. A pre-built current boost takes up several gigabytes of disk space depending on your build configurations and if you’re build 32 bit and 64 bit versions at the same time. Assuming a fairly agile release frequency, you’re not going to miss many releases so you’ll be adding those several gigabytes to the repository every six months or so. Over the course of a few years, you will end up with a pretty large repository that will take your local developers half an hour to an hour to clone. Your remote developers will suddenly either have to mirror the repository – which has its own set of challenges if it has to be a two-way mirror – or will suddenly find themselves resorting to overnight clones and hope nothing breaks during the clone. Yes, there are workarounds like Mercurial’s Largefiles extension and git-annex, and they’re certainly workable if you are planning for them from the beginning.

The one big upside of this approach is that it is extremely easy to reproduce the exact combination of source code and third party libraries that go into each and every release provided an appropriate release branching or release tagging strategy is used. You also don’t need to maintain multiple repositories of different types like you have to in the approach I’ll discuss next.

Handle third party libraries using a separate package management tool

I admit I’m hugely biased towards this approach when working with a team that is using a DVCS. It keeps the large binaries out of the source code repository and into a repository managed by a tool that was designed for the express purpose of managing binary packages. Typical package management tools would be NuGet, ivy and similar tools. What they all have in common is that they use a repository format that is optimized for storing large binary packages, usually in a compressed format. They also make it easy to pull a specific version of a package out of the repository and put it into an appropriate place in your source tree or anywhere else on your hard drive.

Instead of containing the whole third party library, your source control system contains a configuration file or two that specifies which versions of which third party libraries are needed to build whichever version of your project. You obviously need to hook these tools into your build process to ensure that the correct third party libraries get pulled in during build time.

The downside of these tools is that you get to maintain and back up yet another repository that needs to be treated as having immutable history like most regular VCS/DVCSs have. This requires additional discipline to ensure nobody touches a package once it’s been made part of the overall build process – if you need to put in a patch, the correct way is to rev the package so you are able to reproduce the correct state of the source tree and its third party libraries at any given time.

TL;DR – how should I manage my third party libraries?

If you’re using a centralised version control system, checking in the binaries into the VCS is fine. Yes, the admin might yell at you for taking up precious space, but it’s easy, reasonably fast and simplifies release management.

If you are using a DVCS, use a separate package management, either a third party one or one that you roll yourself. Just make sure you keep the third party libraries in your own internal repository so you’re not at somebody else’s mercy when they decide to suddenly delete one of the libraries you’re using.

Polymorphism and boost::shared_ptr

Reposted from my old blog. Here’s the news from 2009…

I’m currently in the final stages of converting a library from raw pointers to boost::shared_ptr. I’m mainly doing this to ensure the correct pointer/pointee ownership rather than the outright prevention of memory leaks, but the latter is a nice side effect of the former.

One problem I ran into was that the library I’m updating and its clients make rather heavy use of polymorphism. Of course in 99% of the code that was fine as the objects were accessed through pointers to base classes, but the last 1% was causing problems because that part of the code contained lots of dynamic_cast statements. These classes unfortunately need to know the exact type they were dealing with so there was no easy way around the use of these idioms. It probably isn’t news to most of the people reading this blog that dynamic_cast and boost::shared_ptr don’t play that nicely.

The main issue is that taking a pointer that is held inside a boost::shared_ptr, dynamic_casting it down the hierarchy and then stuffing it into another boost::shared_ptr is a good way to ensure double deletion. Oops. So, if you see the following code you better get the debugger out…

 

   boost::shared_ptr<A> something(new A(blah));
   ...
   boost::shared_ptr<B> something_else(dynamic_cast<B>(something.get()));

 

So far so bad, but I couldn’t believe that something with a flaw that obvious would be in the boost libraries. And of course, there is a way around this – boost provides alternatives to the four C++ casts with similar names that work on boost::shared_ptrs. You can find these alternatives – which are really just wrappers around the C++ casts, but designed to work with the boost smart pointers – in the include file boost/pointer_cast.hpp.. If you’re using smart pointers because you need polymorphic behaviour of, say, items stored in standard C++ containers, have a look at this page right now. If you don’t have the time or inclination to check the linked document right now, the management summary is: “The next time someone tells you that you can’t use boost:.shared_ptr with dynamic_cast, point them in the direction of boost::dynamic_pointer_cast”. Using boost::dynamic_pointer_cast would change the above example to:

 

  

   boost::shared_ptr<A> something(new A(blah));
   ...
   boost::shared_ptr<B> something_else =
     boost::dynamic_pointer_cast<B>(something));

 

Problem solved.

Using pantheios to log from a C++ JNI DLL

I originally published this post on my old blog in 2009. I’ve edited it a little for readability but left the contents unchanged, so it may be out of date and not reflect the current state of the pantheios library. I also haven’t been using pantheios for logging since about 2010, and have been using Boost.Log instead.

I recently had to come up with a logging solution for C++ code a JNI DLL/shared library that is providing the data translation layer between Java and underlying native C and C++ libraries. As usual, some logging was required to aid fault-finding in a production environment, if necessary. A quick survey of the state of C++ logging showed that not a lot had changed since I last looked at logging libraries. In fact, a lot of them seem to have survived unchanged for several years. I’m not sure if that is a good thing and a sign of maturity or a sign of “making do” and the low priority most projects assign to a performant logging library. Eventually I settled on pantheios as it offered several features that were crucial for this application. The major one was that pantheios it is extremely modular and will only link in the parts you really need. I consider this a major advantage over the more monolithic libraries that pull in all their functionality all the time, especially when you link them in as a static library (yes, log4cxx, I’m looking at you). Linking in the logging library as a static library was necessary to avoid conflicts with other libraries that are being used in the same process.

Initial tests in a simple command line program suggested that worked well and matched the requirements. Unfortunately I couldn’t get it to log at all inside the JNI DLL, so I ended up trawling Google’s search results for quite a while and experimented quite a lot of different settings until I ended up with a working combination.

First, pantheios initialises itself automatically if you use it inside a regular executable. For various reasons, it can’t do that inside a Windows DLL, so you have to do that explicitly. Fortunately, Matthew Wilson, the author of pantheios, had explained on a mailing list how to do this. Typically, I can’t find the post anymore so here’s the code that I’m using to initialise the library, which is more or less a verbatim copy of Matthew’s code minus a couple of lines that weren’t required:

#include <iostream>

#include <pantheios/pantheios.hpp>
#include <pantheios/inserters.hpp> 
#include <pantheios/frontends/stock.h>

const char PANTHEIOS_FE_PROCESS_IDENTITY[] = "JNITestDll.1";

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  dwReason,
                       LPVOID lpReserved
                                         )
{
  if (dwReason == DLL_PROCESS_ATTACH)
  {
    if (pantheios::pantheios_init() < 0)
    {
      std::cout << "Failed to initialise the Pantheios logging libraries!n" << std::endl;
      return FALSE;
    }
    else
    {
      pantheios::log_INFORMATIONAL("Logger enabled!");
    }
  }
  else if (dwReason == DLL_PROCESS_DETACH)
  {
    pantheios::log_INFORMATIONAL("Logger disabled!");
    pantheios::pantheios_uninit();
  }
  return TRUE;  // ok
}

This seemed to initialise the library correctly. I wasn’t getting any error messages to prove otherwise but unfortunately I still wasn’t getting any output either. Yes, I could see that pantheios_init() returned a value that indicated successful initialisation, the logging functions were called and the output went straight into the bit bucket somewhere.

It took me a little while to work out what happened but in the end I tracked it down to something that I filed under “JNI oddity”. Pantheios supports implicit linking for both its frontends (the part that you interact with) and its backends which are responsible for sending the output somewhere. Being the usual lazy so-and-so programmer, I had borrowed one of the implicit link files from the samples. Which should have worked OK as it was for a command line executable, but didn’t.

After some poking and prodding I realised that the issue was that by default, in this particular implicit link file pantheios would use the Windows Console logger when the code was built Windows. This didn’t work (probably because this was in a DLL and there wasn’t a console associated with it. Switching to the fprintf backend fixed this issue and I was finally seeing logging output from the JNI DLL. Here is the code for the implicit linking:

/* Pantheios Header Files */
#include <pantheios/implicit_link/core.h>
#include <pantheios/implicit_link/fe.simple.h>
#include <platformstl/platformstl.h>
#include <pantheios/implicit_link/be.fprintf.h>

#if (   defined(UNIX) || 
        defined(unix))&& 
    (   defined(_WIN32) || 
        defined(_WIN64))
# include <unixem/implicit_link.h>
#endif /* _WIN32 || _WIN64 */

All in all I’m happy with Pantheios as a logging solution. If you’re looking for a versatile C++ logger, I’d recommend you look at it.

Note from 2014: In a project that is not using or cannot use Boost, I would still look at pantheios first before looking at other libraries.

A neat way of handling common C++ exceptions

Another slightly edited post from my old C++ blog. Again, you may have seen this one before.

This post is a quasi follow-up to the  “little exception shop of horrors”. As I mentioned in that post, I believe the reason for dumping all the exception handling code into a single header file was a misguided attempt at avoiding code duplication. No, I didn’t write that code, so I can only speculate as to why it was done. I inherited the project that contained this code and the reasons were lost in the mists of time. I did file it under “sound idea but bad execution”. It doesn’t fix the problem and you still have code duplication as the preprocessor will do the duplication work for you. Ah well, at least to don’t have to type the code in yourself multiple times. I couldn’t help but think that there must be a better way.

I more or less forgot about the whole thing as the code was about to be retired anyway. At some point I was talking to a colleague about it and he showed me a much nicer way of addressing this problem without the code duplication and in a much cleaner fashion. Start with a try/catch block like this:

try {
  // ... Lots of code here
}
catch (...) {
  handle_common_exceptions();
}

The handler function looks like this:

void handle_common_exceptions() {
  try {
    throw;
  }
  catch (specialised_exception const &ref) {
    // handling code
  }
  catch (another_sub_exception const &ex) {
    // ... more exception handling code ..
  }
  catch (std::bad_alloc const &ref) {
    // ... even more
  }
}

The elegant part is that you rethrow the exception that has been caught in the original try block and then handle those exceptions that your code actually can handle at this point. Normally, catch (…) is only useful if you are in a piece of code which requires you to stop any exceptions from escaping. You can’t determine what the actual exception is, so you can’t handle it appropriately. The only thing you can do is to say “oops, sorry”.

Rethrowing the exception inside the catch-all handler does restore access to the exception information so you can handle the exceptions that are appropriate at the point of the catch handler. As long as you don’t add another catch-all handler inside the handler function (note that I didn’t), those exceptions that you cannot handle
at this point propagate further up the call chain as they escape the handler function due to the rethrow.

Quite neat, don’t you think? Thanks to Alan Stokes for showing this technique to me.

The little exception shop of horrors

This post first appeared on my old C++ blog. You might have seen it before.
I think by now we can all agree that exceptions are generally a good thing in C++. They allow us to separate the error handling from the general flow of control inside a program and also enable us to handle errors at the most appropriate point. More often than not, the best point to handle errors is quite far removed from the source of the exception.

Imagine my surprise when I came across the following gem that was almost worthy of the Daily WTF:

catch (std::string ex) {
  // Do something horrible with the exception
}
 catch (const char *ex) {
   // Eh?
 }
 catch (int i) {
   // Uh oh
 }

I guess in some circles, being able to throw any sort of POD or object as an exception is considered a good thing. It’s not necessarily something I consider a good idea. For the moment we’ll also gloss over the advisability of catching exceptions by value. That doesn’t mean I’m condoning that sort of coding, especially not when you’re catching types that might trigger memory allocations when copied.

But wait, it gets better. You thought that the code snippet above was part of a larger chunk of code that I omitted, didn’t you? Well, it actually wasn’t – it was all parked in a header file. Obviously in the interests of brevity I removed the code that actually handled the exceptions while preserving the full structure of the code in the header file.

So what the heck is going on here then? It looks like in order to avoid duplication of code – always a worthy goal – the above header file got include wherever “needed”, so at least one of the project’s source files was full of code like this:

void doSomething() {
  try {
    // Lots of processing that might throw
  }
#include "catch_clauses.H"
}

Pass the Stroustrup, I need to refactor a developer or three…