Sometimes, std::set just doesn’t cut it from a performance point of view

A piece of code I recently worked with required data structures that hold unique, sorted data elements. The requirement for the data being both sorted and unique came from it being fed into std::set_intersection() so using an std::set seemed to be an obvious way of fulfilling these requirements. The code did fulfill all the requirements but I found the performance somewhat wanting in this particular implementation (Visual Studio 2008 with the standard library implementation shipped by Microsoft). The main problem was is that this code is extremely critical to the performance of the application and it simply wasn’t fast enough at this point.

Pointing the profiler at the code very quickly suggested that the cost of destruction of the std::set seemed to be rather out of proportion compared to the cost of the rest of the code. To make matters worse, the std::set in question was used as a temporary accumulator and was thus created and destroyed very often. Also, the cost was in the destruction of the elements in the std::set, so the obvious technique – keep the std::set around, but clear() its contents – did not yield any improvement in the overall runtime, mainly due to the fact that red-black trees (the underlying implementation of this – and most – std::sets) are very expensive to destroy as you have to traverse the whole tree and delete it node by node, even if the data held in the nodes is a POD. Clearly, a different approach was needed.

A post on gamedev.net suggested that I wasn’t the first person to run into this issue, nor did it appear to be that specific to my platform. The article also suggested a technique that should work for me. Basically, std::set provides three guarantees – the contents is sorted, the keys unique and the lookup speed for a given key is in the order of O(log n). As I mentioned in the introduction, I did only really need two of the three guarantees, namely unique elements in a sorted order; std::set_intersection() expects input iterators as its parameters so despite its name, it is not tied to using a std::set, even though it’s the obvious initial choice of data structure.

In this particular case – a container accumulating a varying number of PODs, that eventually get fed into std::set_intersection – I could make use of the fact that at the point of accumulation of the data, I basically didn’t care if the data was sorted and unique. As long as I could ensure the data in the container fulfilled both criteria before calling std::set_intersection, I should be fine.

The simplest way and the least expensive in terms of CPU time I was able to come up with was to accumulate the data in a std::vector of PODs, which is cheap to destroy. Much like as described in the above thread on gamedev.net. I then took care of the sorted and unique requirements just before I fed it into the std::set_intersection:

std::vector<uint32_t> accumulator;

... accumulate lots of data ...

std::sort(accumulator.begin(), accumulator.end());
accumulator.erase(std::unique(accumulator.begin(),
                              accumulator.end()),
                  accumulator.end());

Note the call to accumulator.erase() – std::unique doesn’t actually remove the ‘superfluous’ elements from the std::vector, it just returns the iterator to the new end of the container. In the code I was using I couldn’t make use of this feature so I had to actually shrink the std::vector to only contain the elements of interest. Nevertheless, changing over from a real set to the ‘fake’ resulted in a speed increase of about 2x-3x which was very welcome.

Basically if you need a std::set, you need a std::set and you’ll have to keep in mind its restrictions when the container in question only has a short lifetime. I don’t advocate ripping out all sets from your application and replacing them with sorted, uniqified std::vectors. However in some cases like the one I described above when you don’t need the full functionality of a std::set, using a std::vector can offer a very reasonable alternative, especially when you identify the std::set as a bottleneck. Yes, you could probably also get O(log n) lookup speed in the sorted and uniqified vector by using std::binary_search but that’s getting a little too close to declaring “the only data C++ structure I’m familiar with is a std::vector”. Using the appropriate data structure (like a std::set) communicates additional information to the readers of your code; using workarounds like the one above are not that obvious and tend to obscure your code somewhat.

Quick tip if you see bad DLL or entry point’msobj80.dll’when building software with VS2008

Try stopping mspdbsrv.exe (the process that generates the pdb files during a build) if it is still running. My understanding is that it’s supposed to shut down at the end of the compilation but it seems that it can turn into a zombie process and if the latter happens, you can get the above error when linking your binaries.

Anyway, I just ran into this issue and stopping the process via the Task Manager resolved the issue for me.

On combining #import and /MP in C++ builds with VS2010

I’m currently busy porting a large native C++ project from VS2008 to VS2010 and one of the issues I keep running into was build times. The VS2008 build uses a distributed build system; Unfortunately the vendor doesn’t support VS2010 yet, so I couldn’t use the same infrastructure. In order to get a decent build speed, I started exploring MSBuild’s ability to build projects in parallel (which is fairly similar to VS2008’s ability to build projects in parallel) and the C++ compiler’s ability to make use of multiple processors/cores, aka the /MP switch.

Read More

Using CEDET-1.0 pre7 with Emacs 23.2

It’s been mentioned in several places that GNU Emacs versions sometime after 23.1.50 do come with an integrated version of CEDET. While I think that’s a superb idea it unfortunately managed to break my setup, which relies on a common set of emacs-lisp files that I hold under version control and distribute across the machines I work on. Those machines have different versions of GNU-based Emacsen (pure GNU, Emacs/W32, Carbon Emacs etc) so I can’t rely on the default CEDET. Unfortunately when I got a new machine and put a GNU Emacs 23.2 on there, my carefully crafted (OK, I’m exaggerating) .emacs wouldn’t play ball because the built-in CEDET conflicted with the pre7 that I had already installed on that machine.

I didn’t want to have to make extensive modifications to my .emacs, but a little time spent on Google brought up a post by Marco Bardelli on the CEDET-devel mailing list with a little code snippet that removes the built-in CEDET from the load-path. After putting this into my .emacs, my -pre7 config is working again.

For those in a similar quandary, here is the snippet in all its glory:

(setq load-path
      (remove (concat "/usr/share/emacs/" 
		      (substring emacs-version 0 -2) "/lisp/cedet")
	      load-path))

Welcome back to the new blog, almost the same as the old blog

The move to the other side of the Atlantic from the UK is almost complete, I’m just waiting for my household items – and more importantly, my computer books etc – to turn up. So it’s time to start blogging again in the next few weeks. Due to some server trouble in the UK, combined with the fact that I do like Serendipity as a blogging system but was never 100% happy with it, I’ve switched to using WordPress on a server here in the US. The old blog will stay up, at least as long as the server stays put, but I won’t add any new content to the old blog.

Enough meta-blogging though, I should be back to the usual 1-2 post per month soon, so if you kindly could subscribe to the RSS feed and you’ll see when all of this is up and running again.