Sometimes, std::set just doesn’t cut it from a performance point of view

A piece of code I recently worked with required data structures that hold unique, sorted data elements. The requirement for the data being both sorted and unique came from it being fed into std::set_intersection() so using an std::set seemed to be an obvious way of fulfilling these requirements. The code did fulfill all the requirements but I found the performance somewhat wanting in this particular implementation (Visual Studio 2008 with the standard library implementation shipped by Microsoft). The main problem was is that this code is extremely critical to the performance of the application and it simply wasn’t fast enough at this point.

Pointing the profiler at the code very quickly suggested that the cost of destruction of the std::set seemed to be rather out of proportion compared to the cost of the rest of the code. To make matters worse, the std::set in question was used as a temporary accumulator and was thus created and destroyed very often. Also, the cost was in the destruction of the elements in the std::set, so the obvious technique – keep the std::set around, but clear() its contents – did not yield any improvement in the overall runtime, mainly due to the fact that red-black trees (the underlying implementation of this – and most – std::sets) are very expensive to destroy as you have to traverse the whole tree and delete it node by node, even if the data held in the nodes is a POD. Clearly, a different approach was needed.

A post on gamedev.net suggested that I wasn’t the first person to run into this issue, nor did it appear to be that specific to my platform. The article also suggested a technique that should work for me. Basically, std::set provides three guarantees – the contents is sorted, the keys unique and the lookup speed for a given key is in the order of O(log n). As I mentioned in the introduction, I did only really need two of the three guarantees, namely unique elements in a sorted order; std::set_intersection() expects input iterators as its parameters so despite its name, it is not tied to using a std::set, even though it’s the obvious initial choice of data structure.

In this particular case – a container accumulating a varying number of PODs, that eventually get fed into std::set_intersection – I could make use of the fact that at the point of accumulation of the data, I basically didn’t care if the data was sorted and unique. As long as I could ensure the data in the container fulfilled both criteria before calling std::set_intersection, I should be fine.

The simplest way and the least expensive in terms of CPU time I was able to come up with was to accumulate the data in a std::vector of PODs, which is cheap to destroy. Much like as described in the above thread on gamedev.net. I then took care of the sorted and unique requirements just before I fed it into the std::set_intersection:

std::vector<uint32_t> accumulator;

... accumulate lots of data ...

std::sort(accumulator.begin(), accumulator.end());
accumulator.erase(std::unique(accumulator.begin(),
                              accumulator.end()),
                  accumulator.end());

Note the call to accumulator.erase() – std::unique doesn’t actually remove the ‘superfluous’ elements from the std::vector, it just returns the iterator to the new end of the container. In the code I was using I couldn’t make use of this feature so I had to actually shrink the std::vector to only contain the elements of interest. Nevertheless, changing over from a real set to the ‘fake’ resulted in a speed increase of about 2x-3x which was very welcome.

Basically if you need a std::set, you need a std::set and you’ll have to keep in mind its restrictions when the container in question only has a short lifetime. I don’t advocate ripping out all sets from your application and replacing them with sorted, uniqified std::vectors. However in some cases like the one I described above when you don’t need the full functionality of a std::set, using a std::vector can offer a very reasonable alternative, especially when you identify the std::set as a bottleneck. Yes, you could probably also get O(log n) lookup speed in the sorted and uniqified vector by using std::binary_search but that’s getting a little too close to declaring “the only data C++ structure I’m familiar with is a std::vector”. Using the appropriate data structure (like a std::set) communicates additional information to the readers of your code; using workarounds like the one above are not that obvious and tend to obscure your code somewhat.

Quick tip if you see “bad DLL or entry point ‘msobj80.dll'” when building software with VS2008

Try stopping mspdbsrv.exe, which is the process that generates the pdb files during a build if it is still running. My understanding is that it’s supposed to shut down at the end of the compilation, but it seems that it can turn into a zombie process. If that happens, you can get error shown in the post’s title when linking your binaries.

Anyway, I just ran into this issue and stopping the process via the Task Manager resolved the issue for me.

On combining #import and /MP in C++ builds with VS2010

I’m currently busy porting a large native C++ project from VS2008 to VS2010 and one of the issues I keep running into was build times. The VS2008 build uses a distributed build system; Unfortunately the vendor doesn’t support VS2010 yet, so I couldn’t use the same infrastructure. In order to get a decent build speed, I started exploring MSBuild’s ability to build projects in parallel (which is fairly similar to VS2008’s ability to build projects in parallel) and the C++ compiler’s ability to make use of multiple processors/cores, aka the /MP switch.

Read More

Using CEDET-1.0 pre7 with Emacs 23.2

It’s been mentioned in several places that GNU Emacs versions sometime after 23.1.50 do come with an integrated version of CEDET. While I think that’s a superb idea it unfortunately managed to break my setup, which relies on a common set of emacs-lisp files that I hold under version control and distribute across the machines I work on. Those machines have different versions of GNU-based Emacsen (pure GNU, Emacs/W32, Carbon Emacs etc) so I can’t rely on the default CEDET. Unfortunately when I got a new machine and put a GNU Emacs 23.2 on there, my carefully crafted (OK, I’m exaggerating) .emacs wouldn’t play ball because the built-in CEDET conflicted with the pre7 that I had already installed on that machine.

I didn’t want to have to make extensive modifications to my .emacs, but a little time spent on Google brought up a post by Marco Bardelli on the CEDET-devel mailing list with a little code snippet that removes the built-in CEDET from the load-path. After putting this into my .emacs, my -pre7 config is working again.

For those in a similar quandary, here is the snippet in all its glory:

(setq load-path
      (remove (concat "/usr/share/emacs/" 
		      (substring emacs-version 0 -2) "/lisp/cedet")
	      load-path))

Welcome back to the new blog, almost the same as the old blog

The move to the other side of the Atlantic from the UK is almost complete. I’m just waiting for my household items – and more importantly, my computer books etc – to turn up. So it’s time to start blogging again in the next few weeks with a new blog or at least a new URL.

Due to some server trouble in the UK, combined with the fact that I do like Serendipity as a blogging system but was never 100% happy with it, I’ve switched to using WordPress on a server here in the US. The old blog will stay up, at least as long as the server stays put, but I won’t add any new content to the old blog.

Enough meta-blogging though, I should be back to the usual 1-2 post per month soon, so if you kindly could subscribe to the RSS feed and you’ll see when all of this is up and running again.

The homebuilt NAS/home server, revisited

This is a reblog of my “building a home NAS server” series on my old blog. The server still exists, still works but I’m about to embark on an overhaul so I wanted to consolidate all the articles on the same blog.

I’ve blogged building my own NAS/home server before, see here, here, here and here.

After a few months, I think it might be time for an interim update.

In its original incarnation, the server wasn’t as stable as it should have been given my previous experience of FreeBSD. For some reason, it would crash every few weeks and sometimes even hang on reboot. Not good, especially as it happened a few times while I wasn’t home. I guess I should have heeded the warning about the zfs integration being experimental… Things got worse when I added a wireless card and retired my access point. Roughly around this point in time I got fed up with this enough to go back and start building an OpenSolaris VM to try out a mail server setup similar to the one I’m running on FreeBSD.

Before I got anywhere with this, FreeBSD 8.0 came out, so I upgraded. ZFS had be promoted from experimental, the wireless stack has been overhauled, etc pp. The stability problems disappeared and the machine has been utterly reliable since then. Where before, trying to use Time Machine from to back up my MacBook via the wireless network was a good way to a 50% chance to crash the server, it now “just works”. This is where I wanted to get and I’ve now got there. Performance also seems to have improved – copying large files from the server to my Windows 7 machine sees a reliable 78MB/s via my Gigabit network now.

I’ve still got a couple of small changes I want to make to the machine – for example, I’ve got 4GB of RAM that I want to put into the machine. This should enable zfs readahead which should give me further performance improvements. I also plan to add two more fans to blow cold air over the hard drives to keep them happy and working for longer. Edit: Actually it didn’t as 4GB RAM on the mainboard result in slightly less than 4GB available to the OS. I did enable the readahead manually, though.

If I built another home server, I would probably get another motherboard. Not that there is anything particularly wrong with the one in the machine, but the CPU fan speed control only goes down to 70%, so no wonder that the CPU fan is noisier than it should be.

Was it worth it? Overall I’d say yes, although I probably should have stuck to tried and tested technology (either using FreeBSD’s built-in RAID5 or use OpenSolaris with zfs). This caused unnecessary problems at the beginning and pushed up the cost as I was dithering between either. Next time I probably set up the server on OpenSolaris and run the mail server on FreeBSD in a VM running on OpenSolaris. Given that the current configuration is working, I leave it alone for the time being though.

A couple of useful Emacs modes

highlight-changes-mode – as the name implies, it highlights changes that you make to a file. I do find it useful for the typical scenario of checking out a file, making a couple of smaller changes to it and then having to diff it to work out what you actually changed. As mentioned over at Emacswiki it doesn’t play too nicely with font-locking but I’ll try out some of the suggestions in the “Taming Highlight-Changes-Mode” section on this page.

nxml-mode – my preferred mode for editing XML. Of course it would be better if I could be bothered to create schemas for some of the files I’m editing but even without them, it does a pretty good job. As it’s trying to parse the XML that you write, it’s very helpful when it comes to highlighting mismatched tags or auto complete tags.

ido-mode – I’ve only recently started to use it and I’m still trying to work out if it is useful enough for me or if the improved file finding capability does bother me more than it helps. Yes, I know it can do a lot more but so far I’m only using the improved file finding and buffer switching. I really rate the buffer switching which is the main reason I leave it turned on.

Not really a mode, but I like using bm.el for visible bookmarks. I don’t use bookmarks that often but the package is extremely useful when I do need them.