A piece of code I recently worked with required data structures that hold unique, sorted data elements. The requirement for the data being both sorted and unique came from it being fed into std::set_intersection() so using an std::set seemed to be an obvious way of fulfilling these requirements. The code did fulfill all the requirements but I found the performance somewhat wanting in this particular implementation (Visual Studio 2008 with the standard library implementation shipped by Microsoft). The main problem was is that this code is extremely critical to the performance of the application and it simply wasn’t fast enough at this point.

Pointing the profiler at the code very quickly suggested that the cost of destruction of the std::set seemed to be rather out of proportion compared to the cost of the rest of the code. To make matters worse, the std::set in question was used as a temporary accumulator and was thus created and destroyed very often. Also, the cost was in the destruction of the elements in the std::set, so the obvious technique – keep the std::set around, but clear() its contents – did not yield any improvement in the overall runtime, mainly due to the fact that red-black trees (the underlying implementation of this – and most – std::sets) are very expensive to destroy as you have to traverse the whole tree and delete it node by node, even if the data held in the nodes is a POD. Clearly, a different approach was needed.

A post on gamedev.net suggested that I wasn’t the first person to run into this issue, nor did it appear to be that specific to my platform. The article also suggested a technique that should work for me. Basically, std::set provides three guarantees – the contents is sorted, the keys unique and the lookup speed for a given key is in the order of O(log n). As I mentioned in the introduction, I did only really need two of the three guarantees, namely unique elements in a sorted order; std::set_intersection() expects input iterators as its parameters so despite its name, it is not tied to using a std::set, even though it’s the obvious initial choice of data structure.

In this particular case – a container accumulating a varying number of PODs, that eventually get fed into std::set_intersection – I could make use of the fact that at the point of accumulation of the data, I basically didn’t care if the data was sorted and unique. As long as I could ensure the data in the container fulfilled both criteria before calling std::set_intersection, I should be fine.

The simplest way and the least expensive in terms of CPU time I was able to come up with was to accumulate the data in a std::vector of PODs, which is cheap to destroy. Much like as described in the above thread on gamedev.net. I then took care of the sorted and unique requirements just before I fed it into the std::set_intersection:

std::vector<uint32_t> accumulator;

... accumulate lots of data ...

std::sort(accumulator.begin(), accumulator.end());

Note the call to accumulator.erase() – std::unique doesn’t actually remove the ‘superfluous’ elements from the std::vector, it just returns the iterator to the new end of the container. In the code I was using I couldn’t make use of this feature so I had to actually shrink the std::vector to only contain the elements of interest. Nevertheless, changing over from a real set to the ‘fake’ resulted in a speed increase of about 2x-3x which was very welcome.

Basically if you need a std::set, you need a std::set and you’ll have to keep in mind its restrictions when the container in question only has a short lifetime. I don’t advocate ripping out all sets from your application and replacing them with sorted, uniqified std::vectors. However in some cases like the one I described above when you don’t need the full functionality of a std::set, using a std::vector can offer a very reasonable alternative, especially when you identify the std::set as a bottleneck. Yes, you could probably also get O(log n) lookup speed in the sorted and uniqified vector by using std::binary_search but that’s getting a little too close to declaring “the only data C++ structure I’m familiar with is a std::vector”. Using the appropriate data structure (like a std::set) communicates additional information to the readers of your code; using workarounds like the one above are not that obvious and tend to obscure your code somewhat.

2 thoughts on “Sometimes, std::set just doesn’t cut it from a performance point of view”

    1. Thanks for the tip – I haven’t tried that (partially because it would require this change in every subproject so we don’t accidentally mix the various build configurations of the STL). It would be interesting if it does change the speed difference between the vector implementation and the set implementation or if the relative speed between the two stays the same.

Leave a Reply