The last paragraph is actually the interesting one for every C++-programmer, the leading-in might not be understandable.
The new lexer component of KDevelop-PG-Qt is not very fast, it can take several minutes with complex grammars, especially when using UTF-8 as internal encoding for the automata. Of course, first I want to finish the most important features and that for I am writing an example lexer for PHP to check its capabilities to spot bugs and missing features, but now I was curious and asked Valgrind/Callgrind for analysis. I expected a lot of useless copying/assignments because I have not done any optimisation yet and tried to be “defensive” by using too much copying. Indeed: For my first example about 78% of the runtime were spent on copying. But then I tried a full (as far as it is finished) PHP example with UTF-8 and it was a real surprise:
It spends most of the time with comparing instances of std::vector<bool>! For the powerset construction I have to maintain following states for subsets of states in the NFA and after that I map those subsets to new unique IDs for the states of the DFA. For both cases I wanted to use a map with vector<bool> (a subset of states) as key_type, I could not decide if I should use map or unordered_map – I used map, but there is also some code to make it possible to use QBitArray with unordered_map, just for the case. However: I expected the minimisation to be a bottle neck (disclaimer: I stopped valgrind after some time, maybe it becomes a bottle neck later), but now the comparisons used in the map-implementation are crucial. I was not very economical with those map-operations – I could have done the mapping while computing the necessary subsets, it has to be optimised and I may use unordered_map and unordered_set (hashes) instead of map and set, but that is not the reason why I am writing this blog post.
The interesting thing about it is the usage of std::_Bit_reference. std::vector<bool> has a specialised implementation, the booleans get packed, thus it will only need size()/8+const Bytes when it is full. There is a special hashing-implementation directly using the integers used to store the bits, but there is only a generic comparison implementation. Thus it will iterate over all the bits, instead of using comparisons of chunks (like 32 or 64 bit, depending on the CPU). Of course it will be much slower, keep in mind that each iteration requires some bit-manipulation. Especially in my case of a red-black-tree the vector will get compared to very similar ones next to it in the tree, and in that case it takes especially long time. Be aware of vector<bool>, it is often awesome (safe a lot of memory), but something like this slow comparison may happen, and always keep in mind that the index-operator might be slower than for vector<char>. Unfortunately there seems to be no API for accessing the integer-chunks. Btw. QBitArray does not support comparison at all. I have filed a bug-report to GCC/libstdc++.