KDevelop-PG-Qt: Generalised Expression Parsing

Hi guys and non-guys,

apparently I had no motivation for blogging in the last few months, however…

Currently (once in a while) I am porting the KDevelop-PG-Qt grammar from Flex+Bison to KDevelop-PG—Qt, a parser generator should trust itself and I hope it will increase the quality of the code (and of course it will allow using Unicode). Long Some time ago I implemented specialised parsing for (arithmetic) expressions: the recursive descent approach is not very good in those cases, you have to care about not using left-recursion and you will get very, very deep syntax trees, taking a lot of memory for simple expressions like “0” etc. thus increasing memory usage significantly (e. g. in KDevelop’s PHP-plugin). What was the approach? You can specify postfix, prefix, binary and ternary operators and parentheses. For each operator you have to specify the type of the tokens which should be interpreted as operator and a priority (higher≙stronger binding), additionally, binary and ternary operators have an associativity. I implemented it using a shunting yard-like approach (actually without knowing that termat that time).

However, when porting the grammar of KDevelop-PG-Qt I noticed that this approach did not suffice, but of course I wanted to use the specialised expression parsing for regular expressions etc.: I needed an “empty binary operator” for the usual concatenations in regular expressions (you can just write ab and do not have to use an operator to join them, e.g. a.b). It had been on my TODO-list for long time, and now I had to implement it. Well, when you have to change something, of course you want to choose a more general approach, thus instead of adding a special syntax for “concat-operators” or something like that—I tinkered with the idea—I allowed using arbitrary expressions as operator (already tinkered with that idea long time ago): The parser will parse whatever syntactic unit specified as operator instead of eating a single token. The interesting observation I want to tell about–let us start with an example: With this new feature we could implement Knuth’s hyperoperator: 3↑3, 3↑↑3, 3↑↑↑3 etc., the operator would be specified by the expression “UP_ARROW+” or something like that. Interestingly this is kinda a ternary operator: we can specify an arbitrary integer for the number of arrows, and the number is encoded in unary representation. Let us choose a more comfortable notation: 3[12]3 for twelve arrows, or why not 3[1+3*2]3? In terms of the classical ternary operator: “cond ? opt1 : opt2“ is a binary expression, where “? opt1 :” is the operator. Now it is obviously a normal ternary operator taking the arithmetic expressions as parameters, but we can interprete it as binary operator where the operator symbol itself is an expression surrounded by brackets. In fact we can handle any n-ary infix operators that way as binary infix operators, n-ary prefix operators as unary prefix operators and n-ary postfix operators as unary postfix operators, since for precedence the inner arguments do not matter, and the expression parsing problem gets reduced to these three basic cases and parentheses—of course excluding certain kinds of clashes between the symbols used (colon for both a ternary (cond? opt1 : opt2) and binary (x:y) operators etc.) where probably all usual linear time parsing algorithms will fail and which are unsuitable for programming languages. This allows a great simplification of the parsing algorithm, I had a lot of code dedicated to handle ternary operators, when composing the syntax tree the code had to look for them explicitly (of course now it is obvious that this was unnecessary). For now I have removed the support for ternary operators, later I might add the syntactic sugar to implement such operators easily without caring about these technique, that for I have to allow custom fields in the AST for specific operators, but the shunting yard routine will not have to be aware of this—I will stop with those details, I hope you got the interesting point how shunting yard can be used in a very general way and how expression parsing gets simplified to the few basic cases found in most programming languages by allowing everything to be an “operator”.

Back from NWERC 2011

Hi folks!

This weekend I participated at the North Western European Regional Contest, a regional preliminary round for the ACM ICPC. It is an international students contest about solving as many as possible out of 10 algorithmic problems in 5 hours in teams of three students and one PC. It was my first time there and it was quite fun.

At saturday we visited the historic harbour of Bremen Vegesack, which was partly boring and partly quite nice. At sunday there was the actual contest at the quite decadent and walled Campus of Jacobs University in Bremen (you even have to pay tuition of ~€25000, ~$33000 per year, at public universities you pay about €400-1400, I could live from it for long time, but even at elitist economical universities like EBS and WHU you have to pay much less). We used KDevelop 4.2 for solving the tasks and enjoyed the nice completion, source code reformatting (nice when you are not sure any longer which braces belong together and do not have much time :)), and inline documentation of STL functions. Thank you, KDevelop guys. :)

Our team from RWTH solved only three out of 10 tasks (30st place out of 70). B and E were quite trivial (I did not even had to read B since my team members solved it). I want to tell a bit about the other problems, I hope it is interesting for some of you. Of course you may try to solve the problems your self, it is quite a lot of fun. ;)


Given a natural number x<10^15 we should compute all n,k such that binomial(n,k)=x. It was quite simple, but we had a bug in our implementation and I do not have any clue what the problem was. We chose to check the cases binomial(n,1)=n and binomial(n,2)=n·(n-1)/2 manually and brute force the solution for all other cases by computing lines of Pascal’s triangle for k values such that binomial(n,k)≤x and terminating when binomial(n,2)>x. Alternatively you could check more terms explicitly (the team from LMU did it) and compute the whole triangle for the remaining values or use binary search to find the parameters.


Given a sequence of films watched by a person we should answer for each time position how many different films have been watched since the last time he watched the current one. By maintaining a binary tree containing the films ordered by the last time they were watched and a map containing this time for each movie this queries can be easily answered to if you have a way to get the number of nodes after a specific one in the tree in O(log n). That for you would have to store the size of each subtree and STL set/map unfortunately do not support this. Thus we decided to write our own tree, we went for a Treap with random priorities since it is the easiest way to avoid degeneration (easier to implement than AVL and Red-Black trees). Unfortunately we did not have enough time for debugging the tree and we even forgot to update the counters when balancing the tree after insertion. There were simplier solutions using segment trees.

For this task there were particularly many wrong submissions (the man presenting the solutions assumed that many teams tried to solve this one because the first team solved it yet after 20 minutes and thought it would be the easiest one, he even presented a very funny solution using java.lang.String. a linear time Java indexOf() call, which was of course way too slow).


This task was about deciding whether it is possible to partition a given pattern of black and white square into disjoint, L-shaped “puzzle pieces” of one black (in the corner) and two white squares. We had no idea how to solve that, we thought a lot about backtracking, but there semt to be no clever way to do that in polynomial time. The trick was to encode this problem in Boolean formulas and to use a standard algorithm (e.g. resolution) for the 2-SAT problem using variables encoding “white square x,y belongs to the same puzzle piece as the white square x+1,y+1“ etc.


Nobody solved this problem at the contest (though there were some people at the semi-live contest at Spoj solving it, a coach of another team said he knew a very similiar task from a different contest). Look at it, it looks like really complicated spam. Only one team tried. Cool geometric structures, quad-trees etc. will not help you to solve this problem: 500×500 fields, each one is either water or grass. You want to build a pool area. The border has to be grass and every pool of water has to be surrounded by a wall, which will cost $a per wall. You can transform water fields into grass, which will cost $b per field, and grass into water for $c. You want to build the cheapest pool area possible.

Actually it is a simple max-flow problem, since a minimum cut in a certain flow encoding the landscape is the optimal solution. You have a source, a sink and a node for every field. A cut should divide the nodes into water-nodes and grass-nodes. Source-to-x edges should have a capacity of the price needed to transform x into a grass-field and x-to-sink edges should have a capacity refering of the price needed to transform x into a water-field, unless the node represents a field at the border, then the capacity should be infinity. Additionally you add x-to-y nodes with a capacity of the price needed to build a wall for adjacent nodes x and y. That is it (if I am not wrong, I hope so :D). Really awesome.


I will post a full description later. For each observation you can yield an inueqility and you can complete the information in a Floyd-Warshall-like manner.


Spam. You have 13 out of 52 regular playing cards and want to partition it into as few as possible bunches of single cards, pairs, triples, quadruples, full-houses and streets of at least five consecutive cards. You can simply use brute force: Test all combinations of up to two full-houses and up to two streets (cannot be more for 13 cards) and greedily take the remaining cards as quadruples, triples, pairs and single cards. We tried that with a lot of copy-and-paste code and our program printed wrong answers, but we have no idea in which cases it failed. Alternatively you can choose a probably more compact solution by building a directed acyclic graph of all 2^13 combinations of your cards where edges represents adding one of the allowed bunches. Then you use breadth-first-search to get the solution. This was one of those typical tasks with useless information in the input (cards were encoded as 4h for ♥4 and Jd for a ♦Jack, in fact the suits did not even matter).


Given integer coordinates of up to 10000 RFID chips, 250000 RFID sensors and 25 walls compute which sensors can detect which chips. Each sensor can detect any chip within a radius of r, while walls reduce the radius by 1. No two sensors are closer to each other than r (“due to interference”^^). We used a sweep-line algorithm. Whenever the sweep line reaches the value x where x+r is the abscissa of a product we add the product to an ordered binary tree (std::set in C++) ordered by the ordinate. When reaching the value x where x-r is the abscissa of a chip we erase it from the tree. If x-r is the abscissa of a sensor we query all products in the tree with ordinates between y-r and y+r using lower_bound and upper_bound where y is the ordinate of the sensor. We count the number of walls intersecting the line between sensor and chip (we check that by adding up angle(p1, q1, p2)+angle(q1, p2, q2)+angle(p1, q2, p2)+angle(q1, p1, q2), which should be 2π (aka 360 degrees) for intersecting line segments) and check whether the chip is within the resulting reduced radius. This solution has a complexity of O((n+m)log(n+m)+n·w) where n is the number of chips, m the number of sensors and w the number of walls. This is guaranteed since the distance between any two sensors is at least r, thus at most 7 sensors can detect one chip (imagine a regular hexagon of sensors with one sensor in the center). Thus for many walls you would probably require a more sophisticated algorithm taking walls into account when sweeping.

Surprisingly only 7 teams solved it, I thought it would be kinda a standard-geometric-sweep-line-problem, it was even easier than E for us (there we used int instead of long long :D). Alternatively you can use binning (since the coordinate space is limited) or quad-trees to solve this problem.


You have a couple of train-stations and ≤1000 connections between these stations. Each connection will be once an hour at a certain minute (e.g. 13:47, 14:47 and so on). However, there is a certain probability that the train will be too late. If it is too late the delay is uniformly distributed in a range from 1 to some number ≤120. You have to compute the minimum expected duration of a trip from the start to the target. We solved that problem by a fixed-point iteration: For each station and each minute (0≤m<60) we store the best known expected duration to the target. We propagate that information till it does not change any longer. Unfortunately it timed out, probably because of floating point issues: we recognised changes which where insignificant or caused by rounding. Only one team solved this problem, three teams tried.


Yesterday I returned from Bremen, entered the first train at about 17:00, arrived at 0:45 and slept till 12:00. :) It was a really nice experience. If you are a student, like algorithms and want to participate: go ahead, look for two comrades, ask your professor to register you and join your regional contest! :) (at some universities there is a local preliminary contest)

Some visual impressions:

Coloured wave


The camera of my N97 mini mobile phone had a bug or something like that, however, when trying to photograph an ugly super market or something like that in the early evening and accidentally moving the camera too early I got this nice photo. Could not reproduce. :(
Some of my fingers

Could not reproduce :(

The contest was sponsored by Facebook and we had to attend a boring presentation of a Facebook employee. It was quite funny how he tried to make advertisement to work there. He assumed everybody in the room would use Facebook, what? We could get free t-shirts when answering his silly questions, did he think we would like to work there because of a t-shirt? It happened multiple times that he said that he is not allowed to tell about xy, really a great opportunity to work for a company where you cannot even superficially talk openly about the things you are doing there. And he told us that they would finally delete all messages six days after you have pressed the delete-button, haha. :D

I have pushed some solutions here.

Switched to LuaLaTeX


Have you ever tried to write a little bit complex command in LaTeX? I did at some occasions, and finally it somehow worked, but it has always been ugly. However, there is LuaTeX/LuaLaTeX, it provides real scripting within your documents:

  for i=0, 15 do
    tex.print("Math: $x_{" .. i .. "}$")

That is just awesome, in plain LaTeX this would be ugly, but it gets even more ugly if you have to deal with floating points, external files etc. Well, for now I do not need any complex macro, so I cannot talk about actual experiences with Lua stuff, but I encountered some problems when translating my LaTeX document to LuaLaTeX two weeks ago.

unicode-math does not work properly

When enabling the unicode-math package for using Unicode characters in formulas (⊂∀∃∂ etc.) I have to select a font using the \setmathfont command, I tried “Latin Modern Math”, “STIXGeneral”, “XITS Math”, “Neo Euler” and “Asana Math”, otherwise Unicode symbols will not get displayed. However, with all of these fonts formulas do not look as good as with standard LaTeX lmodern-package, which is usable from LuaLaTeX, too, \setmathfont will override it. Some of them are too bold, some have ugly ℝ, ℚ, ℂ (\mathbb) and \mathcal symbols etc. Thus I decided to port the uniinput package provided by the Neo project (they create the keyboard layout I am using) to LuaLaTeX. I thought it would be nice to check the Lua capabilities that for, however, I faced the next problem.

Lua is not Unicode aware

That is really annoying, LuaLaTeX’s claim is to support a) sophisticated scripting and b) native Unicode support. However, they choosed Lua as scripting language, which does not support Unicode natively. I could not find the functions I needed to write a succinct macro for declaring a Unicode character in math mode (for examble ℝ should be replaced with \mathbb{R}), simply something to split a 8-bit-UTF-8-string into its Unicode characters and to do conversions between character codes and strings. I did not want to write it myself. Thus I choosed a quick and dirty way: using some regexp-magic and a Ruby script to convert uniinput.sty into uniinput-lualatex.sty. It works now, you can use it if you want to…

Making it working with KileIP

KileIP currently has the latex-command hard coded to display previews of formulas. I was too lazy to fix that and I wanted to be able to fall back to LaTeX if there were unexpected problems, thus I made my document working with both LaTeX and LuaLaTeX:


Well, next time I need a complex macro I will certainly use Lua and it will hopefully work with my setup. :)

KDevelop-PG-Qt 1.0 Beta

Today KDevelop-PG-Qt 1.0 Beta (aka 0.9.82) got released, the parser generator used by several KDevelop (and Quanta) language plugins. There are some new features, and various bugs have been reported in the last few months since 0.9.5 and are now fixed.

New features

Most effort has been spent for implementing the lexer generation. You can now write a simple specification of the lexical structure by using regular expressions, the generated token stream class, which generates tokens from the input-data, can be used directly by the generated parser, but you can also just create a lexer without a parser. Thus, despite the name, KDevelop-PG-Qt is no longer just a parser generator. My motivation for writing the lexer generator was the lack of decent Unicode support in most lexers. Quex is quite good, but it is not free software (despite the LGPL based license) because of excluding military usage. The lexer can not only read different encodings, but also use different encodings for internal processing, e.g. it can either convert a UTF-8 stream to UTF-32 first or directly operate on UTF-8 bytes. Simple example:

%lexer ->
    for FOR ;
    "some keyword" SOME_KEYWORD ;
    {alphabetic}* WORD ;

It does also include limited support for look-ahead (full support like in Flex make generated code more complicated and inefficient), and a simlar – but in my experience more useful for modern languages – feature called “barriers” which will be explained in the documentation soon.

The API used by the generated files has been cleaned up, some method names were very counterintuitive, some methods were not used at all. This break requires updating KDevelop-PG-Qt to build the latest versions of PHP, QMake, SGML and CSS plugins (you may have to do a clean build).

There are also some minor new features: AST structs now get forward declared reducing dependencies, code which relied on definitions at some places has been fixed some time ago. The token types do not get declared by the parser any longer. Additionally you no longer need “;;” in the grammer files, though it is still supported for compatibility.

Bug fixes

The bug fixes include proper line numbering. In GCC you can now see correct line numbers refering to the location of wrong code in the grammer file, for other compilers you may have to activate the –use-line-directive, then it will use the #line directive instead of the GCC specific syntax, but you will not see the location in the generated code. The CMake-macro will do that automatically. Some compatibility errors have been reported reported by Fedora and Debian packagers and fixed, special thanks to them. KDevelop-PG-Qt builds with QT_STRICT_ITERATOS now and also builds under Windows with MSVC (MSVC is not perfect for C++11 and Flex is not perfect for Windows). Annoying wrongly formated error messages in the lexer and parser of KDevelop-PG-Qt have been fixed, too.

Some bug fixes might be necessary, thus there is this beta-version first. There is some work to do now first: The documentation and the kate grammar file for proper syntax highlighting have to be updated.


This is the second release using C++11 features, especially the auto keyword and STL hashes (unordered_map and unordered_set). I had used variadic templates before to construct UTF-8 automata, but for compatibility with MSVC it has been replaced with a Ruby script generating some code, the variadic template had been quite ugly anyway.

Future development of KDevelop-PG-Qt

  • Parser and lexer should be rewritten using KDevelop-PG-Qt
  • Long requested: there should be a way to mark conflicts as resolved, that for the conflict reporting has to be refactored a bit
  • The next release will hopefully support LL(k) parsing, making some stupid look-aheads obsolete
  • Cleaning up further code ;)

Special thanks to Milian Wolff for all his patience, bug reports and support with the release.

I will post the link to the tar-ball as soon as it is uploaded, there is only the tag for now.Tar-ball can be found here.

Parliamentary democracy has nothing to do with legitimation

I have recently noticed a belief I had been indoctrinated with: western political systems—parliamentary democracies—would be better than other ones because of their democratic legitimation, the reign is legitimate because the government is elected. It had appeared natural to me because it had been repeated again and again: some state or decision is legitimate because there was an election at some time. It had been told at school and everywhere, even spontaneous polls with arbitrary options pretend to do legitimation. But there are several really big flaws in this argumentation:

  • What is our target function? What should a legitimate decision be like? Why should “on-ground-of-election” be a most important value? Politics should be about resolving conflicts while protecting the freedom of everybody, elections do not automatically imply freedom, in fact they can set dictatorships like the Nazi regime up, and do—more often—simply nothing, thus do not eliminate a lot of injustice in the world.
  • Rousseau told us that only democracy is able to ensure freedom inside a society by implementing the volonté générale, but what have results of elections to do with the volonté générale? Nothing, in representative systems the decisions do not even exactly reflet the volonté de tous, in fact it is a very, very bad approximation, thus decisions are far away from being legitimate, they are the result of power games, nothing more.
  • The sample for elections does not strictly correlate with the set of affected persons, elections are part of an random distribution of power, not a legitimate one. Non-human animals, foreigners, children, incapacitated, how could elections legitimate an arbitrary reign over those persons?
  • To be more explicit: There is no intrinsic property of elections ensuring freedom and humanity.
  • External circumstances are ignored when saying “democratic” systems would ensure freedom. Wether freedom is possible is determined by the whole social order, not by single political decisions. Remember companies, employers, customers, teachers, “friends” etc., there are power and herrschaft everywhere, and they are not inherently related to politics. However, do not falsely conclude that it is all about polity to create a state ensuring freedom, polity inherently includes herrschaft, not freedom, instead, it is much more desirable to have external circumstances inherently including freedom and eliminating polity as far as possible. A lack of polity is not a fundamental problem.

You may argue that these are flaws of democracy, but it could approximate legitimation. Maybe, but not within our system. Majority decisions and representative systems can not converge to legitimacy, except of by annulment. But progress in that areas—I cannot even observe—is not the most important aspect of freedom, as I said, it is about the social circumstances, they may ensure personal freedom, but wherever politics get necessary for organisation we should of course try to find the volonté générale, consensus. But parliamentary democracies have nothing to do with that. They are not democracies–i.e. where the volonté générale is the political maxim–they are just chaotic systems of herrschaft, consisting of power games and occasional elections, but politicians call it “democracy” and pretend it would be a most important value–complacent hypocrisy.

But what makes parliamentary democracies good? Why do we prefer them to autocracies? They fulfil a very simple job: politicians can easier lose their power in so called “democracies”, thus they are more careful with doing bad things if the electorate (not the general public!) could care about it. Additionally there is the separation of powers—contemned by Rousseau—providing some limitation of arbitrariness in an undemocratic system. Courts ensure continuity. That are the true colours of parliamentary democracy! We should realise that the world will not get better just because there are parliamentary democracies.

MeeGo finally dead, absolutely dead

Some people could not believe it that Nokia’s deal with Microsoft killed MeeGo, a promising Linux distribution for mobile devices, though it had been arguable to drop Maemo and not using Plasma Mobile. I hope they will now believe it: Even Intel is dropping MeeGo, Tizen is coming slowly, probably not providing a full featured GNU/Linux (why should it be like that, if everything is HTML5/JavaScript-focused), providing no more big benefit compared to Android, Necessitas is there, Tizle is not. There is finally obviously no more prospect for MeeGo, forget it. I hope the are chances for Free Software and KDE on Android, although that is not optimal, no full featured GNU/Linux and has an uncertain future because of Chrome OS, I currently do not see any alternative.

EcmaScript is no assembly language


Windows 8, Tizen – there seems to be a HTML 5 + JavaScript hype. The myriads of web developers are expected to write all their stupid apps to make a platform popular. Some may know that I really do not like EcmaScript, however, it is good to allow using it, there may be domains where it is appropriate, there may be cases where it is the best choice to reuse some EcmaScript code. But seriously, that way technical possibilities get wasted. Reusing all the high-quality desktop application code – nearly impossible. Using certain paradigms of great programming languages – nearly impossible. I know, you can translate LLVM IL to EcmaScript, but EcmaScript is really not an assembly language. That it is not its purpose and it is not suitable, do not tell me V8 would be good, it cannot transform EcmaScript to an assembly language, even Google is now propagating Nativ Client (NaCl), really cool technology, to allow any language (great developments of the past!) being ed, by reusing existing languages, suitable as universal low-level representations: x86 machine code and LLVM IL. It had taken long time until there was finally a usable version of Qt for Android (Necessitas), reimplementation of everything and Java got in the way of a quick port, although there has always been the possibility to use native object-files. How long would something usable take for Tizen? And do not forget: Qt is huge an popular, it may be unaffordable for other projects somehow relying on certain OS capabilities. Thus, why not just supplying existing API? Why not just keeping great ecosystem, being open for new innovation and new technology which is compatible with it? Why limiting everything with EcmaScript’s poor capabilities? Web developers can still be attracted, EcmaScript and even HTML 5 can perfectly be combined with Qt. Even Apple failed with its original web-apps-only strategy, why should anybody else succeed with that nonsense now, years later? Microsoft is despaired, Google wants to control everything with Chrome OS, but why the hell should MeeGo be dropped?

New HTML/JavaScript-focused architectures: you suck.

Tizen wants to allow native access, but they “anticipate” that most stuff should be written with EcmaScript. Thus it will certainly not be easier than with Android to reuse existing technology.

Barfüßigkeit in den Pinakotheken

Ein Erlebnis, von dem ich bislang hier noch nicht erzählt habe (eines von vielen): Vor einigen Wochen stattete ich der Pinakothek der Moderne in München einen Besuch ab. Neben kräftiger BMW-Werbung gab es wirklich großartige Werke zu sehen, von Dalí, Beuys und vielen anderen. Fotos durften gemacht werden und es war einfach ein schöner Besuch. Wie es mir beliebte, war ich barfuß unterwegs, und als ich mit dem Gedanken spielte, das Museum zu verlassen oder noch einen Teil zu erkunden, und hierzu ins Foyer zurück geschritten war, sprach mich doch tatsächlich eine Angestellte des Museums an, ich könne mich nicht ohne Schuhen in ihren Räumlichkeiten aufhalten. Sie fragte nach dem Aufenthaltsort meiner Schuhe – Garderobe/Rucksack – ich möge sie doch bitte anlegen, so sei die Hausordnung. Auch hielt sie sich nicht damit zurück, mit dem Kontaktieren des Sicherheitspersonals zu drohen. Da ich mir dachte, wie blödsinnig dies ist, wollte ich doch interessehalber diese Hausordnung auch einmal sehen, geht ja nicht an, dass man sich vage auf irgendein Dokument beruft und nichts vorzuweisen hat, zudem hatte ich mich in einer ähnlichen Situation schon einmal abwimmeln lassen – dies sollte sich nicht wiederholen. Sie reagierte gereizt und beleidigt, ich würde ihr nicht glauben, in der Tat hatte ich meine Zweifel, was da in der Hausordnung bitte stehen sollte, doch meine Motivation war mein Interesse.

Ganz ohne den Sicherheitsdienst zu kontaktieren ging sie dann doch zum Thresen und suchte nach der Hausordnung, fragte Kolleginnen nach dem Aufenthaltsort ihrer und rufte sogar an anderer Stelle an. Die Forderung nach Einsicht schien wohl doch nicht völlig unberechtigt. Während dessen redeten sowohl sie als auch eine andere Angestellte auf mich ein, wieso ich ihr denn nicht glaube, obwohl sie so lange schon dort arbeite (die eine), und dass man schließlich nicht wolle, dass das Museum zu einer Art Badewiese verkomme (die andere, wenn ich mich recht entsinne). Schließlich wurde die Hausordnung doch in einem Papierstoß ausfindig gemacht, und ich staunte nicht schlecht: Es wurde tatsächlich in einem Abschnitt konstatiert, dass das Betreten sowohl mit Rollschuhen als auch ohne Schuhe „(‚barfuß‘)“ nicht gestattet sei. So verließ ich verwundert und noch immer barfüßig das Gebäude und wurde noch von hinten als Querulant beschuldigt, da ich nunmal Interesse gezeigt habe.

In einem Satz mit Rollschuhen? Geht es um eine vermeintlich erhöhte Rutschgefährdung, durch die empfindliche Kunstwerke, wie etwa geklebte, frei im Raum stehende Schnüre bedroht werden könnten? Bei den genervten Angestellten war keine Nachfrage mehr möglich, doch die Antwort eines Herrn Regierungsdirektors auf meine EMail-Anfrage brachte dann zum Vorschein:

Als Museen von Weltrang sind wir der Kunst und Ästhetik in besonderer Weise verpflichtet und haben deshalb in unserer Hausordnung die Erwartungshaltung festgeschrieben, dass unsere Besucher der Kunst in üblicher Bekleidung gegenübertreten, wozu nach unserer Auffassung auch Schuhe gehören.

Wunderbar, ich dachte zumindest die Pinakothek der Moderne hätte sich der Kunst verschrieben, und nicht bürgerlicher Selbstdarstellung. Tja, was soll man machen, künftig heißt es dann wohl Waschlappen um die Füße binden. Frei Kunst und freies Denken! Jaja… Das erklärt jedenfalls, wenn keine Kunst in Moscheen aufgehängt wird, wäre ja respektlos, und so mancher Orden hat wohl kollektives Hausverbot in den Pinakotheken.

GPLv3, LGPLv3, AGPLv3 Discussion


Just a short notice: I have started a discussion about the exclusion of GPLv3, LGPLv3 and AGPLv3 by the current licensing policy at the kde-licensing mailing list—as promised in my previous blog post discussing some arguments. Will be offline for few days, thus do not wonder if I am not answering.


Unexpected Bottle Neck: vector<bool>

Hi folks!

The last paragraph is actually the interesting one for every C++-programmer, the leading-in might not be understandable.

The new lexer component of KDevelop-PG-Qt is not very fast, it can take several minutes with complex grammars, especially when using UTF-8 as internal encoding for the automata. Of course, first I want to finish the most important features and that for I am writing an example lexer for PHP to check its capabilities to spot bugs and missing features, but now I was curious and asked Valgrind/Callgrind for analysis. I expected a lot of useless copying/assignments because I have not done any optimisation yet and tried to be “defensive” by using too much copying. Indeed: For my first example about 78% of the runtime were spent on copying. But then I tried a full (as far as it is finished) PHP example with UTF-8 and it was a real surprise:

KDevelop-PG-Qt: Callgrind results: a lot of comparisons on vector<bool>

KCacheGrind Overview

It spends most of the time with comparing instances of std::vector<bool>! For the powerset construction I have to maintain following states for subsets of states in the NFA and after that I map those subsets to new unique IDs for the states of the DFA. For both cases I wanted to use a map with vector<bool> (a subset of states) as key_type, I could not decide if I should use map or unordered_map – I used map, but there is also some code to make it possible to use QBitArray with unordered_map, just for the case. However: I expected the minimisation to be a bottle neck (disclaimer: I stopped valgrind after some time, maybe it becomes a bottle neck later), but now the comparisons used in the map-implementation are crucial. I was not very economical with those map-operations – I could have done the mapping while computing the necessary subsets, it has to be optimised and I may use unordered_map and unordered_set (hashes) instead of map and set, but that is not the reason why I am writing this blog post. ;)

The interesting thing about it is the usage of std::_Bit_reference. std::vector<bool> has a specialised implementation, the booleans get packed, thus it will only need size()/8+const Bytes when it is full. There is a special hashing-implementation directly using the integers used to store the bits, but there is only a generic comparison implementation. Thus it will iterate over all the bits, instead of using comparisons of chunks (like 32 or 64 bit, depending on the CPU). Of course it will be much slower, keep in mind that each iteration requires some bit-manipulation. Especially in my case of a red-black-tree the vector will get compared to very similar ones next to it in the tree, and in that case it takes especially long time. Be aware of vector<bool>, it is often awesome (safe a lot of memory), but something like this slow comparison may happen, and always keep in mind that the index-operator might be slower than for vector<char&gt. Unfortunately there seems to be no API for accessing the integer-chunks. Btw. QBitArray does not support comparison at all. I have filed a bug-report to GCC/libstdc++.