KDevelop-PG-Qt 1.0 Beta

Today KDevelop-PG-Qt 1.0 Beta (aka 0.9.82) got released, the parser generator used by several KDevelop (and Quanta) language plugins. There are some new features, and various bugs have been reported in the last few months since 0.9.5 and are now fixed.

New features

Most effort has been spent for implementing the lexer generation. You can now write a simple specification of the lexical structure by using regular expressions, the generated token stream class, which generates tokens from the input-data, can be used directly by the generated parser, but you can also just create a lexer without a parser. Thus, despite the name, KDevelop-PG-Qt is no longer just a parser generator. My motivation for writing the lexer generator was the lack of decent Unicode support in most lexers. Quex is quite good, but it is not free software (despite the LGPL based license) because of excluding military usage. The lexer can not only read different encodings, but also use different encodings for internal processing, e.g. it can either convert a UTF-8 stream to UTF-32 first or directly operate on UTF-8 bytes. Simple example:

%lexer ->
    for FOR ;
    "some keyword" SOME_KEYWORD ;
    {alphabetic}* WORD ;
    ;

It does also include limited support for look-ahead (full support like in Flex make generated code more complicated and inefficient), and a simlar – but in my experience more useful for modern languages – feature called “barriers” which will be explained in the documentation soon.

The API used by the generated files has been cleaned up, some method names were very counterintuitive, some methods were not used at all. This break requires updating KDevelop-PG-Qt to build the latest versions of PHP, QMake, SGML and CSS plugins (you may have to do a clean build).

There are also some minor new features: AST structs now get forward declared reducing dependencies, code which relied on definitions at some places has been fixed some time ago. The token types do not get declared by the parser any longer. Additionally you no longer need “;;” in the grammer files, though it is still supported for compatibility.

Bug fixes

The bug fixes include proper line numbering. In GCC you can now see correct line numbers refering to the location of wrong code in the grammer file, for other compilers you may have to activate the –use-line-directive, then it will use the #line directive instead of the GCC specific syntax, but you will not see the location in the generated code. The CMake-macro will do that automatically. Some compatibility errors have been reported reported by Fedora and Debian packagers and fixed, special thanks to them. KDevelop-PG-Qt builds with QT_STRICT_ITERATOS now and also builds under Windows with MSVC (MSVC is not perfect for C++11 and Flex is not perfect for Windows). Annoying wrongly formated error messages in the lexer and parser of KDevelop-PG-Qt have been fixed, too.

Some bug fixes might be necessary, thus there is this beta-version first. There is some work to do now first: The documentation and the kate grammar file for proper syntax highlighting have to be updated.

C++11

This is the second release using C++11 features, especially the auto keyword and STL hashes (unordered_map and unordered_set). I had used variadic templates before to construct UTF-8 automata, but for compatibility with MSVC it has been replaced with a Ruby script generating some code, the variadic template had been quite ugly anyway.

Future development of KDevelop-PG-Qt

  • Parser and lexer should be rewritten using KDevelop-PG-Qt
  • Long requested: there should be a way to mark conflicts as resolved, that for the conflict reporting has to be refactored a bit
  • The next release will hopefully support LL(k) parsing, making some stupid look-aheads obsolete
  • Cleaning up further code ;)

Special thanks to Milian Wolff for all his patience, bug reports and support with the release.

I will post the link to the tar-ball as soon as it is uploaded, there is only the tag for now.Tar-ball can be found here.

Leave a Reply

XHTML: Use <blockquote cite="name"> for quotations, <pre lang="text    ∨ cpp-qt ∨ cpp ∨ bash ∨ other language"> for code, [latex] for formulas and <em> for em. Contact me if the comment does not get published, it may have accidentally been marked as spam.

Anti-Spam Quiz: