Graphical KDevelop-PG-Qt Output

KDevelop-PG-Qt is really boring, it is just generating some boring C++-code. Well, I have not implemented QML-based animations or multitouch-gestures for KDevelop-PG-Qt, but now you can get .dot-output, graphs which can be visualized using GraphViz, e.g. dot on the command-line or KGraphViewer. That way you can visualize the finite-state-machines used for generating the lexer. I guess everybody knows what this is:

utf8 dfa overview

Overview of the DFA

You have not got it? Let us zoom in:
utf8 dfa, rectangular excerpt

Rectangular view

utf8 dfa, square exceprt

Square view

You can also download the .dot-file and browse it using KGraphViewer, or browse the .svg-file (generated by using dot) with Gwenview or whatever. What is this automaton about? It as actually quite simple: The automaton will read single bytes from UTF-8-encoded input and recognize if the input represents a single alphabetic character (e.g. A or ? or ? or whatever). That is quite complicate, because there are many ranges in Unicode representing alphabetic characters and UTF-8 encoding makes it more complicate. This DFA is an optimal (minimum number of states) Moore-automaton (previous versions used Mealy for output, but Moore for optimization, that was stupid), and it needs 206 states, it is really minimal, no heuristics. You are right: Unicode is really complicated. Unfortunately it took 65 seconds to generate the lexer for this file:

%token_stream Lexer ; -- necessary name
%input_encoding "utf8" -- encoding used by the deterministic finite automaton
%token ALPHABETIC ; -- a token
%lexer ->
  {alphabetic} ALPHABETIC ; -- a lexer-rule

I think such automatons like {alphabetic} should be cached (currently unimplemented), because they are responsible for most of the runtime. The implementation of the .dot-output is still both buggy and hackish, that has to be changed. But the graphs look nice and they help with spotting errors in the lexer.

Leave a Reply

XHTML: Use <blockquote cite="name"> for quotations, <pre lang="text    ∨ cpp-qt ∨ cpp ∨ bash ∨ other language"> for code, [latex] for formulas and <em> for em. Contact me if the comment does not get published, it may have accidentally been marked as spam.

Anti-Spam Quiz: