Archive for the ‘KDE’ Category

KDevelop-PG-Qt: Generalised Expression Parsing

Sunday, February 26th, 2012

Hi guys and non-guys,

apparently I had no motivation for blogging in the last few months, however…

Currently (once in a while) I am porting the KDevelop-PG-Qt grammar from Flex+Bison to KDevelop-PG—Qt, a parser generator should trust itself and I hope it will increase the quality of the code (and of course it will allow using Unicode). Long Some time ago I implemented specialised parsing for (arithmetic) expressions: the recursive descent approach is not very good in those cases, you have to care about not using left-recursion and you will get very, very deep syntax trees, taking a lot of memory for simple expressions like “0” etc. thus increasing memory usage significantly (e. g. in KDevelop’s PHP-plugin). What was the approach? You can specify postfix, prefix, binary and ternary operators and parentheses. For each operator you have to specify the type of the tokens which should be interpreted as operator and a priority (higher≙stronger binding), additionally, binary and ternary operators have an associativity. I implemented it using a shunting yard-like approach (actually without knowing that termat that time).

However, when porting the grammar of KDevelop-PG-Qt I noticed that this approach did not suffice, but of course I wanted to use the specialised expression parsing for regular expressions etc.: I needed an “empty binary operator” for the usual concatenations in regular expressions (you can just write ab and do not have to use an operator to join them, e.g. a.b). It had been on my TODO-list for long time, and now I had to implement it. Well, when you have to change something, of course you want to choose a more general approach, thus instead of adding a special syntax for “concat-operators” or something like that—I tinkered with the idea—I allowed using arbitrary expressions as operator (already tinkered with that idea long time ago): The parser will parse whatever syntactic unit specified as operator instead of eating a single token. The interesting observation I want to tell about–let us start with an example: With this new feature we could implement Knuth’s hyperoperator: 3↑3, 3↑↑3, 3↑↑↑3 etc., the operator would be specified by the expression “UP_ARROW+” or something like that. Interestingly this is kinda a ternary operator: we can specify an arbitrary integer for the number of arrows, and the number is encoded in unary representation. Let us choose a more comfortable notation: 3[12]3 for twelve arrows, or why not 3[1+3*2]3? In terms of the classical ternary operator: “cond ? opt1 : opt2“ is a binary expression, where “? opt1 :” is the operator. Now it is obviously a normal ternary operator taking the arithmetic expressions as parameters, but we can interprete it as binary operator where the operator symbol itself is an expression surrounded by brackets. In fact we can handle any n-ary infix operators that way as binary infix operators, n-ary prefix operators as unary prefix operators and n-ary postfix operators as unary postfix operators, since for precedence the inner arguments do not matter, and the expression parsing problem gets reduced to these three basic cases and parentheses—of course excluding certain kinds of clashes between the symbols used (colon for both a ternary (cond? opt1 : opt2) and binary (x:y) operators etc.) where probably all usual linear time parsing algorithms will fail and which are unsuitable for programming languages. This allows a great simplification of the parsing algorithm, I had a lot of code dedicated to handle ternary operators, when composing the syntax tree the code had to look for them explicitly (of course now it is obvious that this was unnecessary). For now I have removed the support for ternary operators, later I might add the syntactic sugar to implement such operators easily without caring about these technique, that for I have to allow custom fields in the AST for specific operators, but the shunting yard routine will not have to be aware of this—I will stop with those details, I hope you got the interesting point how shunting yard can be used in a very general way and how expression parsing gets simplified to the few basic cases found in most programming languages by allowing everything to be an “operator”.

Back from NWERC 2011

Monday, November 28th, 2011

Hi folks!

This weekend I participated at the North Western European Regional Contest, a regional preliminary round for the ACM ICPC. It is an international students contest about solving as many as possible out of 10 algorithmic problems in 5 hours in teams of three students and one PC. It was my first time there and it was quite fun.

At saturday we visited the historic harbour of Bremen Vegesack, which was partly boring and partly quite nice. At sunday there was the actual contest at the quite decadent and walled Campus of Jacobs University in Bremen (you even have to pay tuition of ~€25000, ~$33000 per year, at public universities you pay about €400-1400, I could live from it for long time, but even at elitist economical universities like EBS and WHU you have to pay much less). We used KDevelop 4.2 for solving the tasks and enjoyed the nice completion, source code reformatting (nice when you are not sure any longer which braces belong together and do not have much time :)), and inline documentation of STL functions. Thank you, KDevelop guys. :)

Our team from RWTH solved only three out of 10 tasks (30st place out of 70). B and E were quite trivial (I did not even had to read B since my team members solved it). I want to tell a bit about the other problems, I hope it is interesting for some of you. Of course you may try to solve the problems your self, it is quite a lot of fun. ;)

A

Given a natural number x<10^15 we should compute all n,k such that binomial(n,k)=x. It was quite simple, but we had a bug in our implementation and I do not have any clue what the problem was. We chose to check the cases binomial(n,1)=n and binomial(n,2)=n·(n-1)/2 manually and brute force the solution for all other cases by computing lines of Pascal’s triangle for k values such that binomial(n,k)≤x and terminating when binomial(n,2)>x. Alternatively you could check more terms explicitly (the team from LMU did it) and compute the whole triangle for the remaining values or use binary search to find the parameters.

C

Given a sequence of films watched by a person we should answer for each time position how many different films have been watched since the last time he watched the current one. By maintaining a binary tree containing the films ordered by the last time they were watched and a map containing this time for each movie this queries can be easily answered to if you have a way to get the number of nodes after a specific one in the tree in O(log n). That for you would have to store the size of each subtree and STL set/map unfortunately do not support this. Thus we decided to write our own tree, we went for a Treap with random priorities since it is the easiest way to avoid degeneration (easier to implement than AVL and Red-Black trees). Unfortunately we did not have enough time for debugging the tree and we even forgot to update the counters when balancing the tree after insertion. There were simplier solutions using segment trees.

For this task there were particularly many wrong submissions (the man presenting the solutions assumed that many teams tried to solve this one because the first team solved it yet after 20 minutes and thought it would be the easiest one, he even presented a very funny solution using java.lang.String. a linear time Java indexOf() call, which was of course way too slow).

D

This task was about deciding whether it is possible to partition a given pattern of black and white square into disjoint, L-shaped “puzzle pieces” of one black (in the corner) and two white squares. We had no idea how to solve that, we thought a lot about backtracking, but there semt to be no clever way to do that in polynomial time. The trick was to encode this problem in Boolean formulas and to use a standard algorithm (e.g. resolution) for the 2-SAT problem using variables encoding “white square x,y belongs to the same puzzle piece as the white square x+1,y+1“ etc.

F

Nobody solved this problem at the contest (though there were some people at the semi-live contest at Spoj solving it, a coach of another team said he knew a very similiar task from a different contest). Look at it, it looks like really complicated spam. Only one team tried. Cool geometric structures, quad-trees etc. will not help you to solve this problem: 500×500 fields, each one is either water or grass. You want to build a pool area. The border has to be grass and every pool of water has to be surrounded by a wall, which will cost $a per wall. You can transform water fields into grass, which will cost $b per field, and grass into water for $c. You want to build the cheapest pool area possible.

Actually it is a simple max-flow problem, since a minimum cut in a certain flow encoding the landscape is the optimal solution. You have a source, a sink and a node for every field. A cut should divide the nodes into water-nodes and grass-nodes. Source-to-x edges should have a capacity of the price needed to transform x into a grass-field and x-to-sink edges should have a capacity refering of the price needed to transform x into a water-field, unless the node represents a field at the border, then the capacity should be infinity. Additionally you add x-to-y nodes with a capacity of the price needed to build a wall for adjacent nodes x and y. That is it (if I am not wrong, I hope so :D). Really awesome.

G

I will post a full description later. For each observation you can yield an inueqility and you can complete the information in a Floyd-Warshall-like manner.

H

Spam. You have 13 out of 52 regular playing cards and want to partition it into as few as possible bunches of single cards, pairs, triples, quadruples, full-houses and streets of at least five consecutive cards. You can simply use brute force: Test all combinations of up to two full-houses and up to two streets (cannot be more for 13 cards) and greedily take the remaining cards as quadruples, triples, pairs and single cards. We tried that with a lot of copy-and-paste code and our program printed wrong answers, but we have no idea in which cases it failed. Alternatively you can choose a probably more compact solution by building a directed acyclic graph of all 2^13 combinations of your cards where edges represents adding one of the allowed bunches. Then you use breadth-first-search to get the solution. This was one of those typical tasks with useless information in the input (cards were encoded as 4h for ♥4 and Jd for a ♦Jack, in fact the suits did not even matter).

I

Given integer coordinates of up to 10000 RFID chips, 250000 RFID sensors and 25 walls compute which sensors can detect which chips. Each sensor can detect any chip within a radius of r, while walls reduce the radius by 1. No two sensors are closer to each other than r (“due to interference”^^). We used a sweep-line algorithm. Whenever the sweep line reaches the value x where x+r is the abscissa of a product we add the product to an ordered binary tree (std::set in C++) ordered by the ordinate. When reaching the value x where x-r is the abscissa of a chip we erase it from the tree. If x-r is the abscissa of a sensor we query all products in the tree with ordinates between y-r and y+r using lower_bound and upper_bound where y is the ordinate of the sensor. We count the number of walls intersecting the line between sensor and chip (we check that by adding up angle(p1, q1, p2)+angle(q1, p2, q2)+angle(p1, q2, p2)+angle(q1, p1, q2), which should be 2π (aka 360 degrees) for intersecting line segments) and check whether the chip is within the resulting reduced radius. This solution has a complexity of O((n+m)log(n+m)+n·w) where n is the number of chips, m the number of sensors and w the number of walls. This is guaranteed since the distance between any two sensors is at least r, thus at most 7 sensors can detect one chip (imagine a regular hexagon of sensors with one sensor in the center). Thus for many walls you would probably require a more sophisticated algorithm taking walls into account when sweeping.

Surprisingly only 7 teams solved it, I thought it would be kinda a standard-geometric-sweep-line-problem, it was even easier than E for us (there we used int instead of long long :D). Alternatively you can use binning (since the coordinate space is limited) or quad-trees to solve this problem.

J

You have a couple of train-stations and ≤1000 connections between these stations. Each connection will be once an hour at a certain minute (e.g. 13:47, 14:47 and so on). However, there is a certain probability that the train will be too late. If it is too late the delay is uniformly distributed in a range from 1 to some number ≤120. You have to compute the minimum expected duration of a trip from the start to the target. We solved that problem by a fixed-point iteration: For each station and each minute (0≤m<60) we store the best known expected duration to the target. We propagate that information till it does not change any longer. Unfortunately it timed out, probably because of floating point issues: we recognised changes which where insignificant or caused by rounding. Only one team solved this problem, three teams tried.

Subsumption

Yesterday I returned from Bremen, entered the first train at about 17:00, arrived at 0:45 and slept till 12:00. :) It was a really nice experience. If you are a student, like algorithms and want to participate: go ahead, look for two comrades, ask your professor to register you and join your regional contest! :) (at some universities there is a local preliminary contest)

PS:
Some visual impressions:

Coloured wave

:)


The camera of my N97 mini mobile phone had a bug or something like that, however, when trying to photograph an ugly super market or something like that in the early evening and accidentally moving the camera too early I got this nice photo. Could not reproduce. :(
Some of my fingers

Could not reproduce :(

PPS:
The contest was sponsored by Facebook and we had to attend a boring presentation of a Facebook employee. It was quite funny how he tried to make advertisement to work there. He assumed everybody in the room would use Facebook, what? We could get free t-shirts when answering his silly questions, did he think we would like to work there because of a t-shirt? It happened multiple times that he said that he is not allowed to tell about xy, really a great opportunity to work for a company where you cannot even superficially talk openly about the things you are doing there. And he told us that they would finally delete all messages six days after you have pressed the delete-button, haha. :D

PPPS:
I have pushed some solutions here.

KDevelop-PG-Qt 1.0 Beta

Tuesday, October 4th, 2011

Today KDevelop-PG-Qt 1.0 Beta (aka 0.9.82) got released, the parser generator used by several KDevelop (and Quanta) language plugins. There are some new features, and various bugs have been reported in the last few months since 0.9.5 and are now fixed.

New features

Most effort has been spent for implementing the lexer generation. You can now write a simple specification of the lexical structure by using regular expressions, the generated token stream class, which generates tokens from the input-data, can be used directly by the generated parser, but you can also just create a lexer without a parser. Thus, despite the name, KDevelop-PG-Qt is no longer just a parser generator. My motivation for writing the lexer generator was the lack of decent Unicode support in most lexers. Quex is quite good, but it is not free software (despite the LGPL based license) because of excluding military usage. The lexer can not only read different encodings, but also use different encodings for internal processing, e.g. it can either convert a UTF-8 stream to UTF-32 first or directly operate on UTF-8 bytes. Simple example:

%lexer ->
    for FOR ;
    "some keyword" SOME_KEYWORD ;
    {alphabetic}* WORD ;
    ;

It does also include limited support for look-ahead (full support like in Flex make generated code more complicated and inefficient), and a simlar – but in my experience more useful for modern languages – feature called “barriers” which will be explained in the documentation soon.

The API used by the generated files has been cleaned up, some method names were very counterintuitive, some methods were not used at all. This break requires updating KDevelop-PG-Qt to build the latest versions of PHP, QMake, SGML and CSS plugins (you may have to do a clean build).

There are also some minor new features: AST structs now get forward declared reducing dependencies, code which relied on definitions at some places has been fixed some time ago. The token types do not get declared by the parser any longer. Additionally you no longer need “;;” in the grammer files, though it is still supported for compatibility.

Bug fixes

The bug fixes include proper line numbering. In GCC you can now see correct line numbers refering to the location of wrong code in the grammer file, for other compilers you may have to activate the –use-line-directive, then it will use the #line directive instead of the GCC specific syntax, but you will not see the location in the generated code. The CMake-macro will do that automatically. Some compatibility errors have been reported reported by Fedora and Debian packagers and fixed, special thanks to them. KDevelop-PG-Qt builds with QT_STRICT_ITERATOS now and also builds under Windows with MSVC (MSVC is not perfect for C++11 and Flex is not perfect for Windows). Annoying wrongly formated error messages in the lexer and parser of KDevelop-PG-Qt have been fixed, too.

Some bug fixes might be necessary, thus there is this beta-version first. There is some work to do now first: The documentation and the kate grammar file for proper syntax highlighting have to be updated.

C++11

This is the second release using C++11 features, especially the auto keyword and STL hashes (unordered_map and unordered_set). I had used variadic templates before to construct UTF-8 automata, but for compatibility with MSVC it has been replaced with a Ruby script generating some code, the variadic template had been quite ugly anyway.

Future development of KDevelop-PG-Qt

  • Parser and lexer should be rewritten using KDevelop-PG-Qt
  • Long requested: there should be a way to mark conflicts as resolved, that for the conflict reporting has to be refactored a bit
  • The next release will hopefully support LL(k) parsing, making some stupid look-aheads obsolete
  • Cleaning up further code ;)

Special thanks to Milian Wolff for all his patience, bug reports and support with the release.

I will post the link to the tar-ball as soon as it is uploaded, there is only the tag for now.Tar-ball can be found here.

GPLv3, LGPLv3, AGPLv3 Discussion

Wednesday, August 24th, 2011

Hi!

Just a short notice: I have started a discussion about the exclusion of GPLv3, LGPLv3 and AGPLv3 by the current licensing policy at the kde-licensing mailing list—as promised in my previous blog post discussing some arguments. Will be offline for few days, thus do not wonder if I am not answering.

Regards
Jonathan

Unexpected Bottle Neck: vector<bool>

Tuesday, August 23rd, 2011

Hi folks!

The last paragraph is actually the interesting one for every C++-programmer, the leading-in might not be understandable.

The new lexer component of KDevelop-PG-Qt is not very fast, it can take several minutes with complex grammars, especially when using UTF-8 as internal encoding for the automata. Of course, first I want to finish the most important features and that for I am writing an example lexer for PHP to check its capabilities to spot bugs and missing features, but now I was curious and asked Valgrind/Callgrind for analysis. I expected a lot of useless copying/assignments because I have not done any optimisation yet and tried to be “defensive” by using too much copying. Indeed: For my first example about 78% of the runtime were spent on copying. But then I tried a full (as far as it is finished) PHP example with UTF-8 and it was a real surprise:

KDevelop-PG-Qt: Callgrind results: a lot of comparisons on vector<bool>

KCacheGrind Overview


It spends most of the time with comparing instances of std::vector<bool>! For the powerset construction I have to maintain following states for subsets of states in the NFA and after that I map those subsets to new unique IDs for the states of the DFA. For both cases I wanted to use a map with vector<bool> (a subset of states) as key_type, I could not decide if I should use map or unordered_map – I used map, but there is also some code to make it possible to use QBitArray with unordered_map, just for the case. However: I expected the minimisation to be a bottle neck (disclaimer: I stopped valgrind after some time, maybe it becomes a bottle neck later), but now the comparisons used in the map-implementation are crucial. I was not very economical with those map-operations – I could have done the mapping while computing the necessary subsets, it has to be optimised and I may use unordered_map and unordered_set (hashes) instead of map and set, but that is not the reason why I am writing this blog post. ;)

The interesting thing about it is the usage of std::_Bit_reference. std::vector<bool> has a specialised implementation, the booleans get packed, thus it will only need size()/8+const Bytes when it is full. There is a special hashing-implementation directly using the integers used to store the bits, but there is only a generic comparison implementation. Thus it will iterate over all the bits, instead of using comparisons of chunks (like 32 or 64 bit, depending on the CPU). Of course it will be much slower, keep in mind that each iteration requires some bit-manipulation. Especially in my case of a red-black-tree the vector will get compared to very similar ones next to it in the tree, and in that case it takes especially long time. Be aware of vector<bool>, it is often awesome (safe a lot of memory), but something like this slow comparison may happen, and always keep in mind that the index-operator might be slower than for vector<char&gt. Unfortunately there seems to be no API for accessing the integer-chunks. Btw. QBitArray does not support comparison at all. I have filed a bug-report to GCC/libstdc++.

The FSF about Free Games

Sunday, July 17th, 2011

I have noticed a new FSF-bulletin-article via identi.ca: The Free Game Lag by Danny Piccarillo.
The article is about the lack of FLOSS games. It neglects the theory that Free Software would be an unsuitable method for game development. Free games will evolve like any other fields of software, but currently it is low-priority, because games are not that important. Seriously, the arguments of that article are null and void, it does not take specific properties of game development into account, I want to explain my thoughts about the issue:

Some time ago I thought it would be impossible for Free Software to conquer game development. It is a lot of work involved with developing a big computer game, but there are no people having a specific commercial interest in the development of them, thus selling licenses seems to be the only possible business model (in comparison many companies are interested in the development of Linux or Apache). There will not be any RedHat or Google or whatever extensively sponsoring development of free games, nothing more than some GSOC Wesnoth projects (that is much less than big game industry). What was wrong about my thought? Game development does not necessarily need such a “business-model” to be successful. First of all we should notice that there are sophisticated Free Software game engines, XNA or similar proprietary software is not needed, there are e.g. Irrlicht, Ogre3D, Blender or Panda3D, sophisticated graphical effects, physics etc. seem not to depend on proprietary software any longer. When looking at the development of major games one may notice that there are seldomly epic new concepts, most of the time it is new content, i.e. new graphics/artwork, new story, new characters, new items, new quests etc. It is a lot of work, but: it can be done by communities. Gamers have already created huge modifications in their free time, once free games have reached a certain level including good graphics etc., there could be entirely new big communities of gamers, and because community and cooperation are integral parts of Free Software, such games would not stop growing. But currently most “serious” gamers are only recognising proprietary major games.

But of course those major 3D-shooter/RPG/strategy/…-games are not the only ones, many people are also playing so called “casual games”, they tend to be very wide spread—and proprietary. One may argue that casual gamers do not want to spend time for contributing, but I think there is enough hope they may be interested in it, too. The Gluon Project, we all know about, seems to have some very nice approaches, it is trying to build such communities for free games, which are currently not present, supported by OpenDesktop.org software (and hardware, but that is not that important). For 2D-realtime games it looks very promising. There are also some innovative approaches for turn-based games, e.g. short time ago I found out about Toss, it is a free research project combining game creation and general game playing (from AI point of view), I am sure it would be awesome if communities could be built around such a software as well (there is Zillions of Games, but it is proprietary).

When people ask you how gaming as we know it can exist in a free software world, you should open with your response with, “It can’t, but it can be better.”

That is definitely right, but there are specific properties which have to be taken into account. There are chances for free games, we should not forsake any hope just because it seems to be impossible with current business, we should hope that all those business (“big game industry”, “app development”, all using DRM etc.) will perish, although currently they seem to strengthen. Free Software and community can succeed when using entirely new methods.

PS:
What I have forgotten to write yesterday: crowd funding should be considered, too, why should ordinary gamers not pay for development instead of licenses if there are good chances that there will be some good results? That is often a good alternative to selling copies, especially unknown artists can benefit. Of course it should not be misused (that is often the case at kickstartet.com, people receive such funding and have no risks, but then they create DRMed products, sell it in the app store etc., that is strictly against the street performer protocol ;)). Btw. OpenDesktop.org supports donations.

Regarding Dynamic Typing

Sunday, July 10th, 2011

Currenty there seems to be a big hype about EcmaScript (JavaScript). Google probably wants to enslave the world with Chrome OS, where the system is not much more than an EcmaScript virtual machine provider (maybe it will support native applications through NaCl, like Android does not only allow Java…), Microsoft wants to reimplement their Windows interface, Qt invented QML, the EcmaScript extension we all know about, providing cool declarative features. Today I want to talk about a fundamental property of EcmaScript and why it just sucks: dynamic typing.

Clarification

First let us clarify the meaning of “dynamic typing”. “Dynamic typing” means that the type of expressions gets checked at runtime, expressions which may have any type are possible. It should not be confused with duck typing, e.g. many types using dynamic typing have a lot of built-in functions relying on specific types (“the frog will not get accepted”), e.g. most functions in PHP’s standard library (expecting string, integer, array or what ever). But for example C++ function templates in etc. provide duck typing (they will accept anything looking like an iterator), or the signatures in old g++ versions provided duck typing. Determining types may happen at compile time (type inference) even with dynamic typing, and of course optimising compilers/interpreters are doing that.

Impact on Development

Let us talk about an argument for dynamic typing: it makes life easier. Actually, that can be right, there are domains where you just do not want to care about such stuff, for example when writing a shell script with few lines of code or when quickly doing some calculations with a computer algebra system. But Typo3, MediaWiki, Windows, Plasma etc. are much more than that. Why do I doubt that dynamic typing makes life easier in those contexts? Because it is error-prone. It is always better when errors get detected at compile time. It is good to fulfill contracts when programming, and they should get verified at compile time, such that they can be easily found and will not annoy the user. A type of contract (not the only one, cf. design by contract) which has been used for long time is the type system. The programmer assures that a variable has a certain type. What happens in dynamically typed languages? You do not have to state the contract, the compiler (or code checker) will usually not be able to check it, it is just in your brain, but of course you will still rely on that contract, the type is something you rely on most of the time when programming, I know that x is an integer when using x for some arithmetics. But you will do mistakes and you get buggy software. That is the fundamental disadvantage when programming, but of course I have to compare it to the advantages of dynamic typing: you can write code quickly and efficiently not mentioning the type everywhere. But there are more proper ways to achieve that: use type inference. The type of a variable will be determined by the compiler when initialising it and you will get an error when you are trying to change the type. That is good because in most cases the type of a variable will not change. And you will get informed about undefined variables (a typo should not cause a runtime error, but in dynamically typed languages it does). For the case that you need a structure allowing different types at the same position there are algebraic data types. With algebraic data types you can state a contract with only few tokens (instead of a nested array/dictionary data structure with a layout which is just implicitly given by the manipulation of it, that does often happen in dynamically typed languages), for variable declaration you only need one token, maybe a single character. That minimalistic overhead in code length is definitely worth it once the software has reached a certain complexity. That threshold is probably not very high, annoying mistakes which could have been avoided with static type checking can already occur in small programs just computing some stuff or something like that.

Performance

Dynamic typing causes big overhead because instructions have to be choosen at runtime based on type information all the time. Of course it is much more complicated to optimise dynamically typed languages, there might be corner cases where the type is not the expected one, but the runtime has to care about it etc. I often read statements like “the performance critical parts are implemented natively” etc., but regarding the amount of applications running using such languages (JavaScript, PHP, Ruby, Python, Lua) we have to state: it is performance critical, PHP is used for more than a preprocessor, QML is used for more than just representing the UI, JavaScript is used for drawing a lot of complex stuff in the browser, Python gets used for scientific computations, and Ruby is establishing new standards regarding overhead (that is not true, Scheme has been that slow before ;), but Ruby allows modifying a lot of stuff at runtime, too). There is reasonable overhead—for abstraction, generalisation, internationalisation etc., but dynamic typing affects nearly any operation when running the program, that is unreasonable and of course it will sum up to significant overhead, although it is simply not needed (and bad for environment ;)).

Special Issues

Regarding extreme flexibility

First of all: in 95% of applications you do not need it, you do not have to modify types at runtime, adding member functions to classes or objects and all that stuff. Sometimes it may be a good way to establish abstraction etc., but in those cases there are usually alternatives: meta-programming can be done at compile time, when manipulating all the types in Ruby they usually could have been manipulated at compile time, too, but Ruby does not support sophisticated compile time meta programming (ML and Template Haskell do, in C++ and D it is kinda limited). Regarding collection of information, debugging etc. using such features: debugging facilities should not influence the performance and cleanness, I am sure by involvement of meta programming you could implement language features allowing that when debugging without neglecting the type system. And of course a lot of flexibility at runtime can be achieved without allowing any type everywhere: dynamic dispatch (including stuff like inheritance, interfaces, signatures and even multi-dispatch), variant types at few places (e.g. QVariant, although I think it is used too often, signals can be implemented in a type safe way, and there are those type safe plugin factories as alternative to QtScript and Kross), signals and slots, aspects etc.

Regarding EcmaScript

You might say that EcmaScript is becoming fast enough because of good compilers and extensions like type safe arrays (e.g. containing only floating points). But EcmaScript will stay EcmaScript, it will keep the downsides of dynamic typing, those type safe arrays are an ugly hack to make it feasible for some specific applications. It is simply lacking a proper type system and it will not get it.

Regarding QML

Using EcmaScript for QtScript was a pragmatic choice, no awesome innovation: there were many web developers knowing about JavaScript. Unfortunately that caused yet another way to integrate scripts and certainly not the most flexible one (cf. my previous blog post), for some reason they did not want to reuse KDE’s innovation (like QtCreator and KDevelop, but that is really a different topic…). QML is based on EcmaScript because QtScript had been based on it before. Dynamic typing is definitely not an inherent property of such declarative UI, most of it could have looked the same with a native implementation based on C++, but also implementations in Ruby or whatever would be easily possible. I have to admit that C++ is not perfect, it does not provide sophisticated meta programming, algebraic types or one-letter type inference (“auto” has four letters ;)), the last one may be a small problem, but overall it is simply not simple enough ;), languages like Scala, D and OCaml have certain problems, too. Hence some of the non-declarative code in QML would have been disproportionately complicated compared to the declarative code. The general approach of declarative UI is certainly good, and now we probably have to accept that it has been implemented using EcmaScript, we can accept it, as long as it is still possible to write Plasmoids using C++ or whatever etc.—obviously that is the case. Thus QML is generally a good development in my opinion, although implementing program logic in it is often not a good idea and although dynamic typing leaves a bitter aftertaste.

I hope you have got my points about dynamic typing. Any opinions?

Skype Reverse Engineered

Thursday, June 2nd, 2011

Hi!

Good news for Free Software and open protocols: There has been a sucessful attempt to reverse engineer Skype (Magent URI). Nice timing, shortly after Microsoft’s acquisition Skype could finally be broken. :) He is also including modified Skype executables allowing debugging etc., which is usually prevented by really elaborated anti-features (encryption, kills itself if there is a debugger etc.). The sample code is able to send a message via Skype, awesome!

Now there are hopefully soon implementations for Telepathy or libpurple (used by Pidgin and Telepathy Haze). One may say we should not promote such proprietary protocols, but: For many people Skype is important (do not say you do not know anybody, it varies, in some groups everybody is using it, somewhere else it is different, I am not using it). The chances that they will switch to Free Software (KDE/GNU/Linux) are much higher if there is support for Skype without proprietary software. And once they start using Pidgin or Telepathy, it is no problem for them to use open protocols, too, you can simply tell them they have to click those buttons and then they can communicate with you using an open protocol like Jingle or SIP (or XMPP for text messages). Thus it does not only help to spread Free Software, but also to spread open protocols. And all future attempts to commercialise the private data created by Skype can be easily prevented. Even Windows-users may start using Pidgin or something like that against the advertisement. Regarding “it is just a hack”: They cannot simply change their protocol, because there is Skype hardware which cannot be updated and would not work any longer (that would make all non-idealist customers unhappy, too). And for WLM/.NET Messenger/MSN and Oscar/ICQ it has been working well since long time (even with Kopete ;)). Really, really great news!

Unification and Orthogonalisation: Part 3, HTML-Engines and Web Browsing

Thursday, June 2nd, 2011

Hi!

In this blog-post I want to share some thoughts about HTML-rendering-engines and web browsers in KDE. There seem to be some problems, some may be easier solved with KDE5 (breaking some API may help).

Does there have to be a KDE-browsing-experience?

When looking at browser-statistics for my blog, most people seem to use Firefox or Chrome/Chromium, using Konqueror or Rekonq is kinda exotic, browser statistics may be wrong (I am sometimes using “faked” user agent string if websites do not like Konqueror), but I evena know many KDE-users personally using Konqueror or Chromium. But I think KDE-browsers actually have potantial. Everybody is annoyed about Gtk+-file-selection-dialogues, but there are also real features provided by KDE, in some areas Rekonq provides unique interface, Konqueror can satisfy many wishes for flexibility. I am not that happy with Firefox and Chromium (quoting Martin Gräßlin “it is broken” (client side decoration ;))). KDE-web browsing should not be dropped, and certainly nobody is planning to do that.

Rendering engines

We should be honest about KHTML: We have to admit that the JavaScript support is poor and technically not up-to-date. Many websites are not working, even KDE-related ones, popular sites like Blogspot etc. KHTML is certainly not yet dead because kdewebkit is not yet a working alternative, it is lacking behind regarding KDE-integration (KDE style widgets are a must-have), and it cannot yet be considered to be stable, too. But there is one thing which would be definitely nice: a common interface. KDE is able to attract plugin-developers (see Plasma), it has great technical capabilities (KPluginFactory etc.), but they would have to be used. If there would be a unified way to access both KHTML-KParts and WebKit-KParts, it would make it more reliable for people wanting to write a plugin and of course it would be better for every user if KHTML-plugins could be used with WebKit in Konqueror. API could be unified with KDE 5, but there is still a lot time left, maybe an alternative should be found.

Konqueror and Rekonq

That has to be said: Rekonq started as a “hey, I will try to build a web browser” project, today it is the default web browser in Kubuntu and provides e.g. the awesome address-bar. It is now a more serious project, so why is it not integrated into KDE technology? Konqueror is excessively using KParts and KXmlGui, I think it would have been no problem to implement Rekonq’s user interface elements as plugins and using Konqueror’s infrastructure. Not only Konqueror would benefit, but it would provide built-in plugin system, PDF viewing, split view, synchronously displaying HTML and modifying via FTP etc. I am sure some performance optamisations could be done for KParts/Konqueror, too, without breaking everything. Now we have got two independent web browsers. Even with Dolphin it is a problem, the promised integration did not become true, the useful dockers (e.g. for Nepomuk), address bar etc. are not available in Konqueror, which is using its own sidebar unaffected by Dolphin. But for Rekonq it is even worse because Rekonq does not even try to integrate into KParts.

It would be nice if there would be a single, unified web browser with a reliable rendering/JavaScript engine using all the great technology KDE supplies – especially all the flexibility provided by kdeui.

Regards
A loyal Konqueror user

Graphical KDevelop-PG-Qt Output

Tuesday, April 26th, 2011

KDevelop-PG-Qt is really boring, it is just generating some boring C++-code. Well, I have not implemented QML-based animations or multitouch-gestures for KDevelop-PG-Qt, but now you can get .dot-output, graphs which can be visualized using GraphViz, e.g. dot on the command-line or KGraphViewer. That way you can visualize the finite-state-machines used for generating the lexer. I guess everybody knows what this is:

utf8 dfa overview

Overview of the DFA


You have not got it? Let us zoom in:
utf8 dfa, rectangular excerpt

Rectangular view

utf8 dfa, square exceprt

Square view

You can also download the .dot-file and browse it using KGraphViewer, or browse the .svg-file (generated by using dot) with Gwenview or whatever. What is this automaton about? It as actually quite simple: The automaton will read single bytes from UTF-8-encoded input and recognize if the input represents a single alphabetic character (e.g. A or ? or ? or whatever). That is quite complicate, because there are many ranges in Unicode representing alphabetic characters and UTF-8 encoding makes it more complicate. This DFA is an optimal (minimum number of states) Moore-automaton (previous versions used Mealy for output, but Moore for optimization, that was stupid), and it needs 206 states, it is really minimal, no heuristics. You are right: Unicode is really complicated. Unfortunately it took 65 seconds to generate the lexer for this file:

%token_stream Lexer ; -- necessary name
 
%input_encoding "utf8" -- encoding used by the deterministic finite automaton
 
%token ALPHABETIC ; -- a token
 
%lexer ->
  {alphabetic} ALPHABETIC ; -- a lexer-rule
  ;

I think such automatons like {alphabetic} should be cached (currently unimplemented), because they are responsible for most of the runtime. The implementation of the .dot-output is still both buggy and hackish, that has to be changed. But the graphs look nice and they help with spotting errors in the lexer.