Archive for the ‘Algorithms’ Category

Proseminar: Computing Maximal Independent Sets an a PRAM

Sunday, February 26th, 2012

Last year I had a proseminar at university and I do not want the stuff to moulder on my hard drive. It was about randomised algorithms and I had to write about ways to compute maximal independent sets in parallel. The considerations are primarily theoretically, proving runtime complexities on abstract parallel machines—we do not have a problem with using polynomially many cores for a trivial task (we simply have to choose anything that fits) which would in practice be more difficult to distribute to the nodes than computing it sequentially—but it might be interesting to see how randomisation can guarantee uncoupling between different processes resulting in a better runtime complexity, I enjoyed it. What is a maximal independent set? Given a graph it is a subset of the vertices of the graph being both independent and dominating, no two of its members are adjacent, but every other vertex is adjacent to a member of the set—not to be confused with maximum independent sets being maximal independent sets of maximal cardinality, computing maximum independent sets is NP hard. If anybody is interested in a German explanation of randomised parallel computation of maximal independent sets not requiring specific knowledge about the theory parallel or randomised algorithms, here you go (slides).

Back from NWERC 2011

Monday, November 28th, 2011

Hi folks!

This weekend I participated at the North Western European Regional Contest, a regional preliminary round for the ACM ICPC. It is an international students contest about solving as many as possible out of 10 algorithmic problems in 5 hours in teams of three students and one PC. It was my first time there and it was quite fun.

At saturday we visited the historic harbour of Bremen Vegesack, which was partly boring and partly quite nice. At sunday there was the actual contest at the quite decadent and walled Campus of Jacobs University in Bremen (you even have to pay tuition of ~€25000, ~$33000 per year, at public universities you pay about €400-1400, I could live from it for long time, but even at elitist economical universities like EBS and WHU you have to pay much less). We used KDevelop 4.2 for solving the tasks and enjoyed the nice completion, source code reformatting (nice when you are not sure any longer which braces belong together and do not have much time :)), and inline documentation of STL functions. Thank you, KDevelop guys. :)

Our team from RWTH solved only three out of 10 tasks (30st place out of 70). B and E were quite trivial (I did not even had to read B since my team members solved it). I want to tell a bit about the other problems, I hope it is interesting for some of you. Of course you may try to solve the problems your self, it is quite a lot of fun. ;)


Given a natural number x<10^15 we should compute all n,k such that binomial(n,k)=x. It was quite simple, but we had a bug in our implementation and I do not have any clue what the problem was. We chose to check the cases binomial(n,1)=n and binomial(n,2)=n·(n-1)/2 manually and brute force the solution for all other cases by computing lines of Pascal’s triangle for k values such that binomial(n,k)≤x and terminating when binomial(n,2)>x. Alternatively you could check more terms explicitly (the team from LMU did it) and compute the whole triangle for the remaining values or use binary search to find the parameters.


Given a sequence of films watched by a person we should answer for each time position how many different films have been watched since the last time he watched the current one. By maintaining a binary tree containing the films ordered by the last time they were watched and a map containing this time for each movie this queries can be easily answered to if you have a way to get the number of nodes after a specific one in the tree in O(log n). That for you would have to store the size of each subtree and STL set/map unfortunately do not support this. Thus we decided to write our own tree, we went for a Treap with random priorities since it is the easiest way to avoid degeneration (easier to implement than AVL and Red-Black trees). Unfortunately we did not have enough time for debugging the tree and we even forgot to update the counters when balancing the tree after insertion. There were simplier solutions using segment trees.

For this task there were particularly many wrong submissions (the man presenting the solutions assumed that many teams tried to solve this one because the first team solved it yet after 20 minutes and thought it would be the easiest one, he even presented a very funny solution using java.lang.String. a linear time Java indexOf() call, which was of course way too slow).


This task was about deciding whether it is possible to partition a given pattern of black and white square into disjoint, L-shaped “puzzle pieces” of one black (in the corner) and two white squares. We had no idea how to solve that, we thought a lot about backtracking, but there semt to be no clever way to do that in polynomial time. The trick was to encode this problem in Boolean formulas and to use a standard algorithm (e.g. resolution) for the 2-SAT problem using variables encoding “white square x,y belongs to the same puzzle piece as the white square x+1,y+1“ etc.


Nobody solved this problem at the contest (though there were some people at the semi-live contest at Spoj solving it, a coach of another team said he knew a very similiar task from a different contest). Look at it, it looks like really complicated spam. Only one team tried. Cool geometric structures, quad-trees etc. will not help you to solve this problem: 500×500 fields, each one is either water or grass. You want to build a pool area. The border has to be grass and every pool of water has to be surrounded by a wall, which will cost $a per wall. You can transform water fields into grass, which will cost $b per field, and grass into water for $c. You want to build the cheapest pool area possible.

Actually it is a simple max-flow problem, since a minimum cut in a certain flow encoding the landscape is the optimal solution. You have a source, a sink and a node for every field. A cut should divide the nodes into water-nodes and grass-nodes. Source-to-x edges should have a capacity of the price needed to transform x into a grass-field and x-to-sink edges should have a capacity refering of the price needed to transform x into a water-field, unless the node represents a field at the border, then the capacity should be infinity. Additionally you add x-to-y nodes with a capacity of the price needed to build a wall for adjacent nodes x and y. That is it (if I am not wrong, I hope so :D). Really awesome.


I will post a full description later. For each observation you can yield an inueqility and you can complete the information in a Floyd-Warshall-like manner.


Spam. You have 13 out of 52 regular playing cards and want to partition it into as few as possible bunches of single cards, pairs, triples, quadruples, full-houses and streets of at least five consecutive cards. You can simply use brute force: Test all combinations of up to two full-houses and up to two streets (cannot be more for 13 cards) and greedily take the remaining cards as quadruples, triples, pairs and single cards. We tried that with a lot of copy-and-paste code and our program printed wrong answers, but we have no idea in which cases it failed. Alternatively you can choose a probably more compact solution by building a directed acyclic graph of all 2^13 combinations of your cards where edges represents adding one of the allowed bunches. Then you use breadth-first-search to get the solution. This was one of those typical tasks with useless information in the input (cards were encoded as 4h for ♥4 and Jd for a ♦Jack, in fact the suits did not even matter).


Given integer coordinates of up to 10000 RFID chips, 250000 RFID sensors and 25 walls compute which sensors can detect which chips. Each sensor can detect any chip within a radius of r, while walls reduce the radius by 1. No two sensors are closer to each other than r (“due to interference”^^). We used a sweep-line algorithm. Whenever the sweep line reaches the value x where x+r is the abscissa of a product we add the product to an ordered binary tree (std::set in C++) ordered by the ordinate. When reaching the value x where x-r is the abscissa of a chip we erase it from the tree. If x-r is the abscissa of a sensor we query all products in the tree with ordinates between y-r and y+r using lower_bound and upper_bound where y is the ordinate of the sensor. We count the number of walls intersecting the line between sensor and chip (we check that by adding up angle(p1, q1, p2)+angle(q1, p2, q2)+angle(p1, q2, p2)+angle(q1, p1, q2), which should be 2π (aka 360 degrees) for intersecting line segments) and check whether the chip is within the resulting reduced radius. This solution has a complexity of O((n+m)log(n+m)+n·w) where n is the number of chips, m the number of sensors and w the number of walls. This is guaranteed since the distance between any two sensors is at least r, thus at most 7 sensors can detect one chip (imagine a regular hexagon of sensors with one sensor in the center). Thus for many walls you would probably require a more sophisticated algorithm taking walls into account when sweeping.

Surprisingly only 7 teams solved it, I thought it would be kinda a standard-geometric-sweep-line-problem, it was even easier than E for us (there we used int instead of long long :D). Alternatively you can use binning (since the coordinate space is limited) or quad-trees to solve this problem.


You have a couple of train-stations and ≤1000 connections between these stations. Each connection will be once an hour at a certain minute (e.g. 13:47, 14:47 and so on). However, there is a certain probability that the train will be too late. If it is too late the delay is uniformly distributed in a range from 1 to some number ≤120. You have to compute the minimum expected duration of a trip from the start to the target. We solved that problem by a fixed-point iteration: For each station and each minute (0≤m<60) we store the best known expected duration to the target. We propagate that information till it does not change any longer. Unfortunately it timed out, probably because of floating point issues: we recognised changes which where insignificant or caused by rounding. Only one team solved this problem, three teams tried.


Yesterday I returned from Bremen, entered the first train at about 17:00, arrived at 0:45 and slept till 12:00. :) It was a really nice experience. If you are a student, like algorithms and want to participate: go ahead, look for two comrades, ask your professor to register you and join your regional contest! :) (at some universities there is a local preliminary contest)

Some visual impressions:

Coloured wave


The camera of my N97 mini mobile phone had a bug or something like that, however, when trying to photograph an ugly super market or something like that in the early evening and accidentally moving the camera too early I got this nice photo. Could not reproduce. :(
Some of my fingers

Could not reproduce :(

The contest was sponsored by Facebook and we had to attend a boring presentation of a Facebook employee. It was quite funny how he tried to make advertisement to work there. He assumed everybody in the room would use Facebook, what? We could get free t-shirts when answering his silly questions, did he think we would like to work there because of a t-shirt? It happened multiple times that he said that he is not allowed to tell about xy, really a great opportunity to work for a company where you cannot even superficially talk openly about the things you are doing there. And he told us that they would finally delete all messages six days after you have pressed the delete-button, haha. :D

I have pushed some solutions here.


Tuesday, April 5th, 2011

Yet another Spoj-problem: SCRAPER. Short description of the problem (shorter than the official one): F (F?5·10?) floors in a scyscraper ([0,F[), E (E<100) elevators with a min-floor X?[0,F[ and an increment Y. Now you should decide if you can use the elevators to get from floor A to floor B. My solution (it took some time to fix all bugs, because I am out of practice): Two elevators ((X?, Y?) and (X?, Y?)) are “adjacent” if there is a common floor, i.e. {x ? [X?, F[ ? [X?, F[ | x ? X? (Y?) ? x ? X? (Y?)} ? ø. If we are able to compute the adjacency, we can simply use a depth-first-search to check if there is a path between A and B (identifying all elevators visiting a certain floor is trivial). How to check it? Without loss of generality X??X?. Let d := X? - X?. If the elevator 0 wants to reach a floor visited by elevator 1, the number of Y?-steps f has to comply with Y?·f ? d (Y?). We can compute a solution for f by using the extended Euclidean algorithm. It can solve the equation a·Y? + b·Y? = gcd(Y?, Y?) =: g, and there exists a solution for f iff g|d. If this is the case, we can conclude Y?/g·a ? 1 (Y?) ? Y?/g·a·d ? d, that looks like our previous equation, now we know that f ? a·d/g. We can perform a “plain” integer division, because, since g|d. For the computed f we want to know if there exists any n such that X? ? X?+Y?·f+n·lcm(Y?, Y?) < F (adding n·lcm(Y?, Y?) floors does not change anything). We can perform that quite simply by comparing the expression with X? for the biggest possible n (where the expression is smaller than F): X? ? F-1 - ((F-1 - (X?+Y?·f)) % lcm(Y?,Y?)). That's it. I used some caching and handled special cases for the extended Euclidean algorithm for optimisation, but that is not that interesting. Here you can get the source-code, but of course you should try to solve it yourself and you should not have even read this text!


Wednesday, March 30th, 2011


Today I want to talk about my solution for a Spoj-problem. Spoj is a website providing a lot of algorithmic problems and you can collect points by solving them. Well, that is the problem, it is called EXPLOSN:

There has been an assault by some rebels at a public place, they caused a big explosion. Some witnesses (at most 100000) are still alive, including at least one rebel. Because we live in a police state, myriads of policemen can arrest all witnesses and rebels, each of them accuse exactly one person to be a “terrorist”, because the inhabitants of the city are very bourgeois, they never lie, but the rebels are humorous and so they sometimes accuse each other, sometimes they lie. However, the government fears for its political power, and thus they want to execute some people. They do not care about careful investigations, but their judgement should be consistent with the testimonies and because the executioners want to earn money, they want to execute as few people as possible (it will be still enough people for an effective deterrence). The program should output the number of people they have to execute. Read the original statement for a pro-government-formulation. ;) You should not read further if you want to solve the problem yourself, and why should you not want to do that? :D

However, actually it is not important if we sympathise with the government or the rebels because we can state it in terms of graph-theory: We want to find a maximum independent set in the accusation-graph, its complement is the minimal set of morituri. Generally this is a very hard problem, 100000 nodes are infeasible, but in our graph the out-degree of each node is exactly one. Such a graph has some special properties: Of course it has a finite number of weakly connected components. Because it has as many vertices as values, there is at least one (undirected) cycle in the component. But this cycle is rectified (? ? ?, not ? ? ?) because otherwise there would be one node in the cycle with out-degree ? 2. Notice: The length of the cycle may be one if a rebel is accusing himself. The subgraph consisting of the other nodes has less vertices than edges, thus it is a forest. We can conclude that the trees in the forest are rectified (child ? parent) using the same argument as before. The root of each tree is accusing an element of the cycle. For each of the trees we can now easily compute the minimal number of morituri recursively, one time considering the root being a moriturus, and one time considering the root not being a moriturus. Then we can sum up those values for each node in the cycle while no two adjacent nodes should stay alive. The ideal constellation of morituri and non-morituri in the cycle can be easily computed using dynamic programming. By adding the result for each weak connected component we gain the result.

Source code.

Algorithmic ideas needed for Unicode

Monday, March 21st, 2011


KDevelop-PG-Qt development is continuing slowly, and I need some creative ideas. You do not have to know anything about KDev-PG-Qt or lexers, it is an algorithmic issue about Unicode, I am failing at arranging my thoughts in a way that could be transformed into (maintainable, understandable) code.

The problem is quite simple: Given a range of Unicode characters (e.g. “from a to z”, a-z, or 0xef-0x19fe3), I want to represent this range in UTF-8 or UTF-16 encoding, using 8-bit or 16-bit ranges respectively (e.g. 0×14-0×85 is a 8-bit range). For example any ASCII-range (like a-z) stays the same encoded in UTF-8 or UTF-16, for a more sophisticated example you should know how UTF-16 (the encoding used by QString) works:

  • Any UTF-32 codepoint (character) between 0×0 and 0xd800 (including 0×0, excluding 0xd800) and between 0xe000 and 0×10000 is simply converted into a 16-Bit integer, it does not get changed.
  • For any larger codepoint subtract 0×10000, now you have got a 20-Bit number, split it into two 10-Bit numbers, add 0xd800 to the first one and 0xdc00 to the second one. Now you have got two 16-Bit numbers, the high-surrogate and the low-surrogate, together those numbers represent the character in UTF-16.
  • Any surrogates (between 0xd800 and 0xe000) are invalid.

The example: 0xffef to 0×20000 should be transformed into UTF-16. The starting-codepoint is within UCS-2 (smaller than 0×10000 and it is not a surrogate), the end is not within UCS-2, it has to be encoded with a surrogate pair. So in UTF-16 the range would be 0xffef to (0×80+0xd800, 0×0+0xdc00) = (0xd880, 0xdc00). But I want to have 16-Bit ranges, a range from a single 16-Bit number to a pair of 16-Bit numbers is simply nonsense. Thus the range has to be split into two parts:

  • The UCS-2 part: 0xffef-0×10000
  • The non-UCS-2 part: 0×1000 = (0xd800, 0xdc00) to (0xd880, 0xdc00), resulting in (0xd800 – 0xd880) followed by (0xdc00 – 0xe000)

In that example there are exactly two parts, but that may be different, a range may have to be split because 0xd800-0xe000 are invalid, the starting codepoint may be non-UCS-2, then there can be up to three ranges, e.g.:
We want to encode the range 0x1000f-0x200ff. 0x1000f is (0xd800, 0xdc0f) in UTF-16, 0x200ff is (0xd880, 0xdcff), so we need three ranges:

  • The first few surrogate pairs with the same high-surrogate: 0xd800 followed by (0xdc0f – 0xe000)
  • The majority of surrogate pairs in the middle, where every low-surrogate is allowed: (0xd801 – 0xd880) followed by (0xdc00 – 0xe000)
  • The rest with the same high-surrogate: 0xd880 followed by (0xdc00 – 0xdcff)

I hope it is clear which output-format I would like to have: a set of sequences of 8/16-Bit ranges (ranges may have only one element). Unfortunately there are a lot of cases, I have implemented the conversion (commit 1dbfb66ca66c392c6ffdaa3ff9d00d399370cf0e) for UTF-16 using a lot of if/else for handling all special cases, involving 130 lines of ugly code, well that was not too bad, but now I need the same for UTF-8: up to four ranges in sequence, and much more special cases, 500 lines of unreadable, boring, unmaintainable code just for UTF-8? I do not like this idea. I need some creative ideas how to implement it in a more abstract way without handling all cases explicitly, and without brute force (checking all codepoints within the range separately). There is one simple idea: do not care about 0xd800-0xe000 and remove it at the end. But I have currently no idea how to simplify the rest, information how many UTF-8 surrogates are needed to represent the start and the beginning and if the surrogates have the maximum or the minimum value should definitely be used. Read this for a comprehensive description of the UTF-8 format, it is not more complicate than UTF-16. Consider it a contest, the winner will recieve eternal glory, I hope somebody has a nice idea. :D

The KDev-PG-Qt Lexer-Generation-Algorithm

Saturday, February 19th, 2011

Notice: This post may require some background-information about lexers/regexp-parsing/automatons.

Today I want to describe the essential ideas for lexer-generation in KDevelop-PG-Qt. Each lexer-rule is described with a regular expression. The generated code has to determine which regular expressions match the following input-characters and which one to choose. A common approach for choosing one of multiple matching regular expressions is choosing the longest match (eat characters greedily), if there are multiple longest matches, priorities from the definition are used. That approach is easy to implement and very appropriate for real-world-languages, that is not the interesting part.
The usual approach for detecting the regular expressions is a translation to finite-state-machines. Each regular expression can be easily transformed into a nondeterministic-finite-automaton (NFA), accepting the words described by the regular expression, wich can be transformed into an equivalent deterministic-finite-automaton. For multiple tokens you can simply create a NFA accepting each of the expressions, while taking care about the rules which may match the current input using some additional states. Now the interesting part about KDev-PG-Qt’s implementation: The usual approach for handling the transitions between the states of the finite-state-machines is a table mapping the input character to the next state:

Foreach state store: table (character → next state(s))

But KDev-PG-Qt should support Unicode, it should be possible to process even UTF-32 input, more than a million possible inputs, memory usage, size of generated tables? You can imagine what would happen with a few hundred states. But it is possible to take advantage of a property of Unicode: similar characters are grouped in blocks, and usually tokens will consist of similar tokens. So it is a good approach to save the transitions blockwise, e.g. A-F, X-Z, a-i and m-p, such representations are obviously much more efficient than table-based approaches. But KDev-PG-Qt should support both approaches, but the used algorithms should always be the same (generic programming, yeah!), so I need some kind of abstraction: sets of characters which may be represented using either bits or blocks/sequences of characters. That for I have also choosen a different internal format for transitions:

Foreach state store: table (next state → set of characters for this transition)

The algorithms are now using set-based-arithmetics (union, difference, intersection, complement) to work with any representation of character-sets. For tables it can be easily implemented using bit-operations, for the interval-based representation it is a bit more complicated, but possible in Ο(n), where n is the number of intervals in the sets you are currently dealing with. So it can run without using Gibibytes of memory and within reasonable time for creating a Unicode-aware lexer.
However, not only KDev-PG-Qt should not require too much memory, but also the generated lexer should be short and efficient. That is why there should be differnt code-generation-strategies for the different representations of sets of characters. For the tables you can simply generate tables or switch-statements for the state-transitions. But for the sequence-based-approach you can do better: The sets of characters get translated into nested if-else-statements, some kind a static binary search, they can decide in which interval the character from the input is. So the lexer will use a very small number of comparisons to decide which state to choose next instead of using a large table, that makes the generated code even more readable (although that is not the purpose).
Everything is programmed using a lot of templates, some of you might not like this approach, but it is really nice to be able to create one algorithm accepting different character-encodings (ASCII, Latin1, UTF-16, UTF-32, …) supporting different internal storage representations and different output-formats for the decisions.

Well, that is it, I hope this post is interesting for at least a few people. ;)

The User

Of course minimization is also implemented, so you will not get useless states in the generated code.