Archive for the ‘Internet’ Category

Back from NWERC 2011

Monday, November 28th, 2011

Hi folks!

This weekend I participated at the North Western European Regional Contest, a regional preliminary round for the ACM ICPC. It is an international students contest about solving as many as possible out of 10 algorithmic problems in 5 hours in teams of three students and one PC. It was my first time there and it was quite fun.

At saturday we visited the historic harbour of Bremen Vegesack, which was partly boring and partly quite nice. At sunday there was the actual contest at the quite decadent and walled Campus of Jacobs University in Bremen (you even have to pay tuition of ~€25000, ~$33000 per year, at public universities you pay about €400-1400, I could live from it for long time, but even at elitist economical universities like EBS and WHU you have to pay much less). We used KDevelop 4.2 for solving the tasks and enjoyed the nice completion, source code reformatting (nice when you are not sure any longer which braces belong together and do not have much time :)), and inline documentation of STL functions. Thank you, KDevelop guys. :)

Our team from RWTH solved only three out of 10 tasks (30st place out of 70). B and E were quite trivial (I did not even had to read B since my team members solved it). I want to tell a bit about the other problems, I hope it is interesting for some of you. Of course you may try to solve the problems your self, it is quite a lot of fun. ;)

A

Given a natural number x<10^15 we should compute all n,k such that binomial(n,k)=x. It was quite simple, but we had a bug in our implementation and I do not have any clue what the problem was. We chose to check the cases binomial(n,1)=n and binomial(n,2)=n·(n-1)/2 manually and brute force the solution for all other cases by computing lines of Pascal’s triangle for k values such that binomial(n,k)≤x and terminating when binomial(n,2)>x. Alternatively you could check more terms explicitly (the team from LMU did it) and compute the whole triangle for the remaining values or use binary search to find the parameters.

C

Given a sequence of films watched by a person we should answer for each time position how many different films have been watched since the last time he watched the current one. By maintaining a binary tree containing the films ordered by the last time they were watched and a map containing this time for each movie this queries can be easily answered to if you have a way to get the number of nodes after a specific one in the tree in O(log n). That for you would have to store the size of each subtree and STL set/map unfortunately do not support this. Thus we decided to write our own tree, we went for a Treap with random priorities since it is the easiest way to avoid degeneration (easier to implement than AVL and Red-Black trees). Unfortunately we did not have enough time for debugging the tree and we even forgot to update the counters when balancing the tree after insertion. There were simplier solutions using segment trees.

For this task there were particularly many wrong submissions (the man presenting the solutions assumed that many teams tried to solve this one because the first team solved it yet after 20 minutes and thought it would be the easiest one, he even presented a very funny solution using java.lang.String. a linear time Java indexOf() call, which was of course way too slow).

D

This task was about deciding whether it is possible to partition a given pattern of black and white square into disjoint, L-shaped “puzzle pieces” of one black (in the corner) and two white squares. We had no idea how to solve that, we thought a lot about backtracking, but there semt to be no clever way to do that in polynomial time. The trick was to encode this problem in Boolean formulas and to use a standard algorithm (e.g. resolution) for the 2-SAT problem using variables encoding “white square x,y belongs to the same puzzle piece as the white square x+1,y+1“ etc.

F

Nobody solved this problem at the contest (though there were some people at the semi-live contest at Spoj solving it, a coach of another team said he knew a very similiar task from a different contest). Look at it, it looks like really complicated spam. Only one team tried. Cool geometric structures, quad-trees etc. will not help you to solve this problem: 500×500 fields, each one is either water or grass. You want to build a pool area. The border has to be grass and every pool of water has to be surrounded by a wall, which will cost $a per wall. You can transform water fields into grass, which will cost $b per field, and grass into water for $c. You want to build the cheapest pool area possible.

Actually it is a simple max-flow problem, since a minimum cut in a certain flow encoding the landscape is the optimal solution. You have a source, a sink and a node for every field. A cut should divide the nodes into water-nodes and grass-nodes. Source-to-x edges should have a capacity of the price needed to transform x into a grass-field and x-to-sink edges should have a capacity refering of the price needed to transform x into a water-field, unless the node represents a field at the border, then the capacity should be infinity. Additionally you add x-to-y nodes with a capacity of the price needed to build a wall for adjacent nodes x and y. That is it (if I am not wrong, I hope so :D). Really awesome.

G

I will post a full description later. For each observation you can yield an inueqility and you can complete the information in a Floyd-Warshall-like manner.

H

Spam. You have 13 out of 52 regular playing cards and want to partition it into as few as possible bunches of single cards, pairs, triples, quadruples, full-houses and streets of at least five consecutive cards. You can simply use brute force: Test all combinations of up to two full-houses and up to two streets (cannot be more for 13 cards) and greedily take the remaining cards as quadruples, triples, pairs and single cards. We tried that with a lot of copy-and-paste code and our program printed wrong answers, but we have no idea in which cases it failed. Alternatively you can choose a probably more compact solution by building a directed acyclic graph of all 2^13 combinations of your cards where edges represents adding one of the allowed bunches. Then you use breadth-first-search to get the solution. This was one of those typical tasks with useless information in the input (cards were encoded as 4h for ♥4 and Jd for a ♦Jack, in fact the suits did not even matter).

I

Given integer coordinates of up to 10000 RFID chips, 250000 RFID sensors and 25 walls compute which sensors can detect which chips. Each sensor can detect any chip within a radius of r, while walls reduce the radius by 1. No two sensors are closer to each other than r (“due to interference”^^). We used a sweep-line algorithm. Whenever the sweep line reaches the value x where x+r is the abscissa of a product we add the product to an ordered binary tree (std::set in C++) ordered by the ordinate. When reaching the value x where x-r is the abscissa of a chip we erase it from the tree. If x-r is the abscissa of a sensor we query all products in the tree with ordinates between y-r and y+r using lower_bound and upper_bound where y is the ordinate of the sensor. We count the number of walls intersecting the line between sensor and chip (we check that by adding up angle(p1, q1, p2)+angle(q1, p2, q2)+angle(p1, q2, p2)+angle(q1, p1, q2), which should be 2π (aka 360 degrees) for intersecting line segments) and check whether the chip is within the resulting reduced radius. This solution has a complexity of O((n+m)log(n+m)+n·w) where n is the number of chips, m the number of sensors and w the number of walls. This is guaranteed since the distance between any two sensors is at least r, thus at most 7 sensors can detect one chip (imagine a regular hexagon of sensors with one sensor in the center). Thus for many walls you would probably require a more sophisticated algorithm taking walls into account when sweeping.

Surprisingly only 7 teams solved it, I thought it would be kinda a standard-geometric-sweep-line-problem, it was even easier than E for us (there we used int instead of long long :D). Alternatively you can use binning (since the coordinate space is limited) or quad-trees to solve this problem.

J

You have a couple of train-stations and ≤1000 connections between these stations. Each connection will be once an hour at a certain minute (e.g. 13:47, 14:47 and so on). However, there is a certain probability that the train will be too late. If it is too late the delay is uniformly distributed in a range from 1 to some number ≤120. You have to compute the minimum expected duration of a trip from the start to the target. We solved that problem by a fixed-point iteration: For each station and each minute (0≤m<60) we store the best known expected duration to the target. We propagate that information till it does not change any longer. Unfortunately it timed out, probably because of floating point issues: we recognised changes which where insignificant or caused by rounding. Only one team solved this problem, three teams tried.

Subsumption

Yesterday I returned from Bremen, entered the first train at about 17:00, arrived at 0:45 and slept till 12:00. :) It was a really nice experience. If you are a student, like algorithms and want to participate: go ahead, look for two comrades, ask your professor to register you and join your regional contest! :) (at some universities there is a local preliminary contest)

PS:
Some visual impressions:

Coloured wave

:)


The camera of my N97 mini mobile phone had a bug or something like that, however, when trying to photograph an ugly super market or something like that in the early evening and accidentally moving the camera too early I got this nice photo. Could not reproduce. :(
Some of my fingers

Could not reproduce :(

PPS:
The contest was sponsored by Facebook and we had to attend a boring presentation of a Facebook employee. It was quite funny how he tried to make advertisement to work there. He assumed everybody in the room would use Facebook, what? We could get free t-shirts when answering his silly questions, did he think we would like to work there because of a t-shirt? It happened multiple times that he said that he is not allowed to tell about xy, really a great opportunity to work for a company where you cannot even superficially talk openly about the things you are doing there. And he told us that they would finally delete all messages six days after you have pressed the delete-button, haha. :D

PPPS:
I have pushed some solutions here.

Skype Reverse Engineered

Thursday, June 2nd, 2011

Hi!

Good news for Free Software and open protocols: There has been a sucessful attempt to reverse engineer Skype (Magent URI). Nice timing, shortly after Microsoft’s acquisition Skype could finally be broken. :) He is also including modified Skype executables allowing debugging etc., which is usually prevented by really elaborated anti-features (encryption, kills itself if there is a debugger etc.). The sample code is able to send a message via Skype, awesome!

Now there are hopefully soon implementations for Telepathy or libpurple (used by Pidgin and Telepathy Haze). One may say we should not promote such proprietary protocols, but: For many people Skype is important (do not say you do not know anybody, it varies, in some groups everybody is using it, somewhere else it is different, I am not using it). The chances that they will switch to Free Software (KDE/GNU/Linux) are much higher if there is support for Skype without proprietary software. And once they start using Pidgin or Telepathy, it is no problem for them to use open protocols, too, you can simply tell them they have to click those buttons and then they can communicate with you using an open protocol like Jingle or SIP (or XMPP for text messages). Thus it does not only help to spread Free Software, but also to spread open protocols. And all future attempts to commercialise the private data created by Skype can be easily prevented. Even Windows-users may start using Pidgin or something like that against the advertisement. Regarding “it is just a hack”: They cannot simply change their protocol, because there is Skype hardware which cannot be updated and would not work any longer (that would make all non-idealist customers unhappy, too). And for WLM/.NET Messenger/MSN and Oscar/ICQ it has been working well since long time (even with Kopete ;)). Really, really great news!

Are you using Wolfram Alpha?

Tuesday, May 17th, 2011

Hi!

That blog-post will not be very long, because I want to sleep soon and do further investigations later.

Many of you will know about Wolfram Alpha, when I first saw it, I thought: hey, that is really cool. Many people are using it. But it has some short comings:

  • They want you to buy Mathematica
  • It is proprietary
  • You cannot create arbitrarily complicated queries
  • You cannot save temporary results
  • Sometimes English grammar is too ambiguous
  • No localization

Well, that service should not be used, it is actually not that great, and like Google Mail, Chrome OS, iOS or WP7 it is becoming really popular and is a threat to Free Software in my opinion, especially high qualified people are using it and stop caring about Free Software (“just a web-app”). But what are the alternatives?

  • There is Sage – an awesome computer algebra system combining a lot of free software, using Python instead of ugly, specialized languages like in Mathematica, Maple and Maxima
  • Semantic databases in RDF-format, e.g. DBPedia crawling the Wikipedia or governmental websites like data.gov or data.gov.uk

For people being able to use SPARQL the mentioned RDF-databases are very powerful tools – no limitations on queries or something like that. So what is missing? A nice interface anybody can use. It would be really nice to have a free tool parsing natural language queries (e.g. using Earley-parsers and some probabilistics, it would not really have to understand real sentences, just some fixed structures would be enough like “where”, “by”, “all” and fixed sets of attributes). Those queries could be transformed into e.g. DBPedia-SPARQL-queries, and the RDF-results could be transformed into some nice tables and maybe graphs. A free implementation would be a) awesome and b) much more seriously usable than Wolfram Alpha. Do you know about any project trying to do something like that? Any comments? What is your opinion about such software?

Regards

PS:
A few random thoughts about it:

  • Of course natural language stuff would be nice for math, too.
  • With SPARQL access from Sage/Python data-querying and calculations would also be combinable. A software could generate Python containing SPARQL.
  • I do not know if there are practically usable distributed RDF-databases, but such software could make it possible to distribute query evaluation to the peers, Free Software projects probably cannot afford Wolfram’s computing-capacity.
  • Combination of different RDF-databases may be a problemtask.

What is special about the number 36?

Saturday, March 5th, 2011

It is the smallest natural number which is not an element of the list of special numbers at the German Wikipedia. By the way: The smallest special number seems to be -2, while -0.5 seems to be the smallest value of a Unicode-digit (༳, u0f33, TIBETIAN DIGIT HALF ZERO).

Are you still using Google?

Friday, March 4th, 2011

Hi!

If you are still using Google, you may have noticed it: Yesterday (3. 3. 2011) they changed their search-page: When using the search-page clicks get redirected via http://google.com/url?q=<url>, even if you have disabled JavaScript (before there was an onmousedown-event or something like that changing the link before clicking it). Well, you really want Google to know about most pages you are visiting? They do not only track your search-queries, the links you are clicking, they also provide the Google-Analytics-Tool, collecting even more data from many websites. Well, there are alternatives:

1. There is Scroogle, a nice Proxy for Google, I trust them more than Google, because they cannot make much profit with your data.
2. Install your own proxy on your website (disclaimer: I have written it, do not use the example-installation, it is just to have a quick look at it, I will track your visits, install it for yourself!).
3. Addendum: There are the free, libre, distributed search-engines, e.g. YaCy (see also peer-search.net) and Seeks.
4. There is startpage.com, claiming to support privacy, I do not know much about it (found it few minutes ago), opinions? Addendum: A commenter informed me about DuckDuckGo, the page looks nice, and they claim not to track the users, too.
5. Access Google via a Proxy, without JavaScript or Cookies, and use a script to remve the google.com/url-links.
6. Do not use Yahoo, Bing, Exalead, Ask.com, Baidu, Fireball or whatever, they are probably not better than Google.

For Webmasters:
1. Stop using Google-Analytics, although users can block it, it is not very nice (i.e. immoral), to let ingenuous visitors send their surfing-behaviour to Google, there are better alternatives, e.g. Webalizer and CrawlTrack.
2. Stop using Blogger or Blogspot, install WordPress or whatever CMS you like on your own, cheap webspace or your own, expensive webserver.
3. (addendum) Stop using Google AdSense or reCAPTCHA.

Google’s Android and V8 are nice, but the searchengine and Google Analytics: they are really problematic. Care about your privacy and about the privacy of the visitors of your websites!

PS:
I am currently using reCAPTCHA for this blog (as you may see). I am not sure, if this is okay, the privacy policy seems to be nicer than the normal Google-behaviour, but every visitor will send some information to Google, and they do not say, how they handle visitors not commenting, and who wants to trust Google? Okay, I will remove this captcha, I am realizing that it is probably really bad, any opinions?

PPS:
Sorry, I have to apologise, I thought reCAPTCHA would be nice, because it helps scanning books, but… I have replaced it with a math-captcha, I know, bots can break it easily, I will see, what will happen. I hope it will block most bots.

PPPS:
You could at least try this to remove the spying-redirection-links and Adblock to block Google Analytics.