wiki2beamer 0.7 is out

Hi all out there,

Wiki2beamer is a small tool to create LaTeX beamer presentations by converting a wiki-like syntax to LaTeX beamer code. It took a while after the 0.7 alpha 2 release, but I wanted to give it some time: wiki2beamer-0.7 is released. Major new features are:

  • Easy animated code listings via the [code] environment
  • Total escaping from wiki syntax via the [nowiki] environment
  • Template-less mode via the [autotemplate] environment
  • A setup.py
  • A man-page!!!11

So, go there, and spread the word :)

How many cities has the world?

So this is what I did this weekend: For my diploma thesis I needed a list of all the cities of the world (to filter text collections). There already exist corpora like the “Getty Thesaurus of Geographic Names” but they are not free and way over my needs - I just want a list.

Who could have such a list? First try: IATA airport list from wikipedia. Parsed. Erroneous (typos, problems with parsing…). Second try: openstreetmap.org.

And here is where the story begins. Openstreetmap.org collects geographical knowledge in a huge database and publishes it under a free license (CC-BY-SA). The database dump can be downloaded as a huge XML file which currently is about 5GB in bzip2 compression. The expected ratio for decompression is 10:1 so this is a fricking 50GB XML file. The structure is fairly simple. There are nodes, ways and relations and these can have tags. The tags then encode information like “this is a place of the kind city” or “its name is Footown”. To get this information I wrote a little python script that iterates over every line and extracts the relevant parts with some regular expressions. A run on the 5MB bzip2ed relations file took 17s. Well … that was to slow. So I removed the regexps and did some dirty by-hand parsing. 8s. Better. But still to slow.

So, next step: C.
The evil thing!


But, you have to admit - there’s nothing else when it comes to performance. About six hours of coding later I got the first results: 3.5s for a 1,170,000 lines XML file. Good. After some further improvements I got it down to 2.6s. Yeah! :)

On a 1.6GHz Core2 Duo the combination of bzcat and grepplaces.c (both running on one core) gives around 1.2MB/s reading speed on the bzip2ed planet-file. So a complete scan over the planet-file now takes about 70 minutes.

So, here’s the code: http://cleeus.de/grepplaces/

The extracted corpus will follow, as soon as it’s ready.

So long, some statistics:


$ grep “^city” places_planet.txt | wc -l
4179
$ grep “^town” places_planet.txt | wc -l
29401
$ grep “^village” places_planet.txt | wc -l
249716

Zyxel NWD-210N with Linux (Gentoo, 2.6.29-rc2-git1)

Last time I wrote about getting my Zyxel NWD 210N WLAN USB stick to run I had to fall back to ndiswrapper which uses the windows drivers from zyxel and somehow magically gets them to run under linux.

Well, this was with Linux kernel 2.6.25 while 2.6.29 is in the making now. Time to give it a second try, this time with native Linux drivers from the kernel. The output from lsusb -v suggested that the chipset inside is made by Ralink (USB ID 0586:3416). Some Ubuntu hardware list gave me the missing hint to find the right driver: Inside the Zyxel NWD210N there is a Ralink 2870 USB chipset. Ralink provides native drivers and firmware on their homepage.

The kernel hackers have a native kernel driver in the making for this chipset which is marked as a staging driver. Worth giving it a try ;)
Step 1: Get 2.6.29-rc2-git1 (emerge -av >=sys-kernel/git-sources-2.6.29_rc2-r1)
Step 2: Get RT2870STA.dat from Ralink’s Linux drivers and put it to /etc/Wireless/RT2870STA/RT2870STA.dat

Step 3: Copy over old .config and configure the kernel to include the staging driver

Device Drivers —> [*] Staging drivers —>
[M] Ralink 2870 wireless support

Step 4: make && make modules_install and setup the kernel to boot

After a reboot with this brand new kernel and modules everything is ready to plug in the device.

The LED on the thingy instantly started flashing and the dmesg output says:

usb 1-3: new high speed USB device using ehci_hcd and address 4
usb 1-3: configuration #1 chosen from 1 choice
rt2870sta: module is from the staging directory, the quality is unknown, you have been warned.
rtusb init —>

After a few moments and some dmesg spam a new network interface (ra0) appeared and KNetworkManager showed the WLAN networks around over this second interface. :)

I have yet to test how stable the driver and the connections are, but this post you are currently reading was published over the new rt2870sta driver :)

Preparations

Looks like I’ve got another lecture to read. In our (we = Computers and Society Departement) course “Information Rules 1″ which will take place next semester, I allready have the “Lex Informatica” lecture. In this lesson we teach our students the basics of the theory of regulation (Lessig, Reidenberg). I allready have this lesson pretty stable, it just needs a roundup to make some points more clear. But now I additionally have the “Open Source” lesson. This means alot of reading for me. I think I’ll start with some articles from our Open Source Yearbook and then dive into the papers from all the famous and not so famous scientists. This is both challenging and exciting for me but also means some work, obviously. Well, no more boredom in cold winter days. ;)