Knzlrdll && Schwrmstrm

SPON produziert ja selten wirklich Niveauvolles. So ist auch das hier bestimmt nicht auf der Basis einer handfesten Recherche entstanden. Aber gut isses trotzdem.

Und wo wir schonmal dabei sind: Freier Strom für freie Häcker — Deutschlands Energieversorgung jetzt per GSM kontrollierbar. Wurde in diversen Medien als tolle Erfindung kolportiert.

Performance Python

As I needed some heavy optimization of some inner loop in an O(n^2) problem, I turned to C extensions for Python this weekend. Well, writing native C extensions for Python (2.x) turned out to be a real mess. The Python C API isn’t something you want to use, which is understandable since Python is a heavily object oriented language where C is only procedural.

But, there are some cool extensions which make calling C code from Python easy. Of all those extensions, I tried out Pyrex.

Pyrex is basically a language that is very similar to Python. It’s so similar, that you can just cut-and-paste most of your Python code into a Pyrex module file (.pyx) (there are some limitations, e.g. no *-comprehension, but these can be fixed easily, most of the time). The next step is to compile the Pyrex Code into C code with the Pyrex compiler and then compile this into shared library with gcc. This shared library can then be imported into Python as a module and the methods can be called.

The fun starts when you a) start calling other native C libraries and b) start writing optimized Pyrex code. Calling native C libraries is pretty easy, most of the time. The Pyrex compiler builds the required code to convert the basic python types into C types. E.g. if you have a C function that want’s a char *p as an argument, you can just assign it with a python str-Object, same goes for int and float.

The second use-case is writing really fast Pyrex code. If you just cut-and-paste Python code, your variables will be complex Python-objects and the C output of the Pyrex compiler contains some really nasty Python-API calls on these. E.g. adding two python variables is anything else then trivial and results in something like this:

add.pyx:

a = 1
b = 2
c = a + b

turns into:

[…snip…]

PyObject *__pyx_1 = 0;
PyObject *__pyx_2 = 0;
PyObject *__pyx_3 = 0;

[..snip…]

/* “/home/kai/add.pyx”:1 */
[…snip (3 nasty lines for value assignment to Python-object)…]

/* “/home/kai/add.pyx”:2 */
[…snip (same here) …]

/* “/home/kai/add.pyx”:3 */
__pyx_1 = __Pyx_GetName(__pyx_m, __pyx_n_a); if (!__pyx_1) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 3; goto __pyx_L1;}
__pyx_2 = __Pyx_GetName(__pyx_m, __pyx_n_b); if (!__pyx_2) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 3; goto __pyx_L1;}
__pyx_3 = PyNumber_Add(__pyx_1, __pyx_2); if (!__pyx_3) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 3; goto __pyx_L1;}
Py_DECREF(__pyx_1); __pyx_1 = 0;
Py_DECREF(__pyx_2); __pyx_2 = 0;
if (PyObject_SetAttr(__pyx_m, __pyx_n_c, __pyx_3) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 3; goto __pyx_L1;}
Py_DECREF(__pyx_3); __pyx_3 = 0;

[…snip…]

This is really nasty and anything else then fast, isn’t it?

BUT! If we tell Pyrex to treat our variables as C integers, it can generate clean C code:


cdef int a
cdef int b
cdef int c
a = 1
b = 2
c = a + b

what we get is:


[…snip…]
static int __pyx_v_3add_a;
static int __pyx_v_3add_b;
static int __pyx_v_3add_c;

[…snip…]
/* “/home/kai/add.pyx”:4 */
__pyx_v_3add_a = 1;

/* “/home/kai/add.pyx”:5 */
__pyx_v_3add_b = 2;

/* “/home/kai/add.pyx”:6 */
__pyx_v_3add_c = (__pyx_v_3add_a + __pyx_v_3add_b);
[…snip…]

And this is just clean C code. The same goes for loops. If you just use your python code, alot of Python-object operations have to be used. But if you use C variables, then Pyrex can create really fast C code and can even do pointer arithmetics. Pyrex allows you to speed up your code where you really need it without all the hassle of using the Python C API.

And my code turned out about five times faster after doing just some minor C adjustments :)

As I mentioned above, there are some other methods of improving Python performance, there is a good overview from the people at SciPy. And before you turn to Python+C, check the Python-performance tips first.

Wiki2beamer 0.8 is out (update)

Some weeks ago, already, I released wiki2beamer 0.8. It’s mostly a maintenance release. It now works again with python 2.4 which makes it easier to run on ancient systems (like our universities ;) ), has a litle bug fixed where “expressions” immediatly following lists were not transformed (so you had to add a newline) and the license was changed to “GPL 2.0 or later” so we don’t get stuck in some copyright problems as time moves on.

I also put the manpage for wiki2beamer online here (via man2html) so google can find it and non-*nix users can read it, too.

I should probably finish my thesis before I start coding at wiki2beamer, but if anyone out here is interested, there still are some things to be done:

  1. Create a Lessig-style slide environment, like the [code]-environment
  2. Split the code into a commandline wiki2beamer frontend and a python module as backend
  3. Write a formal syntax description, so we can create a real parser/compiler instead of these regular expression tricks.
  4. Build distro packages. (Fedora, RedHat, Arch, … anyone? Gentoo already has one. Update: I’ve built a Debian/Ubuntu package now, too. Go get it at SourceForge.)
  5. Build a windows installable package. (?)
  6. Update online documentation.

Also, we finally seem to get users :)
According to the sourceforge download statistics, we had 86 downloads in the last 2 months.

So, add one and head over to get wiki2beamer 0.8.

Jobpy released (pp is good, jobpy is more special)

Hello world Jobpy!

As part of my diploma thesis I needed a multithreaded job producer/consumer architecture implementation with at-least-once semantics. To make testing easier I implemented the architecture and semantics part in a seperate framework and made sure it worked well. The code is pretty java-ish which I’m not all that happy about, but I thought that maybe it could be of use for someone anyway. So I created a little sourceforge.net project for jobpy.

You may ask yourself now why the hell to use jobpy when there is also parallelpython (aka pp). Well, there are some problems with pp - E.g. it’s pretty hard to share ressources like a socket or a database among the consumer threads because the jobs have to be serializable. But sometimes data structures cannot be serialized and you would have to code around this limitation. Jobpy is not designed to replace pp, it’s just a solution for the special case of multithreaded producer/consumer job processing with optionally shared ressources.

wiki2beamer 0.7 is out

Hi all out there,

Wiki2beamer is a small tool to create LaTeX beamer presentations by converting a wiki-like syntax to LaTeX beamer code. It took a while after the 0.7 alpha 2 release, but I wanted to give it some time: wiki2beamer-0.7 is released. Major new features are:

  • Easy animated code listings via the [code] environment
  • Total escaping from wiki syntax via the [nowiki] environment
  • Template-less mode via the [autotemplate] environment
  • A setup.py
  • A man-page!!!11

So, go there, and spread the word :)

wiki2beamer 0.7 alpha 2

Hey there, wiki2beamer 0.7 alpha 2 is out. This means there is a dramatic increase of features of which the most important are:

  1. autotemplate-support: You don’t have to create a template file anymore, wiki2beamer can now do the job for you and create a fully functional .tex file from the .txt sources.
  2. code-environment: There is a new code environment which allows you to create animated code listings with a very dense and easy grammar.
  3. nowiki-environment: The nowiki environment allows you to completely escape from wiki2beamer.

These features were written in a very short time, so there probably are bugs. Also some feedback about the syntax and usability would be very helpfull.

wiki2beamer 0.6 is out

Do you like LaTeX? Would you also like to create your presentation slides for your talks with LaTeX-Beamer but fear the overhead? Then wiki2beamer probably is something for you. wiki2beamer speeds up beamer code creation by converting a media-wiki like syntax into functioning LaTeX-Beamer coder.

Here’s an example of the syntax:


==== Frametitle ====
* Bullet 1
* Bullet 2

# Enumerate
#<2-> Enumerate with pup-up effect

This really speeds up slide creation alot. Now wiki2beamer 0.6 is out which contains a small patch from me which allows you to manually close frames with [frame]>   and then continue with some LaTeX code outside of the frame environment. The examples contain a demonstration of how this works and how it can be used, espacially with verbatim environments and beamer.

On may, 8th I will give a small talk at Freitagsrunde TechTalks where I show how to efficiently create source-code centric presentations with LaTeX beamer (and wiki2beamer). So, I’d be glad if you’d just show up if you’re interested.

How many cities has the world?

So this is what I did this weekend: For my diploma thesis I needed a list of all the cities of the world (to filter text collections). There already exist corpora like the “Getty Thesaurus of Geographic Names” but they are not free and way over my needs - I just want a list.

Who could have such a list? First try: IATA airport list from wikipedia. Parsed. Erroneous (typos, problems with parsing…). Second try: openstreetmap.org.

And here is where the story begins. Openstreetmap.org collects geographical knowledge in a huge database and publishes it under a free license (CC-BY-SA). The database dump can be downloaded as a huge XML file which currently is about 5GB in bzip2 compression. The expected ratio for decompression is 10:1 so this is a fricking 50GB XML file. The structure is fairly simple. There are nodes, ways and relations and these can have tags. The tags then encode information like “this is a place of the kind city” or “its name is Footown”. To get this information I wrote a little python script that iterates over every line and extracts the relevant parts with some regular expressions. A run on the 5MB bzip2ed relations file took 17s. Well … that was to slow. So I removed the regexps and did some dirty by-hand parsing. 8s. Better. But still to slow.

So, next step: C.
The evil thing!


But, you have to admit - there’s nothing else when it comes to performance. About six hours of coding later I got the first results: 3.5s for a 1,170,000 lines XML file. Good. After some further improvements I got it down to 2.6s. Yeah! :)

On a 1.6GHz Core2 Duo the combination of bzcat and grepplaces.c (both running on one core) gives around 1.2MB/s reading speed on the bzip2ed planet-file. So a complete scan over the planet-file now takes about 70 minutes.

So, here’s the code: http://cleeus.de/grepplaces/

The extracted corpus will follow, as soon as it’s ready.

So long, some statistics:


$ grep “^city” places_planet.txt | wc -l
4179
$ grep “^town” places_planet.txt | wc -l
29401
$ grep “^village” places_planet.txt | wc -l
249716

Zyxel NWD-210N with Linux (Gentoo, 2.6.29-rc2-git1)

Last time I wrote about getting my Zyxel NWD 210N WLAN USB stick to run I had to fall back to ndiswrapper which uses the windows drivers from zyxel and somehow magically gets them to run under linux.

Well, this was with Linux kernel 2.6.25 while 2.6.29 is in the making now. Time to give it a second try, this time with native Linux drivers from the kernel. The output from lsusb -v suggested that the chipset inside is made by Ralink (USB ID 0586:3416). Some Ubuntu hardware list gave me the missing hint to find the right driver: Inside the Zyxel NWD210N there is a Ralink 2870 USB chipset. Ralink provides native drivers and firmware on their homepage.

The kernel hackers have a native kernel driver in the making for this chipset which is marked as a staging driver. Worth giving it a try ;)
Step 1: Get 2.6.29-rc2-git1 (emerge -av >=sys-kernel/git-sources-2.6.29_rc2-r1)
Step 2: Get RT2870STA.dat from Ralink’s Linux drivers and put it to /etc/Wireless/RT2870STA/RT2870STA.dat

Step 3: Copy over old .config and configure the kernel to include the staging driver

Device Drivers —> [*] Staging drivers —>
[M] Ralink 2870 wireless support

Step 4: make && make modules_install and setup the kernel to boot

After a reboot with this brand new kernel and modules everything is ready to plug in the device.

The LED on the thingy instantly started flashing and the dmesg output says:

usb 1-3: new high speed USB device using ehci_hcd and address 4
usb 1-3: configuration #1 chosen from 1 choice
rt2870sta: module is from the staging directory, the quality is unknown, you have been warned.
rtusb init —>

After a few moments and some dmesg spam a new network interface (ra0) appeared and KNetworkManager showed the WLAN networks around over this second interface. :)

I have yet to test how stable the driver and the connections are, but this post you are currently reading was published over the new rt2870sta driver :)

Preparations

Looks like I’ve got another lecture to read. In our (we = Computers and Society Departement) course “Information Rules 1″ which will take place next semester, I allready have the “Lex Informatica” lecture. In this lesson we teach our students the basics of the theory of regulation (Lessig, Reidenberg). I allready have this lesson pretty stable, it just needs a roundup to make some points more clear. But now I additionally have the “Open Source” lesson. This means alot of reading for me. I think I’ll start with some articles from our Open Source Yearbook and then dive into the papers from all the famous and not so famous scientists. This is both challenging and exciting for me but also means some work, obviously. Well, no more boredom in cold winter days. ;)