blog rss feed

This post became a rather longish braindump of my view on the current situation in the mobile computing devices market. It's based on what I read everyday in the news and some basic economic terms thrown in. Most of the ideas here are not new. You might have read them somewhere else. In fact I have read most of them somewhere in someone else's braindump but can't find it again for linking.

During the students talks at the Information Rules 1 conference, it became clear to me that the mobile devices market with all it's highly integrated smartphones and tablets looks a lot like the mainframe market of the 70's and 80's.

Back then large companies were busy cranking out features on proprietary platforms with proprietary operating systems, locking users deeply in and taking money out of them. This was possible because they controlled the whole platform stack: the hardware, the operating system, the developer tools and a lot of userland software.

This obviously is exactly what Apple is doing (and others are trying to do) today. They tightly control the hardware, the operating system, the developer tools, a lot of userland software and even the distribution channels for 3rd party software. And they do a good job of monetizing that control.

In 1981, IBM did something that was going to change the whole computing industry (and kill their own mainframe business, which, apparently IBM didn't foresee). They released the IBM-PC and opened the spec so that anyone could build clones. Although IBM didn't fully open it (they left out the BIOS code, expecting that none would be able to clone that), many vendors started to build clones. In a short timespan most PCs built were "IBM-PC compatible" .

What followed was a time of incompatibilities, bugs and attempts to restore lock in to specific operating system stacks on top of the x86/IBM-PC platform. But it was also a time of enormous growth. A huge number of hardware vendors popped up and improved the platform, all being held together by the IBM spec and mostly Intel which moved the internal interfaces (PCI, ATA, AGP, PCIe, SATA) forward. It wasn't all beautiful. Microsoft took over a big market share in operating systems. But this can still be considered an improvement as Microsoft didn't control the hardware.

Today only very few companies are still locked in to the old mainframe machines (a few days ago, NASA announced that they finally got rid of their last mainframe, an IBM Z9). Many are free and use software that at least with a bit of work could run on any operating system on any hardware. And even better: We have a whole bunch of free and open source operating systems which run on a multitude of hardware and pretty much can keep up with the commercial competitors.

Now what's happening on the mobile devices market? Apple has taken a good share with its closed iOS platform and makes the mainframe companies of the past look like hippies. Microsoft is just now really entering the game with its Windows Phone platform and strong partnership (if not to call it an acquisition) of Nokia. For me this looks a lot like control over hardware and operating system, too (although Microsoft still licenses Windows Phone to other hardware vendors). Blackberry covers another segment of the market with a closed hardware and operating system stack which they keep for themselves.

This could really have easily become a costly disaster with only one huge monopolist left in the end. But then there also is Google and Android.

With Google developing, pushing and licensing the Android operating system under mostly open source terms (without strong copyleft), diversity is coming back into the game: On the Android platform, the control over the operating system is very much decoupled from the hardware. It's decoupled so much that most of the times you can put a different flavor of Android on your device.

Now Google recently has acquired Motorola Mobility. This looks a lot like the vertical integration strategy of Apple, Microsoft+Nokia and Blackberry. But really I think it's not (and I am not the first to recognize ). Google is an advertisement platform for the open internet. Any company which creates a closed network/platform where Google cannot enter (distribute it's advertisement) is a threat to it. The massive momentum in the closed mobile platforms is such a threat (as facebook is one, too).

With the Android platform, Google provides a free operating system for device vendors (for them it's just free as in beer) which lowers the marginal costs (per-device) for creating mobile devices. This puts price pressure on all closed platform mobile device vendors and creates a huge number of devices which are not so closed and more open for Google's advertisement. Essentially, Google has to prevent any closed platform to get a monopoly or a substantially large share of the mobile device market in order not to loose its advertisement market there. Yes, Google could try to be the monopolist itself. But by giving away Android, Google triggers multiple positive feedback cycles (more devices - more apps - more buyers - more devices - ...). Closing future versions of Android would slow down these feedback loops and thus is not in the interest of Google. Acquiring Motorola has probably more to do with it's patent pool. Microsoft and Apple are known to sue or threaten to sue Android device vendors with their own patent pools and at least Microsoft is known to receive patent licensing fees for Android devices. By acquiring the Motorola patent pool, Google can use these patents to lower or eliminate these license fees and further push the Android platform.

Google's motivation to prevent a monopoly is good for everyone. They make the closed devices a little more open. But this is not yet the freedom we need and mostly have on the PC market. You cannot switch Android for Windows Phone or iOS on any mobile device. At least I have not heard of any device where this is possible.

The reasons is that there was no IBM-PC moment yet for the mobile devices market. You simply cannot swap out the display, the microprocessor, the wireless connectivity components, ... . They all are so tightly integrated, beautifully connected and interwoven with the operating system and userland software that it's hard for an operating system to run on anything other than it was designed for.

Of course this is not only a business problem but mainly an engineering problem. Components are not yet small enough to allow the specification of flexible and capable physical interfaces which could be used to build component based mobile devices as beautiful and usable as the highly integrated devices of today are.

I don't believe there is much more room for innovation on the smartphone and tablet markets in terms of form factors. A smartphone has to fit into a human hand. A tablet has to fit into two human hands. Both are rectangles, because every digital display is a rectangle. And that's about it. Of course there may be other form factors (implants, wearable, ...) but those will be a totally different thing.

Now what needs to happen for an IBM-PC moment on mobile devices are three things:

  1. Hardware components have to become a lot smaller.
  2. One big player has to make a good spec.
  3. A critical mass of standardized devices and components have to be built and sold.

I sincerely hope that this will happen some day.

The good thing is, it's not that unlikely. Once the hardware components will be small enough, device design will not be dominated by engineering problems and solutions anymore. Like a PC case, the phone or tablet could become a case for components. Then everything that is needed will be the spec for developing components and plugging them together. Oh yes, and someone has to start building those machines and components. As with the IBM-PC and Intel, the CPU designer (or a vendor of some other important component which is protected by strong IP rights) could then play the role of the spec designer. When the moment comes, I think we can expect one of the mobile CPU designers (ARM, MIPS, ...) to take that road (because they would get a monopoly, and which company would miss that opportunity).

The bad thing is: It might take some time (or never happen at all). The engineering problems currently seem far away from being easily solvable.

When there is sufficient supply of vertically integrated but differentiated mobile devices (different models with different features), and at the same time the costs of customizing are high then of course market demand for customizable devices will be low. The automobile market works like that. You would never buy some components and plug a car together. Regulatory and technological problems increase customization costs while at the same time the market has differentiated the products enough so that everyone will find a car that fits his needs.

The IBM-PC moment might indeed never repeat, because the IBM-PC was not only an open design, but also a leapfrog at the same time. Unlike most competitors who used 8bit processors, IBM used a 16bit chip in the IBM-PC that was able to deliver considerable more performance. So the advantage of buying "IBM-PC" or "IBM-PC compatible" was not only a minor improvement in a feature here or there, but a big killer feature for the customer.

To repeat that on the mobile market, a similar killer feature may be needed to gain critical mass quickly and get other vendors to adopt. In the prospect of the current engineering problems, sadly, steady evolution towards a automobile-like differentiated but vertically integrated market seems more plausible.

As seen on the PC market, a highly vertically disintegrated market can be very innovative. Open standards are the key as competition no longer has to be about who is setting the standards but about features inside the standard. But finding a player who at the same time is capable of and willing to put a killer feature in use to push an open standard will not be easy. And back then, the IBM-PC moment wasn't really intentional. I believe that had IBM known the consequences, they would not have opened their platform for anyone to clone.

tldr

Mobile devices (smartphones, tablets) are highly integrated products. Vendors control the hardware, the operating system, the developer tools and a lot of userland code. This is similar to the situation on the mainframe market of the 70's and 80's. Back then, the open IBM-PC hardware and software platform broke through the strong lock in effects on those markets. Google's Android is a step into the right direction but not enough. An IBM-PC moment is needed for mobile devices to become free.

disclaimer

It's easy to miss a critical piece when putting these chains of arguments together. If you think I have made a mistake, just drop me a note.

This is a good one on TED:

A.J. Jacobs: How healthy living nearly killed me.

Food for your brain: news.ycombinator.com

It's better than heise.de .

Tried a lot of different beans here in Berlin. These are the best: "African Queen" by Impala Coffee

A house blend. Dark roasted, with a hint of caramel.

Get it fresh.

When I listen to music, I mostly use my atom-board home server as a player and of course I want to control it from remote. But I don't like those complicated networked sound architectures (mpd, pulseaudio, jack). I just want to ssh on the box, put some music on the playlist and disconnect.

To do that, you obviously need a player that works from the command line. So far I used herrie. It has done a great job and I still like its simple interface.

But recently I wanted to upgrade from the onboard Intel HD audio to a solution with optical out. As I will need the single PCI slot for a gigabit ethernet card, I bought a USB soundcard, a "Creative SoundBlaster X-Fi 5.1 Sourround Pro USB". It works pretty much out of the box, you just need to build snd-usb-audio module into the kernel (3.0.4 that is). Sadly it's a pretty stupid device: It has no hardware mixer. So alsamixer won't give me a volume control bar, unless I set one up through a complicated asound.conf soft mixer. But I don't really need that anyway.

So, back on topic, the new USB soundcard now is connected, but for some reason herrie doesn't work well with it. Sound is horrible and stuttering. Maybe I'll file a bug...

So it was time to look for other players. Gentoo has media-sound/moc. Turns out it was a good idea to try something new. First of all, moc works nicely with the USB soundcard. It might have something to do with the architecture. Moc is multithreaded and has a seperate thread for the playback and for the user interface.

This is how moc looks on my console: moc music on console audio player screenshot

Besides beeing a functional console music player, moc also has some cool features:

  • can run in the background (without screen)
  • is themable
  • can seek pretty fast
  • has it's own soft mixer for volume
  • supports a good number of audio formats

So if you are looking for a good console audio player, you can give it a try.

Most native code / C++ software vendors nowadays have implemented coredumps for their software. "coredump" is a term from computer history, when main memory was magnetic core memory. A coredump is the frozen and stored state of a process at the time of a CPU exception (aka crash). It can be loaded in the debugger and analyzed (given you have the binary and the debugging symbols). On Linux it must be enabled in the kernel and by the system administrator via rlimit. On Windows, developers can use structured exception handling to catch a deadly exception and execute some little piece of code which then calls a Windows API function which creates a coredump.

When a crash in a production environment should happen, you then get a coredump stored on disk and can receive that from you customer. While crashes should not happen at all, when they happen, coredumps are extremely helpful.

Now recently I got a very special coredump on my desk. Here is what happened (Intel assembly this time, as it's on Windows).

The exception handler has catched an exception: "Unhandled exception at 0x57dff720 in SomeBinary.exe.dmp: 0xC0000005: Access violation."

Hmm, not much to see here, but the debugger put's us at the position of the crash:

.
.
.
004D4AFF  mov         edx,dword ptr [ecx]
004D4B01  push        eax
004D4B02  mov         eax,dword ptr [edx]
004D4B04  call        eax
-- baaammmm, crash here --
.
.
.

With registers:

EAX = 57DFF720 //indeed, thats what happened
ECX = 7843BD48
EDX = 7891F8D0
EIP = 004D4B06 //crash here

Hmm, what could that be? When something like that happens during development, you most likely have used an old invalid pointer or committed a crime of pointer juggling. When it happens during production, it's probably something terrible like a heap corruption or a race condition which you didn't find during development. The fact that there were something like a hundred other threads open in this coredump, didn't make matters betters. I feared for the worst. So the first thing I did was to find the bigger picture of where exactly I am in the code. The problem with production assembly code is, that it's optimized and has little to do with the source code anymore. Especially C++ templates and inlining make reading assembly hard, even if you have the sources.

From the debugger line it was clear that this is in the return section of a method call when the compiler starts calling destructors. Something like this:

void SomeClass::someMethod(...) {
  boost::intrusive_ptr<SomeBaseClass> foo(new SomeContreteClass());
  .
  .
  .
  // -- crash here, right before the return --
  return;
}

A little more assembler context made it clear, that the final call instruction should have been the invocation of the virtual destructor from the SomeConcreteClass object in the heap. foo is an intrusively reference counted pointer on the dynamically allocated object. Intrusively counting the reference means, that the object itself contains the counter and carries it, even when it is degraded to just a raw pointer. Because SomeConcreteClass was derived from SomeBaseClass and polymorphic, the compiler needs to find out the correct destructor uppon destruction of foo. With that knowledge I could start looking at the heap. ECX seemed to contain the address of the object in the heap. Now I had a look at the memory:

@ECX:
0x7843BD48 | 0082cf54 00000001 ...

That looks like a vtable pointer followed by the refcount which was down to 1. Maybe the vtable pointer was bad? So I followed the vtable pointer.

0x0082CF54 | 004355b0 0042c5d0 0042c5e0 ...

Well, this doesn't help much, but the disassembly looked sane and the debugger had a symbol for the address:

SomeConcreteClass::`vftable':
0082CF54  mov         al,55h //whatever, I don't care 
0082CF56  inc         ebx    //but it looks OK
0082CF57  add         al,dl

So why could that fail?

Just look what happend. The first mov instruction at 0x004D4AFF is supposed to load the address of the vtable into EDX which should be 0x0082cf54. But EDX has a totally different value (0x7891F8D0).

And that's the end of the story: the CPU must have failed on us. I repeat: The CPU has failed.

Either through broken RAM (most likely) or something like a TLB bug. I just loved finding that one ... and telling the customer to fix his hardware :)

You know, those web browsers with ssl support ... they have a reason why they basically ignore any OCSP/CRL errors.

Today I stumbled uppon a root certificate of a not-so-small international CA. Opinions differ, but generally CA roots should also be revokable and thus have an OCSP responder configured. Said certificate is one of the better roots and thus has a responder https://ocsp.some-ca.com set.

Now when you contact this responder something unexpected happens:

~ $ openssl s_client -host ocsp.some-ca.com -port 443
CONNECTED(00000003)
3125948630696:error:140770FC:SSL routines:\
  SSL23_GET_SERVER_HELLO:\
  unknown protocol:s23_clnt.c:683:

...

hmm ... something went wrong. tcpdump + wireshark shed some light on the problem: SYN, SYN-ACK, ACK ... tcp up, that's good ... now we send the SSL Client Hello. What we didn't expect was the reply from the server. A blatant (unencrypted):

HTTP/1.1 400 Bad Request
Server: Apache-Coyote/1.1
Transfer-Encoding: chunked
Date: Thu, 03 Nov 2011 17:45:54 GMT
Connection: close

Indeed, openssl is right: that doesn't look like an SSL handshake. It looks more like a slap with a large trout.

Die Bundesregierung will mal wieder Vorratsdatenspeicherung, wir wollen es nicht. Also gibt es eine Petition die man mitzeichnen kann. Wir müssen also mal wieder alle unser Login rauskramen und mitmachen.

Argumente dafür:

  • Sie wissen eh schon auf welcher Seite du stehst.
  • Es ihnen nochmal zu sagen, kann schlimmstenfalls dazu führen, dass die Botschaft diesmal ankommt.
  • Ja, Petitionen bewirken formell eh nichts. Die Wirkung kann aber doch informell in den Köpfen der Abgeordneten eintreten.

Also nochmal: Login rauskramen, mitzeichnen.

Charilaos Kalogirou on #AltDevBlogADay: "Optimizing for the instruction cache"

The basic advice is: When you write an algorithm that iterates over data and your code branches at some point depending on the values of your current data point (say by a virtual member function call), then the CPU gets frequent cache misses on the instruction cache. If you organize your data (e.g., by sorting or keeping different lists), you can spare your CPU these misses.

Yesterday I had an epiphany (a small one). It happened when I installed a Nagios server and started to set up hosts and checks. Nagios is a network monitoring tool that consists of a nagios server that runs checks on and collects information about a number of other hosts. As an admin you can get alerts and notifications about events or just view the green/yellow/red status of these checks.

The elegance in Nagios is, that checks are very simple and isolated. They just shell commands that return a status (green/yellow/red) and a message. That's it. There are a number of predefined checks and you can easily add new ones.

Now, when you start monitoring an existing network, you don't have checks for every service there is. But you can add them. One after another. Every check you add, adds value to the monitoring system. And eventually you will have checks for all the vital parts in the network.

It's like Unit Testing for admins.

When you have a legacy code base that is not unit tested and start adding tests, you have 0% code coverage. But every test you add, adds value to the test suite.

I think there is a more general lesson to be learned from these two examples. In any existing system, there are two strategies for improvement:

  1. Revolution
  2. Evolution

Revolution means, making a big investment and creating something new to replace the old. Think of an old house that has become to costly to repair. You have to tear it down and build a new one. Thats a big investment and an associated risk that the house might not stay on budget and make you go bankrupt. That's why revolution most of the times is not feasible.

Evolution means small repeated investments to replace or enhance the old with something new. Speaking in terms of house-building, it would mean to replace parts of the house, maybe room by room or wall by wall with something new. Of course that's hard if your architecture does not support incremental change (buildings don't, most of the time). Supporting incremental change is key to improvement in situations where the revolution approach is infeasible. To support incremental change means:

  1. The returns of added increments must be immediate.
  2. The costs of added increments must stay constant.
  3. Returns must not diminish.

All three conditions hold true for Unit Testing and Nagios. I'll explain a bit more about the three rules:

1. Instant Benefits

You have to start with something small and even that has to be of value to you. Building a new house is the opposite: It is of no use only until it is completely done. When introducing a Unit Testing framework to an existing codebase, you have low fixed costs (adding the framework) and get immediate results from your first test on.

2. Constant Costs

According to the second law of thermodynamics, entropy in any closed system will only grow. This seems to be true for complex systems, too. When increments are added, most of the times a bit of coherence and order is lost. Adding another increment will be more costly.

In software development you can observe that directly: It all starts out with a clean design. Then someone has to add something unexpected and violates the design, a little bit. The next time someone wants to add something, he will have a harder time. Costs didn't stay constant but grew with the number of increments.

In Unit Testing, the architecture of isolated test does it's best to not let this happen. It tries to keep entropy of the system low. Nagios checks work the same way. They are isolated and consistently organized. Adding another check is not influenced by existing checks.

3. No Diminishing Returns

Diminishing returns is a term from economics. It means that returns will scale lower than linearly with investment. This must not be the case when you want a system to work. Adding an increment must result in an outcome that is proportional or even higher than proportional to the investment.

Take Unit Testing. Every test you add, results in code coverage (the return) which is more or less proportional to the investment (there might be a point where adding more tests doesn't result in more code coverage, but that's when you have a fully tested system and just don't need to add more tests). Again, Nagios checks behave the same way: Every check you add results in a better ability to react to defects in the network (linear returns).

When returns start to diminsh, you probably need a new architecture to replace the old one. It might be a completely different one, it might need to be more flexible, it might be designed for completly different goals. Diminishing returns mean, that you have pushed the system to it's limits. Don't go any further.

Conclusion

If you want to build systems that induce constant improvement, build them with these three rules in mind.

Examples here might have been overly simplistic. There always are border conditions when the mechanics stop working. That's probably when you need a new system (revolution). But when you see something that doesn't have these mechanics in place, you should either change it or leave it.

The bible for the modern professional programmer. You will find a lot of reviews on the net, so I don't have to write another one. It's about how to write good code, which seems to be still more an art than a science. It should be taught in every CS education. But it isn't. Instead that's where you will find some of the most horrible code anwhere. I barely remember those tree and graph implementations in a Java int[] array ... . Ah, and the variable names ... p q t0 za s ... . As if there is bytes to save ... well our professors learned programming on punchcards.

But, back on topic: "Clean Code: A Handbook of Agile Software Craftsmanship" teaches you, what university courses do wrong and how your code can get better. Go read it. :)

music of the month:

bassdrive.com

drum'n'bass -- classic :)

Most programers already have heard of it: There will be a new C++ ISO standard. It's in the making for quite some time now and compiler vendors already have implemented a lot of the features. It's commonly refered to as C++0x as people expected it to appear somewhere in 200x. Now it seems to become C++11, maybe C++12.

The changes are huge and looking at C++0x source code is like looking at a new language. It has both, new core language features and useful additions to the STL. In this post I'll introduce the feature of which I think that they will change the way C++ is used the most.

shared_ptr

shared_ptr is an addition to the STL. It's a template class std::shared_ptr<T> that implements a reference counting pointer. Reference counting is a commonly known technique to C++ programmers and every big codebase seems to have it's own implementation. They are also refered to as smart pointers.

The idea is simple: Instead of using raw pointers, you pass the pointer into an object that overloads the * and -> operators so you can use the object like a pointer. But additionally the object has a reference counter attached. Every time you make a copy of that pointer object, the reference counter is increased. Uppon destruction, the reference counter is decreased. If the reference counter goes down to 0, delete is called. Now the big advantage of having this in the STL means, that we can easily pass shared_ptr in interfaces.

Constructing shared_ptr works like this:

#include <string>
#include <iostream>
#include <memory>
using namespace std;

int main(int, char**) {
  shared_ptr<string> hello = shared_ptr<string>(new string("Hello World!"));
  cout << *hello << endl;
  return 0;
}

As you can see, there is no delete. Usually this was a memory leak, but shared_ptr will call delete for us. Common shared_ptr implementations create an internal refcounter object on the heap, so you will have 2 new calls. Using the make_shared template function can even eliminate that and it's also shorter:

shared_ptr<string> hello = make_shared<string>("Hello World!");

You can pass around copies of hello as you like, the refcounter will make sure it stays alive as long as needed. Using shared_ptr almost feels like garbage collection in C++. There is one situation where reference counting fails: cycles. Think of a tree where parents know their children and children know their parents. If you use shared_ptr for both, forward and backward links, you have a strong cycle and the refcounters will never go down to 0. That's what weak_ptr is for. You can make a weak_ptr from a shared_ptr. weak_ptr get's notified when the object it points to is destructed, so you won't get dangling pointers. To access an object behind a weak_ptr, you have to call lock() on it. This will return a shared_ptr. My prediction is, that C++ libraries will soon start to use smart pointers extensively and that will make C++ a better language.

If you want to learn more about smart pointers I can suggest the "Advanced STL" video lecture series by Stephan T. Lavavej on channel 9 . In part 3 he discusses everything about smart pointers.

auto

The auto keyword is a new core language feature. It implements L-value type inference which is an idea that most prominently is used in the Go language. In any assignment statement, there is an L-value which is on the left side of the = operator and an R-value (on the right side) which is the result of an evaluated expression. The R-value already has a strict type. When the L-value is a definition of a new variable, most of the time you want it to have the type of the R-value. And since your compiler already knows that type, you can now just tell it to use it:

auto hello = make_shared<string>("Hello World!");

hello now has exactly the type std::shared_ptr<std::string>. This might not seem to be that important, but it greatly improves readability and coding speed in C++.

lambdas

In one of my last posts, I explained why the STL algorithms are decoupled from the containers they work on. Now some of these algorithms, most prominently for_each, require a function that will be used / applied uppon the elements in the container. Modern dynamic languages like python long support functions as arguments, not to mention functional languages.

C++03 did too, but the syntax was verbose. You could either pass a function pointer to a real function or implement a callable function object (an object which has the () operator overloaded). Both were to complicated and it was much easier to write a simple for loop then to use for_each. Now lambda expressions change that. Take a look at this simple example:

vector<int> a_list;

for_each(a_list.begin(); a_list.end();
  [](int x) { cout << x << endl; }
);

There is a lot more to lambdas, which I haven't even fully explored myself. E.g. using notations like [&] or [=] you can access the variable in the lambdas scope. You can read some more in the wikipedia section about C++0x lambdas. Lambdas will greatly increase the power and expressiveness of the language and will help improve the functional style parts of C++.

function / bind

A std::function is a typed callable object. It is typed on it's inputs and outputs and can be constructed from a variety of constructs: raw function pointers, function objects, lambdas and binds. Using std::function it is easy to setup callbacks / signaling mechanisms in a typesafe C++ manner. The classic example is a button onClick event:

class Button {
public:
  void setOnClickEvent(std::function<void (void)> event) {
    onClickEvent = event;
  }
private:
  void performOnClick() {
    if(onClickEvent) {
      onClickEvent();
    }
  }
  std::function<void (void)> onClickEvent;
};

The std::bind is a first order function object -- a function that has some arguments saturated, while others are left to be filled. It looks like this:

void repeatedBeep(unsigned int x) {
  for(unsigned int i=0; i<x; ++i) {
    //...
  }
}

int main(int, char**) {

  Button button;

  button.setOnClick( std::bind(repeatedBeep, 42) );

  //... 
}

Arguments in a std::bind can also be other function objects which then results in a higher order function object. Wheeee... functional programming in C++ :) Admitted, you can do lot's of stupid stuff with binds and functions (and lambdas), exspecially if your arguments go out of scope where you have passed references. But as always: C++ assumes the intelligent programmer. ;) I guess when C++0x is in all stable mainstream compilers, we will see a new generation of GUI libraries which will make good use of these facilities.

There is a lot more to C++0x, but most of them are details which make a difference for the hard-core library implementer. The features above are what I think will influence our every day code and architecture. And I don't know what you think, but I'm pretty excited about these new features.

A while ago I have found an interview with Herb Sutter ( this one on channel 9 ) where he was talking about the upcoming C++ standard (C++0x or C++11). There was one sentence in that Interview, that didn't seem that much important, but I have found it very true and it constantly comes back in my mind: It's important to get exposed to other languages. (at 18:00 in the video)

And it really is important: If you keep sticking your head deeper and deeper into one single language, you tend to forget why you do things the way you do. But at the moment you switch languages, you have to start thinking again. What's the right way to express what you want to do? Doing that, makes you a better programmer (of course you should always do that in your native language, as well).

I just noticed this when I had to do some Java again. Java removes some features from C++, that allow you to do stupid stuff and adds some features that encourage you to do good stuff. You can learn from Java that maybe you shouldn't do everything you can do. Or take Python: Pythons standard libraries is once of the best I have seen. Everything is where you intuitively expect it to be. Also, in Python there are simple free functions that are not part of any class (but part of a package/module/namespace) and that happens to provide much of the simplicity of pythons standard library. You can learn from Python how good libraries should look like and that not everything is a class.

So, next time you have an itch and need to create something, choose a new language. Even if it slows you down, take the time and your motivation to try out ... maybe ... Ruby, Go, Scala, C++0x or even Haskel. It will make you a better programmer.

Social networks are a good thing: People can connect, communicate and share. I think the Internet and within it, the social networks, will raise a whole generation of young people who will have learned to apreciate different cultures and people from around the world.

But most of these social networks, as much as they do good, they also violate your privacy through a very clever economic and technical measure: The Like! Button.

Content producers profit from putting such buttons on their pages because it increases their audience. Users profit from it because they can share stuff more easily. And the button providers profit from it because they can track where you go, what you look at and provide targeted advertising.

Now targeted advertising and profiling is not necessarily a problem. It only becomes a problem when you can't control your profile anymore. And that is what has happened with those Like buttons: They are everywhere and you cannot opt out. Or can you?

Yes you can, even with Chromium. I am currently trying out two Chromium extensions: Ghostery and Disconnect.

They both do a similar job: They block those tracking web bugs and like buttons. Unlike the popular Firfox ShareMeNot plugin they completely remove the buttons and thus remove function. But I can accept that and share manually.

Yay, looks like wiki2beamer is mentioned in the 2011/08 issue of the german print magazine "Linux Magazin" (thanks for the hint to sping). According to this content overview it's only a small "Tool Tips" section. But hey, it's print!

There seems to be no free online version of the article so I guess I'll get a copy at the kiosk.

Do you want an invite to Google+? Just drop me a line.

Today we take memory for granted. As a developer you often don't think of it anymore and lavishly reserve buffers for all kinds of stuff. That XML file? Just throw it into a DOM parser. That ZIP file? Just decompress it into a memory buffer and write that to disk. That email attachment? Just decode it in memory and see what it is. This is perfectly ok, because if you would think for half an hour about every chunk of memory you're about to reserve, you would get nothing done on time.

But, there is always a but. The day will come when that XML file is a database dump with gazillions of nodes or it contains a huge chunk of encoded binary and your new / malloc just cannot get you fresh memory anymore. The day will come, when some email user attaches a 9GB ISO image to an email and tries to send it over your filtering SMTP server (I've heard, chief executives really like to do that).

Yes, new can actually fail. On 32bit architectures you have a theoretical raw address space of 4GB. But your process layout, stack, libraries and operating system restrict that considerably so that only about 2GB (at least on windows) are left for use. Now the heap, where most of your long-living memory chunks will be allocated, is like a harddisk: it can be fragmented. That means, if you sprinkled small objects all over your heap, chances are your memory manager (new/malloc) won't find a large continous chunk anymore. And that's when it will fail and throw an exception.

When that happens, and you really need to solve the problem, it's like standing with your back to the wall. You have two choices:

  1. Offload large chunks to the harddisk, or
  2. rewrite everything to be a streaming architecture.

Well, 1. is kinda obvious: Go to the places where your code wants to get huge chunks of memory and replace it with read/write access to temporary files. That's ugly, but it works and is often the only option you have in existing codebases.

Now what does 2. mean? A streaming architecture?

In a streaming architecture you can only read and write fixed size chunks at once. That is your program reserves some fixed buffer and then repeatedly reads into it, runs some processing on it and finally writes the buffer content to some sink - repeat. Your read sources and write sinks could be files, sockets, devices, whatever. The thing is: Your overall design has to consider this requirement up front. Otherwise you've lost. Converting an existing step-by-step design into an on-the-fly streaming architecture is close to a full rewrite.

Creating such an architecture can be hard. It might require some dirty hacks, when the output needs calculated values (like checksums) in the beginning when they are only known after fully running through the input once. Or it might require seekable input stream semantics when you need to know the the size of your input before you have fully processed it. That's why designing clever output formats is so difficult.

But it's worth it in the long run. What would you think if tar -xz suddenly crashed with an out of memory exception?

So, next time you process a file: Do it in chunks! :)

One part of learning a language is getting to know the standard runtime / library environment. In all modern languages there are huge standard libraries and they make up a big part of what the language feels like, and how efficient you are as a developer in your day-to-day job.

In some languages these libraries are closely interwoven with the language itself. Take Java and it's String class. It's part of the library and located in the package java.lang. but it's so close to the language that every string literal is converted to a String instance.

Or take the Python and its Exception classes. Even when basic language expressions, like import package fail (e.g. because package cannot be found) an Exception from the package exceptions. is thrown.

C++ takes a somewhat different approach.

C++ is rigorously divided into core language features and a standard library that is implemented using these language features. C++ core language features are things like:

  • basic data types, arrays, pointers
  • functions
  • classes, inheritance, polymorphism
  • templates
  • exceptions
  • runtime type information

The standard library in C++ provides a number of mostly generic templates (classes and functions) and is called the Standard Template Library - STL.

The basic building blocks of the STL are containers, iterators and algorithms. Containers are standard computer science data structures - vectors, linked lists, priority queues, (hash) maps, sets, etc..

When using them, you will sometimes miss some helpfull member functions. E.g. there are no .sort, no .revert, no .search member function in the std::vector class. At first you'll find that really annoying. But like most of the things in C++, there is deeper meaning.

As said above, there is a collection of algorithms in the <algorithm> header. These standard algorithms like sort, replace, copy, for_each work on all these standard containers. Now why are they decoupled? The argument is simple and is called the NxM problem:

Given N containers and M algorithms, you have to maintain NxM implementations of these algorithms.

This is costly and error prone. The inventors of the STL felt the need to solve this problem and invented a glue, a unified interface that can expose some of the properties of containers so that generic algorithms can work on them. This glue is iterators.

An iterator is like a fancy reference/pointer to an element of a container. The weakest type of iterators supports the ++ increment operator and the * dereference operator. That means you can make the iterator point to the next element in the container and access what it's pointing to. The most powerful iterators expose a random access subscript operator []. Going on from this abstraction, you can implement algorithm template methods like std::copy that work equaly on raw C arrays and doubly linked lists. Yes, you can use the STL algorithms on raw C arrays.

Try this:

#include <algorithm>
#include <iostream>

int main(int, char**) {
  int foo[42];
  const size_t foo_size = sizeof(foo)/sizeof(foo[0]);
  int bar[foo_size];

  for(size_t i=0; i<foo_size; i++) {
    foo[i] = i;
  }

  std::copy(foo, foo+foo_size, bar);

  for(size_t i=0; i<foo_size; i++) {
    std::cout << bar[i] << " ";
  }

  return 0;
}

And next time, we'll take a look at the assembly :)

great musik, great people, love everywhere :)

bookmark: Der Fall Böse - Raus

As most of you may know, on my job I'm not coding one of those new fancy languages but old-school, hardcore, C++. Now some may scream: OMFG, C/C++ - it causes security bugs, it's verbose code, it's not cross platform, it's so 90's - your job must be terrible. But after more then a year, I can only say: I love it. In this and probably some of the next posts, I'll try to show some of the things I like.

Admitted, C++ is not for the faint-hearted. You have to be prepared to get engaged with the language and the machine. Once you got that, you will see that:

  • C++ is one of the most portable languages
  • C++ (in not-C) seldomly causes security bugs (no more then say Java).
  • You get rewarded with uncompared performance.

Of course you always have to choose the right tool for the job. If I had to make a bunch of half educated "IT specialists" code some random business application, I'd probably not choose C++ because they wouldn't have a clue why it crashed all the time (Java could be a good tool here). Or if I'd want to write a tiny tool to do some text processing, I'd just use python ;). But with clever coders and high performance requirements, C++ is still an excelent choice.

And the performance gains can be dramatic. During my diploma thesis (together with Daniel Käs), I did some heavy statics calculations on vast amounts of data. One of those was to calculate a distance matrix. That is you have a number of points (vectors) with N dimensions (talking about N>20000 here) and want to know the distance of each point to every other point. That's a distance matrix. Now our distance function was the cosine similariy measure which is a little more demanding, but not much.

The first implementation was pure python. It took ages. We were effectively unable to repeat the calculations more then once a day. Then I just ported the distance function to C (using Cython). That already improved performance by a factor of 2-3. But the real performance gain came, when I also ported the inner for-loop into C (which, from a python point of view reduced complexity from O(NxN) to O(N) and from a C point of view increased complexity from O(1) to O(N)). That was another speedup of like x10.

Now that was C, but C++ can be equally fast. I just recently discovered what a good optimizer can do from some template code and carefull inlining. Let's take a look at the "workhorse class of the STL" (S.T.Lavavej): std::vector. Some even more old-school and hardcore C coders worry that using vectors instead of plain old arrays is considerably slower. But this is not really true. A vector is a dynamically growable array-like structure. It has the subscript operator [] overloaded which means, you can even use it like an array. But it has comfortable features like a .size() method and a .push_back(...) method. Let's say we have a class Foo and make a vector of Foos and assign it some values:

#include <vector>;

class Foo {
public:
  int get_bar() const { return bar; }
  Foo& operator=(int val) { bar = val; return *this; }
private:
  int bar;
};

int main(int, char**) {
  std::vector<Foo> foos;
  foos.resize(100);

  for(std::vector<Foo>::size_type i=0; i&lt;foos.size(); i++) {
    foos[i] = i;
  }

  return 0;
}

std::vector<Foo>::size_type is the correct type for indexing foo, you could also say size_t here. Assigning i to foos[i] works, because the assignment operator = in Foo is overloaded.

Is that assignment loop slower than a plain C loop over an array of ints? Let's find out and ask ourselfes: What would gcc do? At first it needs to know the definition of the subscript operator [] on the vector template. The following implementation is from libstdc++ 4.4

reference operator[](size_type __n) {
    return *(this->_M_impl._M_start + __n);
}

Modern compilers like gcc decide for themselfes when it's a good idea to inline code. Inlining is good when the inlined function is only a few lines of code. The subscript operator certainly is a good candidate for inlining. So let's do that, and see what our loop becomes:

for(std::vector<Foo>::size_type i=0; i<foos.size(); i++) {
    *(foos._M_impl._M_start + i) = i;
}

Looks like some good old pointer math. But that = operator is still a method. It's also small and certainly will be inlined, too. Let's also inline that one (for brevity, only the asignment statement, the loop stays untouched):

(*(foos._M_impl._M_start + i)).bar = i;

The next step is to resolve the . member access operators. What they do is nothing more than adding the offset of the member to the base address (this-pointer) of the object. Now a good compiler has the 0-offset optimization, that is: there is no additional data between the this pointer and the first class member (unless there is a virtual function member in which case the compiler has to add a vtable). Our Foo class only has the .bar member which is it's first member. So the offset to the this pointer is 0, so it can be eliminated.

*((foos._M_impl._M_start + i)+0) = i;

which is obviously equal to:

*(foos._M_impl._M_start + i) = i;

Now we have to do the same with those strange _M_impl and _M_start members in the STL vector implementation. _M_impl is an intentionally uglyfied name for the internal proxy object that manages vector memory. And it just happens to be the case, that _M_impl is the first data member in vector. It's of type _Vector_impl which is a nested class in _Vector_base. _M_start is a raw pointer (Foo*) to the memory block inside the _Vector_impl and also happens to be the first data member in _Vector_impl. Our optimization can go on and eliminate _M_impl and _M_start:

*((int*)(&foos + 0 + 0) + i) = i;

which is equal to:

((int*)&foos)[i] = i;

This looks absolutely equal to flat memory access on an int[] C array. And for the non-believers, we'll now have a look at the emitted assembler code. First the C version (compiled with gcc version 4.4.5, on x86_64 with -O2).

int foos[100];
int i;
for(i=0; i<sizeof(foos)/sizeof(int); i++) {
  foos[i] = i;
}

which compiles and disassembles to:

#0000000000000830 <main>:
# ... snip ... (function prologue)
# i=0 (eax is our i)
xor   %eax,%eax
# foos start address (rsp) -> rbp, rbx and rdx
mov   %rsp,%rbp
mov   %rsp,%rbx
mov   %rsp,%rdx
# some NOP / prefetch instruction
nopl   (%rax)
# foos[i] = i
mov    %eax,(%rdx) #loop begin / 0x858
# i++
add    $0x1,%eax
add    $0x4,%rdx
# test i<100 ?
cmp    $0x64,%eax    
# goto loop-start
jne    858 <main+0x28> #loop begin

Pretty clever gcc, huh? ;) It notices, that we're accessing foos with the loop counter. It then allocates a register (rdx) and increments this register by 4 (the size of an int) in each loop round. This way it doesn't actually have to add i to the start address of foos. Well let's see what it can do with the C++ code from above.

#0000000000000a10 <main>:
# ... snip ...
# 100 -> edx
mov   $0x64,%edx
# ... snip ...
#foos.size() inlined -> rdx
mov   0x10(%rsp),%rdi
mov   0x18(%rsp),%rdx
sub   %rdi,%rdx
sar   $0x2,%rdx
#initial test on size() 
test  %rdx,%rdx
je    a84 <main+0x74>  #a84 is behind loop
# i = 0
xor   %eax,%eax
# some NOP / prefetch instruction
nopl  (%rax)
# foos[i] = i
mov   %eax,(%rdi,%rax,4) #loop begin / 0xa78
# i++
add   $0x1,%rax
# test i<100 ?
cmp   %rdx,%rax
# goto loop begin
jb    a78 <main+0x68> #loop begin

Quite different, I have to admit. But it's just logical. In the C version, gcc can assume that the size to compare i with is allways greater 0, so it can skip the initial check. In C++ it cannot do this, because .size() could return a negative number (see a6e). But what we can see here is another optimization. .size() normally is a method. But it's also getting inlined (no call, just memory access). And not only that, it's also moved out of the loop and the result stored in register rdx (see a5d). The only other major difference is the operation used to move the value of i into the memory. The C++ version uses a relative addressing mode where the CPU has to do additional internal calculations. This might be a bit slower the the C version.

The C version might be faster, but it's not a whole order of magnitude or something like that. It's just a matter of loop invariant analysis and choosing the one or the other opcode.

But don't forget what you get for that tiny little bit of performance impact: You're getting all the benefits of a fully functional dynamic-size storage. In debug mode, you're usually even getting bounds checking. You're getting exception-safety (that is at least the weak property: no resource leaks). And in release mode, you have that nice and clean near-C speed.

C++ is a type safe, high level and abstract language and still let's you write code that is amazingly fast. And this is only one of the reasons, why C++ is worth beeing loved :)

A new star is born.

Well ... stars ... there are many and most of them are not that bright either. But I was looking for something new to create a website with. That old horrible joomla/php thing broke apart after some php update (I don't remember which one) and was way to much for me either.

So I was looking for something fresh. As I already just uploaded static snapshots of the old php cms (via httrack and some sed+curl scripts), a static site compiler seemed like a very appealing idea to me.

A static site compiler does something really simple: It takes a number of input and configuration files, runs them through it's processing machinery and spits out a collection of HTML and auxiliary files (aka: a website ... you know like websites from the 90's). You can then simply upload and statically serve that site (or even publish it with your favourite VCS).

No PHP. No database. No security bugs. Simple, fast, static fileserving. Such a site might even survive that slashdot link on your average shared web host.

I tried a few existing ones but none of them was what I needed. When I have to install a dozen additional libraries (for which my distribution doesn't even have packages yet), before I get a tool up and running, there is something wrong. And there is also something wrong, when it pulls in some huge template engine for which I first have to learn a new language before I can get started.

So none of the tools I tried had all of the following properties:

  • simple to use
  • simple to install
  • few dependencies
  • supports blogging, rss feeds and simple pages
  • low maintenance

I had an itch.

Now in a few weekends I hacked together a few hundred sloc of python sriptness. It's far from done, it's completely undocumented, it's not (yet) unittested. But it only needs a python runtime and it can already put together this blog :)

Stay tuned, there's more to come.