<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
	<title>std::blog&#x003c;cleeus&#x003e;</title>
	<link>http://www.cleeus.de/w/</link>
	<description>My blog about stuff.</description>
	
<item>
	<title>Where is the mobile IBM-PC?</title>
	<link>http://www.cleeus.de/w/blog/2012/02/18/where_is_the_mobile_ibm_pc.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Sat, 18 Feb 2012 22:42:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2012/02/18/where_is_the_mobile_ibm_pc.html</guid>
	<description>
	<![CDATA[
<p>This post became a rather longish braindump of my view on the
current situation in the mobile computing devices market.
It's based on what I read everyday in the news and some
basic economic terms thrown in. Most of the ideas here
are not new. You might have read them somewhere else.
In fact I have read most of them somewhere in someone else's braindump
but can't find it again for linking.</p>
<p>During the students talks at the Information Rules 1 conference,
it became clear to me that the mobile
devices market with all it's highly integrated smartphones
and tablets looks a lot like
the mainframe market of the 70's and 80's.</p>
<p>Back then large companies were busy cranking out features
on proprietary platforms with proprietary operating systems,
locking users deeply in
and taking money out of them. This was possible because
they controlled the whole platform stack:
the hardware, the operating system, the developer tools
and a lot of userland software.</p>
<p>This obviously is exactly what Apple is doing (and others are trying to do) today.
They tightly control the hardware, the operating
system, the developer tools, a lot of userland software and even the distribution
channels for 3rd party software.
And they do a good job of monetizing that control.</p>
<p>In 1981, IBM did something that was going to change the whole computing industry
(and kill their own mainframe business, which, apparently IBM didn't foresee).
They released the IBM-PC and opened the spec so that anyone could build clones.
Although IBM didn't fully open it (they left out the BIOS code, expecting
that none would be able to clone that), many vendors started to build clones.
In a short timespan
<a href="http://en.wikipedia.org/wiki/Influence_of_the_IBM_PC_on_the_personal_computer_market">most PCs built were "IBM-PC compatible"</a>
.</p>
<p>What followed was a time of incompatibilities, bugs and attempts to restore lock in to
specific operating system stacks on top of the x86/IBM-PC platform.
But it was also a time of enormous growth. 
A huge number of hardware vendors popped up and improved the platform,
all being held together by the IBM spec and mostly Intel which
moved the internal interfaces (PCI, ATA, AGP, PCIe, SATA) forward.
It wasn't all beautiful. Microsoft took over a big market share in operating systems.
But this can still be considered an improvement as Microsoft didn't control the hardware.</p>
<p>Today only very few companies are still locked in to the
old mainframe machines (a few days ago, 
<a href="http://blogs.nasa.gov/cm/blog/NASA-CIO-Blog/posts/post_1329017818806.html">NASA announced</a>
that they finally got rid of their last mainframe, an IBM Z9).
Many are free and use software that at least with a bit of work could run on any
operating system on any hardware. And even better: We have 
a whole bunch of free and open source operating systems which run
on a multitude of hardware and pretty much can keep up with the
commercial competitors.</p>
<p>Now what's happening on the mobile devices market?
Apple has taken a good share with its closed iOS platform
and makes the mainframe companies of the past look like hippies.
Microsoft is just now really entering the game with its Windows Phone platform
and strong partnership (if not to call it an acquisition) of Nokia.
For me this looks a lot like control over hardware and operating system, too
(although Microsoft still licenses Windows Phone to other hardware vendors).
Blackberry covers another segment of the market with a closed hardware
and operating system stack which they keep for themselves.</p>
<p>This could really have easily become a costly disaster with only one
huge monopolist left in the end.
But then there also is Google and Android.</p>
<p>With Google developing, pushing and licensing the Android operating
system under mostly open source terms (without strong copyleft),
diversity is coming back into the game:
On the Android platform, the control over the operating system is
very much decoupled from the hardware. It's decoupled so much that
most of the times you can put a different flavor of Android
on your device.</p>
<p>Now Google recently has acquired Motorola Mobility. This looks a lot
like the vertical integration strategy of Apple, Microsoft+Nokia and
Blackberry. But really I think it's not (and I am
<a href="http://www.googleopoly.net/googleopoly_2.pdf">not the first to recognize</a>
).
Google is an advertisement platform for the open internet.
Any company which creates a closed network/platform where Google
cannot enter (distribute it's advertisement) is a threat to it.
The 
<a href="https://trigger.io/cross-platform-application-development-blog/2012/01/24/youve-probably-underestimated-just-how-big-this-is/">massive momentum</a>
in the closed mobile platforms is such a threat
(as facebook is one, too).</p>
<p>With the Android platform, Google provides a free
operating system for device vendors
(for them it's just free as in beer)
which lowers the marginal costs (per-device)
for creating mobile devices.
This puts price pressure on all closed platform mobile device
vendors and creates a huge number of devices which are not so
closed and more open for Google's advertisement.
Essentially, Google has to prevent any closed platform to get a monopoly
or a substantially large share of the mobile device market in order not
to loose its advertisement market there.
Yes, Google could try to be the monopolist itself.
But by giving away Android, Google triggers multiple positive feedback
cycles (more devices - more apps - more buyers - more devices - ...).
Closing future versions of Android would slow down these feedback loops
and thus is not in the interest of Google.
Acquiring Motorola has probably more to do with it's patent pool.
Microsoft and Apple are known to sue or threaten to sue Android
device vendors with their own patent pools and at least Microsoft is
<a href="http://googleblog.blogspot.com/2011/08/when-patents-attack-android.html">known</a>
<a href="http://www.infoworld.com/t/android/barnes-noble-takes-microsofts-android-patents-179648">to</a>
<a href="http://arstechnica.com/microsoft/news/2011/11/barnes-noble-microsoft-using-patents-to-cripple-android-competition.ars">receive</a>
patent licensing fees for Android devices.
By acquiring the Motorola patent pool, Google can use these
patents to lower or eliminate these license fees and further
push the Android platform.</p>
<p>Google's motivation to prevent a monopoly is good for everyone.
They make the closed devices a little more open.
But this is not yet the freedom we need and mostly have on the PC market.
You cannot switch Android for Windows Phone or iOS on
any mobile device. At least I have not heard of any device where
this is possible.</p>
<p>The reasons is that there was no IBM-PC moment yet
for the mobile devices market. You simply cannot swap out the display,
the microprocessor, the wireless connectivity components, ... .
They all are so tightly integrated, beautifully connected and interwoven
with the operating system and userland software that it's hard for an
operating system to run on anything other than it was designed for.</p>
<p>Of course this is not only a business problem but mainly an engineering
problem. Components are not yet small enough to allow the
specification of flexible and capable physical interfaces which
could be used to build component based mobile devices as beautiful and usable
as the highly integrated devices of today are.</p>
<p>I don't believe there is much more room for innovation on the
smartphone and tablet markets in terms of form factors.
A smartphone has to fit into a human hand. A tablet has to fit
into two human hands. Both are rectangles, because every digital
display is a rectangle. And that's about it.
Of course there may be other form factors (implants, wearable, ...)
but those will be a totally different thing.</p>
<p>Now what needs to happen for an IBM-PC moment on
mobile devices are three things:</p>
<ol>
<li>Hardware components have to become a lot smaller.</li>
<li>One big player has to make a good spec.</li>
<li>A critical mass of standardized devices and components have to be built and sold.</li>
</ol>
<p>I sincerely hope that this will happen some day.</p>
<p>The good thing is, it's not that unlikely.
Once the hardware components will be small enough, device
design will not be dominated by engineering problems
and solutions anymore. Like a PC case, the phone
or tablet could become a case for components. Then everything
that is needed will be the spec for developing components
and plugging them together.
Oh yes, and someone has to start building those machines
and components.
As with the IBM-PC and Intel, the CPU designer
(or a vendor of some other important component which is
protected by strong IP rights)
could then play the role of the spec designer.
When the moment comes, I think we can expect one
of the mobile CPU designers (ARM, MIPS, ...) to
take that road (because they would get a monopoly,
and which company would miss that opportunity).</p>
<p>The bad thing is: It might take some time (or never happen at all).
The engineering problems currently seem far away from being
easily solvable.</p>
<p>When there is sufficient supply of vertically integrated but differentiated
mobile devices (different models with different features),
and at the same time the costs of customizing are high then
of course market demand for customizable devices will be low.
The automobile market works like that. You would never buy some
components and plug a car together. Regulatory and technological
problems increase customization costs while at the same time
the market has differentiated the products enough so that
everyone will find a car that fits his needs.</p>
<p>The IBM-PC moment might indeed never repeat, because the IBM-PC
was not only an open design, but also a leapfrog at the same time.
Unlike most competitors who used 8bit processors, IBM used a 16bit
chip in the IBM-PC that was able to deliver considerable more
performance. So the advantage of buying "IBM-PC" or
"IBM-PC compatible" was not only a minor improvement in a feature
here or there, but a big killer feature for the customer.</p>
<p>To repeat that on the mobile market, a similar killer feature
may be needed to gain critical mass quickly and get other
vendors to adopt. In the prospect of the current engineering
problems, sadly, steady evolution towards a automobile-like
differentiated but vertically integrated market seems more plausible.</p>
<p>As seen on the PC market, a highly vertically disintegrated market
can be very innovative. Open standards are the key as competition no longer
has to be about who is setting the standards but about features inside
the standard.
But finding a player who at the same time is capable of and willing to
put a killer feature in use to push an open standard will not be easy.
And back then, the IBM-PC moment wasn't really intentional.
I believe that had IBM known the consequences,
they would not have opened their platform for anyone to clone.</p>
<h2>tldr</h2>
<p>Mobile devices (smartphones, tablets) are highly integrated products.
Vendors control the hardware, the operating system, the developer
tools and a lot of userland code. This is similar to the
situation on the mainframe market of the 70's and 80's.
Back then, the open IBM-PC hardware and software platform broke through
the strong lock in effects on those markets.
Google's Android is a step into the right direction but not enough.
An IBM-PC moment is needed for mobile devices to become free.</p>
<h2>disclaimer</h2>
<p>It's easy to miss a critical piece when putting these chains of arguments
together. If you think I have made a mistake, just drop me a note.</p>
	]]>
	</description>
</item>


<item>
	<title>silence and joy</title>
	<link>http://www.cleeus.de/w/blog/2012/01/24/silence_and_joy.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Tue, 24 Jan 2012 20:50:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2012/01/24/silence_and_joy.html</guid>
	<description>
	<![CDATA[
<p>This is a good one on TED:</p>
<p><a href="http://feedproxy.google.com/~r/TEDTalks_video/~5/5O5tm-8aROc/AJJacobs_2011P.mp4">A.J. Jacobs: How healthy living nearly killed me.</a></p>
	]]>
	</description>
</item>


<item>
	<title>brainfood</title>
	<link>http://www.cleeus.de/w/blog/2011/11/23/brainfood.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Wed, 23 Nov 2011 07:50:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/11/23/brainfood.html</guid>
	<description>
	<![CDATA[
<p>Food for your brain:
<a href="http://news.ycombinator.com/">news.ycombinator.com</a></p>
<p>It's better than heise.de .</p>
	]]>
	</description>
</item>


<item>
	<title>Best coffee beans in town</title>
	<link>http://www.cleeus.de/w/blog/2011/11/14/best_coffee_in_town.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Mon, 14 Nov 2011 08:00:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/11/14/best_coffee_in_town.html</guid>
	<description>
	<![CDATA[
<p>Tried a lot of different beans here in Berlin.
These are the best: "African Queen" by <a href="http://www.impala-coffee.com/">Impala Coffee</a></p>
<p>A house blend. Dark roasted, with a hint of caramel.</p>
<p>Get it fresh.</p>
	]]>
	</description>
</item>


<item>
	<title>moc: music on the console</title>
	<link>http://www.cleeus.de/w/blog/2011/11/11/moc_music_on_console.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Fri, 11 Nov 2011 17:03:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/11/11/moc_music_on_console.html</guid>
	<description>
	<![CDATA[
<p>When I listen to music, I mostly use my atom-board home server as a player
and of course I want to control it from remote.
But I don't like those complicated networked sound architectures (mpd, pulseaudio, jack).
I just want to ssh on the box, put some music on the playlist and disconnect.</p>
<p>To do that, you obviously need a player that works from the command line.
So far I used <a href="http://herrie.info/">herrie</a>. It has done a great job and I still
like its simple interface.</p>
<p>But recently I wanted to upgrade from the onboard Intel HD audio to a solution
with optical out. As I will need the single PCI slot for a gigabit ethernet card,
I bought a USB soundcard, a "Creative SoundBlaster X-Fi 5.1 Sourround Pro USB".
It works pretty much out of the box, you just need to build snd-usb-audio module
into the kernel (3.0.4 that is).
Sadly it's a pretty stupid device: It has no hardware mixer.
So alsamixer won't give me a volume control bar, unless I set one up through
a complicated asound.conf soft mixer. But I don't really need that anyway.</p>
<p>So, back on topic, the new USB soundcard now is connected, but for some reason
herrie doesn't work well with it. Sound is horrible and stuttering.
Maybe I'll file a bug...</p>
<p>So it was time to look for other players. Gentoo has
<a href="http://packages.gentoo.org/package/media-sound/moc">media-sound/moc</a>.
Turns out it was a good idea to try something new. First of all, moc works
nicely with the USB soundcard. It might have something to do with the architecture.
<a href="http://moc.daper.net/">Moc</a> is multithreaded and has a seperate thread for the playback and for the user interface.</p>
<p>This is how <a href="http://moc.daper.net/">moc</a> looks on my console:
<img alt="moc music on console audio player screenshot" src="../media/postdata/moc-screenshot.png" /></p>
<p>Besides beeing a functional console music player, moc also has some cool features:</p>
<ul>
<li>can run in the background (without screen)</li>
<li>is themable</li>
<li>can seek pretty fast</li>
<li>has it's own soft mixer for volume</li>
<li>supports a good number of audio formats</li>
</ul>
<p>So if you are looking for a good console audio player, you can give it a try.</p>
	]]>
	</description>
</item>


<item>
	<title>What failing RAM looks like in the debugger.</title>
	<link>http://www.cleeus.de/w/blog/2011/11/11/failing_ram_in_the_debugger.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Fri, 11 Nov 2011 07:00:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/11/11/failing_ram_in_the_debugger.html</guid>
	<description>
	<![CDATA[
<p>Most native code / C++ software vendors nowadays have implemented coredumps
for their software. "coredump" is a term from computer history, when
main memory was <a href="http://en.wikipedia.org/wiki/Magnetic-core_memory">magnetic core memory</a>.
A coredump is the frozen and stored state of a process at the time of a CPU exception (aka crash).
It can be loaded in the debugger and analyzed (given you have the binary and the debugging symbols).
On Linux it must be enabled in the kernel and by the system administrator via rlimit.
On Windows, developers can use structured exception handling to catch a deadly
exception and execute some little piece of code which then calls a Windows API function
which creates a coredump.</p>
<p>When a crash in a production environment should happen, you then get a coredump stored on disk
and can receive that from you customer. While crashes should not happen at all, when they happen,
coredumps are extremely helpful.</p>
<p>Now recently I got a very special coredump on my desk.
Here is what happened (Intel assembly this time, as it's on Windows).</p>
<p>The exception handler has catched an exception:
"Unhandled exception at 0x57dff720 in SomeBinary.exe.dmp: 0xC0000005: Access violation."</p>
<p>Hmm, not much to see here, but the debugger put's us at the position of the crash:</p>
<div class="codehilite"><pre>.
.
.
004D4AFF  mov         edx,dword ptr [ecx]
004D4B01  push        eax
004D4B02  mov         eax,dword ptr [edx]
004D4B04  call        eax
-- baaammmm, crash here --
.
.
.
</pre></div>


<p>With registers:</p>
<div class="codehilite"><pre>EAX = 57DFF720 //indeed, thats what happened
ECX = 7843BD48
EDX = 7891F8D0
EIP = 004D4B06 //crash here
</pre></div>


<p>Hmm, what could that be? When something like that happens during development, you most
likely have used an old invalid pointer or committed a crime of pointer juggling.
When it happens during production, it's probably something terrible like
a heap corruption or a race condition which you didn't find during development.
The fact that there were something like a hundred
other threads open in this coredump, didn't make matters betters.
I feared for the worst.
So the first thing I did was to find the bigger picture of where exactly I am in the code.
The problem with production assembly code is, that it's optimized and has little to do
with the source code anymore. Especially C++ templates and inlining make reading assembly
hard, even if you have the sources.</p>
<p>From the debugger line it was clear that this is in the return section of a method call
when the compiler starts calling destructors. Something like this:</p>
<div class="codehilite"><pre><span class="kt">void</span> <span class="n">SomeClass</span><span class="o">::</span><span class="n">someMethod</span><span class="p">(...)</span> <span class="p">{</span>
  <span class="n">boost</span><span class="o">::</span><span class="n">intrusive_ptr</span><span class="o">&lt;</span><span class="n">SomeBaseClass</span><span class="o">&gt;</span> <span class="n">foo</span><span class="p">(</span><span class="k">new</span> <span class="n">SomeContreteClass</span><span class="p">());</span>
  <span class="p">.</span>
  <span class="p">.</span>
  <span class="p">.</span>
  <span class="c1">// -- crash here, right before the return --</span>
  <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>


<p>A little more assembler context made it clear, that the final <code>call</code> instruction
should have been the invocation of the virtual destructor from the SomeConcreteClass
object in the heap.
<code>foo</code> is an intrusively reference counted pointer on the dynamically allocated object.
Intrusively counting the reference means, that the object itself contains
the counter and carries it, even when it is degraded to just a raw pointer.
Because SomeConcreteClass was derived from SomeBaseClass and polymorphic, the
compiler needs to find out the correct destructor uppon destruction of
<code>foo</code>. With that knowledge I could start looking at the heap.
ECX seemed to contain the address of the object in the heap. Now I had a look
at the memory:</p>
<div class="codehilite"><pre><span class="nv">@ECX:</span>
<span class="mh">0x7843BD48</span> <span class="o">|</span> <span class="mo">00</span><span class="mi">82</span><span class="n">cf54</span> <span class="mo">00000001</span> <span class="o">...</span>
</pre></div>


<p>That looks like a vtable pointer followed by the refcount which was down to 1.
Maybe the vtable pointer was bad? So I followed the vtable pointer.</p>
<div class="codehilite"><pre><span class="mh">0x0082CF54</span> <span class="o">|</span> <span class="mo">004355</span><span class="n">b0</span> <span class="mo">0042</span><span class="n">c5d0</span> <span class="mo">0042</span><span class="n">c5e0</span> <span class="o">...</span>
</pre></div>


<p>Well, this doesn't help much, but the disassembly looked sane and the debugger had a symbol
for the address:</p>
<div class="codehilite"><pre>SomeConcreteClass::`vftable&#39;:
0082CF54  mov         al,55h //whatever, I don&#39;t care 
0082CF56  inc         ebx    //but it looks OK
0082CF57  add         al,dl
</pre></div>


<p>So why could that fail?</p>
<p>Just look what happend. The first <code>mov</code> instruction at <code>0x004D4AFF</code> is supposed
to load the address of the vtable into EDX which should be <code>0x0082cf54</code>.
But EDX has a totally different value (<code>0x7891F8D0</code>).</p>
<p>And that's the end of the story: the CPU must have failed on us.
I repeat: The CPU has failed.</p>
<p>Either through broken RAM (most likely) or something like a
<a href="http://en.wikipedia.org/wiki/AMD_Phenom">TLB</a>
<a href="http://www.theinquirer.net/inquirer/news/1049427/intel-denies-core-i7-tlb-bug">bug</a>.
I just loved finding that one ... and telling the customer to
fix his hardware :)</p>
	]]>
	</description>
</item>


<item>
	<title>PKI bug of the day</title>
	<link>http://www.cleeus.de/w/blog/2011/11/03/pki_bug_of_the_day.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Thu, 03 Nov 2011 17:45:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/11/03/pki_bug_of_the_day.html</guid>
	<description>
	<![CDATA[
<p>You know, those web browsers with ssl support ... they have a reason why they basically
ignore any OCSP/CRL errors.</p>
<p>Today I stumbled uppon a root certificate of a not-so-small international CA.
Opinions differ, but generally CA roots should also be revokable and thus have an OCSP
responder configured. Said certificate is one of the better roots and thus
has a responder <code>https://ocsp.some-ca.com</code> set.</p>
<p>Now when you contact this responder something unexpected happens:</p>
<div class="codehilite"><pre>~ $ openssl s_client -host ocsp.some-ca.com -port 443
CONNECTED(00000003)
3125948630696:error:140770FC:SSL routines:\
  SSL23_GET_SERVER_HELLO:\
  unknown protocol:s23_clnt.c:683:

...
</pre></div>


<p>hmm ... something went wrong.
<code>tcpdump</code> + <code>wireshark</code> shed some light on the problem: SYN, SYN-ACK, ACK ... tcp up, that's good ... now we send the SSL Client Hello.
What we didn't expect was the reply from the server. A blatant (unencrypted):</p>
<div class="codehilite"><pre>HTTP/1.1 400 Bad Request
Server: Apache-Coyote/1.1
Transfer-Encoding: chunked
Date: Thu, 03 Nov 2011 17:45:54 GMT
Connection: close
</pre></div>


<p>Indeed, openssl is right: that doesn't look like an SSL handshake. It looks more like a
<a href="http://www.youtube.com/watch?v=IhJQp-q1Y1s">slap with a large trout</a>.</p>
	]]>
	</description>
</item>


<item>
	<title>Zeichne mit: Vorratsdatenspeicherung verbieten.</title>
	<link>http://www.cleeus.de/w/blog/2011/09/13/epetition_gegen_vorratsdatenspeicherung.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Tue, 13 Sep 2011 17:54:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/09/13/epetition_gegen_vorratsdatenspeicherung.html</guid>
	<description>
	<![CDATA[
<p>Die Bundesregierung will mal wieder Vorratsdatenspeicherung, wir wollen es nicht.
Also gibt es eine Petition <a href="http://zeichnemit.de/">die man mitzeichnen kann</a>.
Wir müssen also mal wieder alle unser Login rauskramen und mitmachen.</p>
<p>Argumente dafür:</p>
<ul>
<li>Sie wissen eh schon auf welcher Seite du stehst.</li>
<li>Es ihnen nochmal zu sagen, kann schlimmstenfalls dazu führen, dass die Botschaft diesmal ankommt.</li>
<li>Ja, Petitionen bewirken formell eh nichts. Die Wirkung kann aber doch informell in den Köpfen der Abgeordneten eintreten.</li>
</ul>
<p>Also nochmal: Login rauskramen, <a href="http://zeichnemit.de/">mitzeichnen</a>.</p>
	]]>
	</description>
</item>


<item>
	<title>reblog: Optimizing For The Instruction Cache</title>
	<link>http://www.cleeus.de/w/blog/2011/08/26/optimizing_for_instruction_cache.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Fri, 26 Aug 2011 16:19:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/08/26/optimizing_for_instruction_cache.html</guid>
	<description>
	<![CDATA[
<p>Charilaos Kalogirou on <a href="http://altdevblogaday.com/">#AltDevBlogADay</a>:
<a href="http://altdevblogaday.com/2011/08/22/optimizing-for-the-instruction-cache/">"Optimizing for the instruction cache"</a></p>
<p>The basic advice is: When you write an algorithm that iterates over data and your code branches at some point
depending on the values of your current data point (say by a virtual member function call), then the CPU gets frequent cache misses
on the instruction cache. If you organize your data (e.g., by sorting or keeping different lists), you can spare your
CPU these misses.</p>
	]]>
	</description>
</item>


<item>
	<title>The Mechanics of Improvement</title>
	<link>http://www.cleeus.de/w/blog/2011/08/11/mechanics_of_improvement.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Thu, 11 Aug 2011 06:03:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/08/11/mechanics_of_improvement.html</guid>
	<description>
	<![CDATA[
<p>Yesterday I had an epiphany (a small one). It happened when I installed
a Nagios server and started to set up hosts and checks. Nagios is
a network monitoring tool that consists of a nagios server that runs checks
on and collects information about a number of other hosts. As an admin you can
get alerts and notifications about events or just view the green/yellow/red
status of these checks.</p>
<p>The elegance in Nagios is, that checks are very simple and isolated.
They just shell commands that return a status (green/yellow/red)
and a message. That's it. There are a number of predefined checks and you can
easily add new ones.</p>
<p>Now, when you start monitoring an existing network, you don't have checks for every
service there is. But you can add them. One after another. Every check you add,
adds value to the monitoring system. And eventually you will have checks for all the vital parts
in the network.</p>
<p>It's like Unit Testing for admins.</p>
<p>When you have a legacy code base that is not unit tested and start adding tests,
you have 0% code coverage. But every test you add, adds value to the test suite.</p>
<p>I think there is a more general lesson to be learned from these two examples.
In any existing system, there are two strategies for improvement:</p>
<ol>
<li>Revolution</li>
<li>Evolution</li>
</ol>
<p>Revolution means, making a big investment and creating something new to replace the old.
Think of an old house that has become to costly to repair. You have to tear it down and
build a new one. Thats a big investment and an associated risk that the house might not
stay on budget and make you go bankrupt. That's why revolution most of the times is not
feasible.</p>
<p>Evolution means small repeated investments to replace or enhance the old with something new.
Speaking in terms of house-building, it would mean to replace parts of the house, maybe room
by room or wall by wall with something new. Of course that's hard if your architecture
does not support incremental change (buildings don't, most of the time).
Supporting incremental change is key to improvement
in situations where the revolution approach is infeasible.
To support incremental change means:</p>
<ol>
<li>The returns of added increments must be immediate.</li>
<li>The costs of added increments must stay constant.</li>
<li>Returns must not diminish.</li>
</ol>
<p>All three conditions hold true for Unit Testing and Nagios. I'll explain a bit more about the
three rules:</p>
<h2>1. Instant Benefits</h2>
<p>You have to start with something small and even that has to be of value to you. Building
a new house is the opposite: It is of no use only until it is completely done.
When introducing a Unit Testing framework to an existing codebase, you have low fixed
costs (adding the framework) and get immediate results from your first test on.</p>
<h2>2. Constant Costs</h2>
<p>According to the second law of thermodynamics, entropy in any closed system will only grow.
This seems to be true for complex systems, too. When increments are added, most of the times
a bit of coherence and order is lost. Adding another increment will be more costly.</p>
<p>In software development you can observe that directly:
It all starts out with a clean design. Then someone has to add something unexpected
and violates the design, a little bit. The next time someone wants to add something,
he will have a harder time. Costs didn't stay constant but grew with the number of
increments.</p>
<p>In Unit Testing, the architecture of isolated test does it's best to not let this happen.
It tries to keep entropy of the system low. Nagios checks work the same way.
They are isolated and consistently organized. Adding another check is not influenced
by existing checks.</p>
<h2>3. No Diminishing Returns</h2>
<p><a href="http://en.wikipedia.org/wiki/Diminishing_returns">Diminishing returns</a> is a term from economics.
It means that returns will scale lower than linearly with investment.
This must not be the case when you want a system to work. Adding an increment must result
in an outcome that is proportional or even higher than proportional to the investment.</p>
<p>Take Unit Testing. Every test you add, results in code coverage (the return) which is
more or less proportional to the investment (there might be a point where adding more tests
doesn't result in more code coverage, but that's when you have a fully tested system and
just don't need to add more tests). Again, Nagios checks behave the same way: Every check you add
results in a better ability to react to defects in the network (linear returns).</p>
<p>When returns start to diminsh, you probably need a new architecture to replace the old one.
It might be a completely different one, it might need to be more flexible, it might be designed
for completly different goals. Diminishing returns mean, that you have
pushed the system to it's limits. Don't go any further.</p>
<h2>Conclusion</h2>
<p>If you want to build systems that induce constant improvement, build them with these
three rules in mind.</p>
<p>Examples here might have been overly simplistic. There always are border conditions when
the mechanics stop working. That's probably when you need a new system (revolution).
But when you see something that doesn't have these mechanics in place,
you should either change it or leave it.</p>
	]]>
	</description>
</item>


<item>
	<title>On the bookshelf: "Clean Code" by Robert C. Martin</title>
	<link>http://www.cleeus.de/w/blog/2011/08/10/book_clean_code.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Wed, 10 Aug 2011 04:44:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/08/10/book_clean_code.html</guid>
	<description>
	<![CDATA[
<p>The bible for the modern professional programmer.
You will find a lot of reviews on the net, so I don't have to write another one.
It's about how to write good code, which seems to be still more an art than a science.
It should be taught in every CS education. But it isn't. Instead that's where
you will find some of the most horrible code anwhere.
I barely remember those tree and graph implementations
in a Java <code>int[]</code> array ... . Ah, and the variable names ... <code>p</code> <code>q</code> <code>t0</code> <code>za</code> <code>s</code> ... .
As if there is bytes to save ... well our professors learned programming on punchcards.</p>
<p>But, back on topic:
<a href="http://www.amazon.de/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882">"Clean Code: A Handbook of Agile Software Craftsmanship"</a>
teaches you, what university courses do wrong and how your code can
get better. Go read it. :)</p>
	]]>
	</description>
</item>


<item>
	<title>MOTM: bassdrive.com</title>
	<link>http://www.cleeus.de/w/blog/2011/08/09/bassdrive_com.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Tue, 09 Aug 2011 06:48:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/08/09/bassdrive_com.html</guid>
	<description>
	<![CDATA[
<p>music of the month:</p>
<p><a href="http://bassdrive.com/">bassdrive.com</a></p>
<p>drum'n'bass -- classic :)</p>
	]]>
	</description>
</item>


<item>
	<title>C++0x: New Features That Will Make a Difference.</title>
	<link>http://www.cleeus.de/w/blog/2011/08/08/cpp0x_new_features_that_matter.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Mon, 08 Aug 2011 04:40:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/08/08/cpp0x_new_features_that_matter.html</guid>
	<description>
	<![CDATA[
<p>Most programers already have heard of it: There will be a new C++ ISO standard.
It's in the making for quite some time now and compiler vendors already have implemented
a lot of the features. It's commonly refered to as C++0x as people expected it to appear
somewhere in 200x. Now it seems to become C++11, maybe C++12.</p>
<p>The changes are huge and looking at C++0x source code is like looking at a new language.
It has both, new core language features and useful additions to the STL.
In this post I'll introduce the feature of which I think that they will
change the way C++ is used the most.</p>
<h2>shared_ptr</h2>
<p><code>shared_ptr</code> is an addition to the STL. It's a template class <code>std::shared_ptr&lt;T&gt;</code> that implements
a reference counting pointer. Reference counting is a commonly known technique to C++ programmers and every
big codebase seems to have it's own implementation. They are also refered to as smart pointers.</p>
<p>The idea is simple: Instead of using raw pointers,
you pass the pointer into an object that overloads the <code>*</code> and <code>-&gt;</code> operators so you can use
the object like a pointer. But additionally the object has a reference counter attached. Every
time you make a copy of that pointer object, the reference counter is increased. Uppon destruction,
the reference counter is decreased. If the reference counter goes down to 0, <code>delete</code> is called.
Now the big advantage of having this in the STL means, that we can easily pass <code>shared_ptr</code> in
interfaces.</p>
<p>Constructing <code>shared_ptr</code> works like this:</p>
<div class="codehilite"><pre><span class="cp">#include &lt;string&gt;</span>
<span class="cp">#include &lt;iostream&gt;</span>
<span class="cp">#include &lt;memory&gt;</span>
<span class="k">using</span> <span class="k">namespace</span> <span class="n">std</span><span class="p">;</span>

<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span><span class="p">,</span> <span class="kt">char</span><span class="o">**</span><span class="p">)</span> <span class="p">{</span>
  <span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span> <span class="n">hello</span> <span class="o">=</span> <span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span><span class="p">(</span><span class="k">new</span> <span class="n">string</span><span class="p">(</span><span class="s">&quot;Hello World!&quot;</span><span class="p">));</span>
  <span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="o">*</span><span class="n">hello</span> <span class="o">&lt;&lt;</span> <span class="n">endl</span><span class="p">;</span>
  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>


<p>As you can see, there is no <code>delete</code>. Usually this was a memory leak, but <code>shared_ptr</code> will call
<code>delete</code> for us. Common <code>shared_ptr</code> implementations create an internal refcounter object on the
heap, so you will have 2 <code>new</code> calls. Using the <code>make_shared</code> template function can even
eliminate that and it's also shorter:</p>
<div class="codehilite"><pre><span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span> <span class="n">hello</span> <span class="o">=</span> <span class="n">make_shared</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span><span class="p">(</span><span class="s">&quot;Hello World!&quot;</span><span class="p">);</span>
</pre></div>


<p>You can pass around copies of <code>hello</code> as you like, the refcounter will make sure it stays alive
as long as needed. Using <code>shared_ptr</code> almost feels like garbage collection in C++. There
is one situation where reference counting fails: cycles. Think of a tree where parents know
their children and children know their parents. If you use <code>shared_ptr</code> for both, forward and backward
links, you have a strong cycle and the refcounters will never go down to 0.
That's what <code>weak_ptr</code> is for. You can make a <code>weak_ptr</code> from a <code>shared_ptr</code>. <code>weak_ptr</code> get's notified
when the object it points to is destructed, so you won't get dangling pointers.
To access an object behind a <code>weak_ptr</code>, you have to call <code>lock()</code> on it. This will return a <code>shared_ptr</code>.
My prediction is, that C++ libraries will soon start to use smart pointers extensively
and that will make C++ a better language.</p>
<p>If you want to learn more about smart pointers I can suggest the
<a href="http://blogs.msdn.com/b/vcblog/archive/2011/02/11/10128357.aspx">"Advanced STL" video lecture series by Stephan T. Lavavej on channel 9</a>
. In 
<a href="http://channel9.msdn.com/shows/Going+Deep/C9-Lectures-Stephan-T-Lavavej-Standard-Template-Library-STL-3-of-n/">part 3</a>
he discusses everything about smart pointers.</p>
<h2>auto</h2>
<p>The <code>auto</code> keyword is a new core language feature. It implements L-value type inference which is an
idea that most prominently is used in the Go language. In any assignment statement, there is an L-value
which is on the left side of the <code>=</code> operator and an R-value (on the right side) which is the result
of an evaluated expression.
The R-value already has a strict type. When the L-value is a definition of a new variable, most of the time
you want it to have the type of the R-value. And since your compiler already knows that type, you can now
just tell it to use it:</p>
<div class="codehilite"><pre><span class="k">auto</span> <span class="n">hello</span> <span class="o">=</span> <span class="n">make_shared</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span><span class="p">(</span><span class="s">&quot;Hello World!&quot;</span><span class="p">);</span>
</pre></div>


<p><code>hello</code> now has exactly the type <code>std::shared_ptr&lt;std::string&gt;</code>. This might not seem to be
that important, but it greatly improves readability and coding speed in C++.</p>
<h2>lambdas</h2>
<p>In one of my last posts, I explained why the STL algorithms are decoupled from the containers they work on.
Now some of these algorithms, most prominently <code>for_each</code>, require a function that will be used / applied uppon
the elements in the container. Modern dynamic languages like python long support functions as arguments, not to
mention functional languages.</p>
<p>C++03 did too, but the syntax was verbose. You could either pass a function pointer to a real function or implement
a callable function object (an object which has the <code>()</code> operator overloaded). Both were to complicated
and it was much easier to write a simple <code>for</code> loop then to use <code>for_each</code>. Now lambda expressions change that.
Take a look at this simple example:</p>
<div class="codehilite"><pre><span class="n">vector</span><span class="o">&lt;</span><span class="kt">int</span><span class="o">&gt;</span> <span class="n">a_list</span><span class="p">;</span>

<span class="n">for_each</span><span class="p">(</span><span class="n">a_list</span><span class="p">.</span><span class="n">begin</span><span class="p">();</span> <span class="n">a_list</span><span class="p">.</span><span class="n">end</span><span class="p">();</span>
  <span class="p">[](</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="n">endl</span><span class="p">;</span> <span class="p">}</span>
<span class="p">);</span>
</pre></div>


<p>There is a lot more to lambdas, which I haven't even fully explored myself. E.g. using notations like <code>[&amp;]</code>
or <code>[=]</code> you can access the variable in the lambdas scope. You can read some more in
the <a href="http://en.wikipedia.org/wiki/C%2B%2B0x#Lambda_functions_and_expressions">wikipedia section about C++0x lambdas</a>.
Lambdas will greatly increase the power and expressiveness of the language and will help improve the
functional style parts of C++.</p>
<h2>function / bind</h2>
<p>A <code>std::function</code> is a typed callable object. It is typed on it's inputs and outputs and  can be constructed from
a variety of constructs: raw function pointers, function objects, lambdas and binds.
Using <code>std::function</code> it is easy to setup callbacks / signaling mechanisms in a typesafe C++ manner.
The classic example is a button onClick event:</p>
<div class="codehilite"><pre><span class="k">class</span> <span class="nc">Button</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
  <span class="kt">void</span> <span class="n">setOnClickEvent</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o">&lt;</span><span class="kt">void</span> <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="o">&gt;</span> <span class="n">event</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">onClickEvent</span> <span class="o">=</span> <span class="n">event</span><span class="p">;</span>
  <span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
  <span class="kt">void</span> <span class="n">performOnClick</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">if</span><span class="p">(</span><span class="n">onClickEvent</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">onClickEvent</span><span class="p">();</span>
    <span class="p">}</span>
  <span class="p">}</span>
  <span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o">&lt;</span><span class="kt">void</span> <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="o">&gt;</span> <span class="n">onClickEvent</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>


<p>The <code>std::bind</code> is a first order function object -- a function that has some arguments saturated, while others are left to be filled.
It looks like this:</p>
<div class="codehilite"><pre><span class="kt">void</span> <span class="n">repeatedBeep</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">for</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">&lt;</span><span class="n">x</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">//...</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span><span class="p">,</span> <span class="kt">char</span><span class="o">**</span><span class="p">)</span> <span class="p">{</span>

  <span class="n">Button</span> <span class="n">button</span><span class="p">;</span>

  <span class="n">button</span><span class="p">.</span><span class="n">setOnClick</span><span class="p">(</span> <span class="n">std</span><span class="o">::</span><span class="n">bind</span><span class="p">(</span><span class="n">repeatedBeep</span><span class="p">,</span> <span class="mi">42</span><span class="p">)</span> <span class="p">);</span>

  <span class="c1">//... </span>
<span class="p">}</span>
</pre></div>


<p>Arguments in a <code>std::bind</code> can also be other function objects which then results in a higher order function object. Wheeee... functional
programming in C++ :)
Admitted, you can do lot's of stupid stuff with binds and functions (and lambdas), exspecially if your arguments go out of scope where
you have passed references. But as always: C++ assumes the intelligent programmer. ;)
I guess when C++0x is in all stable mainstream compilers, we will see a new generation of GUI libraries which will make good use of these
facilities.</p>
<p>There is a lot more to C++0x, but most of them are details which make a difference for the hard-core library implementer.
The features above are what I think will influence our every day code and architecture. And I don't know what you think,
but I'm pretty excited about these new features.</p>
	]]>
	</description>
</item>


<item>
	<title>Expose yourself to other languages.</title>
	<link>http://www.cleeus.de/w/blog/2011/08/07/expose_yourself_to_other_languages.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Sun, 07 Aug 2011 06:37:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/08/07/expose_yourself_to_other_languages.html</guid>
	<description>
	<![CDATA[
<p>A while ago I have found an interview with
<a href="http://en.wikipedia.org/wiki/Herb_Sutter">Herb Sutter</a>
(
<a href="http://channel9.msdn.com/Shows/Going+Deep/Conversation-with-Herb-Sutter-Perspectives-on-Modern-C0x11">this one on channel 9</a>
) where he was talking about the upcoming C++ standard (C++0x or C++11).
There was one sentence in that Interview, that didn't seem that much important, but I have found it very true
and it constantly comes back in my mind: It's important to get exposed to other languages. (at 18:00 in the video)</p>
<p>And it really is important: If you keep sticking your head deeper and deeper into one single language,
you tend to forget why you do things the way you do. But at the moment you switch languages, you have to start
thinking again. What's the right way to express what you want to do?
Doing that, makes you a better programmer (of course you should always do that in your native language, as well).</p>
<p>I just noticed this when I had to do some Java again. Java removes some features from C++, that allow you to do
stupid stuff and adds some features that encourage you to do good stuff. You can learn from Java that
maybe you shouldn't do everything you can do. Or take Python: Pythons standard libraries is once of the best
I have seen. Everything is where you intuitively expect it to be. Also, in Python there are simple free functions
that are not part of any class (but part of a package/module/namespace) and that happens to provide much of the
simplicity of pythons standard library. You can learn from Python how good libraries should
look like and that not everything is a class.</p>
<p>So, next time you have an itch and need to create something, choose a new language. Even if it slows you down, take
the time and your motivation to try out ... maybe ... Ruby, Go, Scala, C++0x or even Haskel. It will make you
a better programmer.</p>
	]]>
	</description>
</item>


<item>
	<title>Web Privacy with Chromium</title>
	<link>http://www.cleeus.de/w/blog/2011/07/31/social_web_privacy_on_chromium.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Sun, 31 Jul 2011 08:51:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/07/31/social_web_privacy_on_chromium.html</guid>
	<description>
	<![CDATA[
<p>Social networks are a good thing: People
can connect, communicate and share.
I think the Internet and within it, the social networks, will raise a whole
generation of young people who will have learned to apreciate
different cultures and people from around the world.</p>
<p>But most of these social networks, as much as they do good,
they also violate your privacy through a very clever economic
and technical measure: The Like! Button.</p>
<p>Content producers profit from putting such buttons
on their pages because it increases their audience. Users
profit from it because they can share stuff more easily.
And the button providers profit from it because they can
track where you go, what you look at and provide
targeted advertising.</p>
<p>Now targeted advertising and profiling is not necessarily a problem.
It only becomes a problem when you can't control your profile anymore.
And that is what has happened with those Like buttons:
They are everywhere and you cannot opt out. Or can you?</p>
<p>Yes you can, even with Chromium. I am currently trying out two
Chromium extensions:
<a href="https://chrome.google.com/webstore/detail/mlomiejdfkolichcflejclcbmpeaniij">Ghostery</a>
and
<a href="https://chrome.google.com/webstore/detail/jeoacafpbcihiomhlakheieifhpjdfeo">Disconnect</a>.</p>
<p>They both do a similar job: They block those tracking web bugs and like buttons.
Unlike the popular Firfox <a href="http://sharemenot.cs.washington.edu/">ShareMeNot</a> plugin
they completely remove the buttons and thus remove function.
But I can accept that and share manually.</p>
	]]>
	</description>
</item>


<item>
	<title>wiki2beamer in the press</title>
	<link>http://www.cleeus.de/w/blog/2011/07/30/wiki2beamer_in_linux_magazin.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Sat, 30 Jul 2011 20:15:59 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/07/30/wiki2beamer_in_linux_magazin.html</guid>
	<description>
	<![CDATA[
<p>Yay, looks like <a href="http://wiki2beamer.sourceforge.net/">wiki2beamer</a> is
mentioned in the 2011/08 issue of the german print
magazine "Linux Magazin" (thanks for the hint to <a href="http://blog.hartwork.org">sping</a>).
According to <a href="http://shop.linuxnewmedia.de/dh11108.html">this content overview</a> it's
only a small "Tool Tips" section. But hey, it's print!</p>
<p>There seems to be no free online version of the article so I guess I'll get a
copy at the kiosk.</p>
	]]>
	</description>
</item>


<item>
	<title>Google+ anyone?</title>
	<link>http://www.cleeus.de/w/blog/2011/07/27/invite_to_google_plus_anyone.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Wed, 27 Jul 2011 21:06:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/07/27/invite_to_google_plus_anyone.html</guid>
	<description>
	<![CDATA[
<p>Do you want an invite to Google+? Just drop me a line.</p>
	]]>
	</description>
</item>


<item>
	<title>Constant memory makes the difference.</title>
	<link>http://www.cleeus.de/w/blog/2011/07/26/constante_memory_makes_the_difference.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Tue, 26 Jul 2011 18:05:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/07/26/constante_memory_makes_the_difference.html</guid>
	<description>
	<![CDATA[
<p>Today we take memory for granted. As a developer you often don't
think of it anymore and lavishly reserve buffers for all kinds of stuff.
That XML file? Just throw it into a DOM parser.
That ZIP file? Just decompress it into a memory buffer and write that to disk.
That email attachment? Just decode it in memory and see what it is.
This is perfectly ok, because if you would think for half an hour about every
chunk of memory you're about to reserve, you would get nothing done on time.</p>
<p>But, there is always a but.
The day will come when that XML file is a database dump with gazillions
of nodes or it contains a huge chunk of encoded binary and
your <code>new</code> / <code>malloc</code> just cannot get you fresh memory anymore.
The day will come, when some email user attaches a 9GB ISO image to an email
and tries to send it over your filtering SMTP server (I've heard, chief executives
really like to do that).</p>
<p>Yes, <code>new</code> can actually fail. On 32bit architectures you have a theoretical
raw address space of 4GB. But your process layout, stack, libraries and operating
system restrict that considerably so that only about 2GB (at least on windows)
are left for use.
Now the heap, where most of your long-living memory chunks will be allocated,
is like a harddisk: it can be fragmented. That means, if you sprinkled small objects
all over your heap, chances are your memory manager (new/malloc) won't find
a large continous chunk anymore. And that's when it will fail and throw an exception.</p>
<p>When that happens, and you really need to solve the problem, it's like standing with your
back to the wall. You have two choices:</p>
<ol>
<li>Offload large chunks to the harddisk, or</li>
<li>rewrite everything to be a streaming architecture.</li>
</ol>
<p>Well, 1. is kinda obvious: Go to the places where your code wants to get huge chunks of memory
and replace it with read/write access to temporary files. That's ugly, but it works and is
often the only option you have in existing codebases.</p>
<p>Now what does 2. mean? A streaming architecture?</p>
<p>In a streaming architecture you can only read and write fixed size chunks at once. That is
your program reserves some fixed buffer and then repeatedly reads into it, runs some processing on it
and finally writes the buffer content to some sink - repeat. Your read sources and write sinks could be
files, sockets, devices, whatever. The thing is: <em>Your overall design has to consider this requirement
up front.</em> Otherwise you've lost. Converting an existing step-by-step design into an on-the-fly streaming
architecture is close to a full rewrite.</p>
<p>Creating such an architecture can be hard. It might require some dirty hacks, when the output needs calculated
values (like checksums) in the beginning when they are only known after fully running through the input once.
Or it might require seekable input stream semantics when you need to know the the size of your input before
you have fully processed it. That's why designing clever output formats is so difficult.</p>
<p>But it's worth it in the long run. What would you think if <code>tar -xz</code> suddenly crashed with an out of memory
exception?</p>
<p>So, next time you process a file: Do it in chunks! :)</p>
	]]>
	</description>
</item>


<item>
	<title>C++: The NxM problem</title>
	<link>http://www.cleeus.de/w/blog/2011/07/25/cpp_m_times_n_problem.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Mon, 25 Jul 2011 19:04:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/07/25/cpp_m_times_n_problem.html</guid>
	<description>
	<![CDATA[
<p>One part of learning a language is getting
to know the standard runtime / library environment.
In all modern languages there are huge standard libraries
and they make up a big part of what the language feels like,
and how efficient you are as a developer in your day-to-day job.</p>
<p>In some languages these libraries are closely interwoven with the
language itself. Take Java and it's String class. It's part
of the library and located in the package <code>java.lang.</code> but it's so close to
the language that every string literal is converted to a String instance.</p>
<p>Or take the Python and its Exception classes. Even when basic language
expressions, like <code>import package</code> fail (e.g. because package cannot be found)
an Exception from the package <code>exceptions.</code> is thrown.</p>
<p>C++ takes a somewhat different approach.</p>
<p>C++ is rigorously divided into core language features and a standard library
that is implemented <em>using</em> these language features.
C++ core language features are things like:</p>
<ul>
<li>basic data types, arrays, pointers</li>
<li>functions</li>
<li>classes, inheritance, polymorphism</li>
<li>templates</li>
<li>exceptions</li>
<li>runtime type information</li>
</ul>
<p>The standard library in C++ provides a number of
mostly generic templates (classes and functions) and is called
the Standard Template Library - STL.</p>
<p>The basic building blocks of the STL are containers, iterators and algorithms.
Containers are standard computer science data structures - vectors, linked lists, priority queues, 
(hash) maps, sets, etc..</p>
<p>When using them, you will sometimes miss some helpfull member functions. E.g. there
are no .sort, no .revert, no .search member function in the std::vector class. At first you'll find
that really annoying. But like most of the things in C++, there is deeper meaning.</p>
<p>As said above, there is a collection of algorithms in the <code>&lt;algorithm&gt;</code> header.
These standard algorithms like sort, replace, copy, for_each work on all these
standard containers. Now why are they decoupled?
The argument is simple and is called the NxM problem:</p>
<p><em>Given N containers and M algorithms, you have to maintain NxM implementations
of these algorithms.</em></p>
<p>This is costly and error prone. The inventors of the STL
felt the need to solve this problem and invented a glue,
a unified interface that can expose some of the properties of containers so
that generic algorithms can work on them. This glue is iterators.</p>
<p>An iterator is like a fancy reference/pointer to an element of a container.
The weakest type of iterators supports the ++ increment operator and the * dereference operator.
That means you can make the iterator point to the next element in the container and
access what it's pointing to. The most powerful iterators expose a random access
subscript operator []. Going on from this abstraction, you can implement algorithm
template methods like std::copy that work equaly on raw C arrays and doubly linked lists.
Yes, you can use the STL algorithms on raw C arrays.</p>
<p>Try this:</p>
<div class="codehilite"><pre><span class="cp">#include &lt;algorithm&gt;</span>
<span class="cp">#include &lt;iostream&gt;</span>

<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span><span class="p">,</span> <span class="kt">char</span><span class="o">**</span><span class="p">)</span> <span class="p">{</span>
  <span class="kt">int</span> <span class="n">foo</span><span class="p">[</span><span class="mi">42</span><span class="p">];</span>
  <span class="k">const</span> <span class="n">size_t</span> <span class="n">foo_size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">foo</span><span class="p">)</span><span class="o">/</span><span class="k">sizeof</span><span class="p">(</span><span class="n">foo</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
  <span class="kt">int</span> <span class="n">bar</span><span class="p">[</span><span class="n">foo_size</span><span class="p">];</span>

  <span class="k">for</span><span class="p">(</span><span class="n">size_t</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">&lt;</span><span class="n">foo_size</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">foo</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="n">std</span><span class="o">::</span><span class="n">copy</span><span class="p">(</span><span class="n">foo</span><span class="p">,</span> <span class="n">foo</span><span class="o">+</span><span class="n">foo_size</span><span class="p">,</span> <span class="n">bar</span><span class="p">);</span>

  <span class="k">for</span><span class="p">(</span><span class="n">size_t</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">&lt;</span><span class="n">foo_size</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">bar</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">&lt;&lt;</span> <span class="s">&quot; &quot;</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>


<p>And next time, we'll take a look at the assembly :)</p>
	]]>
	</description>
</item>


<item>
	<title>Bürgerfest Schwabach</title>
	<link>http://www.cleeus.de/w/blog/2011/07/24/post.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Sat, 23 Jul 2011 23:17:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/07/24/post.html</guid>
	<description>
	<![CDATA[
<p>
great musik, great people, love everywhere :)
</p>

<p>
bookmark: <a href="http://www.derfallboese.de/">Der Fall Böse</a> - <a href="http://www.youtube.com/watch?v=xv3Swdr92ns">Raus</a>
</p>
	]]>
	</description>
</item>


<item>
	<title>Why I love C++</title>
	<link>http://www.cleeus.de/w/blog/2011/07/21/post_01.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Thu, 21 Jul 2011 21:42:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/07/21/post_01.html</guid>
	<description>
	<![CDATA[
<p>As most of you may know, on my job I'm not coding one of those
new fancy languages but old-school, hardcore, C++.
Now some may scream: OMFG, C/C++ - it causes security bugs,
it's verbose code, it's not cross platform, it's so 90's -
your job must be terrible.
But after more then a year, I can only say: I love it. In this
and probably some of the next posts, I'll try to show some of the things
I like.</p>
<p>Admitted, C++ is not for the faint-hearted. You have to
be prepared to get engaged with the language and the
machine. Once you got that, you will see that:</p>
<ul>
<li>C++ is one of the most portable languages</li>
<li>C++ (in not-C) seldomly causes security bugs (no more then say Java).</li>
<li>You get rewarded with uncompared performance.</li>
</ul>
<p>Of course you always have to choose the right tool for the job. If I had
to make a bunch of half educated "IT specialists" code some random business application,
I'd probably not choose C++ because they wouldn't have a clue why it crashed all the time
(Java could be a good tool here). Or if I'd want to write a tiny tool to do some
text processing, I'd just use python ;). But with clever coders and high
performance requirements, C++ is still an excelent choice.</p>
<p>And the performance gains can be dramatic. During my diploma thesis (together with Daniel Käs),
I did some heavy statics calculations on vast amounts of data. One of those was to calculate
a distance matrix. That is you have a number of points (vectors) with N dimensions
(talking about N&gt;20000 here) and want to know the distance of each point to every other
point. That's a distance matrix. Now our distance function was the
<a href="http://en.wikipedia.org/wiki/Cosine_distance">cosine similariy measure</a>
which is a little more demanding, but not much.</p>
<p>The first implementation was pure python. It took ages. We were effectively unable to repeat
the calculations more then once a day. Then I just ported the distance
function to C (using <a href="http://www.cython.org">Cython</a>). That already improved performance
by a factor of 2-3. But the real performance gain came, when I also ported the inner for-loop
into C (which, from a python point of view reduced complexity from O(NxN) to O(N) and from a C point
of view increased complexity from O(1) to O(N)). That was another speedup of like x10.</p>
<p>Now that was C, but C++ can be equally fast. I just recently discovered what a good optimizer can
do from some template code and carefull inlining. Let's take a look at the
"workhorse class of the STL" (S.T.Lavavej): std::vector.
Some even more old-school and hardcore C coders worry that using vectors instead of plain old arrays
is considerably slower. But this is not really true. A vector is a dynamically growable array-like structure.
It has the subscript operator [] overloaded which means, you can even use it like an array. But it has
comfortable features like a .size() method and a .push_back(...) method.
Let's say we have a class Foo and make a vector of Foos and assign it some values:</p>
<div class="codehilite"><pre><span class="cp">#include &lt;vector&gt;;</span>

<span class="k">class</span> <span class="nc">Foo</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
  <span class="kt">int</span> <span class="n">get_bar</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span> <span class="k">return</span> <span class="n">bar</span><span class="p">;</span> <span class="p">}</span>
  <span class="n">Foo</span><span class="o">&amp;</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="kt">int</span> <span class="n">val</span><span class="p">)</span> <span class="p">{</span> <span class="n">bar</span> <span class="o">=</span> <span class="n">val</span><span class="p">;</span> <span class="k">return</span> <span class="o">*</span><span class="k">this</span><span class="p">;</span> <span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
  <span class="kt">int</span> <span class="n">bar</span><span class="p">;</span>
<span class="p">};</span>

<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span><span class="p">,</span> <span class="kt">char</span><span class="o">**</span><span class="p">)</span> <span class="p">{</span>
  <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">Foo</span><span class="o">&gt;</span> <span class="n">foos</span><span class="p">;</span>
  <span class="n">foos</span><span class="p">.</span><span class="n">resize</span><span class="p">(</span><span class="mi">100</span><span class="p">);</span>

  <span class="k">for</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">Foo</span><span class="o">&gt;::</span><span class="n">size_type</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">&amp;</span><span class="n">lt</span><span class="p">;</span><span class="n">foos</span><span class="p">.</span><span class="n">size</span><span class="p">();</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">foos</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>


<p><code>std::vector&lt;Foo&gt;::size_type</code> is the correct type for indexing foo, you could also say <code>size_t</code> here.
Assigning <code>i</code> to <code>foos[i]</code> works, because the assignment operator = in Foo is overloaded.</p>
<p>Is that assignment loop slower than a plain C loop over an array of ints? Let's find out and ask ourselfes: What would gcc do?
At first it needs to know the definition of the subscript operator [] on the vector template.
<a href="http://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.4/a01371.htm">The following implementation is from libstdc++ 4.4</a></p>
<div class="codehilite"><pre><span class="n">reference</span> <span class="k">operator</span><span class="p">[](</span><span class="n">size_type</span> <span class="n">__n</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">return</span> <span class="o">*</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">_M_impl</span><span class="p">.</span><span class="n">_M_start</span> <span class="o">+</span> <span class="n">__n</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>


<p>Modern compilers like gcc decide for themselfes when it's a good idea to inline code. Inlining
is good when the inlined function is only a few lines of code.
The subscript operator certainly is a good candidate for inlining. So let's do that, and see what our
loop becomes:</p>
<div class="codehilite"><pre><span class="k">for</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">Foo</span><span class="o">&gt;::</span><span class="n">size_type</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">&lt;</span><span class="n">foos</span><span class="p">.</span><span class="n">size</span><span class="p">();</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="o">*</span><span class="p">(</span><span class="n">foos</span><span class="p">.</span><span class="n">_M_impl</span><span class="p">.</span><span class="n">_M_start</span> <span class="o">+</span> <span class="n">i</span><span class="p">)</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>


<p>Looks like some good old pointer math. But that = operator is still a method. It's also small and certainly
will be inlined, too. Let's also inline that one (for brevity, only the asignment statement, the loop stays untouched):</p>
<div class="codehilite"><pre><span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="n">foos</span><span class="p">.</span><span class="n">_M_impl</span><span class="p">.</span><span class="n">_M_start</span> <span class="o">+</span> <span class="n">i</span><span class="p">)).</span><span class="n">bar</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
</pre></div>


<p>The next step is to resolve the . member access operators. What they do is nothing more than adding the offset of
the member to the base address (this-pointer) of the object. Now a good compiler has the 0-offset optimization, that
is: there is no additional data between the this pointer and the first class member (unless there is a virtual function member
in which case the compiler has to add a vtable). Our Foo class only has the .bar member which is it's first member. So the
offset to the this pointer is 0, so it can be eliminated.</p>
<div class="codehilite"><pre><span class="o">*</span><span class="p">((</span><span class="n">foos</span><span class="p">.</span><span class="n">_M_impl</span><span class="p">.</span><span class="n">_M_start</span> <span class="o">+</span> <span class="n">i</span><span class="p">)</span><span class="o">+</span><span class="mi">0</span><span class="p">)</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
</pre></div>


<p>which is obviously equal to:</p>
<div class="codehilite"><pre><span class="o">*</span><span class="p">(</span><span class="n">foos</span><span class="p">.</span><span class="n">_M_impl</span><span class="p">.</span><span class="n">_M_start</span> <span class="o">+</span> <span class="n">i</span><span class="p">)</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
</pre></div>


<p>Now we have to do the same with those strange <code>_M_impl</code> and <code>_M_start</code> members in the STL vector implementation.
<code>_M_impl</code> is an intentionally uglyfied name for the internal proxy object that manages vector memory. And it
just happens to be the case, that <code>_M_impl</code> is the first data member in vector. It's of type <code>_Vector_impl</code> which is a nested
class in <code>_Vector_base</code>. <code>_M_start</code> is a raw pointer (<code>Foo*</code>) to the memory block inside the <code>_Vector_impl</code> and also happens to be the first
data member in <code>_Vector_impl</code>. Our optimization can go on and eliminate <code>_M_impl</code> and <code>_M_start</code>:</p>
<div class="codehilite"><pre><span class="o">*</span><span class="p">((</span><span class="kt">int</span><span class="o">*</span><span class="p">)(</span><span class="o">&amp;</span><span class="n">foos</span> <span class="o">+</span> <span class="mi">0</span> <span class="o">+</span> <span class="mi">0</span><span class="p">)</span> <span class="o">+</span> <span class="n">i</span><span class="p">)</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
</pre></div>


<p>which is equal to:</p>
<div class="codehilite"><pre><span class="p">((</span><span class="kt">int</span><span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">foos</span><span class="p">)[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
</pre></div>


<p>This looks absolutely equal to flat memory access on an int[] C array. And for the non-believers,
we'll now have a look at the emitted assembler code.
First the C version (compiled with gcc version 4.4.5, on x86_64 with -O2).</p>
<div class="codehilite"><pre><span class="kt">int</span> <span class="n">foos</span><span class="p">[</span><span class="mi">100</span><span class="p">];</span>
<span class="kt">int</span> <span class="n">i</span><span class="p">;</span>
<span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">&lt;</span><span class="k">sizeof</span><span class="p">(</span><span class="n">foos</span><span class="p">)</span><span class="o">/</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">);</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
  <span class="n">foos</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>


<p>which compiles and disassembles to:</p>
<div class="codehilite"><pre><span class="c">#0000000000000830 &lt;main&gt;:</span>
<span class="c"># ... snip ... (function prologue)</span>
<span class="c"># i=0 (eax is our i)</span>
<span class="nf">xor</span>   <span class="nv">%eax</span><span class="p">,</span><span class="nv">%eax</span>
<span class="c"># foos start address (rsp) -&gt; rbp, rbx and rdx</span>
<span class="nf">mov</span>   <span class="nv">%rsp</span><span class="p">,</span><span class="nv">%rbp</span>
<span class="nf">mov</span>   <span class="nv">%rsp</span><span class="p">,</span><span class="nv">%rbx</span>
<span class="nf">mov</span>   <span class="nv">%rsp</span><span class="p">,</span><span class="nv">%rdx</span>
<span class="c"># some NOP / prefetch instruction</span>
<span class="nf">nopl</span>   <span class="p">(</span><span class="nv">%rax</span><span class="p">)</span>
<span class="c"># foos[i] = i</span>
<span class="nf">mov</span>    <span class="nv">%eax</span><span class="p">,(</span><span class="nv">%rdx</span><span class="p">)</span> <span class="c">#loop begin / 0x858</span>
<span class="c"># i++</span>
<span class="nf">add</span>    <span class="no">$0x1</span><span class="p">,</span><span class="nv">%eax</span>
<span class="nf">add</span>    <span class="no">$0x4</span><span class="p">,</span><span class="nv">%rdx</span>
<span class="c"># test i&lt;100 ?</span>
<span class="nf">cmp</span>    <span class="no">$0x64</span><span class="p">,</span><span class="nv">%eax</span>    
<span class="c"># goto loop-start</span>
<span class="nf">jne</span>    <span class="mh">858</span> <span class="p">&lt;</span><span class="no">main</span><span class="p">+</span><span class="mi">0x28</span><span class="p">&gt;</span> <span class="c">#loop begin</span>
</pre></div>


<p>Pretty clever gcc, huh? ;)
It notices, that we're accessing <code>foos</code> with the loop counter.
It then allocates a register (rdx) and increments this register by 4 (the size of an int) in each loop round.
This way it doesn't actually have to add <code>i</code> to the start address of <code>foos</code>.
Well let's see what it can do with the C++ code from above.</p>
<div class="codehilite"><pre><span class="c">#0000000000000a10 &lt;main&gt;:</span>
<span class="c"># ... snip ...</span>
<span class="c"># 100 -&gt; edx</span>
<span class="nf">mov</span>   <span class="no">$0x64</span><span class="p">,</span><span class="nv">%edx</span>
<span class="c"># ... snip ...</span>
<span class="c">#foos.size() inlined -&gt; rdx</span>
<span class="nf">mov</span>   <span class="mi">0x10</span><span class="p">(</span><span class="nv">%rsp</span><span class="p">),</span><span class="nv">%rdi</span>
<span class="nf">mov</span>   <span class="mi">0x18</span><span class="p">(</span><span class="nv">%rsp</span><span class="p">),</span><span class="nv">%rdx</span>
<span class="nf">sub</span>   <span class="nv">%rdi</span><span class="p">,</span><span class="nv">%rdx</span>
<span class="nf">sar</span>   <span class="no">$0x2</span><span class="p">,</span><span class="nv">%rdx</span>
<span class="c">#initial test on size() </span>
<span class="nf">test</span>  <span class="nv">%rdx</span><span class="p">,</span><span class="nv">%rdx</span>
<span class="nf">je</span>    <span class="mh">a84</span> <span class="p">&lt;</span><span class="no">main</span><span class="p">+</span><span class="mi">0x74</span><span class="p">&gt;</span>  <span class="c">#a84 is behind loop</span>
<span class="c"># i = 0</span>
<span class="nf">xor</span>   <span class="nv">%eax</span><span class="p">,</span><span class="nv">%eax</span>
<span class="c"># some NOP / prefetch instruction</span>
<span class="nf">nopl</span>  <span class="p">(</span><span class="nv">%rax</span><span class="p">)</span>
<span class="c"># foos[i] = i</span>
<span class="nf">mov</span>   <span class="nv">%eax</span><span class="p">,(</span><span class="nv">%rdi</span><span class="p">,</span><span class="nv">%rax</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span> <span class="c">#loop begin / 0xa78</span>
<span class="c"># i++</span>
<span class="nf">add</span>   <span class="no">$0x1</span><span class="p">,</span><span class="nv">%rax</span>
<span class="c"># test i&lt;100 ?</span>
<span class="nf">cmp</span>   <span class="nv">%rdx</span><span class="p">,</span><span class="nv">%rax</span>
<span class="c"># goto loop begin</span>
<span class="nf">jb</span>    <span class="mh">a78</span> <span class="p">&lt;</span><span class="no">main</span><span class="p">+</span><span class="mi">0x68</span><span class="p">&gt;</span> <span class="c">#loop begin</span>
</pre></div>


<p>Quite different, I have to admit. But it's just logical. In the C version, gcc can assume that the size to compare
<code>i</code> with is allways greater 0, so it can skip the initial check. In C++ it cannot do this, because
.size() could return a negative number (see a6e). But what we can see here is another optimization. .size() normally is
a method. But it's also getting inlined (no call, just memory access). And not only that, it's also moved
out of the loop and the result stored in register rdx (see a5d). The only other major difference is the
operation used to move the value of <code>i</code> into the memory. The C++ version uses a relative addressing mode
where the CPU has to do additional internal calculations. This might be a bit slower the the C version.</p>
<p>The C version might be faster, but it's not a whole order of magnitude or something like that. It's just
a matter of loop invariant analysis and choosing the one or the other opcode.</p>
<p>But don't forget what you get for that tiny little bit of performance impact:
You're getting all the benefits of a fully functional dynamic-size storage.
In debug mode, you're usually even getting bounds checking.
You're getting exception-safety (that is at least the weak property: no resource leaks).
And in release mode, you have that nice and clean near-C speed.</p>
<p>C++ is a type safe, high level and abstract language
and still let's you write code that is amazingly fast.
And this is only one of the reasons, why C++ is worth beeing loved :)</p>
	]]>
	</description>
</item>


<item>
	<title>Hello World weavy!</title>
	<link>http://www.cleeus.de/w/blog/2011/07/17/post_01.html</link>
	<author>Kai Dietrich</author>
	<pubDate>Sun, 17 Jul 2011 05:00:00 -0000</pubDate>
	<guid>http://www.cleeus.de/w/blog/2011/07/17/post_01.html</guid>
	<description>
	<![CDATA[
<p>
A new star is born.
</p>
<p>
Well ... stars ... there are many and most of them are not that bright
either. But I was looking for something new to create a website with.
That old horrible joomla/php thing broke apart after some php update
(I don't remember which one) and was way to much for me either.
</p>

<p>
So I was looking for something fresh. As I already just uploaded static
snapshots of the old php cms (via httrack and some sed+curl scripts), a static
site compiler seemed like a very appealing idea to me.
</p>

<p>
A static site compiler does something really simple: It takes a number of
input and configuration files, runs them through it's processing
machinery and spits out a collection of HTML and auxiliary files
(aka: a website ... you know like websites from the 90's). You can then simply
upload and statically serve that site (or even publish it with your favourite
VCS).
</p>

<p>
No PHP. No database. No security bugs. Simple, fast, static fileserving.
Such a site might even survive that slashdot link on your average 
shared web host.
</p>

<p>
I <a href="http://iwantmyname.com/blog/2011/02/list-static-website-generators.html">tried a few existing ones</a>
but none of them was what I needed. When I have to install a dozen additional
libraries (for which my distribution doesn't even have packages yet),
before I get a tool up and running, there is something wrong.
And there is also something wrong, when it pulls in some huge
template engine for which I first have to learn a new language
before I can get started.
</p>

<p>
So none of the tools I tried had all of the following properties:
</p>
<ul>
  <li>simple to use</li>
  <li>simple to install</li>
  <li>few dependencies</li>
  <li>supports blogging, rss feeds and simple pages</li>
  <li>low maintenance</li>
</ul>
<p>
I had an itch.
</p>

<p>
Now in a few weekends I <a href="https://github.com/cleeus/weavy">
hacked together a few hundred sloc of python sriptness</a>.
It's far from done, it's completely undocumented, it's not (yet)
unittested. But it only needs a python runtime and it can already
put together this blog :)
</p>

<p>
Stay tuned, there's more to come.
</p>

	]]>
	</description>
</item>



</channel>
</rss>


