Gentoo hardened on the desktop && poor mans solid state filesystem cache

As I’ve got a new laptop (a tiny EeePC) I used the chance to try gentoo hardened on the desktop. For those who don’t know: Gentoo hardened is a Gentoo Linux flavour where all packages are compiled with stack smashing protection (SSP) and address space layout randomization (ASLR) support. This makes the system a pretty hostile environment for attackers. So far everything went good and I have a fine gnome desktop running. The only major glitch is that python C extensions (ctypes) don’t work because the python runtime uses a bad mmap which would violate the general W^X policy enforced by the PaX kernel patches. But that bug is reported upstream and can and will be fixed.

Another security measure hardened takes is the so called relro linking. Usually when you link your executable binary against a shared library (.so) the dynamic linker loads symbols as they are needed (lazy). For this and the general dynamic linking process to work, references to dynamically exported symbols have to go through an indirection process. One part of this indirection mechanism on IA32 and AMD64 is the GOT, the global offset table. Think of it as a table where you can look up the addresses of symbols (not quite correct though). When the dynamic linker works in lazy mode, it fills out entries in the GOT as needed. Therefore the GOT must be writeable memory.

Now if an attacker manages to insert code into a running process and then tries to transfer control to this code one convenient way to achieve this would be to write into the GOT and manipulate an address there. Therefore we don’t really want the GOT to be writeable. But for this to work, lazy binding has to be disabled. That’s done with the ldd linker flags -z now and -z relro. -z now advises the linker to resolve symbols on load time and not to use lazy binding. -z relro makes the linker mark the GOT read-only.

Now while this is a cool defensive feature for security, it’s terrible for desktop performance. Every program loads all of it’s libraries on load time which slows down load times. I didn’t do any measurements but my HDD got pretty busy when I started heavy GUI programs. Combined with a slow laptop HDD this is no fun.

But there is a device in this little laptop that is unused most of the time anyway and could help: the SD-card reader. Given a good controller attached to the USB2.0 port and a fast SDHC card, it could somehow serve as a cache for /usr 90% of the systems libraries. So I bought a fine class 10 16GB SDHC card and put it into the slot. dd reported 20MB/s write speed. Combined with excellent random-access behaviour, this should be enough to improve the situation.

Now the idea is to somehow redirect access to /usr onto the SD card. There already are solutions out there. Google for mcachefs. Mcachefs is a fuse filesystem written in C that buffers accessed files on another filesystem that may be a solid state disk. But mcachefs is alpha-sate software and has known bugs. So I needed to go for something simpler. Writing my own python fuse filesystem seemed like a lot of work and the extra fuse overhead is something I’d like to avoid if possible anyway.

So I went for something more simple: I just copied /usr over to the SD card and mounted it over /usr in ro mode. The problem is, you will never be able to unmount this and access your real /usr (e.g. to install new packages) as long as there are processes running in the background with open files from /usr. So this was no option either. To be able to switch back and forth between the read-only /usr cache and the real read-write /usr both must be mounted and we need something to decide what to use. Symlink anyone? :)

That’s the solution where I’m now: Move your /usr to /usr.real/ and create a symlink named /usr that pointers either to /usr.real/ or /fscache/usr/. To make things more comfortable I created a nice gentoo init script.

/etc/init.d/fscache start #set up the mount and symlinks
/etc/init.d/fscache stop #switch the symlink back to /usr.real/
/etc/init.d/fscache sync #run an rsync that syncs /usr.real/ into /fscache/usr/

To use it, you have to setup your SD card, rename /usr to /usr.real and create and initial symlink /usr -> /usr.real/ (it won’t do anything if /usr is a directory).

Here is the code: http://pastebin.com/zw10p6TR

Finally … backups.

Solved what’s probably everbodys most neglected problem: backups (or better their absence).

  • bought a bluetooth stick to backup the phone (with gammu/wammu)
  • bought a 2TB HDD and an USB/eSATA case to backup my systems (with rsnapshot)

It definitly was a very relaxing moment when I finished the first backup. Rsnapshot is certainly the tool to use for such home backup purposes. It allows you to have multiple virtual full snapshots of the system while only needing the space of the first full backup plus all your changes (it uses hardlinks for unchanged files). One thing I noticed: As I’m a bit of a paranoid, I set the disc encryption algorithm to AES256 in contrast to dmcrypts default AES128 setting. Now it seems like that takes up a considerable amount of CPU cycles more then AES128 (no real measurement, just quick observation). The result is that on my Atom 330 board, the encryption process runs at 100% CPU usage, eating up a full core and I only get 15MB/s writing speed (as opposed to 30MB/s on a Core2Duo 1.6GHz — maxing out the USB port). It’s still OK, it’s just that I don’t even have to bother using the eSATA port with the system. The good thing is, that on the next weekly backup, the disk speed will not matter very much as only the diff will get rsynced :)

Now … my nights will be much better, I think :)

And if you don’t have home backups yet, I hope your nights will be very restless!

shellcodegrml

… hab die letzten 8h damit verbracht herauszufinden warum mein selbstgebauter shellcode POC segfaultet …
Bis ich bemerkt habe, dass kernel+cpu W^X machen. So’n Mist! Im Wohnzimmer auf der alten Kiste und ohne PaX ging’s dann :)
Dear h4cker-g0dz, plz give me a ROP-compiler.

Putting SSP back into Gentoo-hardened

One of the main advantages of Gentoo Linux is the availability of the hardened profile and kernel. The hardened profile enables a number of switches and features which, together with the hardened kernel (PaX and grsecurity patchset), provide a system with full address space layout randomization (ASLR) and stack-smashing protection (SSP). ASLR requires a kernel patch, called PaX, and all binaries to be built as position independent executables/code (PIE/PIC). SSP, also known as cannaries, is a pure compiler-feature.

Now the old GCC 3.4.6 series had this feature (coming from an old IBM patch called ProPolice). But the current stable compiler on Gentoo, GCC 4.3.4 doesn’t have it anymore. This means, current stable Gentoo-hardened systems are built without SSP.

How could we fix that? Using gcc 3.4.6 will most likely break a number of things, so it’s not really an option. But GCC 4.4.2 has a new SSP feature. It’s a totally new implementation of the same idea. But 4.4.2 is not on by default.

To use GCC 4.4.2 and with it SSP on Gentoo-hardened, you have to import the hardened-dev overlay (layman -a hardened-development). Then you have to unmask =sys-devel/gcc-4.4.2-r2 in /etc/portage/package.unmask and install it. It will be put into a new slot (4.4), so it doesn’t overwrite the old gcc by default. When it’s compiled, you can enable it with gcc-config.

After there were mostly positive reports on the gentoo-hardened mailinglist, I just did that on my home-box. The complete re-build of the system with the new gcc is currently running. I’m confident that nothing breaks.

So if you have a server-box with hardened, I’d suggest you do the same and switch over to the new GCC in the hardened-dev overlay. It seems to work well for most people and packages. If you have a server-box without an ASLR kernel/system, aka not Gentoo-hardened, I’d suggest you do something about it anyway. I mean even Windows has it (since XP SP2).

pyaed - a python audio entropy daemon

Same game again. Not all my boxes have tv-cards to leach entropy from, so I needed some other source. The soundcard comes into mind quickly, and every box has a soundcard nowadays. The existing audio_entropyd once again wasn’t useful, because what it produced didn’t survive the FIPS-140-2 tests (aka wasn’t really random at all on my box). I then went on reimplementing the exact same algorithm it uses in python with pyaudio to take a closer look on the data. When I dumped the output of this algorithm into a file, I could even see patterns in hexdump of that file. strange. Well, there must be some randomness, so I went on implementing a different algorithm. It also records stereo audio and then looks at the upper bit (0×0001) in the samples. If this bit is different on both channels and the current two stereo samples are different from the last two, it records that as an entropy bit (you can argue about that, though - afaik randomsound uses the same mechanism). To add some more confidence in the entropy, it then XORs 64kbit of entropy into a 4kbit block. This way, it’s getting around 3kbit/s of entropy out of the soundcard.

download sourcecode

README:

Python Audio Entropy Daemon v0.0.1 (works on my machine)
(c) 2010 by Kai Dietrich

Inspired by audio_entropyd by Folker Vanheusden
http://www.vanheusden.com/aed/
and randomsound by Daniel Silverstone
http://www.digital-scurf.org/software/randomsound

This software is Licensed under the
GNU General Public License 2.0 or later.

System Requirements:
--------------------
Python 2.6
PyAudio 0.2.3
a soundcard with line or mic in
optional: rng-tools / rngd

What it does:
-------------
Pyaed records samples from an audio input device, extracts some noise/entropy
and writes it to a fifo.

Pyaed opens the default audio input device pyaudio finds and records frames
(44.1kHz, 16bit, stereo). It looks at the highes bit (0x0001) in the samples from each channel.
If these bits differ and the samples are different from the last (to ignore constant signals),
a bit of entropy is recorded. To increase the qualitiy of randomness, it then compresses 64kbit of
entropy into 4kbit by XORing the bits. It then writes the bits into a fifo.
You can then attach rngd from the rnd-tools to this fifo (rngd -f -r entropy.fifo).
rngd will test the noise with a FIPS 140-2 test for it's statistical randomness
and delivers the bits to the kernel entropy buffer.

It does not work, what can I do?
--------------------------------
a) read the code (it's not that much)
b) fix the code
c) Play around with alsamixer to get noise on the default input device,
   turn up boosts and input levels until you get levels around 50%. If you want to, you can even put
   in a stereo mic to get noise from the air and not just the electromagnetic noise from the ADC.

How can I enhance the code?
---------------------------
Just do it. If you like this tool, you can just set up a project somewhere
and start collecting improvements. For me this was just some fire-and-forget
single-task code.

pyved - a python video entropy daemon

Well, what do computer scientists do, when they are bored? They toy around with cryptography.
For some reason I didn’t get video_entropyd to run (it would throw v4l errors and segfault), but I desperately need entropy. Now what I came up with is a quick python script which does essentially the same thing, but with much more dependencies and high-level scripting languages. Also I just grab video frames from the TV-card with pygame.camera, extract the entropy and write it to a fifo. All the communication with the kernel then does rngd from the rng-tools. It picks up the bits, checks if they are really random and only then puts them into the kernel. All in all I have a solution, which (according to rngd) generates about 80MiBit/s of entropy from a good old Bt878 receiver. Im quite satisfied :)

update: it turned out, that was only a number which resulted from reading and writing in chunks. The long run performance is 8kbit/s of entropy.

download source code

Python Video Entropy Daemon v0.0.1 (works on my machine)
(c) 2010 by Kai Dietrich

Inspired by video_entropyd by Folker Vanheusden,
The main part actually is just a python version of Folkers code.
http://www.vanheusden.com/ved/

This software is Licensed under the
GNU General Public License 2.0 or later.

System Requirements:
--------------------
Python 2.6
PyGame 1.9.1
a video4linux device
optional: rng-tools / rngd

What it does:
-------------
Pyved records frames from a video4linux device, extracts the noise/entropy
and writes it to a fifo.

Pyved opens the first video4linux device it finds and records frames (720x576, RGB).
If it finds the kernel entropy pool to be empty it starts extracting noise
from two successive frames. Every uncorrellated change in one of the three color
channels is considered to be a bit of physical randomness and written to
the fifo "entropy.fifo". You can then attach rngd from the rnd-tools to this
fifo (rngd -f -r entropy.fifo). rngd will test the noise with a FIPS 140-2 test for it's
statistical randomness and delivers the bits to the kernel entropy buffer.

How fast is it?
---------------
On a Pinnacle Bt878 analogue TV card, tuned to a really bad channel,
rngd reports the following speeds (entropy bits per second):

stats: HRNG source speed: (min=1.330; avg=1.783; max=4.657)Gibits/s
stats: FIPS tests speed: (min=70.382; avg=88.529; max=89.969)Mibits/s

this is frickin fast, compared to all those commercial devices

It does not work, what can I do?
--------------------------------
a) read the code (it's not that much)
b) fix the code
c) tune your tv-card with a tuner application to some channel before starting pyved

How can I enhance the code?
---------------------------
Just do it. If you like this tool, you can just set up a project somewhere
and start collecting improvements. For me this was just some fire-and-forget
single-task code.

thanks grml

As I’m planning to set up a fresh new gentoo on my new Atom 330 system, I was looking for a suitable boot medium today. The Atom board only has SATA connectors and I don’t have a SATA CD-ROM, so I had to look for an USB bootable 64bit linux. This should not be a problem nowadays … I thought.

Well, first attempt: Ubuntu 9.10 64bit desktop. Downloaded the ISO and then looked for some tutorial. It turned out they now have a debian/ubuntu/windows custom usb-creator tool which is not available on gentoo. So I tried unetbootin, which promised to create bootable USB sticks from all kinds of distributions. It failed. For some reason the BIOS said the resulting stick was not bootable. My next try was to follow some generic tutorials for converting bootable CD images into bootable USB sticks. With this attempt I got the Ubuntu image to boot into the initramfs but it then didn’t find the live image to mount and I didn’t really care to hack into the initramfs. Frustrating. This should be easy, instead.

Well, then I took a look at GRML, as it is said to be a nice litle Linux distro for admins. And surprisingly all it takes to make a GRML ISO boot from USB Stick is dumping the ISO on the stick with dd. dd if=grml64-medium.iso of=/dev/sdX. Done. Works nicely. Thank you GRML!

Ooops, is this KDE4?

Yes it is…

As KDE4 (4.3.1) went stable on ~amd64 in Gentoo Linux, I accidently pulled it in and compiled it during my last emerge –update world. And as I had it installed I decided to just use it and make the full switch. Although it probably will have some bugs here and there, it seems to be pretty usable and it looks really damned beautifull. That again, comes with price: it’s slow. Interaction with the GUI is a litle bit unresponsive sometimes, but I guess I will get used to that. Most important applications are there and work. Only one important thing is still missing: a fully integrated KNetworkManager for KDE4. So long I’m using the KDE3.5 KNetworkManager, which works fine but looks a bit ugly. A good thing is, that power and display management seems to be integrated more nicely, the important special-buttons worked instantly.
I’m currently working on migrating all of the application-settings to KDE4.

I can see now why the Gentoo people will drop KDE3.5 out of the tree sooner or later. It just doesn’t make alot of sense to maintain something, that upstream doesn’t maintain anymore, or atleast not much.

update: disabling Nepomuk semantic desktop in the control panel (system settings->advanced->desktop search->enable nepomuk semantic desktop, uncheck) seems to increase performance and reduce power consumption.

Besides that, I had some free time at the it-sa security fair (where I gave a talk together with Daniel about our thesis) and coded some new features for wiki2beamer:

  • selective slide compiling: mark a frame with a ! (!==== frametitle ====) to only compile the selected frame to latex. Speeds up compilation times during slide creation and also saves you battery when sitting in the train and making slides ;)
  • shortcut syntax for \vspace and \vspace*: –2em– on a single line creates a vspace of 2em, –*2em– respectively creates a \vspace*
  • shortcut syntax for \uncover and \only: +<2->{content} (uncover) and -<2->{content} (only)

As I once again hacked in the wiki2beamer code, it became clear that the code will become more or less unmaintainable. It’s just a matter of time until the complexity is too high. Wiki2beamer definitly needs a formal parser and syntax.

Performance Python

As I needed some heavy optimization of some inner loop in an O(n^2) problem, I turned to C extensions for Python this weekend. Well, writing native C extensions for Python (2.x) turned out to be a real mess. The Python C API isn’t something you want to use, which is understandable since Python is a heavily object oriented language where C is only procedural.

But, there are some cool extensions which make calling C code from Python easy. Of all those extensions, I tried out Pyrex.

Pyrex is basically a language that is very similar to Python. It’s so similar, that you can just cut-and-paste most of your Python code into a Pyrex module file (.pyx) (there are some limitations, e.g. no *-comprehension, but these can be fixed easily, most of the time). The next step is to compile the Pyrex Code into C code with the Pyrex compiler and then compile this into shared library with gcc. This shared library can then be imported into Python as a module and the methods can be called.

The fun starts when you a) start calling other native C libraries and b) start writing optimized Pyrex code. Calling native C libraries is pretty easy, most of the time. The Pyrex compiler builds the required code to convert the basic python types into C types. E.g. if you have a C function that want’s a char *p as an argument, you can just assign it with a python str-Object, same goes for int and float.

The second use-case is writing really fast Pyrex code. If you just cut-and-paste Python code, your variables will be complex Python-objects and the C output of the Pyrex compiler contains some really nasty Python-API calls on these. E.g. adding two python variables is anything else then trivial and results in something like this:

add.pyx:

a = 1
b = 2
c = a + b

turns into:

[…snip…]

PyObject *__pyx_1 = 0;
PyObject *__pyx_2 = 0;
PyObject *__pyx_3 = 0;

[..snip…]

/* “/home/kai/add.pyx”:1 */
[…snip (3 nasty lines for value assignment to Python-object)…]

/* “/home/kai/add.pyx”:2 */
[…snip (same here) …]

/* “/home/kai/add.pyx”:3 */
__pyx_1 = __Pyx_GetName(__pyx_m, __pyx_n_a); if (!__pyx_1) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 3; goto __pyx_L1;}
__pyx_2 = __Pyx_GetName(__pyx_m, __pyx_n_b); if (!__pyx_2) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 3; goto __pyx_L1;}
__pyx_3 = PyNumber_Add(__pyx_1, __pyx_2); if (!__pyx_3) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 3; goto __pyx_L1;}
Py_DECREF(__pyx_1); __pyx_1 = 0;
Py_DECREF(__pyx_2); __pyx_2 = 0;
if (PyObject_SetAttr(__pyx_m, __pyx_n_c, __pyx_3) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 3; goto __pyx_L1;}
Py_DECREF(__pyx_3); __pyx_3 = 0;

[…snip…]

This is really nasty and anything else then fast, isn’t it?

BUT! If we tell Pyrex to treat our variables as C integers, it can generate clean C code:


cdef int a
cdef int b
cdef int c
a = 1
b = 2
c = a + b

what we get is:


[…snip…]
static int __pyx_v_3add_a;
static int __pyx_v_3add_b;
static int __pyx_v_3add_c;

[…snip…]
/* “/home/kai/add.pyx”:4 */
__pyx_v_3add_a = 1;

/* “/home/kai/add.pyx”:5 */
__pyx_v_3add_b = 2;

/* “/home/kai/add.pyx”:6 */
__pyx_v_3add_c = (__pyx_v_3add_a + __pyx_v_3add_b);
[…snip…]

And this is just clean C code. The same goes for loops. If you just use your python code, alot of Python-object operations have to be used. But if you use C variables, then Pyrex can create really fast C code and can even do pointer arithmetics. Pyrex allows you to speed up your code where you really need it without all the hassle of using the Python C API.

And my code turned out about five times faster after doing just some minor C adjustments :)

As I mentioned above, there are some other methods of improving Python performance, there is a good overview from the people at SciPy. And before you turn to Python+C, check the Python-performance tips first.

Wiki2beamer 0.8 is out (update)

Some weeks ago, already, I released wiki2beamer 0.8. It’s mostly a maintenance release. It now works again with python 2.4 which makes it easier to run on ancient systems (like our universities ;) ), has a litle bug fixed where “expressions” immediatly following lists were not transformed (so you had to add a newline) and the license was changed to “GPL 2.0 or later” so we don’t get stuck in some copyright problems as time moves on.

I also put the manpage for wiki2beamer online here (via man2html) so google can find it and non-*nix users can read it, too.

I should probably finish my thesis before I start coding at wiki2beamer, but if anyone out here is interested, there still are some things to be done:

  1. Create a Lessig-style slide environment, like the [code]-environment
  2. Split the code into a commandline wiki2beamer frontend and a python module as backend
  3. Write a formal syntax description, so we can create a real parser/compiler instead of these regular expression tricks.
  4. Build distro packages. (Fedora, RedHat, Arch, … anyone? Gentoo already has one. Update: I’ve built a Debian/Ubuntu package now, too. Go get it at SourceForge.)
  5. Build a windows installable package. (?)
  6. Update online documentation.

Also, we finally seem to get users :)
According to the sourceforge download statistics, we had 86 downloads in the last 2 months.

So, add one and head over to get wiki2beamer 0.8.