I have been working for the WalT project for more than 6 years now. One of the most tricky tasks about it concerns netboot on Raspberry Pi boards. This morning, I discovered yet another interesting detail, and I decided to share my experience about this topic.

U-boot

U-boot is a network bootloader that can be used to boot Raspberry Pi boards over a network. Whilst early Raspberry Pi compatible versions were quite buggy (we first used this in 2012), it is now a reliable solution.

2-stages booting procedure, for easier maintenance

In WalT, we have a two-stages booting procedure, with one first u-boot script embedded on the SD-card, and a second one (see this file, after the SCRIPT_START tag) embedded in each WalT operating system image. In a more standard setup, you would put this second script on the server, inside the TFTP directory. And as you may guess, the first script retrieves the second one (through TFTP) and executes it.

When you want to handle several dozens of nodes, such a two step procedure can greatly reduce maintenance: the first script is very simple, so nearly all maintenance tasks that could occur concern the second one. As a result, you can modify the bootup procedure of all nodes at once, by editing this second script, on the server.

Device identification

In the first boot script, you can however notice something interesting:

  1. we compute the raspberry model we are currently running on
  2. we set a variable bootp_vci accordingly

VCI stands for “vendor-class-identifier”. U-boot will set the VCI field of the DHCP request accordingly. On the remote end, the DHCP server (isc-dhcpd in our case) can take this value into account in order to point the node to a compatible kernel version (cf. Raspbian’s kernel7.img file for raspberry pi 2 & 3, and kernel.img for earlier models).

Preserving firmware-provided kernel arguments

This is the tricky part. With a standard Raspbian setup, the raspberry pi firmware loads linux kernel directly (file kernel.img or kernel7.img on the SD-card). In our case, the firmare loads u-boot (compiled as file kernel.img or kernel7.img), and u-boot loads the kernel.

The Raspbian linux kernel is not standard (the repository is here). Because of this, it should be started with device-specific kernel arguments, specifying things such as the position of the DMA range in RAM. Those device-specific kernel arguments are given by the firmware to the kernel, together with user-provided arguments of file cmdline.txt. Consequently, in our setup, the firmware will call u-boot with those arguments. U-boot has to pass these arguments to the kernel, otherwise the kernel will fail to boot.

By the end of 2015, a patch to u-boot had been proposed, in order to pass these arguments. But it has never been integrated in mainline repository, and it is very outdated now. However, it pointed me in the right direction. Actually, the firmware passes those kernel arguments to the kernel (or u-boot in our case) by altering node /chosen of the device-tree. Thus, u-boot can retrieve them by reading this device-tree node.

U-boot provides two environment variables related to the device-tree:

  • fdt_addr: the address of the provided dtb (device-tree blob) in RAM.
  • fdt_addr_r: an address that can be used to store a user provided dtb. In a netboot scenario, you will probably download both the kernel and its compatible dtb using TFTP. You will store the downloaded kernel at kernel_addr_r and the downloaded dtb at fdt_addr_r, then call the boot command.

So, back to our issue: we can read the firmware-provided kernel arguments like this:

1# tell u-boot to look at the given device-tree
2fdt addr $fdt_addr
3# read "/chosen" node, property "bootargs", and store in var "given_bootargs"
4fdt get value given_bootargs /chosen bootargs

This is exactly what we do here, in the second-stage u-boot script. So now we have firmware-provided kernel arguments in variable given_bootargs.

But actually, we have to process them a little.

When file cmdline.txt is not provided or empty on the SD-card, the firmware will provide default kernel arguments:

  • 3 of these default arguments are suited for an OS installed on the SD-card: root=/dev/mmcblk0p1 rootfstype=ext4 rootwait. This is of course not compatible with a network boot.
  • 1 more argument kgdboc=<something> is a kernel debugging configuration parameter. This can cause the kernel boot to fail if the kernel is not compiled with appropriate support.

We can filter out those parameters by using u-boot regular expression features:

 1setenv bootargs ""
 2for arg in "${given_bootargs}"
 3do
 4    setexpr rootprefix sub "(root).*" "root" "${arg}"
 5    if test "$rootprefix" != "root"
 6    then
 7        setexpr kgdbprefix sub "(kgdboc).*" "kgdboc" "${arg}"
 8        if test "$kgdbprefix" != "kgdboc"
 9        then
10            # OK, we can keep this bootarg given by the firmware
11            setenv bootargs "${bootargs} ${arg}"
12        fi
13    fi
14done

And we are done. We just have to append our custom parameters for network boot (root=/dev/nfs, nfsroot=..., etc.) and the kernel should boot correctly.

Raspberry Pi 3b+ native netboot

Raspberry Pi 3b+ model comes with a network boot procedure enabled by default. It is also possible to enable this bootup procedure on Raspberry Pi 3B model, but it is not enabled by default, and I did not test this activation procedure myself (yet). I just tested the Raspberry Pi 3b+ model.

The major plus of using this procedure is that such a node does not need a SD card anymore. And the SD card is the most frequent point of failure on Raspberry Pi boards.

Note that even if the raspberry pi foundation mentions “PXE booting”, the network boot procedure is not really compatible with a standard PXE setup. Actually, the Raspberry Pi board just tries to retrieve using TFTP the same files it usually finds on the SD card: bootcode.bin, start.elf, config.txt, cmdline.txt, dtb and overlay files, kernel or kernel7.img, etc.

ISC DHCPd setup

The tutorial written by the raspberry pi foundation is based on dnsmasq on server side. In WalT we use ISC DHCPd. We could however adapt it easily, just by adding the following code on top of our dhcpd.conf file:

class "rpi-pxe" {
  match if ((binary-to-ascii(16,8,":",substring(hardware,1,3)) = "b8:27:eb") and
            (option vendor-class-identifier = "PXEClient:Arch:00000:UNDI:002001"));
  option vendor-class-identifier "PXEClient";
  option vendor-encapsulated-options "Raspberry Pi Boot";
}

Actually, if the DHCP server does not respond with this vendor option set to value Raspberry Pi Boot, the Raspberry Pi board will consider its network boot procedure is not implemented on server-side and it will abort its network boot.

A major bug in DHCP handling

The board firmware has a major bug in the DHCP protocol handling.

A standard DHCP sequence can be summarized as follows:

1. DHCPDISCOVER rpi3b+  -> dhcpd    # Hi, could you allocate an IP for me?
2. DHCPOFFER    dhcpd   -> rpi3b+   # Well... what about 192.168.152.176?
3. DHCPREQUEST  rpi3b+  -> dhcpd    # OK, I take 192.168.152.176!
4. DHCPACQ      dhcpd   -> rpi3b+   # OK, noted!

The firmware on the raspberry pi 3B+ does not fully follows this procedure. When the board firmware receives message DHCPOFFER, it stops negociation at this time and immediately starts using the proposed IP (for TFTP transfers). Since the DHCP negociation is not complete, the DHCP server will not consider this IP is allocated, and, after a while, it may propose the same IP to another node, leading to major network issues.

The severity of this bug is mitigated by the fact another DHCP request is often sent by the kernel or the init system shortly after this one, if the system boots correctly. Thus, after a few seconds, an IP address should be properly associated to the node.

Regarding WalT, after a successful first boot, the problem disappears: when a new node is detected, its IP address is removed from the set of free IPs, and the DHCPd configuration is automatically rebuilt. The new configuration associates the node’s mac address to this IP, and this association remains forever. Because of this, we plan to boot our raspberry pi 3B+ nodes with u-boot on a SD-card, at least once. After this first boot, the node is known and has a dedicated IP, thus the native netboot should work and the SD-card can be removed. However we still have to validate the robustness of this approach in a wider setup, and see if we detect other issues with the firmware.

Using kexec

Since we had issues with early Raspberry Pi compatible versions of u-boot (2012), we tried other network bootloading techniques.

A simple two-steps network bootloading procedure could easily be setup:

  • The SD card would be populated with a minimalistic linux-based operating system.
  • After bootup, this minimalistic linux-based OS would download the target kernel and device tree (TFTP), then update config.txt on the SD-card to target these files, and reboot. (Still, this would require a way to restore config.txt for the next two-steps bootup.)

But this has a major drawback. Raspberry Pi are robust devices, but the SD card is quite fragile. With frequent writes, its lifetime will usually not exceed one or two years. (And if the mechanism you are trying to implement often writes the same sectors (e.g. the partition table), you may very well trash several SD-cards just in the debugging phase!)

As a result, in WalT, we keep the SD-card read-only. The whole bootup procedure is read-only, and once the final OS is started, it stores file modifications in RAM (through a filesystem union mechanism).

Still, in order to overcome issues with early u-boot versions, we implemented another mechanism, based on kexec. kexec is a feature of the linux kernel that allows it to load another kernel and switch to it without a hardware reboot. Using this technique, the simple two-steps network bootloading procedure described above can be adapted to avoid writes on the SD-card.

This technique was working good, up to model raspberry pi 2B (excluded). From raspberry pi 2B onward, CPUs are multicores. kexec can only work if only one core is running when it switches to the other kernel. If ever it was possible to stop the 3 other cores at this time, it should work. But apparently raspberry pi CPUs do not provide this CPU hotplug feature. As a result, unless you force boards to use only one core (and that would be a shame), there is apparently no way to make kexec work with those recent models.

Other options

U-boot is not very fun to use. In particular, you cannot provide simple text files as boot scripts: you have to provide u-boot scripts. An u-boot script is just a text file that has been compiled with tool mkimage (provided by package u-boot-tools in Debian and Ubuntu). mkimage adds a short binary header on top of the text file, with checksum and other information. If you open the u-boot script with your favorite text editor, you can read the textual content after the header, but if you modify it, u-boot will fail to load it again because of the wrong checksum.

For network booting a PC, there is another bootloader called ipxe, with amazing features. And grub can also be compiled with netboot features. Both of these are much easier and simpler to use, compared to u-boot.

Actually, given recent additions on these projects, one could imagine chainloading another bootloader after u-boot: u-boot can now provide an UEFI layer, and grub or ipxe both provide an ARM-UEFI version.

Debugging

Last tip: by default, serial line is not activated anymore on 3B and 3B+ models. If you want to use it, add enable_uart=1 in config.txt on the SD card. With an appropriate console=... kernel parameter passed from the firmware (see related subsection above), that should be enough to have bootup traces of the kernel displayed on serial line.

French readers may be interested in my last article published in GNU/Linux Magazine 197 (October 2016). It explains in details the main tricks I used when I wrote my IOCCC 2015 winning entry.

The source code

Thanks again to Tristan, Pierre, Henry, Timothy, Elodie and Colin. You helped me make the trickiest parts much easier to understand!

The source code of IOCCC 2015 winning entries have been published recently. My source code is here.

Quick start

You can compile prog.c using gcc -o prog prog.c and start playing with the resulting binary (I tested in on Ubuntu and FreeBSD).

It looks like this:

Running ./prog

This actually demonstrates the interactive mode.

Alternatively, you may specify the input data (e.g. echo 'hello' | ./prog). In this case no prompt is displayed and the rendering starts immediately.

The program uses the braille patterns range of the unicode standard: this allows to consider each terminal character as a tiny 2x4 bitmap.

Obfuscation

The program is obfuscated in various ways. Let me explain the most unusual one.

At some point, the program swaps file descriptors 0 (usually stdin) and 1 (usually stdout). This is achieved using dup() and close() function calls. As a result, functions such as printf(), puts(), write(1, ...) will write to stdin instead of stdout. Depending on the way you start the program, writing to stdin may succeed or not:

  Command Line Stdin is… Writing to stdin will…
(A) $ ./prog current tty (same as stdout) succeed!
(B) $ ./prog < file.txt file.txt (opened read-only) fail.
(C) $ echo test | ./prog the pipe (opened read-only) fail.

In the code, the program always tries to print the 2 chars of the interactive prompt, using write(1,"> ",2). It succeeds in case (A) and fails silently in case (B) or (C).

This is how is handled the interactive vs non-interactive modes. The program actually always acts the same, but a complex side effect of the file descriptors swapping makes it behave differently depending on how you started it.

What’s next?

For more details about secret features or obfuscation aspects you can check out the hints file which compiles judges’ comments and my own explanations.

I also recommend looking at the other winning entries, some of the authors have been very imaginative!

These last few years, I have been working on an experimentation platform called WalT. From the user point of view, a WalT platform is a set of remotely managed nodes where you can deploy an operating system (so-called WalT image), run a distributed experiment, and collect experiment logs.

WalT architecture

WalT relies on docker for easy packaging, modification, and sharing (on the docker hub). of WalT images. The sharing aspect makes experiments easily reproducible. Nodes are low cost single-board-computers (Raspberry Pi B/B+, support for other hardware should come soon). The WalT platform is itself very easily reproducible (it is made of low cost components, free and open source software, and installation is automated). You can set up your own in a few minutes.

You can get more info on the WalT website. We presented WalT at a seminar recently, our talk material (pdf file and two videos) is available on the resources page.

The International Obfuscated C Code Contest (IOCCC) started in 1984. Candidates have to propose a C program doing something interesting in the most obfuscated way possible. (There are a few rules to follow, most notably limits abouts the source code size.)

I first heard about this contest many years ago, I think it was around 2003, because Fabrice Bellard (author of qemu, tcc, some maths-related things, etc.) mentions on his website he won it in 2002. Looking at the winning entries, I was really impressed.

This year, I decided to give it a try myself. And this was apparently a good idea!

The source code and related comments of the winning entries for 2015 are not published yet. They should be published in a few weeks.