Raspberry Pi network boot: the whole story.

I have been working for the WalT project for more than 6 years now. One of the most tricky tasks about it concerns netboot on Raspberry Pi boards. This morning, I discovered yet another interesting detail, and I decided to share my experience about this topic.

U-boot

U-boot is a network bootloader that can be used to boot Raspberry Pi boards over a network. Whilst early Raspberry Pi compatible versions were quite buggy (we first used this in 2012), it is now a reliable solution.

2-stages booting procedure, for easier maintenance

In WalT, we have a two-stages booting procedure, with one first u-boot script embedded on the SD-card, and a second one (see this file, after the SCRIPT_START tag) embedded in each WalT operating system image. In a more standard setup, you would put this second script on the server, inside the TFTP directory. And as you may guess, the first script retrieves the second one (through TFTP) and executes it.

When you want to handle several dozens of nodes, such a two step procedure can greatly reduce maintenance: the first script is very simple, so nearly all maintenance tasks that could occur concern the second one. As a result, you can modify the bootup procedure of all nodes at once, by editing this second script, on the server.

Device identification

In the first boot script, you can however notice something interesting:

we compute the raspberry model we are currently running on
we set a variable bootp_vci accordingly

VCI stands for “vendor-class-identifier”. U-boot will set the VCI field of the DHCP request accordingly. On the remote end, the DHCP server (isc-dhcpd in our case) can take this value into account in order to point the node to a compatible kernel version (cf. Raspbian’s kernel7.img file for raspberry pi 2 & 3, and kernel.img for earlier models).

Preserving firmware-provided kernel arguments

This is the tricky part. With a standard Raspbian setup, the raspberry pi firmware loads linux kernel directly (file kernel.img or kernel7.img on the SD-card). In our case, the firmare loads u-boot (compiled as file kernel.img or kernel7.img), and u-boot loads the kernel.

The Raspbian linux kernel is not standard (the repository is here). Because of this, it should be started with device-specific kernel arguments, specifying things such as the position of the DMA range in RAM. Those device-specific kernel arguments are given by the firmware to the kernel, together with user-provided arguments of file cmdline.txt. Consequently, in our setup, the firmware will call u-boot with those arguments. U-boot has to pass these arguments to the kernel, otherwise the kernel will fail to boot.

By the end of 2015, a patch to u-boot had been proposed, in order to pass these arguments. But it has never been integrated in mainline repository, and it is very outdated now. However, it pointed me in the right direction. Actually, the firmware passes those kernel arguments to the kernel (or u-boot in our case) by altering node /chosen of the device-tree. Thus, u-boot can retrieve them by reading this device-tree node.

U-boot provides two environment variables related to the device-tree:

fdt_addr: the address of the provided dtb (device-tree blob) in RAM.
fdt_addr_r: an address that can be used to store a user provided dtb. In a netboot scenario, you will probably download both the kernel and its compatible dtb using TFTP. You will store the downloaded kernel at kernel_addr_r and the downloaded dtb at fdt_addr_r, then call the boot command.

So, back to our issue: we can read the firmware-provided kernel arguments like this:

# tell u-boot to look at the given device-tree
fdt addr $fdt_addr
# read "/chosen" node, property "bootargs", and store in var "given_bootargs"
fdt get value given_bootargs /chosen bootargs

This is exactly what we do here, in the second-stage u-boot script. So now we have firmware-provided kernel arguments in variable given_bootargs.

But actually, we have to process them a little.

When file cmdline.txt is not provided or empty on the SD-card, the firmware will provide default kernel arguments:

3 of these default arguments are suited for an OS installed on the SD-card: root=/dev/mmcblk0p1 rootfstype=ext4 rootwait. This is of course not compatible with a network boot.
1 more argument kgdboc=<something> is a kernel debugging configuration parameter. This can cause the kernel boot to fail if the kernel is not compiled with appropriate support.

We can filter out those parameters by using u-boot regular expression features:

setenv bootargs ""
for arg in "${given_bootargs}"
do
    setexpr rootprefix sub "(root).*" "root" "${arg}"
    if test "$rootprefix" != "root"
    then
        setexpr kgdbprefix sub "(kgdboc).*" "kgdboc" "${arg}"
        if test "$kgdbprefix" != "kgdboc"
        then
            # OK, we can keep this bootarg given by the firmware
            setenv bootargs "${bootargs} ${arg}"
        fi
    fi
done

And we are done. We just have to append our custom parameters for network boot (root=/dev/nfs, nfsroot=..., etc.) and the kernel should boot correctly.

Raspberry Pi 3b+ native netboot

Raspberry Pi 3b+ model comes with a network boot procedure enabled by default. It is also possible to enable this bootup procedure on Raspberry Pi 3B model, but it is not enabled by default, and I did not test this activation procedure myself (yet). I just tested the Raspberry Pi 3b+ model.

The major plus of using this procedure is that such a node does not need a SD card anymore. And the SD card is the most frequent point of failure on Raspberry Pi boards.

Note that even if the raspberry pi foundation mentions “PXE booting”, the network boot procedure is not really compatible with a standard PXE setup. Actually, the Raspberry Pi board just tries to retrieve using TFTP the same files it usually finds on the SD card: bootcode.bin, start.elf, config.txt, cmdline.txt, dtb and overlay files, kernel or kernel7.img, etc.

ISC DHCPd setup

The tutorial written by the raspberry pi foundation is based on dnsmasq on server side. In WalT we use ISC DHCPd. We could however adapt it easily, just by adding the following code on top of our dhcpd.conf file:

class "rpi-pxe" {
  match if ((binary-to-ascii(16,8,":",substring(hardware,1,3)) = "b8:27:eb") and
            (option vendor-class-identifier = "PXEClient:Arch:00000:UNDI:002001"));
  option vendor-class-identifier "PXEClient";
  option vendor-encapsulated-options "Raspberry Pi Boot";
}

Actually, if the DHCP server does not respond with this vendor option set to value Raspberry Pi Boot, the Raspberry Pi board will consider its network boot procedure is not implemented on server-side and it will abort its network boot.

A major bug in DHCP handling

The board firmware has a major bug in the DHCP protocol handling.

A standard DHCP sequence can be summarized as follows:

1. DHCPDISCOVER rpi3b+  -> dhcpd    # Hi, could you allocate an IP for me?
2. DHCPOFFER    dhcpd   -> rpi3b+   # Well... what about 192.168.152.176?
3. DHCPREQUEST  rpi3b+  -> dhcpd    # OK, I take 192.168.152.176!
4. DHCPACQ      dhcpd   -> rpi3b+   # OK, noted!

The firmware on the raspberry pi 3B+ does not fully follows this procedure. When the board firmware receives message DHCPOFFER, it stops negociation at this time and immediately starts using the proposed IP (for TFTP transfers). Since the DHCP negociation is not complete, the DHCP server will not consider this IP is allocated, and, after a while, it may propose the same IP to another node, leading to major network issues.

The severity of this bug is mitigated by the fact another DHCP request is often sent by the kernel or the init system shortly after this one, if the system boots correctly. Thus, after a few seconds, an IP address should be properly associated to the node.

Regarding WalT, after a successful first boot, the problem disappears: when a new node is detected, its IP address is removed from the set of free IPs, and the DHCPd configuration is automatically rebuilt. The new configuration associates the node’s mac address to this IP, and this association remains forever. Because of this, we plan to boot our raspberry pi 3B+ nodes with u-boot on a SD-card, at least once. After this first boot, the node is known and has a dedicated IP, thus the native netboot should work and the SD-card can be removed. However we still have to validate the robustness of this approach in a wider setup, and see if we detect other issues with the firmware.

Using kexec

Since we had issues with early Raspberry Pi compatible versions of u-boot (2012), we tried other network bootloading techniques.

A simple two-steps network bootloading procedure could easily be setup:

The SD card would be populated with a minimalistic linux-based operating system.
After bootup, this minimalistic linux-based OS would download the target kernel and device tree (TFTP), then update config.txt on the SD-card to target these files, and reboot. (Still, this would require a way to restore config.txt for the next two-steps bootup.)

But this has a major drawback. Raspberry Pi are robust devices, but the SD card is quite fragile. With frequent writes, its lifetime will usually not exceed one or two years. (And if the mechanism you are trying to implement often writes the same sectors (e.g. the partition table), you may very well trash several SD-cards just in the debugging phase!)

As a result, in WalT, we keep the SD-card read-only. The whole bootup procedure is read-only, and once the final OS is started, it stores file modifications in RAM (through a filesystem union mechanism).

Still, in order to overcome issues with early u-boot versions, we implemented another mechanism, based on kexec. kexec is a feature of the linux kernel that allows it to load another kernel and switch to it without a hardware reboot. Using this technique, the simple two-steps network bootloading procedure described above can be adapted to avoid writes on the SD-card.

This technique was working good, up to model raspberry pi 2B (excluded). From raspberry pi 2B onward, CPUs are multicores. kexec can only work if only one core is running when it switches to the other kernel. If ever it was possible to stop the 3 other cores at this time, it should work. But apparently raspberry pi CPUs do not provide this CPU hotplug feature. As a result, unless you force boards to use only one core (and that would be a shame), there is apparently no way to make kexec work with those recent models.

Other options

U-boot is not very fun to use. In particular, you cannot provide simple text files as boot scripts: you have to provide u-boot scripts. An u-boot script is just a text file that has been compiled with tool mkimage (provided by package u-boot-tools in Debian and Ubuntu). mkimage adds a short binary header on top of the text file, with checksum and other information. If you open the u-boot script with your favorite text editor, you can read the textual content after the header, but if you modify it, u-boot will fail to load it again because of the wrong checksum.

For network booting a PC, there is another bootloader called ipxe, with amazing features. And grub can also be compiled with netboot features. Both of these are much easier and simpler to use, compared to u-boot.

Actually, given recent additions on these projects, one could imagine chainloading another bootloader after u-boot: u-boot can now provide an UEFI layer, and grub or ipxe both provide an ARM-UEFI version.

Debugging

Last tip: by default, serial line is not activated anymore on 3B and 3B+ models. If you want to use it, add enable_uart=1 in config.txt on the SD card. With an appropriate console=... kernel parameter passed from the firmware (see related subsection above), that should be enough to have bootup traces of the kernel displayed on serial line.

Etienne at work...

Categories

Tags

About

Home

Raspberry Pi network boot: the whole story.

October 22, 2018

Raspberry Pi network boot: the whole story.

U-boot

2-stages booting procedure, for easier maintenance

Device identification

Preserving firmware-provided kernel arguments

Raspberry Pi 3b+ native netboot

ISC DHCPd setup

A major bug in DHCP handling

Using kexec

Other options

Debugging

Etienne Dublé

How to build a custom (e.g. 32-bit) live ubuntu easily.

IOCCC 2019: I got a new winning entry!

GLMF 197: Les secrets de fabrication d’une entrée gagnante de l’IOCCC