In this post I describe how to finish setting up the the software on the Raspberry Pi B+ so the attached Pi Zero nodes can boot entirely from the USB interface. This is a rather complicated procedure and I would greatly appreciate any feedback from anyone who can confirm whether I got all the details correct. Before making snapshots of the root filesystem, we first move copy the home directory to a separate subvolume so it doesn't appear in the individual root snapshots. This can be done with the following commands
# cd /x
# btrfs sub create snaa
# mv /home/* snaa
# rmdir /home
# ln -s /x/snaa /home
Now, add the following lines to /etc/hosts listing the IP addresses of B+ and Zeros on the bridge and the Ethernet gadgets over the USB network.
Code: Select all
192.168.7.2 snail
192.168.7.33 s0
192.168.7.37 s1
192.168.7.41 s2
192.168.7.45 s3
192.168.7.49 s4
Since the cluster is small and because the hosts files on the server are automatically replicated with the BTRFS copy-on-write snapshot technique used to create the root file systems for each of the Zeros, it is easy to resolve the node names and corresponding IP numbers using files. When the size of the cluster is much larger than 50, it may be better to set up a bind server instead.
Next, add modify the /etc/rc.local to conditionally load the bridge device and start rpiboot if running on the B+ or to set the MTU of the Ethernet gadget if running on the Zero. To do this, add the following lines just before the "exit 0" line at the end of the file
Code: Select all
case `hostname` in
snail*)
echo Loading san bridge device...
ip link add name san type bridge
ip link set san up
echo Starting rpiboot to boot nodes...
/usr/bin/rpiboot -m 500000 -l -d /x/sboot -o \
>>/var/log/rpiboot.log &
;;
s[0-4])
echo Setting usb0 mtu to 7418
ip link set usb0 mtu 7418
;;
esac
Next, configure /etc/dhcpcd.conf so it assigned an IP number to the bridge but doesn't mess with the Ethernet gadgets. Do this by adding
denyinterfaces usb0 usb2 usb3 usb4 usb5 s0 s1 s2 s3 s4
as the first line of the file. A udev rule will be used to name the Ethernet gadgets s0, s1 and so forth; however, we also include usb0, usb1 and so forth in the denyinterfaces line to avoid a possible race condition. At the end of the same file add the lines
interface san
static ip_address=192.168.7.2/24
to assign an IP number to the bridge device.
Create udev rules /etc/udev/rules.d/70-gadget.rules to identify which Zero is which by the MAC addresses and name the corresponding Ethernet gadgets accordingly. The file should look like
Code: Select all
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="02:34:33:3c:50:22", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="usb*", NAME="s0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="02:34:33:3c:50:26", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="usb*", NAME="s1"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="02:34:33:3c:50:2a", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="usb*", NAME="s2"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="02:34:33:3c:50:2e", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="usb*", NAME="s3"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="02:34:33:3c:50:32", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="usb*", NAME="s4"
Now, create a file /etc/network/interfaces.d/zeros to configure the gadgets on the B+ by adding them to the bridge. The file should look like
Code: Select all
allow-hotplug s0 s1 s2 s3 s4
iface s0 inet manual
up ip link set s0 up
post-up ip link set s0 mtu 7418
post-up ip link set s0 master san
iface s1 inet manual
up ip link set s1 up
post-up ip link set s1 mtu 7418
post-up ip link set s1 master san
iface s2 inet manual
up ip link set s2 up
post-up ip link set s2 mtu 7418
post-up ip link set s2 master san
iface s3 inet manual
up ip link set s3 up
post-up ip link set s3 mtu 7418
post-up ip link set s3 master san
iface s4 inet manual
up ip link set s4 up
post-up ip link set s4 mtu 7418
post-up ip link set s4 master san
Note that the MTU is set at both ends of the USB Ethernet gadget. A Jumbo packet of size 7418 was chosen; however, even this conservative value increases throughput by a factor of about ten-fold under load. I may explore further tuning in subsequent posts.
Next, install the NFS server using
# apt-get install nfs-kernel-server
and configure the /etc/exports file to allow each Zero to mount their root file systems as /x/s0, /x/s1 and so forth and the home directories which are now in /x/snaa. The resulting exports file should look like
Code: Select all
/x/s0 192.168.7.33(rw,no_root_squash,async,no_subtree_check)
/x/snaa 192.168.7.33(rw,no_root_squash,async,no_subtree_check)
/x/s1 192.168.7.37(rw,no_root_squash,async,no_subtree_check)
/x/snaa 192.168.7.37(rw,no_root_squash,async,no_subtree_check)
/x/s2 192.168.7.41(rw,no_root_squash,async,no_subtree_check)
/x/snaa 192.168.7.41(rw,no_root_squash,async,no_subtree_check)
/x/s3 192.168.7.45(rw,no_root_squash,async,no_subtree_check)
/x/snaa 192.168.7.45(rw,no_root_squash,async,no_subtree_check)
/x/s4 192.168.7.49(rw,no_root_squash,async,no_subtree_check)
/x/snaa 192.168.7.49(rw,no_root_squash,async,no_subtree_check)
Note that flag no_root_squash is essential for the root file systems and we have also included it in home file system. The async and no_subtree_check options have been added for performance reasons.
We are now ready to create the root filesystems that the Zero's will mount over NFS. This will be done using the same copy-on-write snapshots that were used for creating the individual boot directories. We emphasize that the copy-on-write semantics imply that only one copy of the root filesystem will be stored on the SD card even though there logically appears to be five additional copies one for each Pi Zero. Since the fstab of each Zero will be different than for the B+, we create the file /x/sproto/fstab to look like
Code: Select all
LABEL=BOOT /boot vfat nofail 0 2
/dev/nfs / nfs noatime 1 1
LABEL=SWAP none swap sw,nofail 0 0
LABEL=SCRATCH /x/scratch ext4 nofail 0 2
snail:/x/snaa /x/snaa nfs vers=3,noacl,async,bg 0 0
The nofail option has been included so that the Pi Zero's boot whether or not they have an suitably formatted SD card. Note that, at this point there are no SD cards present in any of the Pi Zeros.
Finally, we describe the script /x/supdate that create and update the root file systems for the Zero's using a snapshot of the current root file system of the B+. This script reads
Code: Select all
#!/bin/bash
for i in s0 s1 s2 s3 s4
do
echo Configuring $i...
btrfs sub del $i
btrfs sub snap / $i
(
if cd $i/x
then
rmdir scratch
mkdir scratch
fi
)
echo $i >$i/etc/hostname
cp /x/sproto/fstab $i/etc/fstab
rm $i/etc/exports
done
Note that the new fstab is copied into each snapshot and the exports file deleted. The script is complicated by the "rmdir scratch; mkdir scratch" sequence of commands, which for reasons I don't know seems necessary for creating a valid mount point for the SD card later. Run the script as
# cd /x
# ./supdate
Configuring s0...
ERROR: cannot access subvolume s0: No such file or directory
Create a snapshot of '/' in './s0'
Configuring s1...
ERROR: cannot access subvolume s1: No such file or directory
Create a snapshot of '/' in './s1'
Configuring s2...
ERROR: cannot access subvolume s2: No such file or directory
Create a snapshot of '/' in './s2'
Configuring s3...
ERROR: cannot access subvolume s3: No such file or directory
Create a snapshot of '/' in './s3'
Configuring s4...
ERROR: cannot access subvolume s4: No such file or directory
Create a snapshot of '/' in './s4'
As when creating the boot subvolumes, the error messages can be ignored. They will not appear when the script is run again. Every time we change or update the root file system on the B+ using, for example, the commands "apt-get update; apt-get upgrade" the corresponding snapshots in /x/s0, x/s1 and so forth will need to get updated using the supdate script. Note, however, that the snapshots should not be updated when the Zero's are currently booted. We will add additional scripts in a subsequent post that automatically halt the Pi Zero's before performing such an update.
At this point, it should be possible to bring the entire cluster up by rebooting the B+ with the command
# /sbin/reboot
After doing this, you can check that the nodes booted by examining the log file /var/log/rpiboot.log and checking the network status. For reference, the ifconfig should report something like
Code: Select all
root@snail:/etc/ssh# /sbin/ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.46.34 netmask 255.255.255.0 broadcast 192.168.46.255
inet6 fe80::6fdf:e7e8:a7cc:c6ba prefixlen 64 scopeid 0x20<link>
ether b8:27:eb:0b:0d:c2 txqueuelen 1000 (Ethernet)
RX packets 31301 bytes 2044970 (1.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3889 bytes 506946 (495.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1 (Local Loopback)
RX packets 171 bytes 29500 (28.8 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 171 bytes 29500 (28.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 7418
inet6 fe80::34:33ff:fe3c:5022 prefixlen 64 scopeid 0x20<link>
ether 02:34:33:3c:50:22 txqueuelen 1000 (Ethernet)
RX packets 19536 bytes 2173734 (2.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 31902 bytes 37517334 (35.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
s1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 7418
inet6 fe80::34:33ff:fe3c:5026 prefixlen 64 scopeid 0x20<link>
ether 02:34:33:3c:50:26 txqueuelen 1000 (Ethernet)
RX packets 31298 bytes 3124166 (2.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 52993 bytes 68332352 (65.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
s2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 7418
inet6 fe80::34:33ff:fe3c:502a prefixlen 64 scopeid 0x20<link>
ether 02:34:33:3c:50:2a txqueuelen 1000 (Ethernet)
RX packets 20024 bytes 2191754 (2.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 30993 bytes 37127532 (35.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 7418
inet6 fe80::34:33ff:fe3c:502e prefixlen 64 scopeid 0x20<link>
ether 02:34:33:3c:50:2e txqueuelen 1000 (Ethernet)
RX packets 20349 bytes 2207350 (2.1 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 31348 bytes 36773794 (35.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
s4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 7418
inet6 fe80::34:33ff:fe3c:5032 prefixlen 64 scopeid 0x20<link>
ether 02:34:33:3c:50:32 txqueuelen 1000 (Ethernet)
RX packets 20014 bytes 2184902 (2.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 31372 bytes 36539786 (34.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
san: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 7418
inet 192.168.7.2 netmask 255.255.255.0 broadcast 192.168.7.255
inet6 fe80::f0a9:37ff:fe8a:f827 prefixlen 64 scopeid 0x20<link>
ether 02:34:33:3c:50:22 txqueuelen 1000 (Ethernet)
RX packets 1056152 bytes 207951908 (198.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 974310 bytes 866505756 (826.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0