ejolson wrote: ↑Thu Oct 13, 2022 4:19 pm
Compared to the 100$ bill of materials listed in the first post of this thread
viewtopic.php?p=1246773#p1246773
that configuration costs significantly more. However, eight Pi 4B computers networked through gigabit Ethernet may be useful for practical applications as well as for learning.
I spent a few minutes checking prices under the assumption there were no Pi-related supply-chain problems and came up with an estimated budget of
Code: Select all
Raspberry Pi 4B 4GB 8 x 55 = 440
Raspberry Pi POE+ Hat 8 x 20 = 160
Generic 120W POE+ Switch 70
Generic 1' Cat6 Cable 8 x 2 = 16
Generic 1TB SATA SSD 65
Compatible USB SATA Bridge 10
Sandisk 32GB A1 10
Aquarium tubing and wood 10
TOTAL 781 $US
for the cluster in the official tutorial, which is roughly 8 times the price of the super-cheap cluster.
To estimate the price-performance of the super-cheap cluster compared to the more expensive Pi 4B cluster, I used Julia to compute what is essentially the Linpack FLOPS number for a single node. In particular, I ran the program
Code: Select all
using LinearAlgebra
n=parse(Int64,ARGS[1])
flmax=0
println("n = ",n)
for i=1:5
A=rand(n,n)
b=A*ones(n)
Tn=@elapsed x=A\b
flops=1/Tn*(2/3*n^3+2*n^2)
global flmax=max(flmax,flops)
println("Tn = ",Tn)
println("Error = ",norm(x-ones(n)))
println("Flops = ",flops)
end
println()
println("Maximum Flops ",flmax)
using the script
Code: Select all
#!/bin/bash
export OPENBLAS_NUM_THREADS=`nproc`
julia flops.jl "$@"
Note the export is to ensure OpenBLAS uses all four cores on a Pi 4B.
The results were
Code: Select all
Cores GFLOPS Nodes GFLOPS peak
Pi 4B 4 12.5 8 100
Pi Zero 1 0.183 6 1
Thus, the 4B cluster in the tutorial has a 12 times more effective price performance compared to the super-cheap cluster.
For reference, the output of Julia on a Pi 4B running at 1500 MHz was
Code: Select all
$ ./flops 10000
n = 10000
Tn = 57.52547519
Error = 1.2860853610341622e-8
Flops = 1.1592545119602802e10
Tn = 53.573115277
Error = 9.648042057924025e-9
Flops = 1.2447785857115644e10
Tn = 53.50908336
Error = 5.49172627187926e-9
Flops = 1.246268156342917e10
Tn = 53.132460633
Error = 7.600814386987754e-9
Flops = 1.2551021705410776e10
Tn = 53.172400765
Error = 1.516588010359596e-8
Flops = 1.2541594080243645e10
Maximum Flops 1.2551021705410776e10
while a Pi Zero running at 700 MHz obtains
Code: Select all
$ ./flops 2000
n = 2000
Tn = 74.692887318
Error = 2.4186135344497732e-8
Flops = 7.151060194785297e7
Tn = 29.602310524
Error = 6.595160061447455e-10
Flops = 1.8043636590471074e8
Tn = 30.712237466
Error = 2.876098929290894e-9
Flops = 1.7391547389689398e8
Tn = 29.248006363
Error = 1.7972495306031763e-10
Flops = 1.8262213386586076e8
Tn = 30.61493337
Error = 1.0451662734371308e-10
Flops = 1.744682331586382e8
Maximum Flops 1.8262213386586076e8
In my opinion a super-cheap cluster upgraded with the Pi Zero 2 would make a good compromise between too expensive and too slow.