ejolson
Posts: 10214
Joined: Tue Mar 18, 2014 11:47 am

Re: Super-cheap Computing Cluster for Learning

Mon Jan 24, 2022 5:17 am

Gavinmc42 wrote:
Mon Jan 24, 2022 5:05 am
I don't have any Zero2's yet, wonder if they will work on my old ClusterHat?
Might need to boost the power supply system.
Put the Hat on a Pi3 and that's 20 of the same cores.

What are the current requirements for a Zero2?
2.5A :o 10Amps plus the Pi3 power?
Might have to make sure the RF and HDMI is powered down.
Then how much power does the Zero2 use?
As reported in

viewtopic.php?p=1930593#p1930593

the Pi 2 Zero can consume 730 mA with just CPU compute. That's enough more than the old Zero that they likely will not work in a standard cluster hat.

Rather than the powered hub I've been using for the first super-cheap cluster, my plan is to start with an unpowered hub and add my own power similar to how the Pi cloud powers all those 4B nodes.

viewtopic.php?p=1690607#p1690607
Last edited by ejolson on Mon Jan 24, 2022 7:29 am, edited 1 time in total.

User avatar
Gavinmc42
Posts: 6909
Joined: Wed Aug 28, 2013 3:31 am

Re: Super-cheap Computing Cluster for Learning

Mon Jan 24, 2022 6:37 am

Not sure if this is with HDMI and RF powered down.
https://8086.support/?action=faq&cat=23 ... artlang=en
So budget for about 20Watts.
1Watt per CPU core?

I think those mosfets should be able to handle 1A per Pi.
Bit Pi in the sky at the moment, not sure when I will be able to snaffle 4 x Pi Zero2's ;)
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

ejolson
Posts: 10214
Joined: Tue Mar 18, 2014 11:47 am

Re: Super-cheap Computing Cluster for Learning

Sun Apr 24, 2022 10:55 pm

Another parallel computation that can run on the super-cheap computing cluster for learning is repeated trials of computer versus computer battleship.

Details are at

viewtopic.php?p=1996855#p1996855

ejolson
Posts: 10214
Joined: Tue Mar 18, 2014 11:47 am

Re: Super-cheap Computing Cluster for Learning

Thu Jul 07, 2022 6:52 am

I fixed a minor typo in

viewtopic.php?p=1254966#p1254966

concerning the numbering of the USB hubs.

ejolson
Posts: 10214
Joined: Tue Mar 18, 2014 11:47 am

Re: Super-cheap Computing Cluster for Learning

Thu Oct 13, 2022 4:19 pm

I just discovered the official tutorials

https://www.raspberrypi.com/tutorials/

under For Home on the Pi website.

Among other suggestions, there is a tutorial on setting up a Pi cluster.

https://www.raspberrypi.com/tutorials/c ... -tutorial/

Compared to the 100$ bill of materials listed in the first post of this thread

viewtopic.php?p=1246773#p1246773

that configuration costs significantly more. However, eight Pi 4B computers networked through gigabit Ethernet may be useful for practical applications as well as for learning.

ejolson
Posts: 10214
Joined: Tue Mar 18, 2014 11:47 am

Re: Super-cheap Computing Cluster for Learning

Thu Oct 13, 2022 11:08 pm

ejolson wrote:
Thu Oct 13, 2022 4:19 pm
Compared to the 100$ bill of materials listed in the first post of this thread

viewtopic.php?p=1246773#p1246773

that configuration costs significantly more. However, eight Pi 4B computers networked through gigabit Ethernet may be useful for practical applications as well as for learning.
I spent a few minutes checking prices under the assumption there were no Pi-related supply-chain problems and came up with an estimated budget of

Code: Select all

Raspberry Pi 4B 4GB    8 x 55 = 440
Raspberry Pi POE+ Hat  8 x 20 = 160
Generic 120W POE+ Switch         70
Generic 1' Cat6 Cable   8 x 2 =  16
Generic 1TB SATA SSD             65
Compatible USB SATA Bridge       10
Sandisk 32GB A1                  10
Aquarium tubing and wood         10
                          TOTAL 781 $US
for the cluster in the official tutorial, which is roughly 8 times the price of the super-cheap cluster.

To estimate the price-performance of the super-cheap cluster compared to the more expensive Pi 4B cluster, I used Julia to compute what is essentially the Linpack FLOPS number for a single node. In particular, I ran the program

Code: Select all

using LinearAlgebra

n=parse(Int64,ARGS[1])
flmax=0
println("n = ",n)
for i=1:5
    A=rand(n,n)
    b=A*ones(n)
    Tn=@elapsed x=A\b
    flops=1/Tn*(2/3*n^3+2*n^2)
    global flmax=max(flmax,flops)
    println("Tn = ",Tn)
    println("Error = ",norm(x-ones(n)))
    println("Flops = ",flops)
end
println()
println("Maximum Flops ",flmax)
using the script

Code: Select all

#!/bin/bash
export OPENBLAS_NUM_THREADS=`nproc`
julia flops.jl "$@"
Note the export is to ensure OpenBLAS uses all four cores on a Pi 4B.

The results were

Code: Select all

         Cores   GFLOPS    Nodes   GFLOPS peak
Pi 4B      4      12.5       8         100
Pi Zero    1       0.183     6           1
Thus, the 4B cluster in the tutorial has a 12 times more effective price performance compared to the super-cheap cluster.

For reference, the output of Julia on a Pi 4B running at 1500 MHz was

Code: Select all

$ ./flops 10000
n = 10000
Tn = 57.52547519
Error = 1.2860853610341622e-8
Flops = 1.1592545119602802e10
Tn = 53.573115277
Error = 9.648042057924025e-9
Flops = 1.2447785857115644e10
Tn = 53.50908336
Error = 5.49172627187926e-9
Flops = 1.246268156342917e10
Tn = 53.132460633
Error = 7.600814386987754e-9
Flops = 1.2551021705410776e10
Tn = 53.172400765
Error = 1.516588010359596e-8
Flops = 1.2541594080243645e10

Maximum Flops 1.2551021705410776e10
while a Pi Zero running at 700 MHz obtains

Code: Select all

$ ./flops 2000
n = 2000
Tn = 74.692887318
Error = 2.4186135344497732e-8
Flops = 7.151060194785297e7
Tn = 29.602310524
Error = 6.595160061447455e-10
Flops = 1.8043636590471074e8
Tn = 30.712237466
Error = 2.876098929290894e-9
Flops = 1.7391547389689398e8
Tn = 29.248006363
Error = 1.7972495306031763e-10
Flops = 1.8262213386586076e8
Tn = 30.61493337
Error = 1.0451662734371308e-10
Flops = 1.744682331586382e8

Maximum Flops 1.8262213386586076e8
In my opinion a super-cheap cluster upgraded with the Pi Zero 2 would make a good compromise between too expensive and too slow.
Last edited by ejolson on Fri Oct 14, 2022 5:37 am, edited 3 times in total.

ejolson
Posts: 10214
Joined: Tue Mar 18, 2014 11:47 am

Re: Super-cheap Computing Cluster for Learning

Fri Oct 14, 2022 2:58 am

ejolson wrote:
Thu Oct 13, 2022 11:08 pm
In my opinion a super-cheap cluster upgraded with the Pi Zero 2 would make a good compromise between too expensive and too slow.
For a further frame of reference, I ran the same Julia code on a 8-core Ryzen 1700 PC and obtained

Code: Select all

$ ./flops 20000
n = 20000
Tn = 36.916107149
Error = 7.285739688271788e-9
Flops = 1.444933863639473e11
Tn = 35.786517102
Error = 1.192593059581478e-8
Flops = 1.490542742153364e11
Tn = 35.758374777
Error = 1.346231850567564e-7
Flops = 1.491715819468473e11
Tn = 35.76935344
Error = 3.5319862907093447e-9
Flops = 1.4912579681600565e11
Tn = 35.767223554
Error = 9.264243606074743e-9
Flops = 1.4913467703972217e11

Maximum Flops 1.491715819468473e11
While an 8-core office computer seemed special in 2017, such systems can now be had for less than the 781 $US price of the Pi 4B cluster.

Code: Select all

         Cores   GFLOPS    Nodes   GFLOPS peak
Ryzen 1700 8     149.2       1         149
Pi 4B      4      12.5       8         100
Pi Zero    1       0.183     6           1
At the same time, the performance appears better and PC’s are in stock.

On the other hand, one can’t learn much about setting up and using a cluster without actually setting up and using a cluster. From a practical point of view, writing parallel code for SMP architectures with very low communication latency between tasks but where total memory bandwidth provides a bottleneck is much different than distributed memory clusters where no such bottleneck occurs but communication latency is much higher.

ejolson
Posts: 10214
Joined: Tue Mar 18, 2014 11:47 am

Re: Super-cheap Computing Cluster for Learning

Fri Oct 14, 2022 6:37 am

ejolson wrote:
Fri Oct 14, 2022 2:58 am
While an 8-core office computer seemed special in 2017, such systems can now be had for less than the 781 $US price of the Pi 4B cluster.

Code: Select all

         Cores   GFLOPS    Nodes   GFLOPS peak
Ryzen 1700 8     149.2       1         149
Pi 4B      4      12.5       8         100
Pi Zero    1       0.183     6           1
This should probably go in the Julia thread; however, given the focus of Julia on high-performance computing it also makes sense to discuss Julia in the context of Raspberry Pi clusters.

I further tried the flops test on a Ryzen 4650G APU. Weirdly, OpenBLAS selected the slow Prescott-compatible kernels since it didn't recognize the newer AMD CPU. To remedy this I changed the startup script to

Code: Select all

#!/bin/bash
export OPENBLAS_NUM_THREADS=`nproc`
export OPENBLAS_CORETYPE=Zen
julia flops.jl "$@"
After that I obtained

Code: Select all

$ taskset -c 0-5 ./flops 20000
n = 20000
Tn = 18.387137325
Error = 1.5840662251445083e-8
Flops = 2.901013485161064e11
Tn = 17.71443638
Error = 1.053741557444158e-8
Flops = 3.011178690029162e11
Tn = 17.742647695
Error = 9.314802079674327e-9
Flops = 3.006390830178367e11
Tn = 17.777173661
Error = 2.479715104782187e-7
Flops = 3.0005519634628345e11
Tn = 17.817919714
Error = 9.64708085460602e-9
Flops = 2.993690295473813e11

Maximum Flops 3.011178690029162e11
The updated table is

Code: Select all

           Cores   GFLOPS    Nodes   GFLOPS peak
Ryzen 4650G  6     301.1       1         301
Ryzen 1700   8     149.2       1         149
Pi 4B        4      12.5       8         100
Pi Zero      1       0.183     6           1
This further illustrates that the practical use for the super-cheap cluster as well as the Pi 4B cluster described in the tutorial is to learn about cluster computing rather than build a fast machine.

According to the dog developer true SMP doesn't scale past 16 cores. Anything larger has NUMA characteristics.

While the unified CXL memory space of the latest PCIe generations may allow thousand-core NUMA architectures to become a commodity technology, the same algorithmic consideration of memory locality needed to write parallel code for a cluster will likely be needed to make efficient use of future NUMA systems. For this reason, any type of cluster could be a useful learning environment for some time to come.

ejolson
Posts: 10214
Joined: Tue Mar 18, 2014 11:47 am

Re: Super-cheap Computing Cluster for Learning

Thu Nov 24, 2022 5:13 am

ejolson wrote:
Fri Oct 14, 2022 6:37 am
The updated table is

Code: Select all

           Cores   GFLOPS    Nodes   GFLOPS peak
Ryzen 4650G  6     301.1       1         301
Ryzen 1700   8     149.2       1         149
Pi 4B        4      12.5       8         100
Pi Zero      1       0.183     6           1
This further illustrates that the practical use for the super-cheap cluster as well as the Pi 4B cluster described in the tutorial is to learn about cluster computing rather than build a fast machine.
Here is a video comparing a six-node Pi cluster to a single socket Ampere Altra Max server:

https://www.youtube.com/watch?v=UT5UbSJOyog

see also

https://www.servethehome.com/raspberry- ... rm-server/

60 GFLOPS were obtained when solving systems of linear equations on the Pi cluster. This is 4.8 times the speed with Julia using a single Pi 4B. Although one might expect better scaling with the high-performance Linpack, the Julia results rely on OpenBLAS behind the scenes, which is generally faster than the ATLAS library used for the cluster.

The end of the video discussed possible use cases of each machine. There it was suggested that the Pi cluster might be useful for edge computing where having low power consumption was more important than efficiency. In line with the theme of the current thread, my opinion is that such a cluster would be even more useful for learning about clusters.

Return to “Teaching and learning resources”