coddy
Posts: 1
Joined: Sun Oct 01, 2023 12:17 am

OpenMPI built from source not working on a raspberry pi + mac cluster

Sun Oct 01, 2023 1:36 am

I am trying to use macbook m1 as the master in a cluster of 4 raspberry pies (workers, connected to each other with a switch which is connected to my home router to which the mac is connected via wifi). I built OpenMPI (4.1.5) from source for both raspberry pi 4 and the macbook and have configured everything correctly with hosts and hostnames and saved public keys in each raspberry for direct login from the master.

However, when I run

Code: Select all

mpiexec -machinefile machinefile -n 5 python mpi_run.py
machinefile

Code: Select all

MacBook-Air.attlocal.net
rpi1
rpi2
rpi3
rpi4
mpi_run.py file

Code: Select all

from mpi4py import MPI
import sys

size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()

sys.stdout.write(
    "Hello, World! I am process %d of %d on %s.\n"
    % (rank, size, name))

a test example it just doesn't output anything. The working animation in the top right of the terminal runs for a few seconds and then nothing happens, no output or error. The mpiexec does run individually on each machine.

ejolson
Posts: 11715
Joined: Tue Mar 18, 2014 11:47 am

Re: OpenMPI built from source not working on a raspberry pi + mac cluster

Sun Oct 01, 2023 4:41 am

coddy wrote:
Sun Oct 01, 2023 1:36 am
I am trying to use macbook m1 as the master in a cluster of 4 raspberry pies (workers, connected to each other with a switch which is connected to my home router to which the mac is connected via wifi). I built OpenMPI (4.1.5) from source for both raspberry pi 4 and the macbook and have configured everything correctly with hosts and hostnames and saved public keys in each raspberry for direct login from the master.

However, when I run

Code: Select all

mpiexec -machinefile machinefile -n 5 python mpi_run.py
machinefile

Code: Select all

MacBook-Air.attlocal.net
rpi1
rpi2
rpi3
rpi4
mpi_run.py file

Code: Select all

from mpi4py import MPI
import sys

size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()

sys.stdout.write(
    "Hello, World! I am process %d of %d on %s.\n"
    % (rank, size, name))

a test example it just doesn't output anything. The working animation in the top right of the terminal runs for a few seconds and then nothing happens, no output or error. The mpiexec does run individually on each machine.
Is the working directory shared between all the machines at the same path? Also, are you trying to launch the ranks on each node using public key ssh?

To make it easier to launch an MPI task across the nodes in your cluster I would suggest installing the slurm workload manager.

https://slurm.schedmd.com/overview.html

Once slurm is able to schedule non-parallel tasks on each of the nodes, it should also be able to launch an MPI job. Some notes about how I set up a super cheap cluster of Pi Zero computers is available at

viewtopic.php?t=199994

If you are still having trouble, I'd suggest getting it working first with only the Pi computers. This is because heterogeneous clusters are difficult to setup and use, especially when running different operating systems. Note even when all the nodes are the same, simply having a mix of big performance cores and little efficiency cores causes problems.

Return to “Networking and servers”