Skip to content

josanabr/docker-mpi

 
 

Repository files navigation

docker-mpi

MPI cluster on Docker Swarm.

Another (more complete solution) is available here https://github.com/NLKNguyen/alpine-mpich

Do you have two or more physical machines and want to try your personal MPI cluster for your calculations? This toy project might be useful for you! Setup your Docker Swarm cluster, deploy mpistack image, attach to the master container and simply run your MPI program on multiple hosts.

Prerequisites

Those things you need to do only once.

Setup Docker Sworm network (details: TBD)

Set up registry container (this is the approach to distribute image among nodes):

docker service create --name registry --publish published=5000,target=5000 registry:2

To validate if service is running:

docker service ps registry

Now we are ready to push your image there.

Launching MPI cluster

Build and push stackmpi image to the registry:

docker compose build
docker compose push

stackmpi is based on Ubuntu 20.04 with MPICH installed. If you need any additianal software available on each node, modify Dockerfile appropriately and build+push image again.

REMARK: If you restart your machine, from some reason the image needs to be pushed to the repository again.

There are a few helper scripts available:

  • start-stack and stop-stack starts or stops all nodes (docker stack) specified in docker-compose.yml. By default there is one master node started on manager node (assume this is current one) and 2 worker nodes. If you want more workers, go to docker-compose.yml file and modify replicas attribute in worker section.

  • attach-stack attaches to the master container.

  • ssh-stack allows to connect via SSH to the master. However, SSH port is not exposed therefore in order to expose the SSH port, you need to uncomment the port section in the master section at docker-compose.yml.

Inside master node, there are a few aditional commands available:

  • node-master displays IP of master node,

  • node-workers displays list of IPs of worker nodes,

  • machines get list of IPs of all nodes (master and workers) and put it into the file /root/machines (yep, not perfect),

Executing MPI program

Attach to the master container with attach-stack command and fill the file /root/machines with the list of IPs of all nodes with machines command. Now you are ready to run simplest MPI program:

mpiexec -f machinefile -n 3 hostname

The output might look like this

root@master:~# mpiexec -f machinefile -n 4 hostname
master
worker
master
worker

Now lets try something closer to the reality. Go to /root/project/ and build sample program with make. Once the compilation is successful then copy test-mpi to the /shared-directory. Execute the program with:

mpiexec -f machinefile -n 2 /shared_dir/test-mpi

The output might look like this

root@master:~# mpiexec -f machinefile -n 2 /shared_dir/test-mpi 
Processor master, rank 1 out of 2 processors. CPUs: 4   CPUs available: 4
Processor master, rank 0 out of 2 processors. CPUs: 4   CPUs available: 4

About

MPI cluster on Docker Swarm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Dockerfile 42.0%
  • Shell 30.3%
  • C 23.7%
  • Makefile 4.0%