Install dependencies (Ubuntu 18.04 and 20.04)
sudo apt -y install qemu qemu-utils qemu-system-x86
Install Packer
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install packer
Go to Packer template directory for DGX OS 5, i.e
cd packer-maas/dgxos5
Download DGX OS somewhere (i.e. /work)
Generate checksum of image
sha256sum /work/DGXOS-5.0.0-2020-10-01-18-07-44.iso
Build image
# When building the image, you can pass variables to the `build` command or edit the dgxos5.json file
# At minimum update the ISO location and SHA sum (dgxos5_iso and dgxos5_sha256sum)
# i.e.
# "variables":
# {
# "platform": "dgx1",
# "dgxos5_iso": "/work/DGXOS-5.0.0-2020-10-01-18-07-44.iso",
# "dgxos5_sha256sum": "6e5c7ba2024640b3f23ec8681c15c8ccf8997a23da91c7e9d4eacf73bb564bee"
# },
sudo packer build dgxos5.json
# Optionally, instead of modifying config file:
sudo packer build -var 'dgxos5_iso=/path/to/dgx_iso' -var 'dgxos5_sha256sum=<dgx_os_iso_sha256_sum>' dgxos5.json
# Available platforms: dgx1, dgx2, dgx_a100
# For more verbosity set `PACKER_LOG=1`, i.e sudo PACKER_LOG=1 build ...
Go grab a beer while the image builds...
Add image to MAAS:
# Be sure to substitute the proper platform name, i.e. dgx1, dgx2, dgx_a100
maas $PROFILE boot-resources create name='custom/dgx1-5.0' title='NVIDIA DGX-1 5.0' architecture='amd64/generic' filetype='tgz' content@=dgxos5.tar.gz
If using MAAS 3.0 or later, you will need to use the following command, which adds the base_image argument:
# Be sure to substitute the proper platform name, i.e. dgx1, dgx2, dgx_a100
maas $PROFILE boot-resources create name='custom/dgx_a100-5.1' title='NVIDIA DGX-A100 5.1' architecture='amd64/generic' filetype='tgz' base_image=ubuntu/focal content@=dgxos5.tar.gz
Ensure that you are authenticated via the commandline for your MaaS user. If you are not, you may encounter an error like this:
argument COMMAND: invalid choice: 'boot-resources' (choose from 'login', 'logout', 'list', 'refresh', 'init', 'config', 'status', 'migrate', 'reconfigure-supervisord', 'apikey', 'configauth', 'createadmin', 'changepassword')
To authenticate:
maas login $PROFILE http://<MAAS url>
#upon executing the above command, the following prompt will present
API key (leave empty for anonymous access):
At the prompt, enter an API key, which can be acquired with the following command:
maas apikey --username $PROFILE
Re-run the boot-resources command once authenticated.
Boot machines in EFI mode
In maas, create an EFI partition in addition to other partitions, i.e:
# NAME SIZE FILESYSTEM MOUNT POINT
sda-part1 511.7 MB fat32 /boot/efi
sda-part2 63.9 GB ext4 /
Troubleshooting:
# Sometimes nbd devices don't get unmounted between builds with packer
# so run as root:
umount /dev/nbd*
# between builds, remove artifacts:
sudo rm -rf output-qemu/ dgxos5.tar.gz
If you see an error like this from the boot-resources command:
[Errno 13] Permission denied: './dgxos5.tar.gz'
You can update the ownership of the tar.gz file, like this:
sudo chown <target user>:<target group> dgxos5.tar.gz
TODO Next:
- kernel parameters in MAAS (w/ tags)
- document generate one image per DGX type