Skip to content

Latest commit

 

History

History
 
 

dgxos5

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Install dependencies (Ubuntu 18.04 and 20.04)

sudo apt -y install qemu qemu-utils qemu-system-x86

Install Packer

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install packer

Go to Packer template directory for DGX OS 5, i.e

cd packer-maas/dgxos5

Download DGX OS somewhere (i.e. /work)

Generate checksum of image

sha256sum /work/DGXOS-5.0.0-2020-10-01-18-07-44.iso

Build image

# When building the image, you can pass variables to the `build` command or edit the dgxos5.json file
# At minimum update the ISO location and SHA sum (dgxos5_iso and dgxos5_sha256sum)
# i.e.
#    "variables":
#        {
#            "platform": "dgx1",
#            "dgxos5_iso": "/work/DGXOS-5.0.0-2020-10-01-18-07-44.iso",
#            "dgxos5_sha256sum": "6e5c7ba2024640b3f23ec8681c15c8ccf8997a23da91c7e9d4eacf73bb564bee"
#        },
sudo packer build dgxos5.json

# Optionally, instead of modifying config file:
sudo packer build -var 'dgxos5_iso=/path/to/dgx_iso' -var 'dgxos5_sha256sum=<dgx_os_iso_sha256_sum>' dgxos5.json

# Available platforms: dgx1, dgx2, dgx_a100

# For more verbosity set `PACKER_LOG=1`, i.e sudo PACKER_LOG=1 build ...

Go grab a beer while the image builds...

Add image to MAAS:

# Be sure to substitute the proper platform name, i.e. dgx1, dgx2, dgx_a100
maas $PROFILE boot-resources create name='custom/dgx1-5.0' title='NVIDIA DGX-1 5.0' architecture='amd64/generic' filetype='tgz' content@=dgxos5.tar.gz

If using MAAS 3.0 or later, you will need to use the following command, which adds the base_image argument:

# Be sure to substitute the proper platform name, i.e. dgx1, dgx2, dgx_a100
maas $PROFILE boot-resources create name='custom/dgx_a100-5.1' title='NVIDIA DGX-A100 5.1' architecture='amd64/generic' filetype='tgz' base_image=ubuntu/focal content@=dgxos5.tar.gz

Ensure that you are authenticated via the commandline for your MaaS user. If you are not, you may encounter an error like this:

argument COMMAND: invalid choice: 'boot-resources' (choose from 'login', 'logout', 'list', 'refresh', 'init', 'config', 'status', 'migrate', 'reconfigure-supervisord', 'apikey', 'configauth', 'createadmin', 'changepassword')

To authenticate:

maas login $PROFILE http://<MAAS url>
#upon executing the above command, the following prompt will present
API key (leave empty for anonymous access):

At the prompt, enter an API key, which can be acquired with the following command:

maas apikey --username $PROFILE

Re-run the boot-resources command once authenticated.

Boot machines in EFI mode

In maas, create an EFI partition in addition to other partitions, i.e:

# NAME    SIZE     FILESYSTEM   MOUNT POINT
sda-part1 511.7 MB fat32        /boot/efi
sda-part2 63.9 GB  ext4         /

Troubleshooting:

# Sometimes nbd devices don't get unmounted between builds with packer
# so run as root:
umount /dev/nbd*

# between builds, remove artifacts:
sudo rm -rf output-qemu/ dgxos5.tar.gz

If you see an error like this from the boot-resources command:

[Errno 13] Permission denied: './dgxos5.tar.gz'

You can update the ownership of the tar.gz file, like this:

sudo chown <target user>:<target group> dgxos5.tar.gz

TODO Next:

  • kernel parameters in MAAS (w/ tags)
  • document generate one image per DGX type