forked from kubernetes-sigs/image-builder
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
patch: new GPU role to support both AMD and NVIDIA drivers
* Created new gpu role that can support installing a variety of GPU drivers * Added AMD support * Moved NVIDIA into the new role and added gpu_arch var to support selecting the type of GPU driver to install
- Loading branch information
1 parent
d967e60
commit c5cbb4f
Showing
9 changed files
with
177 additions
and
44 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
.vscode/ | ||
.idea/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# GPU driver installation | ||
|
||
The GPU drivers have to be installed via the `node_custom_roles_pre` option to avoid an issue where, should a | ||
dist-upgrade install a new kernel, | ||
the driver won't work when the image is booted. This is because the DKMS hook doesn't run when the driver | ||
is installed after the kernel has been installed. To get around this, we install the drivers first. | ||
|
||
# NVIDIA vGPU | ||
|
||
To install the NVIDIA vGPU driver as part of the image build process, you must have a `.run` file and `.tok` file from | ||
NVIDIA ready and available from an S3 endpoint. | ||
Once done you need to reference those files in your packer file. | ||
|
||
_This is because NVIDIA place the vGPU drivers behind a licensing wall which means you can't just use the standard | ||
installation process for them._ | ||
_NVIDIA, as of July 2023, no longer support an internal licensing server being hosted by a customer._ | ||
_This role currently doesn't support installing the publicly available drivers._ | ||
|
||
An example of the fields you need are defined below. Make sure to review and change any fields where required. | ||
If the gridd configuration or licensing .tok file are not required then you can omit the `gridd_feature_type` | ||
and `nvidia_tok_location` respectively. | ||
|
||
```json | ||
{ | ||
"ansible_user_vars": "gpu_vendor=nvidia nvidia_s3_url=https://s3-endpoint nvidia_bucket=nvidia nvidia_bucket_access=ACCESS_KEY nvidia_bucket_secret=SECRET_KEY nvidia_installer_location=NVIDIA-Linux-x86_64-525.85.05-grid.run nvidia_tok_location=client_configuration_token.tok gridd_feature_type=4", | ||
"node_custom_roles_pre": "gpu" | ||
} | ||
|
||
``` | ||
|
||
The `nvidia` custom role does not make use of the `load_additional_components->s3` role due to a conflict that can occur | ||
when attempting to also use other aspects of `load_additional_components`. | ||
As the `nvidia` role is loaded as part of `node_custom_roles_pre`, it means that `load_additional_components` could be | ||
called out of order. | ||
|
||
As a result they now require a `.tok` file to be available for licensing via their cloud services. | ||
This file contains sensitive information and is unique to the company/license to which it is provided. | ||
|
||
# AMD | ||
|
||
Installing the AMD GPU driver is much more straightforward due to the public availability of the drivers. | ||
|
||
An example of the fields you need are defined below. Make sure to review and change any fields where required. | ||
|
||
```json | ||
{ | ||
"ansible_user_vars": "gpu_vendor=amd amd_version=6.0.2 amd_deb_version=6.0.60002-1 amd_usecase=dkms", | ||
"node_custom_roles_pre": "gpu" | ||
} | ||
|
||
``` | ||
|
||
_**It is highly recommended you read through | ||
the [AMDGPU_Installer use-cases](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/amdgpu-install.html#use-cases) | ||
first to ensure you supply the correct one.**_ | ||
|
||
_**For example, using the `rocm` use case will install +24GB of libraries as | ||
well as the driver so your disk size will need to compensate for this.**_ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Copyright 2024 The Kubernetes Authors. | ||
|
||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
|
||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
--- | ||
|
||
gpu_amd_usecase: dkms |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Copyright 2024 The Kubernetes Authors. | ||
|
||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
|
||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
--- | ||
|
||
- name: Add the root user to the render and video groups | ||
ansible.builtin.user: | ||
name: root | ||
groups: render,video | ||
append: true | ||
when: ansible_os_family == "Debian" | ||
|
||
- name: Install the .deb for AMDGPU-Install | ||
ansible.builtin.apt: | ||
deb: "https://repo.radeon.com/amdgpu-install/{{ amd_version }}/ubuntu/jammy/amdgpu-install_{{ amd_deb_version }}_all.deb" | ||
when: ansible_os_family == "Debian" | ||
|
||
- name: Perform a cache update | ||
ansible.builtin.apt: | ||
force_apt_get: true | ||
update_cache: true | ||
register: apt_lock_status | ||
until: apt_lock_status is not failed | ||
retries: 5 | ||
delay: 10 | ||
when: ansible_os_family == "Debian" | ||
|
||
- name: Install packages required for AMD driver installation | ||
become: true | ||
ansible.builtin.apt: | ||
pkg: | ||
- "linux-headers-{{ ansible_kernel }}" | ||
- "linux-modules-extra-{{ ansible_kernel }}" | ||
- build-essential | ||
- dkms | ||
- rocminfo | ||
- clinfo | ||
when: ansible_os_family == "Debian" | ||
|
||
- name: Run AMDGPU_Install binary with use-cases | ||
ansible.builtin.command: | ||
cmd: "amdgpu-install -y --usecase={{ gpu_amd_usecase }}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Copyright 2024 The Kubernetes Authors. | ||
|
||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
|
||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
--- | ||
|
||
- name: Unload nouveau | ||
community.general.modprobe: | ||
name: nouveau | ||
state: absent | ||
ignore_errors: true | ||
|
||
- name: Include AMD | ||
ansible.builtin.include_tasks: amd.yml | ||
when: gpu_vendor == "amd" | ||
|
||
- name: Include NVIDIA | ||
ansible.builtin.include_tasks: nvidia.yml | ||
when: gpu_vendor == "nvidia" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file was deleted.
Oops, something went wrong.