Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedicated GPU's for time slicing on multi GPU set ups. #628

Closed
joe-schwartz-certara opened this issue Apr 8, 2024 · 5 comments
Closed
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@joe-schwartz-certara
Copy link

I'm wondering if there is a simple way to set up a configuration for dedicating a single GPU on a multi-GPU system to time-slicing. For example, my use case is that I have some services which are critical and some which are not and I want to use time-slicing for the non-critical services and leave dedicated GPUs to the critical services.

It seems like this plugin is close to allowing that and I was expecting something like

version: v1
sharing:
timeSlicing:
renameByDefault: true
resources:
- name: nvidia.com/gpu
id: 0
replicas: 10

to select ten time-slicing replicas for the 0-th GPU as indexed via nvidia-smi and then I would request resources to pods via either nvidia.com/gpu.shared (for non-dedicated GPU usage on the 0-th GPU) or nvidia.com/gpu (for dedicated GPU usage). Is this kind of fine-grained control planned for the future or is there something simple I can do to route only some of the hardware thru the sharing part of the plugin?

@frittentheke
Copy link

I have the exact same question. Looking at the code doing the timeSlicing (a7c5dcf) it's possible to define devices via GPU index, GPU UUID or even MIG UUID.

But apparently device selection is currently "disabled" via

// Disable renaming / device selection in Sharing.TimeSlicing.Resources

This restriction was there from the beginning, if you look at https://github.com/NVIDIA/k8s-device-plugin/blame/b9fe486d8b7c581e1b144ea31f0d6f6173668601/cmd/gpu-feature-discovery/main.go#L276 when the code was copied over from https://github.com/NVIDIA/gpu-feature-discovery/blob/152fa93619e973043d936f19bf20bb465c1ab289/cmd/gpu-feature-discovery/main.go#L276

@elezar @ArangoGutierrez @tariq1890 since you contributed (to) this code, may I kindly ask you to elaborate if adding the capability to "only" do timeSlicing / create replicas for a subset of GPU or MIG instances?

I myself would love to partition all my GPUs via MIG, but only also enable timeSlicing on the MIG instances of the first two.
Not being able to filter whole GPUs is even worse as this requires all GPUs in a machine to either do time-slicing or not (via node-specific config).

@joe-schwartz-certara
Copy link
Author

@frittentheke I am still bouncing around ideas on how to do the kind of fine-grained GPU access control that you and I both need. I discovered that you can override the envvar assignment from the plugin by just setting

        - name: NVIDIA_VISIBLE_DEVICES
          value: <comma separated list of the exact GPU uuids that you want the pod to use>

And if you use the same uuid(s) for 2 different pod specs, the applications will share the gpu selected with no problems. My lack of problems for this oversubscription method w/o using time slicing is probably due to the nature of the applications im running (they both claim all the VRAM they will need as soon as they start up) but I still am worried that this deployment strategy has some unknown issues since I'm basically just ignoring the plugin entirely.

As has been mentioned before: https://docs.google.com/document/d/1BNWqgx_SmZDi-va_V31v3DnuVwYnF2EmN7D-O_fB6Oo/edit#heading=h.bxuci8gx6hna This feature does exactly what we want. But we have to wait...

I will also comment that another, hacky workaround is to use the 'whole' GPU mig partitions (i.e. on an 80gb a100, nvidia.com/mig-7g.80gb is the whole gpu), set only some of the node's gpus to the 'whole' partition, and then select only those that are partitioned for time-slicing. I still foresee problems if you need even more fine control i.e. where you have application a,b,c, and d and a+b can share a gpu, as well as c+d, but c+a cannot (a scenario where application a,c are large gpu requirements but b, d are small). The way k8 routing works, you cannot assure that your resources will get allocated as a+b and c+d instead of some other combination.

@frittentheke
Copy link

I suppose the rather new Dynamic Resource Allocation (https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) is what will somewhat solve this issue of dedicated resources to be claimed by workload.
NVIDIA apparently is working on a driver for their GPU resources: https://github.com/NVIDIA/k8s-dra-driver

Copy link

github-actions bot commented Nov 5, 2024

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 5, 2024
Copy link

github-actions bot commented Dec 5, 2024

This issue was automatically closed due to inactivity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

2 participants