Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Ray job refused to submit jobs in PENDING status #4260

Closed
Michaelvll opened this issue Nov 5, 2024 · 1 comment
Closed

[Core] Ray job refused to submit jobs in PENDING status #4260

Michaelvll opened this issue Nov 5, 2024 · 1 comment
Labels

Comments

@Michaelvll
Copy link
Collaborator

A user encountered an issue where when they submit ~1000 jobs to a cluster with ~100 nodes, at the end there are 4 jobs remain in PENDING state, and other jobs are in terminal states.

When checking the ray job list, it seems the latest job being ray job submit'ed is in PENDING state, although ray status shows all CPUs/GPUs are available, i.e. ray job does not start the job in PENDING state.

Version & Commit info:

  • sky -v: PLEASE_FILL_IN
  • sky -c: PLEASE_FILL_IN
@Michaelvll Michaelvll added the P0 label Nov 5, 2024
@Michaelvll
Copy link
Collaborator Author

This should be fixed by #4318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant