Skip to content

Commit

Permalink
[Core] Fix node affinity strategy when resource is empty (ray-project…
Browse files Browse the repository at this point in the history
…#25344)

Why are these changes needed?
Today, Ray scheduler always pick a random node if the resource requirement is empty, regardless of scheduling policy/strategy.

However, for node affinity scheduling policy, we should not pick random policy but try to stick to the node affinity constraints.
  • Loading branch information
scv119 authored Jun 1, 2022
1 parent 52774e8 commit 49b8bbf
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 2 deletions.
16 changes: 16 additions & 0 deletions python/ray/tests/test_scheduling_2.py
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,22 @@ def get_node_id(self):
).remote()
assert head_node_id == ray.get(actor.get_node_id.remote())

actor = Actor.options(
scheduling_strategy=NodeAffinitySchedulingStrategy(
worker_node_id, soft=False
),
num_cpus=0,
).remote()
assert worker_node_id == ray.get(actor.get_node_id.remote())

actor = Actor.options(
scheduling_strategy=NodeAffinitySchedulingStrategy(
head_node_id, soft=False
),
num_cpus=0,
).remote()
assert head_node_id == ray.get(actor.get_node_id.remote())

# Wait until the target node becomes available.
worker_actor = Actor.options(resources={"worker": 1}).remote()
assert worker_node_id == ray.get(worker_actor.get_node_id.remote())
Expand Down
15 changes: 13 additions & 2 deletions src/ray/raylet/scheduling/cluster_resource_scheduler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,16 @@ bool ClusterResourceScheduler::IsSchedulable(const ResourceRequest &resource_req
/*ignore_object_store_memory_requirement*/ node_id == local_node_id_);
}

namespace {
bool IsHardNodeAffinitySchedulingStrategy(
const rpc::SchedulingStrategy &scheduling_strategy) {
return scheduling_strategy.scheduling_strategy_case() ==
rpc::SchedulingStrategy::SchedulingStrategyCase::
kNodeAffinitySchedulingStrategy &&
!scheduling_strategy.node_affinity_scheduling_strategy().soft();
}
} // namespace

scheduling::NodeID ClusterResourceScheduler::GetBestSchedulableNode(
const ResourceRequest &resource_request,
const rpc::SchedulingStrategy &scheduling_strategy,
Expand All @@ -106,8 +116,9 @@ scheduling::NodeID ClusterResourceScheduler::GetBestSchedulableNode(
int64_t *total_violations,
bool *is_infeasible) {
// The zero cpu actor is a special case that must be handled the same way by all
// scheduling policies.
if (actor_creation && resource_request.IsEmpty()) {
// scheduling policies, except for HARD node affnity scheduling policy.
if (actor_creation && resource_request.IsEmpty() &&
!IsHardNodeAffinitySchedulingStrategy(scheduling_strategy)) {
return scheduling_policy_->Schedule(resource_request, SchedulingOptions::Random());
}

Expand Down

0 comments on commit 49b8bbf

Please sign in to comment.