Skip to content

Commit

Permalink
sched/numa: Complete scanning of partial VMAs regardless of PID activity
Browse files Browse the repository at this point in the history
NUMA Balancing skips VMAs when the current task has not trapped a NUMA
fault within the VMA. If the VMA is skipped then mm->numa_scan_offset
advances and a task that is trapping faults within the VMA may never
fully update PTEs within the VMA.

Force tasks to update PTEs for partially scanned PTEs. The VMA will
be tagged for NUMA hints by some task but this removes some of the
benefit of tracking PID activity within a VMA. A follow-on patch
will mitigate this problem.

The test cases and machines evaluated did not trigger the corner case so
the performance results are neutral with only small changes within the
noise from normal test-to-test variance. However, the next patch makes
the corner case easier to trigger.

Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Raghavendra K T <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
  • Loading branch information
gormanm authored and Ingo Molnar committed Oct 10, 2023
1 parent 2e2675d commit b7a5b53
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 4 deletions.
1 change: 1 addition & 0 deletions include/linux/sched/numa_balancing.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ enum numa_vmaskip_reason {
NUMAB_SKIP_INACCESSIBLE,
NUMAB_SKIP_SCAN_DELAY,
NUMAB_SKIP_PID_INACTIVE,
NUMAB_SKIP_IGNORE_PID,
};

#ifdef CONFIG_NUMA_BALANCING
Expand Down
3 changes: 2 additions & 1 deletion include/trace/events/sched.h
Original file line number Diff line number Diff line change
Expand Up @@ -670,7 +670,8 @@ DEFINE_EVENT(sched_numa_pair_template, sched_swap_numa,
EM( NUMAB_SKIP_SHARED_RO, "shared_ro" ) \
EM( NUMAB_SKIP_INACCESSIBLE, "inaccessible" ) \
EM( NUMAB_SKIP_SCAN_DELAY, "scan_delay" ) \
EMe(NUMAB_SKIP_PID_INACTIVE, "pid_inactive" )
EM( NUMAB_SKIP_PID_INACTIVE, "pid_inactive" ) \
EMe(NUMAB_SKIP_IGNORE_PID, "ignore_pid_inactive" )

/* Redefine for export. */
#undef EM
Expand Down
18 changes: 15 additions & 3 deletions kernel/sched/fair.c
Original file line number Diff line number Diff line change
Expand Up @@ -3113,7 +3113,7 @@ static void reset_ptenuma_scan(struct task_struct *p)
p->mm->numa_scan_offset = 0;
}

static bool vma_is_accessed(struct vm_area_struct *vma)
static bool vma_is_accessed(struct mm_struct *mm, struct vm_area_struct *vma)
{
unsigned long pids;
/*
Expand All @@ -3126,7 +3126,19 @@ static bool vma_is_accessed(struct vm_area_struct *vma)
return true;

pids = vma->numab_state->pids_active[0] | vma->numab_state->pids_active[1];
return test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids);
if (test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids))
return true;

/*
* Complete a scan that has already started regardless of PID access, or
* some VMAs may never be scanned in multi-threaded applications:
*/
if (mm->numa_scan_offset > vma->vm_start) {
trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_IGNORE_PID);
return true;
}

return false;
}

#define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay)
Expand Down Expand Up @@ -3270,7 +3282,7 @@ static void task_numa_work(struct callback_head *work)
}

/* Do not scan the VMA if task has not accessed */
if (!vma_is_accessed(vma)) {
if (!vma_is_accessed(mm, vma)) {
trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_PID_INACTIVE);
continue;
}
Expand Down

0 comments on commit b7a5b53

Please sign in to comment.