Skip to content

Commit

Permalink
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/l…
Browse files Browse the repository at this point in the history
…inux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Various NUMA scheduling updates: harmonize the load-balancer and
     NUMA placement logic to not work against each other. The intended
     result is better locality, better utilization and fewer migrations.

   - Introduce Thermal Pressure tracking and optimizations, to improve
     task placement on thermally overloaded systems.

   - Implement frequency invariant scheduler accounting on (some) x86
     CPUs. This is done by observing and sampling the 'recent' CPU
     frequency average at ~tick boundaries. The CPU provides this data
     via the APERF/MPERF MSRs. This hopefully makes our capacity
     estimates more precise and keeps tasks on the same CPU better even
     if it might seem overloaded at a lower momentary frequency. (As
     usual, turbo mode is a complication that we resolve by observing
     the maximum frequency and renormalizing to it.)

   - Add asymmetric CPU capacity wakeup scan to improve capacity
     utilization on asymmetric topologies. (big.LITTLE systems)

   - PSI fixes and optimizations.

   - RT scheduling capacity awareness fixes & improvements.

   - Optimize the CONFIG_RT_GROUP_SCHED constraints code.

   - Misc fixes, cleanups and optimizations - see the changelog for
     details"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  threads: Update PID limit comment according to futex UAPI change
  sched/fair: Fix condition of avg_load calculation
  sched/rt: cpupri_find: Trigger a full search as fallback
  kthread: Do not preempt current task if it is going to call schedule()
  sched/fair: Improve spreading of utilization
  sched: Avoid scale real weight down to zero
  psi: Move PF_MEMSTALL out of task->flags
  MAINTAINERS: Add maintenance information for psi
  psi: Optimize switching tasks inside shared cgroups
  psi: Fix cpu.pressure for cpu.max and competing cgroups
  sched/core: Distribute tasks within affinity masks
  sched/fair: Fix enqueue_task_fair warning
  thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code
  sched/rt: Remove unnecessary push for unfit tasks
  sched/rt: Allow pulling unfitting task
  sched/rt: Optimize cpupri_find() on non-heterogenous systems
  sched/rt: Re-instate old behavior in select_task_rq_rt()
  sched/rt: cpupri_find: Implement fallback mechanism for !fit case
  sched/fair: Fix reordering of enqueue/dequeue_task_fair()
  sched/fair: Fix runnable_avg for throttled cfs
  ...
  • Loading branch information
torvalds committed Mar 31, 2020
2 parents 9b82f05 + 313f16e commit 642e53e
Show file tree
Hide file tree
Showing 37 changed files with 1,552 additions and 513 deletions.
16 changes: 16 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4428,6 +4428,22 @@
incurs a small amount of overhead in the scheduler
but is useful for debugging and performance tuning.

sched_thermal_decay_shift=
[KNL, SMP] Set a decay shift for scheduler thermal
pressure signal. Thermal pressure signal follows the
default decay period of other scheduler pelt
signals(usually 32 ms but configurable). Setting
sched_thermal_decay_shift will left shift the decay
period for the thermal pressure signal by the shift
value.
i.e. with the default pelt decay period of 32 ms
sched_thermal_decay_shift thermal pressure decay pr
1 64 ms
2 128 ms
and so on.
Format: integer between 0 and 10
Default is 0.

skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate
xtime_lock contention on larger systems, and/or RCU lock
contention on all systems with CONFIG_MAXSMP set.
Expand Down
14 changes: 6 additions & 8 deletions Documentation/robust-futex-ABI.txt
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ setup that list.
address of the associated 'lock entry', plus or minus, of what will
be called the 'lock word', from that 'lock entry'. The 'lock word'
is always a 32 bit word, unlike the other words above. The 'lock
word' holds 3 flag bits in the upper 3 bits, and the thread id (TID)
of the thread holding the lock in the bottom 29 bits. See further
word' holds 2 flag bits in the upper 2 bits, and the thread id (TID)
of the thread holding the lock in the bottom 30 bits. See further
below for a description of the flag bits.

The third word, called 'list_op_pending', contains transient copy of
Expand Down Expand Up @@ -128,7 +128,7 @@ that thread's robust_futex linked lock list a given time.
A given futex lock structure in a user shared memory region may be held
at different times by any of the threads with access to that region. The
thread currently holding such a lock, if any, is marked with the threads
TID in the lower 29 bits of the 'lock word'.
TID in the lower 30 bits of the 'lock word'.

When adding or removing a lock from its list of held locks, in order for
the kernel to correctly handle lock cleanup regardless of when the task
Expand All @@ -141,7 +141,7 @@ On insertion:
1) set the 'list_op_pending' word to the address of the 'lock entry'
to be inserted,
2) acquire the futex lock,
3) add the lock entry, with its thread id (TID) in the bottom 29 bits
3) add the lock entry, with its thread id (TID) in the bottom 30 bits
of the 'lock word', to the linked list starting at 'head', and
4) clear the 'list_op_pending' word.

Expand All @@ -155,7 +155,7 @@ On removal:

On exit, the kernel will consider the address stored in
'list_op_pending' and the address of each 'lock word' found by walking
the list starting at 'head'. For each such address, if the bottom 29
the list starting at 'head'. For each such address, if the bottom 30
bits of the 'lock word' at offset 'offset' from that address equals the
exiting threads TID, then the kernel will do two things:

Expand All @@ -180,7 +180,5 @@ any point:
future kernel configuration changes) elements.

When the kernel sees a list entry whose 'lock word' doesn't have the
current threads TID in the lower 29 bits, it does nothing with that
current threads TID in the lower 30 bits, it does nothing with that
entry, and goes on to the next entry.

Bit 29 (0x20000000) of the 'lock word' is reserved for future use.
6 changes: 6 additions & 0 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -13552,6 +13552,12 @@ F: net/psample
F: include/net/psample.h
F: include/uapi/linux/psample.h

PRESSURE STALL INFORMATION (PSI)
M: Johannes Weiner <[email protected]>
S: Maintained
F: kernel/sched/psi.c
F: include/linux/psi*

PSTORE FILESYSTEM
M: Kees Cook <[email protected]>
M: Anton Vorontsov <[email protected]>
Expand Down
3 changes: 3 additions & 0 deletions arch/arm/include/asm/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
/* Enable topology flag updates */
#define arch_update_cpu_topology topology_update_cpu_topology

/* Replace task scheduler's default thermal pressure retrieve API */
#define arch_scale_thermal_pressure topology_get_thermal_pressure

#else

static inline void init_cpu_topology(void) { }
Expand Down
1 change: 1 addition & 0 deletions arch/arm64/configs/defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ CONFIG_ARCH_ZX=y
CONFIG_ARCH_ZYNQMP=y
CONFIG_ARM64_VA_BITS_48=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_SMT=y
CONFIG_NUMA=y
CONFIG_SECCOMP=y
CONFIG_KEXEC=y
Expand Down
3 changes: 3 additions & 0 deletions arch/arm64/include/asm/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ int pcibus_to_node(struct pci_bus *bus);
/* Enable topology flag updates */
#define arch_update_cpu_topology topology_update_cpu_topology

/* Replace task scheduler's default thermal pressure retrieve API */
#define arch_scale_thermal_pressure topology_get_thermal_pressure

#include <asm-generic/topology.h>

#endif /* _ASM_ARM_TOPOLOGY_H */
25 changes: 25 additions & 0 deletions arch/x86/include/asm/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -193,4 +193,29 @@ static inline void sched_clear_itmt_support(void)
}
#endif /* CONFIG_SCHED_MC_PRIO */

#ifdef CONFIG_SMP
#include <asm/cpufeature.h>

DECLARE_STATIC_KEY_FALSE(arch_scale_freq_key);

#define arch_scale_freq_invariant() static_branch_likely(&arch_scale_freq_key)

DECLARE_PER_CPU(unsigned long, arch_freq_scale);

static inline long arch_scale_freq_capacity(int cpu)
{
return per_cpu(arch_freq_scale, cpu);
}
#define arch_scale_freq_capacity arch_scale_freq_capacity

extern void arch_scale_freq_tick(void);
#define arch_scale_freq_tick arch_scale_freq_tick

extern void arch_set_max_freq_ratio(bool turbo_disabled);
#else
static inline void arch_set_max_freq_ratio(bool turbo_disabled)
{
}
#endif

#endif /* _ASM_X86_TOPOLOGY_H */
Loading

0 comments on commit 642e53e

Please sign in to comment.