Skip to content

Commit

Permalink
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/l…
Browse files Browse the repository at this point in the history
…inux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "The main updates in this cycle were:

   - Group balancing enhancements and cleanups (Brendan Jackman)

   - Move CPU isolation related functionality into its separate
     kernel/sched/isolation.c file, with related 'housekeeping_*()'
     namespace and nomenclature et al. (Frederic Weisbecker)

   - Improve the interactive/cpu-intense fairness calculation (Josef
     Bacik)

   - Improve the PELT code and related cleanups (Peter Zijlstra)

   - Improve the logic of pick_next_task_fair() (Uladzislau Rezki)

   - Improve the RT IPI based balancing logic (Steven Rostedt)

   - Various micro-optimizations:

   - better !CONFIG_SCHED_DEBUG optimizations (Patrick Bellasi)

   - better idle loop (Cheng Jian)

   - ... plus misc fixes, cleanups and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds
  sched/sysctl: Fix attributes of some extern declarations
  sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated
  sched/isolation: Add basic isolcpus flags
  sched/isolation: Move isolcpus= handling to the housekeeping code
  sched/isolation: Handle the nohz_full= parameter
  sched/isolation: Introduce housekeeping flags
  sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL
  sched/isolation: Rename is_housekeeping_cpu() to housekeeping_cpu()
  sched/isolation: Use its own static key
  sched/isolation: Make the housekeeping cpumask private
  sched/isolation: Provide a dynamic off-case to housekeeping_any_cpu()
  sched/isolation, watchdog: Use housekeeping_cpumask() instead of ad-hoc version
  sched/isolation: Move housekeeping related code to its own file
  sched/idle: Micro-optimize the idle loop
  sched/isolcpus: Fix "isolcpus=" boot parameter handling when !CONFIG_CPUMASK_OFFSTACK
  x86/tsc: Append the 'tsc=' description for the 'tsc=unstable' boot parameter
  sched/rt: Simplify the IPI based RT balancing logic
  block/ioprio: Use a helper to check for RT prio
  sched/rt: Add a helper to test for a RT task
  ...
  • Loading branch information
torvalds committed Nov 13, 2017
2 parents f2be8bd + 765cc3a commit 3e20146
Show file tree
Hide file tree
Showing 31 changed files with 1,270 additions and 774 deletions.
40 changes: 28 additions & 12 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1730,20 +1730,33 @@
isapnp= [ISAPNP]
Format: <RDP>,<reset>,<pci_scan>,<verbosity>

isolcpus= [KNL,SMP] Isolate CPUs from the general scheduler.
The argument is a cpu list, as described above.
isolcpus= [KNL,SMP] Isolate a given set of CPUs from disturbance.
[Deprecated - use cpusets instead]
Format: [flag-list,]<cpu-list>

Specify one or more CPUs to isolate from disturbances
specified in the flag list (default: domain):

nohz
Disable the tick when a single task runs.
domain
Isolate from the general SMP balancing and scheduling
algorithms. Note that performing domain isolation this way
is irreversible: it's not possible to bring back a CPU to
the domains once isolated through isolcpus. It's strongly
advised to use cpusets instead to disable scheduler load
balancing through the "cpuset.sched_load_balance" file.
It offers a much more flexible interface where CPUs can
move in and out of an isolated set anytime.

You can move a process onto or off an "isolated" CPU via
the CPU affinity syscalls or cpuset.
<cpu number> begins at 0 and the maximum value is
"number of CPUs in system - 1".

The format of <cpu-list> is described above.

This option can be used to specify one or more CPUs
to isolate from the general SMP balancing and scheduling
algorithms. You can move a process onto or off an
"isolated" CPU via the CPU affinity syscalls or cpuset.
<cpu number> begins at 0 and the maximum value is
"number of CPUs in system - 1".

This option is the preferred way to isolate CPUs. The
alternative -- manually setting the CPU mask of all
tasks in the system -- can cause problems and
suboptimal load balancer performance.

iucv= [HW,NET]

Expand Down Expand Up @@ -4209,6 +4222,9 @@
Used to run time disable IRQ_TIME_ACCOUNTING on any
platforms where RDTSC is slow and this accounting
can add overhead.
[x86] unstable: mark the TSC clocksource as unstable, this
marks the TSC unconditionally unstable at bootup and
avoids any further wobbles once the TSC watchdog notices.

turbografx.map[2|3]= [HW,JOY]
TurboGraFX parallel port interface
Expand Down
11 changes: 10 additions & 1 deletion drivers/base/cpu.c
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include <linux/cpufeature.h>
#include <linux/tick.h>
#include <linux/pm_qos.h>
#include <linux/sched/isolation.h>

#include "base.h"

Expand Down Expand Up @@ -271,8 +272,16 @@ static ssize_t print_cpus_isolated(struct device *dev,
struct device_attribute *attr, char *buf)
{
int n = 0, len = PAGE_SIZE-2;
cpumask_var_t isolated;

n = scnprintf(buf, len, "%*pbl\n", cpumask_pr_args(cpu_isolated_map));
if (!alloc_cpumask_var(&isolated, GFP_KERNEL))
return -ENOMEM;

cpumask_andnot(isolated, cpu_possible_mask,
housekeeping_cpumask(HK_FLAG_DOMAIN));
n = scnprintf(buf, len, "%*pbl\n", cpumask_pr_args(isolated));

free_cpumask_var(isolated);

return n;
}
Expand Down
6 changes: 3 additions & 3 deletions drivers/net/ethernet/tile/tilegx.c
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
#include <linux/tcp.h>
#include <linux/net_tstamp.h>
#include <linux/ptp_clock_kernel.h>
#include <linux/tick.h>
#include <linux/sched/isolation.h>

#include <asm/checksum.h>
#include <asm/homecache.h>
Expand Down Expand Up @@ -2270,8 +2270,8 @@ static int __init tile_net_init_module(void)
tile_net_dev_init(name, mac);

if (!network_cpus_init())
cpumask_and(&network_cpus_map, housekeeping_cpumask(),
cpu_online_mask);
cpumask_and(&network_cpus_map,
housekeeping_cpumask(HK_FLAG_MISC), cpu_online_mask);

return 0;
}
Expand Down
2 changes: 1 addition & 1 deletion fs/proc/array.c
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ static const char * const task_state_array[] = {
static inline const char *get_task_state(struct task_struct *tsk)
{
BUILD_BUG_ON(1 + ilog2(TASK_REPORT_MAX) != ARRAY_SIZE(task_state_array));
return task_state_array[__get_task_state(tsk)];
return task_state_array[task_state_index(tsk)];
}

static inline int get_task_umask(struct task_struct *tsk)
Expand Down
16 changes: 16 additions & 0 deletions include/linux/cpumask.h
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,11 @@ static inline unsigned int cpumask_first(const struct cpumask *srcp)
return 0;
}

static inline unsigned int cpumask_last(const struct cpumask *srcp)
{
return 0;
}

/* Valid inputs for n are -1 and 0. */
static inline unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
Expand Down Expand Up @@ -179,6 +184,17 @@ static inline unsigned int cpumask_first(const struct cpumask *srcp)
return find_first_bit(cpumask_bits(srcp), nr_cpumask_bits);
}

/**
* cpumask_last - get the last CPU in a cpumask
* @srcp: - the cpumask pointer
*
* Returns >= nr_cpumask_bits if no CPUs set.
*/
static inline unsigned int cpumask_last(const struct cpumask *srcp)
{
return find_last_bit(cpumask_bits(srcp), nr_cpumask_bits);
}

unsigned int cpumask_next(int n, const struct cpumask *srcp);

/**
Expand Down
3 changes: 2 additions & 1 deletion include/linux/ioprio.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#define IOPRIO_H

#include <linux/sched.h>
#include <linux/sched/rt.h>
#include <linux/iocontext.h>

/*
Expand Down Expand Up @@ -63,7 +64,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
{
if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE;
else if (task->policy == SCHED_FIFO || task->policy == SCHED_RR)
else if (task_is_realtime(task))
return IOPRIO_CLASS_RT;
else
return IOPRIO_CLASS_BE;
Expand Down
19 changes: 10 additions & 9 deletions include/linux/sched.h
Original file line number Diff line number Diff line change
Expand Up @@ -166,8 +166,6 @@ struct task_group;
/* Task command name length: */
#define TASK_COMM_LEN 16

extern cpumask_var_t cpu_isolated_map;

extern void scheduler_tick(void);

#define MAX_SCHEDULE_TIMEOUT LONG_MAX
Expand Down Expand Up @@ -332,9 +330,11 @@ struct load_weight {
struct sched_avg {
u64 last_update_time;
u64 load_sum;
u64 runnable_load_sum;
u32 util_sum;
u32 period_contrib;
unsigned long load_avg;
unsigned long runnable_load_avg;
unsigned long util_avg;
};

Expand Down Expand Up @@ -377,6 +377,7 @@ struct sched_statistics {
struct sched_entity {
/* For load-balancing: */
struct load_weight load;
unsigned long runnable_weight;
struct rb_node run_node;
struct list_head group_node;
unsigned int on_rq;
Expand Down Expand Up @@ -472,10 +473,10 @@ struct sched_dl_entity {
* conditions between the inactive timer handler and the wakeup
* code.
*/
int dl_throttled;
int dl_boosted;
int dl_yielded;
int dl_non_contending;
int dl_throttled : 1;
int dl_boosted : 1;
int dl_yielded : 1;
int dl_non_contending : 1;

/*
* Bandwidth enforcement timer. Each -deadline task has its
Expand Down Expand Up @@ -1246,7 +1247,7 @@ static inline pid_t task_pgrp_nr(struct task_struct *tsk)
#define TASK_REPORT_IDLE (TASK_REPORT + 1)
#define TASK_REPORT_MAX (TASK_REPORT_IDLE << 1)

static inline unsigned int __get_task_state(struct task_struct *tsk)
static inline unsigned int task_state_index(struct task_struct *tsk)
{
unsigned int tsk_state = READ_ONCE(tsk->state);
unsigned int state = (tsk_state | tsk->exit_state) & TASK_REPORT;
Expand All @@ -1259,7 +1260,7 @@ static inline unsigned int __get_task_state(struct task_struct *tsk)
return fls(state);
}

static inline char __task_state_to_char(unsigned int state)
static inline char task_index_to_char(unsigned int state)
{
static const char state_char[] = "RSDTtXZPI";

Expand All @@ -1270,7 +1271,7 @@ static inline char __task_state_to_char(unsigned int state)

static inline char task_state_to_char(struct task_struct *tsk)
{
return __task_state_to_char(__get_task_state(tsk));
return task_index_to_char(task_state_index(tsk));
}

/**
Expand Down
51 changes: 51 additions & 0 deletions include/linux/sched/isolation.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#ifndef _LINUX_SCHED_ISOLATION_H
#define _LINUX_SCHED_ISOLATION_H

#include <linux/cpumask.h>
#include <linux/init.h>
#include <linux/tick.h>

enum hk_flags {
HK_FLAG_TIMER = 1,
HK_FLAG_RCU = (1 << 1),
HK_FLAG_MISC = (1 << 2),
HK_FLAG_SCHED = (1 << 3),
HK_FLAG_TICK = (1 << 4),
HK_FLAG_DOMAIN = (1 << 5),
};

#ifdef CONFIG_CPU_ISOLATION
DECLARE_STATIC_KEY_FALSE(housekeeping_overriden);
extern int housekeeping_any_cpu(enum hk_flags flags);
extern const struct cpumask *housekeeping_cpumask(enum hk_flags flags);
extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags);
extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags);
extern void __init housekeeping_init(void);

#else

static inline int housekeeping_any_cpu(enum hk_flags flags)
{
return smp_processor_id();
}

static inline const struct cpumask *housekeeping_cpumask(enum hk_flags flags)
{
return cpu_possible_mask;
}

static inline void housekeeping_affine(struct task_struct *t,
enum hk_flags flags) { }
static inline void housekeeping_init(void) { }
#endif /* CONFIG_CPU_ISOLATION */

static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
{
#ifdef CONFIG_CPU_ISOLATION
if (static_branch_unlikely(&housekeeping_overriden))
return housekeeping_test_cpu(cpu, flags);
#endif
return true;
}

#endif /* _LINUX_SCHED_ISOLATION_H */
11 changes: 11 additions & 0 deletions include/linux/sched/rt.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,17 @@ static inline int rt_task(struct task_struct *p)
return rt_prio(p->prio);
}

static inline bool task_is_realtime(struct task_struct *tsk)
{
int policy = tsk->policy;

if (policy == SCHED_FIFO || policy == SCHED_RR)
return true;
if (policy == SCHED_DEADLINE)
return true;
return false;
}

#ifdef CONFIG_RT_MUTEXES
/*
* Must hold either p->pi_lock or task_rq(p)->lock.
Expand Down
6 changes: 3 additions & 3 deletions include/linux/sched/sysctl.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ extern unsigned int sysctl_numa_balancing_scan_period_max;
extern unsigned int sysctl_numa_balancing_scan_size;

#ifdef CONFIG_SCHED_DEBUG
extern unsigned int sysctl_sched_migration_cost;
extern unsigned int sysctl_sched_nr_migrate;
extern unsigned int sysctl_sched_time_avg;
extern __read_mostly unsigned int sysctl_sched_migration_cost;
extern __read_mostly unsigned int sysctl_sched_nr_migrate;
extern __read_mostly unsigned int sysctl_sched_time_avg;

int sched_proc_update_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length,
Expand Down
39 changes: 2 additions & 37 deletions include/linux/tick.h
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,6 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
#ifdef CONFIG_NO_HZ_FULL
extern bool tick_nohz_full_running;
extern cpumask_var_t tick_nohz_full_mask;
extern cpumask_var_t housekeeping_mask;

static inline bool tick_nohz_full_enabled(void)
{
Expand All @@ -162,11 +161,6 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
cpumask_or(mask, mask, tick_nohz_full_mask);
}

static inline int housekeeping_any_cpu(void)
{
return cpumask_any_and(housekeeping_mask, cpu_online_mask);
}

extern void tick_nohz_dep_set(enum tick_dep_bits bit);
extern void tick_nohz_dep_clear(enum tick_dep_bits bit);
extern void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit);
Expand Down Expand Up @@ -235,11 +229,8 @@ static inline void tick_dep_clear_signal(struct signal_struct *signal,

extern void tick_nohz_full_kick_cpu(int cpu);
extern void __tick_nohz_task_switch(void);
extern void __init tick_nohz_full_setup(cpumask_var_t cpumask);
#else
static inline int housekeeping_any_cpu(void)
{
return smp_processor_id();
}
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
Expand All @@ -259,35 +250,9 @@ static inline void tick_dep_clear_signal(struct signal_struct *signal,

static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void __tick_nohz_task_switch(void) { }
static inline void tick_nohz_full_setup(cpumask_var_t cpumask) { }
#endif

static inline const struct cpumask *housekeeping_cpumask(void)
{
#ifdef CONFIG_NO_HZ_FULL
if (tick_nohz_full_enabled())
return housekeeping_mask;
#endif
return cpu_possible_mask;
}

static inline bool is_housekeeping_cpu(int cpu)
{
#ifdef CONFIG_NO_HZ_FULL
if (tick_nohz_full_enabled())
return cpumask_test_cpu(cpu, housekeeping_mask);
#endif
return true;
}

static inline void housekeeping_affine(struct task_struct *t)
{
#ifdef CONFIG_NO_HZ_FULL
if (tick_nohz_full_enabled())
set_cpus_allowed_ptr(t, housekeeping_mask);

#endif
}

static inline void tick_nohz_task_switch(void)
{
if (tick_nohz_full_enabled())
Expand Down
2 changes: 1 addition & 1 deletion include/trace/events/sched.h
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ static inline long __trace_sched_switch_state(bool preempt, struct task_struct *
if (preempt)
return TASK_STATE_MAX;

return __get_task_state(p);
return task_state_index(p);
}
#endif /* CREATE_TRACE_POINTS */

Expand Down
Loading

0 comments on commit 3e20146

Please sign in to comment.