Skip to content

Commit

Permalink
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/l…
Browse files Browse the repository at this point in the history
…inux/kernel/git/tip/tip

Pull scheduler changes from Ingo Molnar:
 "The main changes in this cycle are:

   - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
     van Riel, Peter Zijlstra et al.  Yay!

   - optimize preemption counter handling: merge the NEED_RESCHED flag
     into the preempt_count variable, by Peter Zijlstra.

   - wait.h fixes and code reorganization from Peter Zijlstra

   - cfs_bandwidth fixes from Ben Segall

   - SMP load-balancer cleanups from Peter Zijstra

   - idle balancer improvements from Jason Low

   - other fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
  ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
  stop_machine: Fix race between stop_two_cpus() and stop_cpus()
  sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
  sched: Fix asymmetric scheduling for POWER7
  sched: Move completion code from core.c to completion.c
  sched: Move wait code from core.c to wait.c
  sched: Move wait.c into kernel/sched/
  sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
  sched: Avoid throttle_cfs_rq() racing with period_timer stopping
  sched: Guarantee new group-entities always have weight
  sched: Fix hrtimer_cancel()/rq->lock deadlock
  sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
  sched: Fix race on toggling cfs_bandwidth_used
  sched: Remove extra put_online_cpus() inside sched_setaffinity()
  sched/rt: Fix task_tick_rt() comment
  sched/wait: Fix build breakage
  sched/wait: Introduce prepare_to_wait_event()
  sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
  sched: Remove get_online_cpus() usage
  sched: Fix race in migrate_swap_stop()
  ...
  • Loading branch information
torvalds committed Nov 12, 2013
2 parents ad5d698 + e5137b5 commit 39cf275
Show file tree
Hide file tree
Showing 117 changed files with 3,569 additions and 1,594 deletions.
76 changes: 76 additions & 0 deletions Documentation/sysctl/kernel.txt
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,82 @@ utilize.

==============================================================

numa_balancing

Enables/disables automatic page fault based NUMA memory
balancing. Memory is moved automatically to nodes
that access it often.

Enables/disables automatic NUMA memory balancing. On NUMA machines, there
is a performance penalty if remote memory is accessed by a CPU. When this
feature is enabled the kernel samples what task thread is accessing memory
by periodically unmapping pages and later trapping a page fault. At the
time of the page fault, it is determined if the data being accessed should
be migrated to a local memory node.

The unmapping of pages and trapping faults incur additional overhead that
ideally is offset by improved memory locality but there is no universal
guarantee. If the target workload is already bound to NUMA nodes then this
feature should be disabled. Otherwise, if the system overhead from the
feature is too high then the rate the kernel samples for NUMA hinting
faults may be controlled by the numa_balancing_scan_period_min_ms,
numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
numa_balancing_scan_size_mb, numa_balancing_settle_count sysctls and
numa_balancing_migrate_deferred.

==============================================================

numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms,
numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb

Automatic NUMA balancing scans tasks address space and unmaps pages to
detect if pages are properly placed or if the data should be migrated to a
memory node local to where the task is running. Every "scan delay" the task
scans the next "scan size" number of pages in its address space. When the
end of the address space is reached the scanner restarts from the beginning.

In combination, the "scan delay" and "scan size" determine the scan rate.
When "scan delay" decreases, the scan rate increases. The scan delay and
hence the scan rate of every task is adaptive and depends on historical
behaviour. If pages are properly placed then the scan delay increases,
otherwise the scan delay decreases. The "scan size" is not adaptive but
the higher the "scan size", the higher the scan rate.

Higher scan rates incur higher system overhead as page faults must be
trapped and potentially data must be migrated. However, the higher the scan
rate, the more quickly a tasks memory is migrated to a local node if the
workload pattern changes and minimises performance impact due to remote
memory accesses. These sysctls control the thresholds for scan delays and
the number of pages scanned.

numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
scan a tasks virtual memory. It effectively controls the maximum scanning
rate for each task.

numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
when it initially forks.

numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
scan a tasks virtual memory. It effectively controls the minimum scanning
rate for each task.

numa_balancing_scan_size_mb is how many megabytes worth of pages are
scanned for a given scan.

numa_balancing_settle_count is how many scan periods must complete before
the schedule balancer stops pushing the task towards a preferred node. This
gives the scheduler a chance to place the task on an alternative node if the
preferred node is overloaded.

numa_balancing_migrate_deferred is how many page migrations get skipped
unconditionally, after a page migration is skipped because a page is shared
with other tasks. This reduces page migration overhead, and determines
how much stronger the "move task near its memory" policy scheduler becomes,
versus the "move memory near its task" memory management policy, for workloads
with shared memory.

==============================================================

osrelease, ostype & version:

# cat osrelease
Expand Down
6 changes: 5 additions & 1 deletion Documentation/trace/ftrace.txt
Original file line number Diff line number Diff line change
Expand Up @@ -655,7 +655,11 @@ explains which is which.
read the irq flags variable, an 'X' will always
be printed here.

need-resched: 'N' task need_resched is set, '.' otherwise.
need-resched:
'N' both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set,
'n' only TIF_NEED_RESCHED is set,
'p' only PREEMPT_NEED_RESCHED is set,
'.' otherwise.

hardirq/softirq:
'H' - hard irq occurred inside a softirq.
Expand Down
2 changes: 2 additions & 0 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -7326,6 +7326,8 @@ S: Maintained
F: kernel/sched/
F: include/linux/sched.h
F: include/uapi/linux/sched.h
F: kernel/wait.c
F: include/linux/wait.h

SCORE ARCHITECTURE
M: Chen Liqin <[email protected]>
Expand Down
1 change: 1 addition & 0 deletions arch/alpha/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ generic-y += clkdev.h

generic-y += exec.h
generic-y += trace_clock.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/arc/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,4 @@ generic-y += ucontext.h
generic-y += user.h
generic-y += vga.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/arm/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,4 @@ generic-y += termios.h
generic-y += timex.h
generic-y += trace_clock.h
generic-y += unaligned.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/arm64/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,4 @@ generic-y += unaligned.h
generic-y += user.h
generic-y += vga.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/avr32/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ generic-y += div64.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += futex.h
generic-y += preempt.h
generic-y += irq_regs.h
generic-y += param.h
generic-y += local.h
Expand Down
1 change: 1 addition & 0 deletions arch/blackfin/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,4 @@ generic-y += ucontext.h
generic-y += unaligned.h
generic-y += user.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/c6x/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,4 @@ generic-y += ucontext.h
generic-y += user.h
generic-y += vga.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/cris/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ generic-y += module.h
generic-y += trace_clock.h
generic-y += vga.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/frv/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
generic-y += clkdev.h
generic-y += exec.h
generic-y += trace_clock.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/h8300/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ generic-y += mmu.h
generic-y += module.h
generic-y += trace_clock.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/hexagon/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,4 @@ generic-y += types.h
generic-y += ucontext.h
generic-y += unaligned.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/ia64/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ generic-y += clkdev.h
generic-y += exec.h
generic-y += kvm_para.h
generic-y += trace_clock.h
generic-y += preempt.h
generic-y += vtime.h
1 change: 1 addition & 0 deletions arch/m32r/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ generic-y += clkdev.h
generic-y += exec.h
generic-y += module.h
generic-y += trace_clock.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/m68k/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ generic-y += trace_clock.h
generic-y += types.h
generic-y += word-at-a-time.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/metag/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,4 @@ generic-y += unaligned.h
generic-y += user.h
generic-y += vga.h
generic-y += xor.h
generic-y += preempt.h
2 changes: 2 additions & 0 deletions arch/metag/include/asm/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@
.last_balance = jiffies, \
.balance_interval = 1, \
.nr_balance_failed = 0, \
.max_newidle_lb_cost = 0, \
.next_decay_max_lb_cost = jiffies, \
}

#define cpu_to_node(cpu) ((void)(cpu), 0)
Expand Down
1 change: 1 addition & 0 deletions arch/microblaze/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ generic-y += clkdev.h
generic-y += exec.h
generic-y += trace_clock.h
generic-y += syscalls.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/mips/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,6 @@ generic-y += sections.h
generic-y += segment.h
generic-y += serial.h
generic-y += trace_clock.h
generic-y += preempt.h
generic-y += ucontext.h
generic-y += xor.h
19 changes: 9 additions & 10 deletions arch/mips/kernel/rtlx.c
Original file line number Diff line number Diff line change
Expand Up @@ -172,8 +172,9 @@ int rtlx_open(int index, int can_sleep)
if (rtlx == NULL) {
if( (p = vpe_get_shared(tclimit)) == NULL) {
if (can_sleep) {
__wait_event_interruptible(channel_wqs[index].lx_queue,
(p = vpe_get_shared(tclimit)), ret);
ret = __wait_event_interruptible(
channel_wqs[index].lx_queue,
(p = vpe_get_shared(tclimit)));
if (ret)
goto out_fail;
} else {
Expand Down Expand Up @@ -263,11 +264,10 @@ unsigned int rtlx_read_poll(int index, int can_sleep)
/* data available to read? */
if (chan->lx_read == chan->lx_write) {
if (can_sleep) {
int ret = 0;

__wait_event_interruptible(channel_wqs[index].lx_queue,
int ret = __wait_event_interruptible(
channel_wqs[index].lx_queue,
(chan->lx_read != chan->lx_write) ||
sp_stopping, ret);
sp_stopping);
if (ret)
return ret;

Expand Down Expand Up @@ -440,14 +440,13 @@ static ssize_t file_write(struct file *file, const char __user * buffer,

/* any space left... */
if (!rtlx_write_poll(minor)) {
int ret = 0;
int ret;

if (file->f_flags & O_NONBLOCK)
return -EAGAIN;

__wait_event_interruptible(channel_wqs[minor].rt_queue,
rtlx_write_poll(minor),
ret);
ret = __wait_event_interruptible(channel_wqs[minor].rt_queue,
rtlx_write_poll(minor));
if (ret)
return ret;
}
Expand Down
5 changes: 2 additions & 3 deletions arch/mips/mm/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ void *kmap_coherent(struct page *page, unsigned long addr)

BUG_ON(Page_dcache_dirty(page));

inc_preempt_count();
pagefault_disable();
idx = (addr >> PAGE_SHIFT) & (FIX_N_COLOURS - 1);
#ifdef CONFIG_MIPS_MT_SMTC
idx += FIX_N_COLOURS * smp_processor_id() +
Expand Down Expand Up @@ -193,8 +193,7 @@ void kunmap_coherent(void)
write_c0_entryhi(old_ctx);
EXIT_CRITICAL(flags);
#endif
dec_preempt_count();
preempt_check_resched();
pagefault_enable();
}

void copy_user_highpage(struct page *to, struct page *from,
Expand Down
1 change: 1 addition & 0 deletions arch/mn10300/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
generic-y += clkdev.h
generic-y += exec.h
generic-y += trace_clock.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/openrisc/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,4 @@ generic-y += ucontext.h
generic-y += user.h
generic-y += word-at-a-time.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/parisc/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ generic-y += word-at-a-time.h auxvec.h user.h cputime.h emergency-restart.h \
div64.h irq_regs.h kdebug.h kvm_para.h local64.h local.h param.h \
poll.h xor.h clkdev.h exec.h
generic-y += trace_clock.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/powerpc/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
generic-y += clkdev.h
generic-y += rwsem.h
generic-y += trace_clock.h
generic-y += preempt.h
generic-y += vtime.h
1 change: 1 addition & 0 deletions arch/s390/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@

generic-y += clkdev.h
generic-y += trace_clock.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/score/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ header-y +=
generic-y += clkdev.h
generic-y += trace_clock.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/sh/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,4 @@ generic-y += termios.h
generic-y += trace_clock.h
generic-y += ucontext.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/sparc/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ generic-y += serial.h
generic-y += trace_clock.h
generic-y += types.h
generic-y += word-at-a-time.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/tile/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ generic-y += termios.h
generic-y += trace_clock.h
generic-y += types.h
generic-y += xor.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/um/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ generic-y += hw_irq.h irq_regs.h kdebug.h percpu.h sections.h topology.h xor.h
generic-y += ftrace.h pci.h io.h param.h delay.h mutex.h current.h exec.h
generic-y += switch_to.h clkdev.h
generic-y += trace_clock.h
generic-y += preempt.h
1 change: 1 addition & 0 deletions arch/unicore32/include/asm/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,4 @@ generic-y += unaligned.h
generic-y += user.h
generic-y += vga.h
generic-y += xor.h
generic-y += preempt.h
29 changes: 5 additions & 24 deletions arch/x86/include/asm/atomic.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include <asm/processor.h>
#include <asm/alternative.h>
#include <asm/cmpxchg.h>
#include <asm/rmwcc.h>

/*
* Atomic operations that C can't guarantee us. Useful for
Expand Down Expand Up @@ -76,12 +77,7 @@ static inline void atomic_sub(int i, atomic_t *v)
*/
static inline int atomic_sub_and_test(int i, atomic_t *v)
{
unsigned char c;

asm volatile(LOCK_PREFIX "subl %2,%0; sete %1"
: "+m" (v->counter), "=qm" (c)
: "ir" (i) : "memory");
return c;
GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, i, "%0", "e");
}

/**
Expand Down Expand Up @@ -118,12 +114,7 @@ static inline void atomic_dec(atomic_t *v)
*/
static inline int atomic_dec_and_test(atomic_t *v)
{
unsigned char c;

asm volatile(LOCK_PREFIX "decl %0; sete %1"
: "+m" (v->counter), "=qm" (c)
: : "memory");
return c != 0;
GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", "e");
}

/**
Expand All @@ -136,12 +127,7 @@ static inline int atomic_dec_and_test(atomic_t *v)
*/
static inline int atomic_inc_and_test(atomic_t *v)
{
unsigned char c;

asm volatile(LOCK_PREFIX "incl %0; sete %1"
: "+m" (v->counter), "=qm" (c)
: : "memory");
return c != 0;
GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", "e");
}

/**
Expand All @@ -155,12 +141,7 @@ static inline int atomic_inc_and_test(atomic_t *v)
*/
static inline int atomic_add_negative(int i, atomic_t *v)
{
unsigned char c;

asm volatile(LOCK_PREFIX "addl %2,%0; sets %1"
: "+m" (v->counter), "=qm" (c)
: "ir" (i) : "memory");
return c;
GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, i, "%0", "s");
}

/**
Expand Down
Loading

0 comments on commit 39cf275

Please sign in to comment.