Skip to content

Commit

Permalink
Merge branch 'perf-probes-for-linus-2' of git://git.kernel.org/pub/sc…
Browse files Browse the repository at this point in the history
…m/linux/kernel/git/tip/linux-2.6-tip

* 'perf-probes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Issue at least one memory barrier in stop_machine_text_poke()
  perf probe: Correct probe syntax on command line help
  perf probe: Add lazy line matching support
  perf probe: Show more lines after last line
  perf probe: Check function address range strictly in line finder
  perf probe: Use libdw callback routines
  perf probe: Use elfutils-libdw for analyzing debuginfo
  perf probe: Rename probe finder functions
  perf probe: Fix bugs in line range finder
  perf probe: Update perf probe document
  perf probe: Do not show --line option without dwarf support
  kprobes: Add documents of jump optimization
  kprobes/x86: Support kprobes jump optimization on x86
  x86: Add text_poke_smp for SMP cross modifying code
  kprobes/x86: Cleanup save/restore registers
  kprobes/x86: Boost probes when reentering
  kprobes: Jump optimization sysctl interface
  kprobes: Introduce kprobes jump optimization
  kprobes: Introduce generic insn_slot framework
  kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE
  • Loading branch information
torvalds committed Mar 5, 2010
2 parents 586fac1 + e5a1101 commit 660f6a3
Show file tree
Hide file tree
Showing 18 changed files with 2,063 additions and 835 deletions.
207 changes: 195 additions & 12 deletions Documentation/kprobes.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
Title : Kernel Probes (Kprobes)
Authors : Jim Keniston <[email protected]>
: Prasanna S Panchamukhi <[email protected]>
: Prasanna S Panchamukhi <[email protected]>
: Masami Hiramatsu <[email protected]>

CONTENTS

Expand All @@ -15,6 +16,7 @@ CONTENTS
9. Jprobes Example
10. Kretprobes Example
Appendix A: The kprobes debugfs interface
Appendix B: The kprobes sysctl interface

1. Concepts: Kprobes, Jprobes, Return Probes

Expand Down Expand Up @@ -42,13 +44,13 @@ registration/unregistration of a group of *probes. These functions
can speed up unregistration process when you have to unregister
a lot of probes at once.

The next three subsections explain how the different types of
probes work. They explain certain things that you'll need to
know in order to make the best use of Kprobes -- e.g., the
difference between a pre_handler and a post_handler, and how
to use the maxactive and nmissed fields of a kretprobe. But
if you're in a hurry to start using Kprobes, you can skip ahead
to section 2.
The next four subsections explain how the different types of
probes work and how jump optimization works. They explain certain
things that you'll need to know in order to make the best use of
Kprobes -- e.g., the difference between a pre_handler and
a post_handler, and how to use the maxactive and nmissed fields of
a kretprobe. But if you're in a hurry to start using Kprobes, you
can skip ahead to section 2.

1.1 How Does a Kprobe Work?

Expand Down Expand Up @@ -161,13 +163,125 @@ In case probed function is entered but there is no kretprobe_instance
object available, then in addition to incrementing the nmissed count,
the user entry_handler invocation is also skipped.

1.4 How Does Jump Optimization Work?

If you configured your kernel with CONFIG_OPTPROBES=y (currently
this option is supported on x86/x86-64, non-preemptive kernel) and
the "debug.kprobes_optimization" kernel parameter is set to 1 (see
sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump
instruction instead of a breakpoint instruction at each probepoint.

1.4.1 Init a Kprobe

When a probe is registered, before attempting this optimization,
Kprobes inserts an ordinary, breakpoint-based kprobe at the specified
address. So, even if it's not possible to optimize this particular
probepoint, there'll be a probe there.

1.4.2 Safety Check

Before optimizing a probe, Kprobes performs the following safety checks:

- Kprobes verifies that the region that will be replaced by the jump
instruction (the "optimized region") lies entirely within one function.
(A jump instruction is multiple bytes, and so may overlay multiple
instructions.)

- Kprobes analyzes the entire function and verifies that there is no
jump into the optimized region. Specifically:
- the function contains no indirect jump;
- the function contains no instruction that causes an exception (since
the fixup code triggered by the exception could jump back into the
optimized region -- Kprobes checks the exception tables to verify this);
and
- there is no near jump to the optimized region (other than to the first
byte).

- For each instruction in the optimized region, Kprobes verifies that
the instruction can be executed out of line.

1.4.3 Preparing Detour Buffer

Next, Kprobes prepares a "detour" buffer, which contains the following
instruction sequence:
- code to push the CPU's registers (emulating a breakpoint trap)
- a call to the trampoline code which calls user's probe handlers.
- code to restore registers
- the instructions from the optimized region
- a jump back to the original execution path.

1.4.4 Pre-optimization

After preparing the detour buffer, Kprobes verifies that none of the
following situations exist:
- The probe has either a break_handler (i.e., it's a jprobe) or a
post_handler.
- Other instructions in the optimized region are probed.
- The probe is disabled.
In any of the above cases, Kprobes won't start optimizing the probe.
Since these are temporary situations, Kprobes tries to start
optimizing it again if the situation is changed.

If the kprobe can be optimized, Kprobes enqueues the kprobe to an
optimizing list, and kicks the kprobe-optimizer workqueue to optimize
it. If the to-be-optimized probepoint is hit before being optimized,
Kprobes returns control to the original instruction path by setting
the CPU's instruction pointer to the copied code in the detour buffer
-- thus at least avoiding the single-step.

1.4.5 Optimization

The Kprobe-optimizer doesn't insert the jump instruction immediately;
rather, it calls synchronize_sched() for safety first, because it's
possible for a CPU to be interrupted in the middle of executing the
optimized region(*). As you know, synchronize_sched() can ensure
that all interruptions that were active when synchronize_sched()
was called are done, but only if CONFIG_PREEMPT=n. So, this version
of kprobe optimization supports only kernels with CONFIG_PREEMPT=n.(**)

After that, the Kprobe-optimizer calls stop_machine() to replace
the optimized region with a jump instruction to the detour buffer,
using text_poke_smp().

1.4.6 Unoptimization

When an optimized kprobe is unregistered, disabled, or blocked by
another kprobe, it will be unoptimized. If this happens before
the optimization is complete, the kprobe is just dequeued from the
optimized list. If the optimization has been done, the jump is
replaced with the original code (except for an int3 breakpoint in
the first byte) by using text_poke_smp().

(*)Please imagine that the 2nd instruction is interrupted and then
the optimizer replaces the 2nd instruction with the jump *address*
while the interrupt handler is running. When the interrupt
returns to original address, there is no valid instruction,
and it causes an unexpected result.

(**)This optimization-safety checking may be replaced with the
stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y
kernel.

NOTE for geeks:
The jump optimization changes the kprobe's pre_handler behavior.
Without optimization, the pre_handler can change the kernel's execution
path by changing regs->ip and returning 1. However, when the probe
is optimized, that modification is ignored. Thus, if you want to
tweak the kernel's execution path, you need to suppress optimization,
using one of the following techniques:
- Specify an empty function for the kprobe's post_handler or break_handler.
or
- Config CONFIG_OPTPROBES=n.
or
- Execute 'sysctl -w debug.kprobes_optimization=n'

2. Architectures Supported

Kprobes, jprobes, and return probes are implemented on the following
architectures:

- i386
- x86_64 (AMD-64, EM64T)
- i386 (Supports jump optimization)
- x86_64 (AMD-64, EM64T) (Supports jump optimization)
- ppc64
- ia64 (Does not support probes on instruction slot1.)
- sparc64 (Return probes not yet implemented.)
Expand All @@ -193,6 +307,10 @@ it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO),
so you can use "objdump -d -l vmlinux" to see the source-to-object
code mapping.

If you want to reduce probing overhead, set "Kprobes jump optimization
support" (CONFIG_OPTPROBES) to "y". You can find this option under the
"Kprobes" line.

4. API Reference

The Kprobes API includes a "register" function and an "unregister"
Expand Down Expand Up @@ -389,7 +507,10 @@ the probe which has been registered.

Kprobes allows multiple probes at the same address. Currently,
however, there cannot be multiple jprobes on the same function at
the same time.
the same time. Also, a probepoint for which there is a jprobe or
a post_handler cannot be optimized. So if you install a jprobe,
or a kprobe with a post_handler, at an optimized probepoint, the
probepoint will be unoptimized automatically.

In general, you can install a probe anywhere in the kernel.
In particular, you can probe interrupt handlers. Known exceptions
Expand Down Expand Up @@ -453,6 +574,38 @@ reason, Kprobes doesn't support return probes (or kprobes or jprobes)
on the x86_64 version of __switch_to(); the registration functions
return -EINVAL.

On x86/x86-64, since the Jump Optimization of Kprobes modifies
instructions widely, there are some limitations to optimization. To
explain it, we introduce some terminology. Imagine a 3-instruction
sequence consisting of a two 2-byte instructions and one 3-byte
instruction.

IA
|
[-2][-1][0][1][2][3][4][5][6][7]
[ins1][ins2][ ins3 ]
[<- DCR ->]
[<- JTPR ->]

ins1: 1st Instruction
ins2: 2nd Instruction
ins3: 3rd Instruction
IA: Insertion Address
JTPR: Jump Target Prohibition Region
DCR: Detoured Code Region

The instructions in DCR are copied to the out-of-line buffer
of the kprobe, because the bytes in DCR are replaced by
a 5-byte jump instruction. So there are several limitations.

a) The instructions in DCR must be relocatable.
b) The instructions in DCR must not include a call instruction.
c) JTPR must not be targeted by any jump or call instruction.
d) DCR must not straddle the border betweeen functions.

Anyway, these limitations are checked by the in-kernel instruction
decoder, so you don't need to worry about that.

6. Probe Overhead

On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
Expand All @@ -476,6 +629,19 @@ k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07
ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99

6.1 Optimized Probe Overhead

Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to
process. Here are sample overhead figures (in usec) for x86 architectures.
k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe,
r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe.

i386: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33

x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30

7. TODO

a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
Expand Down Expand Up @@ -523,7 +689,8 @@ is also specified. Following columns show probe status. If the probe is on
a virtual address that is no longer valid (module init sections, module
virtual addresses that correspond to modules that've been unloaded),
such probes are marked with [GONE]. If the probe is temporarily disabled,
such probes are marked with [DISABLED].
such probes are marked with [DISABLED]. If the probe is optimized, it is
marked with [OPTIMIZED].

/sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.

Expand All @@ -533,3 +700,19 @@ registered probes will be disarmed, till such time a "1" is echoed to this
file. Note that this knob just disarms and arms all kprobes and doesn't
change each probe's disabling state. This means that disabled kprobes (marked
[DISABLED]) will be not enabled if you turn ON all kprobes by this knob.


Appendix B: The kprobes sysctl interface

/proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF.

When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides
a knob to globally and forcibly turn jump optimization (see section
1.4) ON or OFF. By default, jump optimization is allowed (ON).
If you echo "0" to this file or set "debug.kprobes_optimization" to
0 via sysctl, all optimized probes will be unoptimized, and any new
probes registered after that will not be optimized. Note that this
knob *changes* the optimized state. This means that optimized probes
(marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be
removed). If the knob is turned on, they will be optimized again.

13 changes: 13 additions & 0 deletions arch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,17 @@ config KPROBES
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".

config OPTPROBES
bool "Kprobes jump optimization support (EXPERIMENTAL)"
default y
depends on KPROBES
depends on !PREEMPT
depends on HAVE_OPTPROBES
select KALLSYMS_ALL
help
This option will allow kprobes to optimize breakpoint to
a jump for reducing its overhead.

config HAVE_EFFICIENT_UNALIGNED_ACCESS
bool
help
Expand Down Expand Up @@ -83,6 +94,8 @@ config HAVE_KPROBES
config HAVE_KRETPROBES
bool

config HAVE_OPTPROBES
bool
#
# An arch should select this if it provides all these things:
#
Expand Down
1 change: 1 addition & 0 deletions arch/x86/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ config X86
select ARCH_WANT_FRAME_POINTERS
select HAVE_DMA_ATTRS
select HAVE_KRETPROBES
select HAVE_OPTPROBES
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_DYNAMIC_FTRACE
select HAVE_FUNCTION_TRACER
Expand Down
4 changes: 3 additions & 1 deletion arch/x86/include/asm/alternative.h
Original file line number Diff line number Diff line change
Expand Up @@ -165,10 +165,12 @@ static inline void apply_paravirt(struct paravirt_patch_site *start,
* invalid instruction possible) or if the instructions are changed from a
* consistent state to another consistent state atomically.
* More care must be taken when modifying code in the SMP case because of
* Intel's errata.
* Intel's errata. text_poke_smp() takes care that errata, but still
* doesn't support NMI/MCE handler code modifying.
* On the local CPU you need to be protected again NMI or MCE handlers seeing an
* inconsistent instruction while you patch.
*/
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);

#endif /* _ASM_X86_ALTERNATIVE_H */
31 changes: 30 additions & 1 deletion arch/x86/include/asm/kprobes.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,10 @@ struct kprobe;

typedef u8 kprobe_opcode_t;
#define BREAKPOINT_INSTRUCTION 0xcc
#define RELATIVEJUMP_INSTRUCTION 0xe9
#define RELATIVEJUMP_OPCODE 0xe9
#define RELATIVEJUMP_SIZE 5
#define RELATIVECALL_OPCODE 0xe8
#define RELATIVE_ADDR_SIZE 4
#define MAX_INSN_SIZE 16
#define MAX_STACK_SIZE 64
#define MIN_STACK_SIZE(ADDR) \
Expand All @@ -44,6 +47,17 @@ typedef u8 kprobe_opcode_t;

#define flush_insn_slot(p) do { } while (0)

/* optinsn template addresses */
extern kprobe_opcode_t optprobe_template_entry;
extern kprobe_opcode_t optprobe_template_val;
extern kprobe_opcode_t optprobe_template_call;
extern kprobe_opcode_t optprobe_template_end;
#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
#define MAX_OPTINSN_SIZE \
(((unsigned long)&optprobe_template_end - \
(unsigned long)&optprobe_template_entry) + \
MAX_OPTIMIZED_LENGTH + RELATIVEJUMP_SIZE)

extern const int kretprobe_blacklist_size;

void arch_remove_kprobe(struct kprobe *p);
Expand All @@ -64,6 +78,21 @@ struct arch_specific_insn {
int boostable;
};

struct arch_optimized_insn {
/* copy of the original instructions */
kprobe_opcode_t copied_insn[RELATIVE_ADDR_SIZE];
/* detour code buffer */
kprobe_opcode_t *insn;
/* the size of instructions copied to detour code buffer */
size_t size;
};

/* Return true (!0) if optinsn is prepared for optimization. */
static inline int arch_prepared_optinsn(struct arch_optimized_insn *optinsn)
{
return optinsn->size;
}

struct prev_kprobe {
struct kprobe *kp;
unsigned long status;
Expand Down
Loading

0 comments on commit 660f6a3

Please sign in to comment.