Skip to content

Commit

Permalink
Merge tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/g…
Browse files Browse the repository at this point in the history
…it/trace/linux-trace

Pull tracing updates from Steven Rostedt:
 "Major changes:

   - Changed location of tracing repo from personal git repo to:
     git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git

   - Added Masami Hiramatsu as co-maintainer

   - Updated MAINTAINERS file to separate out FTRACE as it is more than
     just TRACING.

  Minor changes:

   - Added Mark Rutland as FTRACE reviewer

   - Updated user_events to make it on its way to remove the BROKEN tag.
     The changes should now be acceptable but will run it through a
     cycle and hopefully we can remove the BROKEN tag next release.

   - Added filtering to eprobes

   - Added a delta time to the benchmark trace event

   - Have the histogram and filter callbacks called via a switch
     statement instead of indirect functions. This speeds it up to avoid
     retpolines.

   - Add a way to wake up ring buffer waiters waiting for the ring
     buffer to fill up to its watermark.

   - New ioctl() on the trace_pipe_raw file to wake up ring buffer
     waiters.

   - Wake up waiters when the ring buffer is disabled. A reader may
     block when the ring buffer is disabled, but if it was blocked when
     the ring buffer is disabled it should then wake up.

  Fixes:

   - Allow splice to read partially read ring buffer pages. This fixes
     splice never moving forward.

   - Fix inverted compare that made the "shortest" ring buffer wait
     queue actually the longest.

   - Fix a race in the ring buffer between resetting a page when a
     writer goes to another page, and the reader.

   - Fix ftrace accounting bug when function hooks are added at boot up
     before the weak functions are set to "disabled".

   - Fix bug that freed a user allocated snapshot buffer when enabling a
     tracer.

   - Fix possible recursive locks in osnoise tracer

   - Fix recursive locking direct functions

   - Other minor clean ups and fixes"

* tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
  ftrace: Create separate entry in MAINTAINERS for function hooks
  tracing: Update MAINTAINERS to reflect new tracing git repo
  tracing: Do not free snapshot if tracer is on cmdline
  ftrace: Still disable enabled records marked as disabled
  tracing/user_events: Move pages/locks into groups to prepare for namespaces
  tracing: Add Masami Hiramatsu as co-maintainer
  tracing: Remove unused variable 'dups'
  MAINTAINERS: add myself as a tracing reviewer
  ring-buffer: Fix race between reset page and reading page
  tracing/user_events: Update ABI documentation to align to bits vs bytes
  tracing/user_events: Use bits vs bytes for enabled status page data
  tracing/user_events: Use refcount instead of atomic for ref tracking
  tracing/user_events: Ensure user provided strings are safely formatted
  tracing/user_events: Use WRITE instead of READ for io vector import
  tracing/user_events: Use NULL for strstr checks
  tracing: Fix spelling mistake "preapre" -> "prepare"
  tracing: Wake up waiters when tracing is disabled
  tracing: Add ioctl() to force ring buffer waiters to wake up
  tracing: Wake up ring buffer waiters on closing of the file
  ring-buffer: Add ring_buffer_wake_waiters()
  ...
  • Loading branch information
torvalds committed Oct 10, 2022
2 parents dc55342 + 4f881a6 commit cdf072a
Show file tree
Hide file tree
Showing 34 changed files with 1,299 additions and 486 deletions.
86 changes: 58 additions & 28 deletions Documentation/trace/user_events.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@ dynamic_events is the same as the ioctl with the u: prefix applied.

Typically programs will register a set of events that they wish to expose to
tools that can read trace_events (such as ftrace and perf). The registration
process gives back two ints to the program for each event. The first int is the
status index. This index describes which byte in the
process gives back two ints to the program for each event. The first int is
the status bit. This describes which bit in little-endian format in the
/sys/kernel/debug/tracing/user_events_status file represents this event. The
second int is the write index. This index describes the data when a write() or
second int is the write index which describes the data when a write() or
writev() is called on the /sys/kernel/debug/tracing/user_events_data file.

The structures referenced in this document are contained with the
/include/uap/linux/user_events.h file in the source tree.
The structures referenced in this document are contained within the
/include/uapi/linux/user_events.h file in the source tree.

**NOTE:** *Both user_events_status and user_events_data are under the tracefs
filesystem and may be mounted at different paths than above.*
Expand All @@ -38,18 +38,18 @@ Registering within a user process is done via ioctl() out to the
/sys/kernel/debug/tracing/user_events_data file. The command to issue is
DIAG_IOCSREG.

This command takes a struct user_reg as an argument::
This command takes a packed struct user_reg as an argument::

struct user_reg {
u32 size;
u64 name_args;
u32 status_index;
u32 status_bit;
u32 write_index;
};

The struct user_reg requires two inputs, the first is the size of the structure
to ensure forward and backward compatibility. The second is the command string
to issue for registering. Upon success two outputs are set, the status index
to issue for registering. Upon success two outputs are set, the status bit
and the write index.

User based events show up under tracefs like any other event under the
Expand Down Expand Up @@ -111,15 +111,56 @@ in realtime. This allows user programs to only incur the cost of the write() or
writev() calls when something is actively attached to the event.

User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to
check the status for each event that is registered. The byte to check in the
file is given back after the register ioctl() via user_reg.status_index.
check the status for each event that is registered. The bit to check in the
file is given back after the register ioctl() via user_reg.status_bit. The bit
is always in little-endian format. Programs can check if the bit is set either
using a byte-wise index with a mask or a long-wise index with a little-endian
mask.

Currently the size of user_events_status is a single page, however, custom
kernel configurations can change this size to allow more user based events. In
all cases the size of the file is a multiple of a page size.

For example, if the register ioctl() gives back a status_index of 3 you would
check byte 3 of the returned mmap data to see if anything is attached to that
event.
For example, if the register ioctl() gives back a status_bit of 3 you would
check byte 0 (3 / 8) of the returned mmap data and then AND the result with 8
(1 << (3 % 8)) to see if anything is attached to that event.

A byte-wise index check is performed as follows::

int index, mask;
char *status_page;

index = status_bit / 8;
mask = 1 << (status_bit % 8);

...

if (status_page[index] & mask) {
/* Enabled */
}

A long-wise index check is performed as follows::

#include <asm/bitsperlong.h>
#include <endian.h>

#if __BITS_PER_LONG == 64
#define endian_swap(x) htole64(x)
#else
#define endian_swap(x) htole32(x)
#endif

long index, mask, *status_page;

index = status_bit / __BITS_PER_LONG;
mask = 1L << (status_bit % __BITS_PER_LONG);
mask = endian_swap(mask);

...

if (status_page[index] & mask) {
/* Enabled */
}

Administrators can easily check the status of all registered events by reading
the user_events_status file directly via a terminal. The output is as follows::
Expand All @@ -137,29 +178,18 @@ For example, on a system that has a single event the output looks like this::

Active: 1
Busy: 0
Max: 4096
Max: 32768

If a user enables the user event via ftrace, the output would change to this::

1:test # Used by ftrace

Active: 1
Busy: 1
Max: 4096

**NOTE:** *A status index of 0 will never be returned. This allows user
programs to have an index that can be used on error cases.*

Status Bits
^^^^^^^^^^^
The byte being checked will be non-zero if anything is attached. Programs can
check specific bits in the byte to see what mechanism has been attached.

The following values are defined to aid in checking what has been attached:

**EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0).
Max: 32768

**EVENT_STATUS_PERF** - Bit set if perf has been attached (Bit 1).
**NOTE:** *A status bit of 0 will never be returned. This allows user programs
to have a bit that can be used on error cases.*

Writing Data
------------
Expand Down
26 changes: 18 additions & 8 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -8433,6 +8433,19 @@ L: [email protected]
S: Maintained
F: drivers/platform/x86/fujitsu-tablet.c

FUNCTION HOOKS (FTRACE)
M: Steven Rostedt <[email protected]>
M: Masami Hiramatsu <[email protected]>
R: Mark Rutland <[email protected]>
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
F: Documentation/trace/ftrace*
F: kernel/trace/ftrace*
F: kernel/trace/fgraph.c
F: arch/*/*/*/*ftrace*
F: arch/*/*/*ftrace*
F: include/*/ftrace.h

FUNGIBLE ETHERNET DRIVERS
M: Dimitris Michailidis <[email protected]>
L: [email protected]
Expand Down Expand Up @@ -11422,7 +11435,7 @@ M: Anil S Keshavamurthy <[email protected]>
M: "David S. Miller" <[email protected]>
M: Masami Hiramatsu <[email protected]>
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
F: Documentation/trace/kprobes.rst
F: include/asm-generic/kprobes.h
F: include/linux/kprobes.h
Expand Down Expand Up @@ -20771,14 +20784,11 @@ F: drivers/hwmon/pmbus/tps546d24.c

TRACING
M: Steven Rostedt <[email protected]>
M: Ingo Molnar <[email protected]>
M: Masami Hiramatsu <[email protected]>
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
F: Documentation/trace/ftrace.rst
F: arch/*/*/*/*ftrace*
F: arch/*/*/*ftrace*
T: git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
F: Documentation/trace/*
F: fs/tracefs/
F: include/*/ftrace.h
F: include/linux/trace*.h
F: include/trace/
F: kernel/trace/
Expand All @@ -20787,7 +20797,7 @@ F: tools/testing/selftests/ftrace/

TRACING MMIO ACCESSES (MMIOTRACE)
M: Steven Rostedt <[email protected]>
M: Ingo Molnar <mingo@kernel.org>
M: Masami Hiramatsu <mhiramat@kernel.org>
R: Karol Herbst <[email protected]>
R: Pekka Paalanen <[email protected]>
L: [email protected]
Expand Down
1 change: 0 additions & 1 deletion arch/x86/include/asm/ftrace.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
#define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR

#ifndef __ASSEMBLY__
extern atomic_t modifying_ftrace_code;
extern void __fentry__(void);

static inline unsigned long ftrace_call_adjust(unsigned long addr)
Expand Down
2 changes: 0 additions & 2 deletions arch/x86/include/asm/kprobes.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,6 @@ extern const int kretprobe_blacklist_size;

void arch_remove_kprobe(struct kprobe *p);

extern void arch_kprobe_override_function(struct pt_regs *regs);

/* Architecture specific copy of original instruction*/
struct arch_specific_insn {
/* copy of the original instruction */
Expand Down
2 changes: 0 additions & 2 deletions arch/x86/kernel/kprobes/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,6 @@
DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);

#define stack_addr(regs) ((unsigned long *)regs->sp)

#define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \
(b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) | \
Expand Down
41 changes: 0 additions & 41 deletions include/linux/ftrace.h
Original file line number Diff line number Diff line change
Expand Up @@ -1122,47 +1122,6 @@ static inline void unpause_graph_tracing(void) { }
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */

#ifdef CONFIG_TRACING

/* flags for current->trace */
enum {
TSK_TRACE_FL_TRACE_BIT = 0,
TSK_TRACE_FL_GRAPH_BIT = 1,
};
enum {
TSK_TRACE_FL_TRACE = 1 << TSK_TRACE_FL_TRACE_BIT,
TSK_TRACE_FL_GRAPH = 1 << TSK_TRACE_FL_GRAPH_BIT,
};

static inline void set_tsk_trace_trace(struct task_struct *tsk)
{
set_bit(TSK_TRACE_FL_TRACE_BIT, &tsk->trace);
}

static inline void clear_tsk_trace_trace(struct task_struct *tsk)
{
clear_bit(TSK_TRACE_FL_TRACE_BIT, &tsk->trace);
}

static inline int test_tsk_trace_trace(struct task_struct *tsk)
{
return tsk->trace & TSK_TRACE_FL_TRACE;
}

static inline void set_tsk_trace_graph(struct task_struct *tsk)
{
set_bit(TSK_TRACE_FL_GRAPH_BIT, &tsk->trace);
}

static inline void clear_tsk_trace_graph(struct task_struct *tsk)
{
clear_bit(TSK_TRACE_FL_GRAPH_BIT, &tsk->trace);
}

static inline int test_tsk_trace_graph(struct task_struct *tsk)
{
return tsk->trace & TSK_TRACE_FL_GRAPH;
}

enum ftrace_dump_mode;

extern enum ftrace_dump_mode ftrace_dump_on_oops;
Expand Down
2 changes: 1 addition & 1 deletion include/linux/ring_buffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table);

void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);

#define RING_BUFFER_ALL_CPUS -1

Expand Down
3 changes: 0 additions & 3 deletions include/linux/sched.h
Original file line number Diff line number Diff line change
Expand Up @@ -1390,9 +1390,6 @@ struct task_struct {
#endif

#ifdef CONFIG_TRACING
/* State flags for use by tracers: */
unsigned long trace;

/* Bitmask and counter of trace recursion: */
unsigned long trace_recursion;
#endif /* CONFIG_TRACING */
Expand Down
1 change: 1 addition & 0 deletions include/linux/trace_events.h
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ struct trace_iterator {
unsigned int temp_size;
char *fmt; /* modified format holder */
unsigned int fmt_size;
long wait_index;

/* trace_seq for __print_flags() and __print_symbolic() etc. */
struct trace_seq tmp_seq;
Expand Down
15 changes: 3 additions & 12 deletions include/linux/user_events.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,6 @@
#define USER_EVENTS_SYSTEM "user_events"
#define USER_EVENTS_PREFIX "u:"

/* Bits 0-6 are for known probe types, Bit 7 is for unknown probes */
#define EVENT_BIT_FTRACE 0
#define EVENT_BIT_PERF 1
#define EVENT_BIT_OTHER 7

#define EVENT_STATUS_FTRACE (1 << EVENT_BIT_FTRACE)
#define EVENT_STATUS_PERF (1 << EVENT_BIT_PERF)
#define EVENT_STATUS_OTHER (1 << EVENT_BIT_OTHER)

/* Create dynamic location entry within a 32-bit value */
#define DYN_LOC(offset, size) ((size) << 16 | (offset))

Expand All @@ -45,12 +36,12 @@ struct user_reg {
/* Input: Pointer to string with event name, description and flags */
__u64 name_args;

/* Output: Byte index of the event within the status page */
__u32 status_index;
/* Output: Bitwise index of the event within the status page */
__u32 status_bit;

/* Output: Index of the event to use when writing data */
__u32 write_index;
};
} __attribute__((__packed__));

#define DIAG_IOC_MAGIC '*'

Expand Down
Loading

0 comments on commit cdf072a

Please sign in to comment.