Skip to content

Commit

Permalink
Merge tag 'fsnotify_for_v5.16-rc1' of git://git.kernel.org/pub/scm/li…
Browse files Browse the repository at this point in the history
…nux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:
 "Support for reporting filesystem errors through fanotify so that
  system health monitoring daemons can watch for these and act instead
  of scraping system logs"

* tag 'fsnotify_for_v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (34 commits)
  samples: remove duplicate include in fs-monitor.c
  samples: Fix warning in fsnotify sample
  docs: Fix formatting of literal sections in fanotify docs
  samples: Make fs-monitor depend on libc and headers
  docs: Document the FAN_FS_ERROR event
  samples: Add fs error monitoring example
  ext4: Send notifications on error
  fanotify: Allow users to request FAN_FS_ERROR events
  fanotify: Emit generic error info for error event
  fanotify: Report fid info for file related file system errors
  fanotify: WARN_ON against too large file handles
  fanotify: Add helpers to decide whether to report FID/DFID
  fanotify: Wrap object_fh inline space in a creator macro
  fanotify: Support merging of error events
  fanotify: Support enqueueing of error events
  fanotify: Pre-allocate pool of error events
  fanotify: Reserve UAPI bits for FAN_FS_ERROR
  fsnotify: Support FS_ERROR event type
  fanotify: Require fid_mode for any non-fd event
  fanotify: Encode empty file handle when no inode is provided
  ...
  • Loading branch information
torvalds committed Nov 6, 2021
2 parents d8b4e5b + 15c7266 commit 2acda75
Show file tree
Hide file tree
Showing 22 changed files with 690 additions and 99 deletions.
78 changes: 78 additions & 0 deletions Documentation/admin-guide/filesystem-monitoring.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
.. SPDX-License-Identifier: GPL-2.0
====================================
File system Monitoring with fanotify
====================================

File system Error Reporting
===========================

Fanotify supports the FAN_FS_ERROR event type for file system-wide error
reporting. It is meant to be used by file system health monitoring
daemons, which listen for these events and take actions (notify
sysadmin, start recovery) when a file system problem is detected.

By design, a FAN_FS_ERROR notification exposes sufficient information
for a monitoring tool to know a problem in the file system has happened.
It doesn't necessarily provide a user space application with semantics
to verify an IO operation was successfully executed. That is out of
scope for this feature. Instead, it is only meant as a framework for
early file system problem detection and reporting recovery tools.

When a file system operation fails, it is common for dozens of kernel
errors to cascade after the initial failure, hiding the original failure
log, which is usually the most useful debug data to troubleshoot the
problem. For this reason, FAN_FS_ERROR tries to report only the first
error that occurred for a file system since the last notification, and
it simply counts additional errors. This ensures that the most
important pieces of information are never lost.

FAN_FS_ERROR requires the fanotify group to be setup with the
FAN_REPORT_FID flag.

At the time of this writing, the only file system that emits FAN_FS_ERROR
notifications is Ext4.

A FAN_FS_ERROR Notification has the following format::

::

[ Notification Metadata (Mandatory) ]
[ Generic Error Record (Mandatory) ]
[ FID record (Mandatory) ]

The order of records is not guaranteed, and new records might be added
in the future. Therefore, applications must not rely on the order and
must be prepared to skip over unknown records. Please refer to
``samples/fanotify/fs-monitor.c`` for an example parser.

Generic error record
--------------------

The generic error record provides enough information for a file system
agnostic tool to learn about a problem in the file system, without
providing any additional details about the problem. This record is
identified by ``struct fanotify_event_info_header.info_type`` being set
to FAN_EVENT_INFO_TYPE_ERROR.

::

struct fanotify_event_info_error {
struct fanotify_event_info_header hdr;
__s32 error;
__u32 error_count;
};

The `error` field identifies the type of error using errno values.
`error_count` tracks the number of errors that occurred and were
suppressed to preserve the original error information, since the last
notification.

FID record
----------

The FID record can be used to uniquely identify the inode that triggered
the error through the combination of fsid and file handle. A file system
specific application can use that information to attempt a recovery
procedure. Errors that are not related to an inode are reported with an
empty file handle of type FILEID_INVALID.
1 change: 1 addition & 0 deletions Documentation/admin-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ configure specific aspects of kernel behavior to your liking.
edid
efi-stub
ext4
filesystem-monitoring
nfs/index
gpio/index
highuid
Expand Down
8 changes: 8 additions & 0 deletions fs/ext4/super.c
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
#include <linux/part_stat.h>
#include <linux/kthread.h>
#include <linux/freezer.h>
#include <linux/fsnotify.h>

#include "ext4.h"
#include "ext4_extents.h" /* Needed for trace points definition */
Expand Down Expand Up @@ -759,6 +760,8 @@ void __ext4_error(struct super_block *sb, const char *function,
sb->s_id, function, line, current->comm, &vaf);
va_end(args);
}
fsnotify_sb_error(sb, NULL, error ? error : EFSCORRUPTED);

ext4_handle_error(sb, force_ro, error, 0, block, function, line);
}

Expand Down Expand Up @@ -789,6 +792,8 @@ void __ext4_error_inode(struct inode *inode, const char *function,
current->comm, &vaf);
va_end(args);
}
fsnotify_sb_error(inode->i_sb, inode, error ? error : EFSCORRUPTED);

ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
function, line);
}
Expand Down Expand Up @@ -827,6 +832,8 @@ void __ext4_error_file(struct file *file, const char *function,
current->comm, path, &vaf);
va_end(args);
}
fsnotify_sb_error(inode->i_sb, inode, EFSCORRUPTED);

ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
function, line);
}
Expand Down Expand Up @@ -894,6 +901,7 @@ void __ext4_std_error(struct super_block *sb, const char *function,
printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
sb->s_id, function, line, errstr);
}
fsnotify_sb_error(sb, NULL, errno ? errno : EFSCORRUPTED);

ext4_handle_error(sb, false, -errno, 0, 0, function, line);
}
Expand Down
3 changes: 3 additions & 0 deletions fs/nfsd/filecache.c
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,9 @@ nfsd_file_fsnotify_handle_event(struct fsnotify_mark *mark, u32 mask,
struct inode *inode, struct inode *dir,
const struct qstr *name, u32 cookie)
{
if (WARN_ON_ONCE(!inode))
return 0;

trace_nfsd_file_fsnotify_handle_event(inode, mask);

/* Should be no marks on non-regular files */
Expand Down
117 changes: 107 additions & 10 deletions fs/notify/fanotify/fanotify.c
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,16 @@ static bool fanotify_name_event_equal(struct fanotify_name_event *fne1,
return fanotify_info_equal(info1, info2);
}

static bool fanotify_error_event_equal(struct fanotify_error_event *fee1,
struct fanotify_error_event *fee2)
{
/* Error events against the same file system are always merged. */
if (!fanotify_fsid_equal(&fee1->fsid, &fee2->fsid))
return false;

return true;
}

static bool fanotify_should_merge(struct fanotify_event *old,
struct fanotify_event *new)
{
Expand Down Expand Up @@ -141,6 +151,9 @@ static bool fanotify_should_merge(struct fanotify_event *old,
case FANOTIFY_EVENT_TYPE_FID_NAME:
return fanotify_name_event_equal(FANOTIFY_NE(old),
FANOTIFY_NE(new));
case FANOTIFY_EVENT_TYPE_FS_ERROR:
return fanotify_error_event_equal(FANOTIFY_EE(old),
FANOTIFY_EE(new));
default:
WARN_ON_ONCE(1);
}
Expand Down Expand Up @@ -176,6 +189,10 @@ static int fanotify_merge(struct fsnotify_group *group,
break;
if (fanotify_should_merge(old, new)) {
old->mask |= new->mask;

if (fanotify_is_error_event(old->mask))
FANOTIFY_EE(old)->err_count++;

return 1;
}
}
Expand Down Expand Up @@ -343,13 +360,23 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
static int fanotify_encode_fh_len(struct inode *inode)
{
int dwords = 0;
int fh_len;

if (!inode)
return 0;

exportfs_encode_inode_fh(inode, NULL, &dwords, NULL);
fh_len = dwords << 2;

/*
* struct fanotify_error_event might be preallocated and is
* limited to MAX_HANDLE_SZ. This should never happen, but
* safeguard by forcing an invalid file handle.
*/
if (WARN_ON_ONCE(fh_len > MAX_HANDLE_SZ))
return 0;

return dwords << 2;
return fh_len;
}

/*
Expand All @@ -370,8 +397,14 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
fh->type = FILEID_ROOT;
fh->len = 0;
fh->flags = 0;

/*
* Invalid FHs are used by FAN_FS_ERROR for errors not
* linked to any inode. The f_handle won't be reported
* back to userspace.
*/
if (!inode)
return 0;
goto out;

/*
* !gpf means preallocated variable size fh, but fh_len could
Expand Down Expand Up @@ -403,8 +436,13 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
fh->type = type;
fh->len = fh_len;

/* Mix fh into event merge key */
*hash ^= fanotify_hash_fh(fh);
out:
/*
* Mix fh into event merge key. Hash might be NULL in case of
* unhashed FID events (i.e. FAN_FS_ERROR).
*/
if (hash)
*hash ^= fanotify_hash_fh(fh);

return FANOTIFY_FH_HDR_LEN + fh_len;

Expand Down Expand Up @@ -452,7 +490,7 @@ static struct inode *fanotify_dfid_inode(u32 event_mask, const void *data,
if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS)
return dir;

if (S_ISDIR(inode->i_mode))
if (inode && S_ISDIR(inode->i_mode))
return inode;

return dir;
Expand Down Expand Up @@ -563,6 +601,44 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id,
return &fne->fae;
}

static struct fanotify_event *fanotify_alloc_error_event(
struct fsnotify_group *group,
__kernel_fsid_t *fsid,
const void *data, int data_type,
unsigned int *hash)
{
struct fs_error_report *report =
fsnotify_data_error_report(data, data_type);
struct inode *inode;
struct fanotify_error_event *fee;
int fh_len;

if (WARN_ON_ONCE(!report))
return NULL;

fee = mempool_alloc(&group->fanotify_data.error_events_pool, GFP_NOFS);
if (!fee)
return NULL;

fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
fee->error = report->error;
fee->err_count = 1;
fee->fsid = *fsid;

inode = report->inode;
fh_len = fanotify_encode_fh_len(inode);

/* Bad fh_len. Fallback to using an invalid fh. Should never happen. */
if (!fh_len && inode)
inode = NULL;

fanotify_encode_fh(&fee->object_fh, inode, fh_len, NULL, 0);

*hash ^= fanotify_hash_fsid(fsid);

return &fee->fae;
}

static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
u32 mask, const void *data,
int data_type, struct inode *dir,
Expand Down Expand Up @@ -630,6 +706,9 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,

if (fanotify_is_perm_event(mask)) {
event = fanotify_alloc_perm_event(path, gfp);
} else if (fanotify_is_error_event(mask)) {
event = fanotify_alloc_error_event(group, fsid, data,
data_type, &hash);
} else if (name_event && (file_name || child)) {
event = fanotify_alloc_name_event(id, fsid, file_name, child,
&hash, gfp);
Expand Down Expand Up @@ -702,6 +781,9 @@ static void fanotify_insert_event(struct fsnotify_group *group,

assert_spin_locked(&group->notification_lock);

if (!fanotify_is_hashed_event(event->mask))
return;

pr_debug("%s: group=%p event=%p bucket=%u\n", __func__,
group, event, bucket);

Expand Down Expand Up @@ -738,8 +820,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC);
BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);

BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);

mask = fanotify_group_event_mask(group, iter_info, mask, data,
data_type, dir);
Expand Down Expand Up @@ -778,9 +861,8 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
}

fsn_event = &event->fse;
ret = fsnotify_add_event(group, fsn_event, fanotify_merge,
fanotify_is_hashed_event(mask) ?
fanotify_insert_event : NULL);
ret = fsnotify_insert_event(group, fsn_event, fanotify_merge,
fanotify_insert_event);
if (ret) {
/* Permission events shouldn't be merged */
BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
Expand All @@ -805,6 +887,9 @@ static void fanotify_free_group_priv(struct fsnotify_group *group)
if (group->fanotify_data.ucounts)
dec_ucount(group->fanotify_data.ucounts,
UCOUNT_FANOTIFY_GROUPS);

if (mempool_initialized(&group->fanotify_data.error_events_pool))
mempool_exit(&group->fanotify_data.error_events_pool);
}

static void fanotify_free_path_event(struct fanotify_event *event)
Expand Down Expand Up @@ -833,7 +918,16 @@ static void fanotify_free_name_event(struct fanotify_event *event)
kfree(FANOTIFY_NE(event));
}

static void fanotify_free_event(struct fsnotify_event *fsn_event)
static void fanotify_free_error_event(struct fsnotify_group *group,
struct fanotify_event *event)
{
struct fanotify_error_event *fee = FANOTIFY_EE(event);

mempool_free(fee, &group->fanotify_data.error_events_pool);
}

static void fanotify_free_event(struct fsnotify_group *group,
struct fsnotify_event *fsn_event)
{
struct fanotify_event *event;

Expand All @@ -855,6 +949,9 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
case FANOTIFY_EVENT_TYPE_OVERFLOW:
kfree(event);
break;
case FANOTIFY_EVENT_TYPE_FS_ERROR:
fanotify_free_error_event(group, event);
break;
default:
WARN_ON_ONCE(1);
}
Expand Down
Loading

0 comments on commit 2acda75

Please sign in to comment.