Skip to content

Commit

Permalink
Merge tag 'netfs-lib-20231228' of ssh://gitolite.kernel.org/pub/scm/l…
Browse files Browse the repository at this point in the history
…inux/kernel/git/dhowells/linux-fs

Pull netfs updates from David Howells:

The main aims of these patches are to get high-level I/O and knowledge of
the pagecache out of the filesystem drivers as much as possible and to get
rid, as much of possible, of the knowledge that pages/folios exist.
Further, I would like to see ->write_begin, ->write_end and
->launder_folio go away.

Features that are added by these patches to that which is already there in
netfslib:

 (1) NFS-style (and Ceph-style) locking around DIO vs buffered I/O calls to
     prevent these from happening at the same time.  mmap'd I/O can, of
     necessity, happen at any time ignoring these locks.

 (2) Support for unbuffered I/O.  The data is kept in the bounce buffer and
     the pagecache is not used.  This can be turned on with an inode flag.

 (3) Support for direct I/O.  This is basically unbuffered I/O with some
     extra restrictions and no RMW.

 (4) Support for using a bounce buffer in an operation.  The bounce buffer
     may be bigger than the target data/buffer, allowing for crypto
     rounding.

 (5) ->write_begin() and ->write_end() are ignored in favour of merging all
     of that into one function, netfs_perform_write(), thereby avoiding the
     function pointer traversals.

 (6) Support for write-through caching in the pagecache.
     netfs_perform_write() adds the pages is modifies to an I/O operation
     as it goes and directly marks them writeback rather than dirty.  When
     writing back from write-through, it limits the range written back.
     This should allow CIFS to deal with byte-range mandatory locks
     correctly.

 (7) O_*SYNC and RWF_*SYNC writes use write-through rather than writing to
     the pagecache and then flushing afterwards.  An AIO O_*SYNC write will
     notify of completion when the sub-writes all complete.

 (8) Support for write-streaming where modifed data is held in !uptodate
     folios, with a private struct attached indicating the range that is
     valid.

 (9) Support for write grouping, multiplexing a pointer to a group in the
     folio private data with the write-streaming data.  The writepages
     algorithm only writes stuff back that's in the nominated group.  This
     is intended for use by Ceph to write is snaps in order.

(10) Skipping reads for which we know the server could only supply zeros or
     EOF (for instance if we've done a local write that leaves a hole in
     the file and extends the local inode size).

General notes:

 (1) The fscache module is merged into the netfslib module to avoid cyclic
     exported symbol usage that prevents either module from being loaded.

 (2) Some helpers from fscache are reassigned to netfslib by name.

 (3) netfslib now makes use of folio->private, which means the filesystem
     can't use it.

 (4) The filesystem provides wrappers to call the write helpers, allowing
     it to do pre-validation, oplock/capability fetching and the passing in
     of write group info.

 (5) I want to try flushing the data when tearing down an inode before
     invalidating it to try and render launder_folio unnecessary.

 (6) Write-through caching will generate and dispatch write subrequests as
     it gathers enough data to hit wsize and has whole pages that at least
     span that size.  This needs to be a bit more flexible, allowing for a
     filesystem such as CIFS to have a variable wsize.

 (7) The filesystem driver is just given read and write calls with an
     iov_iter describing the data/buffer to use.  Ideally, they don't see
     pages or folios at all.  A function, extract_iter_to_sg(), is already
     available to decant part of an iterator into a scatterlist for crypto
     purposes.

AFS notes:

 (1) I pushed a pair of patches that clean up the trace header down to the
     base so that they can be shared with another branch.

9P notes:

 (1) Most of xfstests now pass - more, in fact, since upstream 9p lacks a
     writepages method and can't handle mmap writes.  An occasional oops
     (and sometimes panic) happens somewhere in the pathwalk/FID handling
     code that is unrelated to these changes.

 (2) Writes should now occur in larger-than-page-sized chunks.

 (3) It should be possible to turn on multipage folio support in 9P now.

All in all these patches remove a little over 800 lines from AFS, 300
from 9P, albeit with around 3000 lines added to netfs. Hopefully, I will
be able to remove a bunch of lines from Ceph too.

I've split the CIFS patches out to a separate branch, cifs-netfs, where
a further 2000+ lines are removed.  I can run a certain amount of
xfstests on CIFS, though I'm running into ksmbd issues and not all the
tests work correctly because of issues between fallocate and what the
SMB protocol actually supports.

I've also dropped the content-crypto patches out for the moment as
they're only usable by the ceph changes which I'm still working on.

The patch to use PG_writeback instead of PG_fscache for writing to the
cache has also been deferred, pending 9p, afs, ceph and cifs all being
converted.

* tag 'netfs-lib-20231228' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (40 commits)
  9p: Use netfslib read/write_iter
  afs: Use the netfs write helpers
  netfs: Export the netfs_sreq tracepoint
  netfs: Optimise away reads above the point at which there can be no data
  netfs: Implement a write-through caching option
  netfs: Provide a launder_folio implementation
  netfs: Provide a writepages implementation
  netfs, cachefiles: Pass upper bound length to allow expansion
  netfs: Provide netfs_file_read_iter()
  netfs: Allow buffered shared-writeable mmap through netfs_page_mkwrite()
  netfs: Implement buffered write API
  netfs: Implement unbuffered/DIO write support
  netfs: Implement unbuffered/DIO read support
  netfs: Allocate multipage folios in the writepath
  netfs: Make netfs_read_folio() handle streaming-write pages
  netfs: Provide func to copy data to pagecache for buffered write
  netfs: Dispatch write requests to process a writeback slice
  netfs: Prep to use folio->private for write grouping and streaming write
  netfs: Make the refcounting of netfs_begin_read() easier to use
  netfs: Make netfs_put_request() handle a NULL pointer
  ...

Signed-off-by: Christian Brauner <[email protected]>
  • Loading branch information
brauner committed Dec 28, 2023
2 parents 861deac + 80105ed commit 86fb594
Show file tree
Hide file tree
Showing 72 changed files with 4,248 additions and 2,485 deletions.
23 changes: 4 additions & 19 deletions Documentation/filesystems/netfs_library.rst
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,6 @@ through which it can issue requests and negotiate::
struct netfs_request_ops {
void (*init_request)(struct netfs_io_request *rreq, struct file *file);
void (*free_request)(struct netfs_io_request *rreq);
int (*begin_cache_operation)(struct netfs_io_request *rreq);
void (*expand_readahead)(struct netfs_io_request *rreq);
bool (*clamp_length)(struct netfs_io_subrequest *subreq);
void (*issue_read)(struct netfs_io_subrequest *subreq);
Expand All @@ -317,20 +316,6 @@ The operations are as follows:
[Optional] This is called as the request is being deallocated so that the
filesystem can clean up any state it has attached there.

* ``begin_cache_operation()``

[Optional] This is called to ask the network filesystem to call into the
cache (if present) to initialise the caching state for this read. The netfs
library module cannot access the cache directly, so the cache should call
something like fscache_begin_read_operation() to do this.

The cache gets to store its state in ->cache_resources and must set a table
of operations of its own there (though of a different type).

This should return 0 on success and an error code otherwise. If an error is
reported, the operation may proceed anyway, just without local caching (only
out of memory and interruption errors cause failure here).

* ``expand_readahead()``

[Optional] This is called to allow the filesystem to expand the size of a
Expand Down Expand Up @@ -460,14 +445,14 @@ When implementing a local cache to be used by the read helpers, two things are
required: some way for the network filesystem to initialise the caching for a
read request and a table of operations for the helpers to call.

The network filesystem's ->begin_cache_operation() method is called to set up a
cache and this must call into the cache to do the work. If using fscache, for
example, the cache would call::
To begin a cache operation on an fscache object, the following function is
called::

int fscache_begin_read_operation(struct netfs_io_request *rreq,
struct fscache_cookie *cookie);

passing in the request pointer and the cookie corresponding to the file.
passing in the request pointer and the cookie corresponding to the file. This
fills in the cache resources mentioned below.

The netfs_io_request object contains a place for the cache to hang its
state::
Expand Down
21 changes: 13 additions & 8 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -8133,6 +8133,19 @@ S: Supported
F: fs/iomap/
F: include/linux/iomap.h

FILESYSTEMS [NETFS LIBRARY]
M: David Howells <[email protected]>
L: [email protected] (moderated for non-subscribers)
L: [email protected]
S: Supported
F: Documentation/filesystems/caching/
F: Documentation/filesystems/netfs_library.rst
F: fs/netfs/
F: include/linux/fscache*.h
F: include/linux/netfs.h
F: include/trace/events/fscache.h
F: include/trace/events/netfs.h

FINTEK F75375S HARDWARE MONITOR AND FAN CONTROLLER DRIVER
M: Riku Voipio <[email protected]>
L: [email protected]
Expand Down Expand Up @@ -8567,14 +8580,6 @@ F: Documentation/power/freezing-of-tasks.rst
F: include/linux/freezer.h
F: kernel/freezer.c

FS-CACHE: LOCAL CACHING FOR NETWORK FILESYSTEMS
M: David Howells <[email protected]>
L: [email protected] (moderated for non-subscribers)
S: Supported
F: Documentation/filesystems/caching/
F: fs/fscache/
F: include/linux/fscache*.h

FSCRYPT: FILE SYSTEM LEVEL ENCRYPTION SUPPORT
M: Eric Biggers <[email protected]>
M: Theodore Y. Ts'o <[email protected]>
Expand Down
3 changes: 2 additions & 1 deletion arch/arm/configs/mxs_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,8 @@ CONFIG_PWM_MXS=y
CONFIG_NVMEM_MXS_OCOTP=y
CONFIG_EXT4_FS=y
# CONFIG_DNOTIFY is not set
CONFIG_FSCACHE=m
CONFIG_NETFS_SUPPORT=m
CONFIG_FSCACHE=y
CONFIG_FSCACHE_STATS=y
CONFIG_CACHEFILES=m
CONFIG_VFAT_FS=y
Expand Down
3 changes: 2 additions & 1 deletion arch/csky/configs/defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ CONFIG_GENERIC_PHY=y
CONFIG_EXT4_FS=y
CONFIG_FANOTIFY=y
CONFIG_QUOTA=y
CONFIG_FSCACHE=m
CONFIG_NETFS_SUPPORT=m
CONFIG_FSCACHE=y
CONFIG_FSCACHE_STATS=y
CONFIG_CACHEFILES=m
CONFIG_MSDOS_FS=y
Expand Down
3 changes: 2 additions & 1 deletion arch/mips/configs/ip27_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,8 @@ CONFIG_BTRFS_FS_POSIX_ACL=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_FSCACHE=m
CONFIG_NETFS_SUPPORT=m
CONFIG_FSCACHE=y
CONFIG_FSCACHE_STATS=y
CONFIG_CACHEFILES=m
CONFIG_PROC_KCORE=y
Expand Down
3 changes: 2 additions & 1 deletion arch/mips/configs/lemote2f_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,8 @@ CONFIG_BTRFS_FS=m
CONFIG_QUOTA=y
CONFIG_QFMT_V2=m
CONFIG_AUTOFS_FS=m
CONFIG_FSCACHE=m
CONFIG_NETFS_SUPPORT=m
CONFIG_FSCACHE=y
CONFIG_CACHEFILES=m
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
Expand Down
3 changes: 2 additions & 1 deletion arch/mips/configs/loongson3_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -356,7 +356,8 @@ CONFIG_QFMT_V2=m
CONFIG_AUTOFS_FS=y
CONFIG_FUSE_FS=m
CONFIG_VIRTIO_FS=m
CONFIG_FSCACHE=m
CONFIG_NETFS_SUPPORT=m
CONFIG_FSCACHE=y
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_MSDOS_FS=m
Expand Down
3 changes: 2 additions & 1 deletion arch/mips/configs/pic32mzda_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@ CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_FSCACHE=m
CONFIG_NETFS_SUPPORT=m
CONFIG_FSCACHE=y
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
Expand Down
3 changes: 2 additions & 1 deletion arch/s390/configs/debug_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -637,8 +637,9 @@ CONFIG_FUSE_FS=y
CONFIG_CUSE=m
CONFIG_VIRTIO_FS=m
CONFIG_OVERLAY_FS=m
CONFIG_NETFS_SUPPORT=m
CONFIG_NETFS_STATS=y
CONFIG_FSCACHE=m
CONFIG_FSCACHE=y
CONFIG_CACHEFILES=m
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
Expand Down
3 changes: 2 additions & 1 deletion arch/s390/configs/defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -622,8 +622,9 @@ CONFIG_FUSE_FS=y
CONFIG_CUSE=m
CONFIG_VIRTIO_FS=m
CONFIG_OVERLAY_FS=m
CONFIG_NETFS_SUPPORT=m
CONFIG_NETFS_STATS=y
CONFIG_FSCACHE=m
CONFIG_FSCACHE=y
CONFIG_CACHEFILES=m
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
Expand Down
3 changes: 2 additions & 1 deletion arch/sh/configs/sdk7786_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,8 @@ CONFIG_BTRFS_FS=y
CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=y
CONFIG_CUSE=m
CONFIG_FSCACHE=m
CONFIG_NETFS_SUPPORT=m
CONFIG_FSCACHE=y
CONFIG_CACHEFILES=m
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
Expand Down
Loading

0 comments on commit 86fb594

Please sign in to comment.