Skip to content

Commit 7f427d3

Browse files
committedMay 17, 2016
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull parallel filesystem directory handling update from Al Viro. This is the main parallel directory work by Al that makes the vfs layer able to do lookup and readdir in parallel within a single directory. That's a big change, since this used to be all protected by the directory inode mutex. The inode mutex is replaced by an rwsem, and serialization of lookups of a single name is done by a "in-progress" dentry marker. The series begins with xattr cleanups, and then ends with switching filesystems over to actually doing the readdir in parallel (switching to the "iterate_shared()" that only takes the read lock). A more detailed explanation of the process from Al Viro: "The xattr work starts with some acl fixes, then switches ->getxattr to passing inode and dentry separately. This is the point where the things start to get tricky - that got merged into the very beginning of the -rc3-based #work.lookups, to allow untangling the security_d_instantiate() mess. The xattr work itself proceeds to switch a lot of filesystems to generic_...xattr(); no complications there. After that initial xattr work, the series then does the following: - untangle security_d_instantiate() - convert a bunch of open-coded lookup_one_len_unlocked() to calls of that thing; one such place (in overlayfs) actually yields a trivial conflict with overlayfs fixes later in the cycle - overlayfs ended up switching to a variant of lookup_one_len_unlocked() sans the permission checks. I would've dropped that commit (it gets overridden on merge from #ovl-fixes in #for-next; proper resolution is to use the variant in mainline fs/overlayfs/super.c), but I didn't want to rebase the damn thing - it was fairly late in the cycle... - some filesystems had managed to depend on lookup/lookup exclusion for *fs-internal* data structures in a way that would break if we relaxed the VFS exclusion. Fixing hadn't been hard, fortunately. - core of that series - parallel lookup machinery, replacing ->i_mutex with rwsem, making lookup_slow() take it only shared. At that point lookups happen in parallel; lookups on the same name wait for the in-progress one to be done with that dentry. Surprisingly little code, at that - almost all of it is in fs/dcache.c, with fs/namei.c changes limited to lookup_slow() - making it use the new primitive and actually switching to locking shared. - parallel readdir stuff - first of all, we provide the exclusion on per-struct file basis, same as we do for read() vs lseek() for regular files. That takes care of most of the needed exclusion in readdir/readdir; however, these guys are trickier than lookups, so I went for switching them one-by-one. To do that, a new method '->iterate_shared()' is added and filesystems are switched to it as they are either confirmed to be OK with shared lock on directory or fixed to be OK with that. I hope to kill the original method come next cycle (almost all in-tree filesystems are switched already), but it's still not quite finished. - several filesystems get switched to parallel readdir. The interesting part here is dealing with dcache preseeding by readdir; that needs minor adjustment to be safe with directory locked only shared. Most of the filesystems doing that got switched to in those commits. Important exception: NFS. Turns out that NFS folks, with their, er, insistence on VFS getting the fuck out of the way of the Smart Filesystem Code That Knows How And What To Lock(tm) have grown the locking of their own. They had their own homegrown rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink is the reader there). Of course, with VFS getting the fuck out of the way, as requested, the actual smarts of the smart filesystem code etc. had become exposed... - do_last/lookup_open/atomic_open cleanups. As the result, open() without O_CREAT locks the directory only shared. Including the ->atomic_open() case. Backmerge from #for-linus in the middle of that - atomic_open() fix got brought in. - then comes NFS switch to saner (VFS-based ;-) locking, killing the homegrown "lookup and readdir are writers" kinda-sorta rwsem. All exclusion for sillyunlink/lookup is done by the parallel lookups mechanism. Exclusion between sillyunlink and rmdir is a real rwsem now - rmdir being the writer. Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel now. - the rest of the series consists of switching a lot of filesystems to parallel readdir; in a lot of cases ->llseek() gets simplified as well. One backmerge in there (again, #for-linus - rockridge fix)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits) ext4: switch to ->iterate_shared() hfs: switch to ->iterate_shared() hfsplus: switch to ->iterate_shared() hostfs: switch to ->iterate_shared() hpfs: switch to ->iterate_shared() hpfs: handle allocation failures in hpfs_add_pos() gfs2: switch to ->iterate_shared() f2fs: switch to ->iterate_shared() afs: switch to ->iterate_shared() befs: switch to ->iterate_shared() befs: constify stuff a bit isofs: switch to ->iterate_shared() get_acorn_filename(): deobfuscate a bit btrfs: switch to ->iterate_shared() logfs: no need to lock directory in lseek switch ecryptfs to ->iterate_shared 9p: switch to ->iterate_shared() fat: switch to ->iterate_shared() romfs, squashfs: switch to ->iterate_shared() more trivial ->iterate_shared conversions ...
2 parents ede4090 + 0e0162b commit 7f427d3

File tree

195 files changed

+1449
-1230
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

195 files changed

+1449
-1230
lines changed
 

‎Documentation/filesystems/porting

+53
Original file line numberDiff line numberDiff line change
@@ -525,3 +525,56 @@ in your dentry operations instead.
525525
set_delayed_call() where it used to set *cookie.
526526
->put_link() is gone - just give the destructor to set_delayed_call()
527527
in ->get_link().
528+
--
529+
[mandatory]
530+
->getxattr() and xattr_handler.get() get dentry and inode passed separately.
531+
dentry might be yet to be attached to inode, so do _not_ use its ->d_inode
532+
in the instances. Rationale: !@#!@# security_d_instantiate() needs to be
533+
called before we attach dentry to inode.
534+
--
535+
[mandatory]
536+
symlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/
537+
i_pipe/i_link union zeroed out at inode eviction. As the result, you can't
538+
assume that non-NULL value in ->i_nlink at ->destroy_inode() implies that
539+
it's a symlink. Checking ->i_mode is really needed now. In-tree we had
540+
to fix shmem_destroy_callback() that used to take that kind of shortcut;
541+
watch out, since that shortcut is no longer valid.
542+
--
543+
[mandatory]
544+
->i_mutex is replaced with ->i_rwsem now. inode_lock() et.al. work as
545+
they used to - they just take it exclusive. However, ->lookup() may be
546+
called with parent locked shared. Its instances must not
547+
* use d_instantiate) and d_rehash() separately - use d_add() or
548+
d_splice_alias() instead.
549+
* use d_rehash() alone - call d_add(new_dentry, NULL) instead.
550+
* in the unlikely case when (read-only) access to filesystem
551+
data structures needs exclusion for some reason, arrange it
552+
yourself. None of the in-tree filesystems needed that.
553+
* rely on ->d_parent and ->d_name not changing after dentry has
554+
been fed to d_add() or d_splice_alias(). Again, none of the
555+
in-tree instances relied upon that.
556+
We are guaranteed that lookups of the same name in the same directory
557+
will not happen in parallel ("same" in the sense of your ->d_compare()).
558+
Lookups on different names in the same directory can and do happen in
559+
parallel now.
560+
--
561+
[recommended]
562+
->iterate_shared() is added; it's a parallel variant of ->iterate().
563+
Exclusion on struct file level is still provided (as well as that
564+
between it and lseek on the same struct file), but if your directory
565+
has been opened several times, you can get these called in parallel.
566+
Exclusion between that method and all directory-modifying ones is
567+
still provided, of course.
568+
569+
Often enough ->iterate() can serve as ->iterate_shared() without any
570+
changes - it is a read-only operation, after all. If you have any
571+
per-inode or per-dentry in-core data structures modified by ->iterate(),
572+
you might need something to serialize the access to them. If you
573+
do dcache pre-seeding, you'll need to switch to d_alloc_parallel() for
574+
that; look for in-tree examples.
575+
576+
Old method is only used if the new one is absent; eventually it will
577+
be removed. Switch while you still can; the old one won't stay.
578+
--
579+
[mandatory]
580+
->atomic_open() calls without O_CREAT may happen in parallel.

‎arch/alpha/kernel/osf_sys.c

+2-2
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
147147
long __user *, basep)
148148
{
149149
int error;
150-
struct fd arg = fdget(fd);
150+
struct fd arg = fdget_pos(fd);
151151
struct osf_dirent_callback buf = {
152152
.ctx.actor = osf_filldir,
153153
.dirent = dirent,
@@ -164,7 +164,7 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
164164
if (count != buf.count)
165165
error = count - buf.count;
166166

167-
fdput(arg);
167+
fdput_pos(arg);
168168
return error;
169169
}
170170

0 commit comments

Comments
 (0)
Please sign in to comment.