Skip to content

Commit

Permalink
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…
Browse files Browse the repository at this point in the history
…/git/viro/vfs

Pull parallel filesystem directory handling update from Al Viro.

This is the main parallel directory work by Al that makes the vfs layer
able to do lookup and readdir in parallel within a single directory.
That's a big change, since this used to be all protected by the
directory inode mutex.

The inode mutex is replaced by an rwsem, and serialization of lookups of
a single name is done by a "in-progress" dentry marker.

The series begins with xattr cleanups, and then ends with switching
filesystems over to actually doing the readdir in parallel (switching to
the "iterate_shared()" that only takes the read lock).

A more detailed explanation of the process from Al Viro:
 "The xattr work starts with some acl fixes, then switches ->getxattr to
  passing inode and dentry separately.  This is the point where the
  things start to get tricky - that got merged into the very beginning
  of the -rc3-based #work.lookups, to allow untangling the
  security_d_instantiate() mess.  The xattr work itself proceeds to
  switch a lot of filesystems to generic_...xattr(); no complications
  there.

  After that initial xattr work, the series then does the following:

   - untangle security_d_instantiate()

   - convert a bunch of open-coded lookup_one_len_unlocked() to calls of
     that thing; one such place (in overlayfs) actually yields a trivial
     conflict with overlayfs fixes later in the cycle - overlayfs ended
     up switching to a variant of lookup_one_len_unlocked() sans the
     permission checks.  I would've dropped that commit (it gets
     overridden on merge from #ovl-fixes in #for-next; proper resolution
     is to use the variant in mainline fs/overlayfs/super.c), but I
     didn't want to rebase the damn thing - it was fairly late in the
     cycle...

   - some filesystems had managed to depend on lookup/lookup exclusion
     for *fs-internal* data structures in a way that would break if we
     relaxed the VFS exclusion.  Fixing hadn't been hard, fortunately.

   - core of that series - parallel lookup machinery, replacing
     ->i_mutex with rwsem, making lookup_slow() take it only shared.  At
     that point lookups happen in parallel; lookups on the same name
     wait for the in-progress one to be done with that dentry.

     Surprisingly little code, at that - almost all of it is in
     fs/dcache.c, with fs/namei.c changes limited to lookup_slow() -
     making it use the new primitive and actually switching to locking
     shared.

   - parallel readdir stuff - first of all, we provide the exclusion on
     per-struct file basis, same as we do for read() vs lseek() for
     regular files.  That takes care of most of the needed exclusion in
     readdir/readdir; however, these guys are trickier than lookups, so
     I went for switching them one-by-one.  To do that, a new method
     '->iterate_shared()' is added and filesystems are switched to it
     as they are either confirmed to be OK with shared lock on directory
     or fixed to be OK with that.  I hope to kill the original method
     come next cycle (almost all in-tree filesystems are switched
     already), but it's still not quite finished.

   - several filesystems get switched to parallel readdir.  The
     interesting part here is dealing with dcache preseeding by readdir;
     that needs minor adjustment to be safe with directory locked only
     shared.

     Most of the filesystems doing that got switched to in those
     commits.  Important exception: NFS.  Turns out that NFS folks, with
     their, er, insistence on VFS getting the fuck out of the way of the
     Smart Filesystem Code That Knows How And What To Lock(tm) have
     grown the locking of their own.  They had their own homegrown
     rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink
     is the reader there).  Of course, with VFS getting the fuck out of
     the way, as requested, the actual smarts of the smart filesystem
     code etc. had become exposed...

   - do_last/lookup_open/atomic_open cleanups.  As the result, open()
     without O_CREAT locks the directory only shared.  Including the
     ->atomic_open() case.  Backmerge from #for-linus in the middle of
     that - atomic_open() fix got brought in.

   - then comes NFS switch to saner (VFS-based ;-) locking, killing the
     homegrown "lookup and readdir are writers" kinda-sorta rwsem.  All
     exclusion for sillyunlink/lookup is done by the parallel lookups
     mechanism.  Exclusion between sillyunlink and rmdir is a real rwsem
     now - rmdir being the writer.

     Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel
     now.

   - the rest of the series consists of switching a lot of filesystems
     to parallel readdir; in a lot of cases ->llseek() gets simplified
     as well.  One backmerge in there (again, #for-linus - rockridge
     fix)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits)
  ext4: switch to ->iterate_shared()
  hfs: switch to ->iterate_shared()
  hfsplus: switch to ->iterate_shared()
  hostfs: switch to ->iterate_shared()
  hpfs: switch to ->iterate_shared()
  hpfs: handle allocation failures in hpfs_add_pos()
  gfs2: switch to ->iterate_shared()
  f2fs: switch to ->iterate_shared()
  afs: switch to ->iterate_shared()
  befs: switch to ->iterate_shared()
  befs: constify stuff a bit
  isofs: switch to ->iterate_shared()
  get_acorn_filename(): deobfuscate a bit
  btrfs: switch to ->iterate_shared()
  logfs: no need to lock directory in lseek
  switch ecryptfs to ->iterate_shared
  9p: switch to ->iterate_shared()
  fat: switch to ->iterate_shared()
  romfs, squashfs: switch to ->iterate_shared()
  more trivial ->iterate_shared conversions
  ...
  • Loading branch information
torvalds committed May 17, 2016
2 parents ede4090 + 0e0162b commit 7f427d3
Show file tree
Hide file tree
Showing 195 changed files with 1,449 additions and 1,230 deletions.
53 changes: 53 additions & 0 deletions Documentation/filesystems/porting
Original file line number Diff line number Diff line change
Expand Up @@ -525,3 +525,56 @@ in your dentry operations instead.
set_delayed_call() where it used to set *cookie.
->put_link() is gone - just give the destructor to set_delayed_call()
in ->get_link().
--
[mandatory]
->getxattr() and xattr_handler.get() get dentry and inode passed separately.
dentry might be yet to be attached to inode, so do _not_ use its ->d_inode
in the instances. Rationale: !@#!@# security_d_instantiate() needs to be
called before we attach dentry to inode.
--
[mandatory]
symlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/
i_pipe/i_link union zeroed out at inode eviction. As the result, you can't
assume that non-NULL value in ->i_nlink at ->destroy_inode() implies that
it's a symlink. Checking ->i_mode is really needed now. In-tree we had
to fix shmem_destroy_callback() that used to take that kind of shortcut;
watch out, since that shortcut is no longer valid.
--
[mandatory]
->i_mutex is replaced with ->i_rwsem now. inode_lock() et.al. work as
they used to - they just take it exclusive. However, ->lookup() may be
called with parent locked shared. Its instances must not
* use d_instantiate) and d_rehash() separately - use d_add() or
d_splice_alias() instead.
* use d_rehash() alone - call d_add(new_dentry, NULL) instead.
* in the unlikely case when (read-only) access to filesystem
data structures needs exclusion for some reason, arrange it
yourself. None of the in-tree filesystems needed that.
* rely on ->d_parent and ->d_name not changing after dentry has
been fed to d_add() or d_splice_alias(). Again, none of the
in-tree instances relied upon that.
We are guaranteed that lookups of the same name in the same directory
will not happen in parallel ("same" in the sense of your ->d_compare()).
Lookups on different names in the same directory can and do happen in
parallel now.
--
[recommended]
->iterate_shared() is added; it's a parallel variant of ->iterate().
Exclusion on struct file level is still provided (as well as that
between it and lseek on the same struct file), but if your directory
has been opened several times, you can get these called in parallel.
Exclusion between that method and all directory-modifying ones is
still provided, of course.

Often enough ->iterate() can serve as ->iterate_shared() without any
changes - it is a read-only operation, after all. If you have any
per-inode or per-dentry in-core data structures modified by ->iterate(),
you might need something to serialize the access to them. If you
do dcache pre-seeding, you'll need to switch to d_alloc_parallel() for
that; look for in-tree examples.

Old method is only used if the new one is absent; eventually it will
be removed. Switch while you still can; the old one won't stay.
--
[mandatory]
->atomic_open() calls without O_CREAT may happen in parallel.
4 changes: 2 additions & 2 deletions arch/alpha/kernel/osf_sys.c
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
long __user *, basep)
{
int error;
struct fd arg = fdget(fd);
struct fd arg = fdget_pos(fd);
struct osf_dirent_callback buf = {
.ctx.actor = osf_filldir,
.dirent = dirent,
Expand All @@ -164,7 +164,7 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
if (count != buf.count)
error = count - buf.count;

fdput(arg);
fdput_pos(arg);
return error;
}

Expand Down
2 changes: 1 addition & 1 deletion arch/powerpc/platforms/cell/spufs/inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ const struct file_operations spufs_context_fops = {
.release = spufs_dir_close,
.llseek = dcache_dir_lseek,
.read = generic_read_dir,
.iterate = dcache_readdir,
.iterate_shared = dcache_readdir,
.fsync = noop_fsync,
};
EXPORT_SYMBOL_GPL(spufs_context_fops);
Expand Down
4 changes: 1 addition & 3 deletions drivers/staging/lustre/lustre/llite/dir.c
Original file line number Diff line number Diff line change
Expand Up @@ -1865,7 +1865,6 @@ static loff_t ll_dir_seek(struct file *file, loff_t offset, int origin)
int api32 = ll_need_32bit_api(sbi);
loff_t ret = -EINVAL;

inode_lock(inode);
switch (origin) {
case SEEK_SET:
break;
Expand Down Expand Up @@ -1903,7 +1902,6 @@ static loff_t ll_dir_seek(struct file *file, loff_t offset, int origin)
goto out;

out:
inode_unlock(inode);
return ret;
}

Expand All @@ -1922,7 +1920,7 @@ const struct file_operations ll_dir_operations = {
.open = ll_dir_open,
.release = ll_dir_release,
.read = generic_read_dir,
.iterate = ll_readdir,
.iterate_shared = ll_readdir,
.unlocked_ioctl = ll_dir_ioctl,
.fsync = ll_fsync,
};
4 changes: 2 additions & 2 deletions drivers/staging/lustre/lustre/llite/llite_internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -1042,8 +1042,8 @@ static inline __u64 ll_file_maxbytes(struct inode *inode)
/* llite/xattr.c */
int ll_setxattr(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags);
ssize_t ll_getxattr(struct dentry *dentry, const char *name,
void *buffer, size_t size);
ssize_t ll_getxattr(struct dentry *dentry, struct inode *inode,
const char *name, void *buffer, size_t size);
ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size);
int ll_removexattr(struct dentry *dentry, const char *name);

Expand Down
6 changes: 2 additions & 4 deletions drivers/staging/lustre/lustre/llite/xattr.c
Original file line number Diff line number Diff line change
Expand Up @@ -451,11 +451,9 @@ int ll_getxattr_common(struct inode *inode, const char *name,
return rc;
}

ssize_t ll_getxattr(struct dentry *dentry, const char *name,
void *buffer, size_t size)
ssize_t ll_getxattr(struct dentry *dentry, struct inode *inode,
const char *name, void *buffer, size_t size)
{
struct inode *inode = d_inode(dentry);

LASSERT(inode);
LASSERT(name);

Expand Down
8 changes: 4 additions & 4 deletions fs/9p/acl.c
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ static struct posix_acl *v9fs_get_cached_acl(struct inode *inode, int type)
* instantiating the inode (v9fs_inode_from_fid)
*/
acl = get_cached_acl(inode, type);
BUG_ON(acl == ACL_NOT_CACHED);
BUG_ON(is_uncached_acl(acl));
return acl;
}

Expand Down Expand Up @@ -213,8 +213,8 @@ int v9fs_acl_mode(struct inode *dir, umode_t *modep,
}

static int v9fs_xattr_get_acl(const struct xattr_handler *handler,
struct dentry *dentry, const char *name,
void *buffer, size_t size)
struct dentry *dentry, struct inode *inode,
const char *name, void *buffer, size_t size)
{
struct v9fs_session_info *v9ses;
struct posix_acl *acl;
Expand All @@ -227,7 +227,7 @@ static int v9fs_xattr_get_acl(const struct xattr_handler *handler,
if ((v9ses->flags & V9FS_ACCESS_MASK) != V9FS_ACCESS_CLIENT)
return v9fs_xattr_get(dentry, handler->name, buffer, size);

acl = v9fs_get_cached_acl(d_inode(dentry), handler->flags);
acl = v9fs_get_cached_acl(inode, handler->flags);
if (IS_ERR(acl))
return PTR_ERR(acl);
if (acl == NULL)
Expand Down
4 changes: 2 additions & 2 deletions fs/9p/vfs_dir.c
Original file line number Diff line number Diff line change
Expand Up @@ -246,15 +246,15 @@ int v9fs_dir_release(struct inode *inode, struct file *filp)
const struct file_operations v9fs_dir_operations = {
.read = generic_read_dir,
.llseek = generic_file_llseek,
.iterate = v9fs_dir_readdir,
.iterate_shared = v9fs_dir_readdir,
.open = v9fs_file_open,
.release = v9fs_dir_release,
};

const struct file_operations v9fs_dir_operations_dotl = {
.read = generic_read_dir,
.llseek = generic_file_llseek,
.iterate = v9fs_dir_readdir_dotl,
.iterate_shared = v9fs_dir_readdir_dotl,
.open = v9fs_file_open,
.release = v9fs_dir_release,
.fsync = v9fs_file_fsync_dotl,
Expand Down
2 changes: 1 addition & 1 deletion fs/9p/vfs_inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -1071,7 +1071,7 @@ v9fs_vfs_getattr(struct vfsmount *mnt, struct dentry *dentry,
if (IS_ERR(st))
return PTR_ERR(st);

v9fs_stat2inode(st, d_inode(dentry), d_inode(dentry)->i_sb);
v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb);
generic_fillattr(d_inode(dentry), stat);

p9stat_free(st);
Expand Down
4 changes: 2 additions & 2 deletions fs/9p/xattr.c
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,8 @@ ssize_t v9fs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
}

static int v9fs_xattr_handler_get(const struct xattr_handler *handler,
struct dentry *dentry, const char *name,
void *buffer, size_t size)
struct dentry *dentry, struct inode *inode,
const char *name, void *buffer, size_t size)
{
const char *full_name = xattr_full_name(handler, name);

Expand Down
2 changes: 1 addition & 1 deletion fs/affs/dir.c
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ static int affs_readdir(struct file *, struct dir_context *);
const struct file_operations affs_dir_operations = {
.read = generic_read_dir,
.llseek = generic_file_llseek,
.iterate = affs_readdir,
.iterate_shared = affs_readdir,
.fsync = affs_file_fsync,
};

Expand Down
16 changes: 8 additions & 8 deletions fs/afs/dir.c
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ static int afs_rename(struct inode *old_dir, struct dentry *old_dentry,
const struct file_operations afs_dir_file_operations = {
.open = afs_dir_open,
.release = afs_release,
.iterate = afs_readdir,
.iterate_shared = afs_readdir,
.lock = afs_lock,
.llseek = generic_file_llseek,
};
Expand Down Expand Up @@ -128,7 +128,7 @@ struct afs_lookup_cookie {
/*
* check that a directory page is valid
*/
static inline void afs_dir_check_page(struct inode *dir, struct page *page)
static inline bool afs_dir_check_page(struct inode *dir, struct page *page)
{
struct afs_dir_page *dbuf;
loff_t latter;
Expand Down Expand Up @@ -168,11 +168,11 @@ static inline void afs_dir_check_page(struct inode *dir, struct page *page)
}

SetPageChecked(page);
return;
return true;

error:
SetPageChecked(page);
SetPageError(page);
return false;
}

/*
Expand All @@ -196,10 +196,10 @@ static struct page *afs_dir_get_page(struct inode *dir, unsigned long index,
page = read_cache_page(dir->i_mapping, index, afs_page_filler, key);
if (!IS_ERR(page)) {
kmap(page);
if (!PageChecked(page))
afs_dir_check_page(dir, page);
if (PageError(page))
goto fail;
if (unlikely(!PageChecked(page))) {
if (PageError(page) || !afs_dir_check_page(dir, page))
goto fail;
}
}
return page;

Expand Down
4 changes: 2 additions & 2 deletions fs/autofs4/root.c
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ const struct file_operations autofs4_root_operations = {
.open = dcache_dir_open,
.release = dcache_dir_close,
.read = generic_read_dir,
.iterate = dcache_readdir,
.iterate_shared = dcache_readdir,
.llseek = dcache_dir_lseek,
.unlocked_ioctl = autofs4_root_ioctl,
#ifdef CONFIG_COMPAT
Expand All @@ -51,7 +51,7 @@ const struct file_operations autofs4_dir_operations = {
.open = autofs4_dir_open,
.release = dcache_dir_close,
.read = generic_read_dir,
.iterate = dcache_readdir,
.iterate_shared = dcache_readdir,
.llseek = dcache_dir_lseek,
};

Expand Down
4 changes: 2 additions & 2 deletions fs/bad_inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ static int bad_inode_setxattr(struct dentry *dentry, const char *name,
return -EIO;
}

static ssize_t bad_inode_getxattr(struct dentry *dentry, const char *name,
void *buffer, size_t size)
static ssize_t bad_inode_getxattr(struct dentry *dentry, struct inode *inode,
const char *name, void *buffer, size_t size)
{
return -EIO;
}
Expand Down
4 changes: 2 additions & 2 deletions fs/befs/befs.h
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ BEFS_I(const struct inode *inode)
}

static inline befs_blocknr_t
iaddr2blockno(struct super_block *sb, befs_inode_addr * iaddr)
iaddr2blockno(struct super_block *sb, const befs_inode_addr *iaddr)
{
return ((iaddr->allocation_group << BEFS_SB(sb)->ag_shift) +
iaddr->start);
Expand All @@ -141,7 +141,7 @@ befs_iaddrs_per_block(struct super_block *sb)
}

static inline int
befs_iaddr_is_empty(befs_inode_addr * iaddr)
befs_iaddr_is_empty(const befs_inode_addr *iaddr)
{
return (!iaddr->allocation_group) && (!iaddr->start) && (!iaddr->len);
}
Expand Down
16 changes: 8 additions & 8 deletions fs/befs/btree.c
Original file line number Diff line number Diff line change
Expand Up @@ -88,15 +88,15 @@ struct befs_btree_node {
static const befs_off_t befs_bt_inval = 0xffffffffffffffffULL;

/* local functions */
static int befs_btree_seekleaf(struct super_block *sb, befs_data_stream * ds,
static int befs_btree_seekleaf(struct super_block *sb, const befs_data_stream *ds,
befs_btree_super * bt_super,
struct befs_btree_node *this_node,
befs_off_t * node_off);

static int befs_bt_read_super(struct super_block *sb, befs_data_stream * ds,
static int befs_bt_read_super(struct super_block *sb, const befs_data_stream *ds,
befs_btree_super * sup);

static int befs_bt_read_node(struct super_block *sb, befs_data_stream * ds,
static int befs_bt_read_node(struct super_block *sb, const befs_data_stream *ds,
struct befs_btree_node *node,
befs_off_t node_off);

Expand Down Expand Up @@ -134,7 +134,7 @@ static int befs_compare_strings(const void *key1, int keylen1,
* On failure, BEFS_ERR is returned.
*/
static int
befs_bt_read_super(struct super_block *sb, befs_data_stream * ds,
befs_bt_read_super(struct super_block *sb, const befs_data_stream *ds,
befs_btree_super * sup)
{
struct buffer_head *bh;
Expand Down Expand Up @@ -193,7 +193,7 @@ befs_bt_read_super(struct super_block *sb, befs_data_stream * ds,
*/

static int
befs_bt_read_node(struct super_block *sb, befs_data_stream * ds,
befs_bt_read_node(struct super_block *sb, const befs_data_stream *ds,
struct befs_btree_node *node, befs_off_t node_off)
{
uint off = 0;
Expand Down Expand Up @@ -247,7 +247,7 @@ befs_bt_read_node(struct super_block *sb, befs_data_stream * ds,
* actuall value stored with the key.
*/
int
befs_btree_find(struct super_block *sb, befs_data_stream * ds,
befs_btree_find(struct super_block *sb, const befs_data_stream *ds,
const char *key, befs_off_t * value)
{
struct befs_btree_node *this_node;
Expand Down Expand Up @@ -416,7 +416,7 @@ befs_find_key(struct super_block *sb, struct befs_btree_node *node,
* until the (key_no)th key is found or the tree is out of keys.
*/
int
befs_btree_read(struct super_block *sb, befs_data_stream * ds,
befs_btree_read(struct super_block *sb, const befs_data_stream *ds,
loff_t key_no, size_t bufsize, char *keybuf, size_t * keysize,
befs_off_t * value)
{
Expand Down Expand Up @@ -548,7 +548,7 @@ befs_btree_read(struct super_block *sb, befs_data_stream * ds,
* Also checks for an empty tree. If there are no keys, returns BEFS_BT_EMPTY.
*/
static int
befs_btree_seekleaf(struct super_block *sb, befs_data_stream * ds,
befs_btree_seekleaf(struct super_block *sb, const befs_data_stream *ds,
befs_btree_super *bt_super,
struct befs_btree_node *this_node,
befs_off_t * node_off)
Expand Down
4 changes: 2 additions & 2 deletions fs/befs/btree.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
*/


int befs_btree_find(struct super_block *sb, befs_data_stream * ds,
int befs_btree_find(struct super_block *sb, const befs_data_stream *ds,
const char *key, befs_off_t * value);

int befs_btree_read(struct super_block *sb, befs_data_stream * ds,
int befs_btree_read(struct super_block *sb, const befs_data_stream *ds,
loff_t key_no, size_t bufsize, char *keybuf,
size_t * keysize, befs_off_t * value);

Loading

0 comments on commit 7f427d3

Please sign in to comment.