Skip to content

Commit

Permalink
Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/…
Browse files Browse the repository at this point in the history
…kernel/git/mszeredi/vfs

Pull overlayfs updates from Miklos Szeredi:
 "This work from Amir adds NFS export capability to overlayfs. NFS
  exporting an overlay filesystem is a challange because we want to keep
  track of any copy-up of a file or directory between encoding the file
  handle and decoding it.

  This is achieved by indexing copied up objects by lower layer file
  handle. The index is already used for hard links, this patchset
  extends the use to NFS file handle decoding"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (51 commits)
  ovl: check ERR_PTR() return value from ovl_encode_fh()
  ovl: fix regression in fsnotify of overlay merge dir
  ovl: wire up NFS export operations
  ovl: lookup indexed ancestor of lower dir
  ovl: lookup connected ancestor of dir in inode cache
  ovl: hash non-indexed dir by upper inode for NFS export
  ovl: decode pure lower dir file handles
  ovl: decode indexed dir file handles
  ovl: decode lower file handles of unlinked but open files
  ovl: decode indexed non-dir file handles
  ovl: decode lower non-dir file handles
  ovl: encode lower file handles
  ovl: copy up before encoding non-connectable dir file handle
  ovl: encode non-indexed upper file handles
  ovl: decode connected upper dir file handles
  ovl: decode pure upper file handles
  ovl: encode pure upper file handles
  ovl: document NFS export
  vfs: factor out helpers d_instantiate_anon() and d_alloc_anon()
  ovl: store 'has_upper' and 'opaque' as bit flags
  ...
  • Loading branch information
torvalds committed Feb 5, 2018
2 parents 2deb41b + 9b6faee commit 139351f
Show file tree
Hide file tree
Showing 15 changed files with 1,905 additions and 409 deletions.
106 changes: 103 additions & 3 deletions Documentation/filesystems/overlayfs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,20 @@ Mount options:
Redirects are not created and not followed (equivalent to "redirect_dir=off"
if "redirect_always_follow" feature is not enabled).

When the NFS export feature is enabled, every copied up directory is
indexed by the file handle of the lower inode and a file handle of the
upper directory is stored in a "trusted.overlay.upper" extended attribute
on the index entry. On lookup of a merged directory, if the upper
directory does not match the file handle stores in the index, that is an
indication that multiple upper directories may be redirected to the same
lower directory. In that case, lookup returns an error and warns about
a possible inconsistency.

Because lower layer redirects cannot be verified with the index, enabling
NFS export support on an overlay filesystem with no upper layer requires
turning off redirect follow (e.g. "redirect_dir=nofollow").


Non-directories
---------------

Expand Down Expand Up @@ -281,9 +295,9 @@ filesystem, so both st_dev and st_ino of the file may change.

Any open files referring to this inode will access the old data.

If a file with multiple hard links is copied up, then this will
"break" the link. Changes will not be propagated to other names
referring to the same inode.
Unless "inode index" feature is enabled, if a file with multiple hard
links is copied up, then this will "break" the link. Changes will not be
propagated to other names referring to the same inode.

Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged
directory will fail with EXDEV.
Expand All @@ -299,6 +313,92 @@ filesystem are not allowed. If the underlying filesystem is changed,
the behavior of the overlay is undefined, though it will not result in
a crash or deadlock.

When the overlay NFS export feature is enabled, overlay filesystems
behavior on offline changes of the underlying lower layer is different
than the behavior when NFS export is disabled.

On every copy_up, an NFS file handle of the lower inode, along with the
UUID of the lower filesystem, are encoded and stored in an extended
attribute "trusted.overlay.origin" on the upper inode.

When the NFS export feature is enabled, a lookup of a merged directory,
that found a lower directory at the lookup path or at the path pointed
to by the "trusted.overlay.redirect" extended attribute, will verify
that the found lower directory file handle and lower filesystem UUID
match the origin file handle that was stored at copy_up time. If a
found lower directory does not match the stored origin, that directory
will not be merged with the upper directory.



NFS export
----------

When the underlying filesystems supports NFS export and the "nfs_export"
feature is enabled, an overlay filesystem may be exported to NFS.

With the "nfs_export" feature, on copy_up of any lower object, an index
entry is created under the index directory. The index entry name is the
hexadecimal representation of the copy up origin file handle. For a
non-directory object, the index entry is a hard link to the upper inode.
For a directory object, the index entry has an extended attribute
"trusted.overlay.upper" with an encoded file handle of the upper
directory inode.

When encoding a file handle from an overlay filesystem object, the
following rules apply:

1. For a non-upper object, encode a lower file handle from lower inode
2. For an indexed object, encode a lower file handle from copy_up origin
3. For a pure-upper object and for an existing non-indexed upper object,
encode an upper file handle from upper inode

The encoded overlay file handle includes:
- Header including path type information (e.g. lower/upper)
- UUID of the underlying filesystem
- Underlying filesystem encoding of underlying inode

This encoding format is identical to the encoding format file handles that
are stored in extended attribute "trusted.overlay.origin".

When decoding an overlay file handle, the following steps are followed:

1. Find underlying layer by UUID and path type information.
2. Decode the underlying filesystem file handle to underlying dentry.
3. For a lower file handle, lookup the handle in index directory by name.
4. If a whiteout is found in index, return ESTALE. This represents an
overlay object that was deleted after its file handle was encoded.
5. For a non-directory, instantiate a disconnected overlay dentry from the
decoded underlying dentry, the path type and index inode, if found.
6. For a directory, use the connected underlying decoded dentry, path type
and index, to lookup a connected overlay dentry.

Decoding a non-directory file handle may return a disconnected dentry.
copy_up of that disconnected dentry will create an upper index entry with
no upper alias.

When overlay filesystem has multiple lower layers, a middle layer
directory may have a "redirect" to lower directory. Because middle layer
"redirects" are not indexed, a lower file handle that was encoded from the
"redirect" origin directory, cannot be used to find the middle or upper
layer directory. Similarly, a lower file handle that was encoded from a
descendant of the "redirect" origin directory, cannot be used to
reconstruct a connected overlay path. To mitigate the cases of
directories that cannot be decoded from a lower file handle, these
directories are copied up on encode and encoded as an upper file handle.
On an overlay filesystem with no upper layer this mitigation cannot be
used NFS export in this setup requires turning off redirect follow (e.g.
"redirect_dir=nofollow").

The overlay filesystem does not support non-directory connectable file
handles, so exporting with the 'subtree_check' exportfs configuration will
cause failures to lookup files over NFS.

When the NFS export feature is enabled, all directory index entries are
verified on mount time to check that upper file handles are not stale.
This verification may cause significant overhead in some cases.


Testsuite
---------

Expand Down
88 changes: 57 additions & 31 deletions fs/dcache.c
Original file line number Diff line number Diff line change
Expand Up @@ -1698,9 +1698,15 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
}
EXPORT_SYMBOL(d_alloc);

struct dentry *d_alloc_anon(struct super_block *sb)
{
return __d_alloc(sb, NULL);
}
EXPORT_SYMBOL(d_alloc_anon);

struct dentry *d_alloc_cursor(struct dentry * parent)
{
struct dentry *dentry = __d_alloc(parent->d_sb, NULL);
struct dentry *dentry = d_alloc_anon(parent->d_sb);
if (dentry) {
dentry->d_flags |= DCACHE_RCUACCESS | DCACHE_DENTRY_CURSOR;
dentry->d_parent = dget(parent);
Expand Down Expand Up @@ -1886,7 +1892,7 @@ struct dentry *d_make_root(struct inode *root_inode)
struct dentry *res = NULL;

if (root_inode) {
res = __d_alloc(root_inode->i_sb, NULL);
res = d_alloc_anon(root_inode->i_sb);
if (res)
d_instantiate(res, root_inode);
else
Expand Down Expand Up @@ -1925,33 +1931,19 @@ struct dentry *d_find_any_alias(struct inode *inode)
}
EXPORT_SYMBOL(d_find_any_alias);

static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected)
static struct dentry *__d_instantiate_anon(struct dentry *dentry,
struct inode *inode,
bool disconnected)
{
struct dentry *tmp;
struct dentry *res;
unsigned add_flags;

if (!inode)
return ERR_PTR(-ESTALE);
if (IS_ERR(inode))
return ERR_CAST(inode);

res = d_find_any_alias(inode);
if (res)
goto out_iput;

tmp = __d_alloc(inode->i_sb, NULL);
if (!tmp) {
res = ERR_PTR(-ENOMEM);
goto out_iput;
}

security_d_instantiate(tmp, inode);
security_d_instantiate(dentry, inode);
spin_lock(&inode->i_lock);
res = __d_find_any_alias(inode);
if (res) {
spin_unlock(&inode->i_lock);
dput(tmp);
dput(dentry);
goto out_iput;
}

Expand All @@ -1961,24 +1953,57 @@ static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected)
if (disconnected)
add_flags |= DCACHE_DISCONNECTED;

spin_lock(&tmp->d_lock);
__d_set_inode_and_type(tmp, inode, add_flags);
hlist_add_head(&tmp->d_u.d_alias, &inode->i_dentry);
spin_lock(&dentry->d_lock);
__d_set_inode_and_type(dentry, inode, add_flags);
hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
if (!disconnected) {
hlist_bl_lock(&tmp->d_sb->s_roots);
hlist_bl_add_head(&tmp->d_hash, &tmp->d_sb->s_roots);
hlist_bl_unlock(&tmp->d_sb->s_roots);
hlist_bl_lock(&dentry->d_sb->s_roots);
hlist_bl_add_head(&dentry->d_hash, &dentry->d_sb->s_roots);
hlist_bl_unlock(&dentry->d_sb->s_roots);
}
spin_unlock(&tmp->d_lock);
spin_unlock(&dentry->d_lock);
spin_unlock(&inode->i_lock);

return tmp;
return dentry;

out_iput:
iput(inode);
return res;
}

struct dentry *d_instantiate_anon(struct dentry *dentry, struct inode *inode)
{
return __d_instantiate_anon(dentry, inode, true);
}
EXPORT_SYMBOL(d_instantiate_anon);

static struct dentry *__d_obtain_alias(struct inode *inode, bool disconnected)
{
struct dentry *tmp;
struct dentry *res;

if (!inode)
return ERR_PTR(-ESTALE);
if (IS_ERR(inode))
return ERR_CAST(inode);

res = d_find_any_alias(inode);
if (res)
goto out_iput;

tmp = d_alloc_anon(inode->i_sb);
if (!tmp) {
res = ERR_PTR(-ENOMEM);
goto out_iput;
}

return __d_instantiate_anon(tmp, inode, disconnected);

out_iput:
iput(inode);
return res;
}

/**
* d_obtain_alias - find or allocate a DISCONNECTED dentry for a given inode
* @inode: inode to allocate the dentry for
Expand All @@ -1999,7 +2024,7 @@ static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected)
*/
struct dentry *d_obtain_alias(struct inode *inode)
{
return __d_obtain_alias(inode, 1);
return __d_obtain_alias(inode, true);
}
EXPORT_SYMBOL(d_obtain_alias);

Expand All @@ -2020,7 +2045,7 @@ EXPORT_SYMBOL(d_obtain_alias);
*/
struct dentry *d_obtain_root(struct inode *inode)
{
return __d_obtain_alias(inode, 0);
return __d_obtain_alias(inode, false);
}
EXPORT_SYMBOL(d_obtain_root);

Expand Down Expand Up @@ -3527,6 +3552,7 @@ bool is_subdir(struct dentry *new_dentry, struct dentry *old_dentry)

return result;
}
EXPORT_SYMBOL(is_subdir);

static enum d_walk_ret d_genocide_kill(void *data, struct dentry *dentry)
{
Expand Down
31 changes: 25 additions & 6 deletions fs/overlayfs/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,28 @@ config OVERLAY_FS_INDEX
The inodes index feature prevents breaking of lower hardlinks on copy
up.

Note, that the inodes index feature is read-only backward compatible.
That is, mounting an overlay which has an index dir on a kernel that
doesn't support this feature read-only, will not have any negative
outcomes. However, mounting the same overlay with an old kernel
read-write and then mounting it again with a new kernel, will have
unexpected results.
Note, that the inodes index feature is not backward compatible.
That is, mounting an overlay which has an inodes index on a kernel
that doesn't support this feature will have unexpected results.

config OVERLAY_FS_NFS_EXPORT
bool "Overlayfs: turn on NFS export feature by default"
depends on OVERLAY_FS
depends on OVERLAY_FS_INDEX
help
If this config option is enabled then overlay filesystems will use
the inodes index dir to decode overlay NFS file handles by default.
In this case, it is still possible to turn off NFS export support
globally with the "nfs_export=off" module option or on a filesystem
instance basis with the "nfs_export=off" mount option.

The NFS export feature creates an index on copy up of every file and
directory. This full index is used to detect overlay filesystems
inconsistencies on lookup, like redirect from multiple upper dirs to
the same lower dir. The full index may incur some overhead on mount
time, especially when verifying that directory file handles are not
stale.

Note, that the NFS export feature is not backward compatible.
That is, mounting an overlay which has a full index on a kernel
that doesn't support this feature will have unexpected results.
3 changes: 2 additions & 1 deletion fs/overlayfs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@

obj-$(CONFIG_OVERLAY_FS) += overlay.o

overlay-objs := super.o namei.o util.o inode.o dir.o readdir.o copy_up.o
overlay-objs := super.o namei.o util.o inode.o dir.o readdir.o copy_up.o \
export.o
Loading

0 comments on commit 139351f

Please sign in to comment.