Skip to content

Commit

Permalink
Merge tag 'ovl-update-5.7' of git://git.kernel.org/pub/scm/linux/kern…
Browse files Browse the repository at this point in the history
…el/git/mszeredi/vfs

Pull overlayfs update from Miklos Szeredi:

 - Fix failure to copy-up files from certain NFSv4 mounts

 - Sort out inconsistencies between st_ino and i_ino (used in /proc/locks)

 - Allow consistent (POSIX-y) inode numbering in more cases

 - Allow virtiofs to be used as upper layer

 - Miscellaneous cleanups and fixes

* tag 'ovl-update-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: document xino expected behavior
  ovl: enable xino automatically in more cases
  ovl: avoid possible inode number collisions with xino=on
  ovl: use a private non-persistent ino pool
  ovl: fix WARN_ON nlink drop to zero
  ovl: fix a typo in comment
  ovl: replace zero-length array with flexible-array member
  ovl: ovl_obtain_alias(): don't call d_instantiate_anon() for old
  ovl: strict upper fs requirements for remote upper fs
  ovl: check if upper fs supports RENAME_WHITEOUT
  ovl: allow remote upper
  ovl: decide if revalidate needed on a per-dentry basis
  ovl: separate detection of remote upper layer from stacked overlay
  ovl: restructure dentry revalidation
  ovl: ignore failure to copy up unknown xattrs
  ovl: document permission model
  ovl: simplify i_ino initialization
  ovl: factor out helper ovl_get_root()
  ovl: fix out of date comment and unreachable code
  ovl: fix value of i_ino for lower hardlink corner case
  • Loading branch information
torvalds committed Apr 9, 2020
2 parents 9744b92 + 2eda9ea commit c6b80eb
Show file tree
Hide file tree
Showing 11 changed files with 460 additions and 163 deletions.
82 changes: 80 additions & 2 deletions Documentation/filesystems/overlayfs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,46 @@ On 64bit systems, even if all overlay layers are not on the same
underlying filesystem, the same compliant behavior could be achieved
with the "xino" feature. The "xino" feature composes a unique object
identifier from the real object st_ino and an underlying fsid index.

If all underlying filesystems support NFS file handles and export file
handles with 32bit inode number encoding (e.g. ext4), overlay filesystem
will use the high inode number bits for fsid. Even when the underlying
filesystem uses 64bit inode numbers, users can still enable the "xino"
feature with the "-o xino=on" overlay mount option. That is useful for the
case of underlying filesystems like xfs and tmpfs, which use 64bit inode
numbers, but are very unlikely to use the high inode number bit.
numbers, but are very unlikely to use the high inode number bits. In case
the underlying inode number does overflow into the high xino bits, overlay
filesystem will fall back to the non xino behavior for that inode.

The following table summarizes what can be expected in different overlay
configurations.

Inode properties
````````````````

+--------------+------------+------------+-----------------+----------------+
|Configuration | Persistent | Uniform | st_ino == d_ino | d_ino == i_ino |
| | st_ino | st_dev | | [*] |
+==============+=====+======+=====+======+========+========+========+=======+
| | dir | !dir | dir | !dir | dir + !dir | dir | !dir |
+--------------+-----+------+-----+------+--------+--------+--------+-------+
| All layers | Y | Y | Y | Y | Y | Y | Y | Y |
| on same fs | | | | | | | | |
+--------------+-----+------+-----+------+--------+--------+--------+-------+
| Layers not | N | Y | Y | N | N | Y | N | Y |
| on same fs, | | | | | | | | |
| xino=off | | | | | | | | |
+--------------+-----+------+-----+------+--------+--------+--------+-------+
| xino=on/auto | Y | Y | Y | Y | Y | Y | Y | Y |
| | | | | | | | | |
+--------------+-----+------+-----+------+--------+--------+--------+-------+
| xino=on/auto,| N | Y | Y | N | N | Y | N | Y |
| ino overflow | | | | | | | | |
+--------------+-----+------+-----+------+--------+--------+--------+-------+

[*] nfsd v3 readdirplus verifies d_ino == i_ino. i_ino is exposed via several
/proc files, such as /proc/locks and /proc/self/fdinfo/<fd> of an inotify
file descriptor.


Upper and Lower
Expand Down Expand Up @@ -248,6 +281,50 @@ overlay filesystem (though an operation on the name of the file such as
rename or unlink will of course be noticed and handled).


Permission model
----------------

Permission checking in the overlay filesystem follows these principles:

1) permission check SHOULD return the same result before and after copy up

2) task creating the overlay mount MUST NOT gain additional privileges

3) non-mounting task MAY gain additional privileges through the overlay,
compared to direct access on underlying lower or upper filesystems

This is achieved by performing two permission checks on each access

a) check if current task is allowed access based on local DAC (owner,
group, mode and posix acl), as well as MAC checks

b) check if mounting task would be allowed real operation on lower or
upper layer based on underlying filesystem permissions, again including
MAC checks

Check (a) ensures consistency (1) since owner, group, mode and posix acls
are copied up. On the other hand it can result in server enforced
permissions (used by NFS, for example) being ignored (3).

Check (b) ensures that no task gains permissions to underlying layers that
the mounting task does not have (2). This also means that it is possible
to create setups where the consistency rule (1) does not hold; normally,
however, the mounting task will have sufficient privileges to perform all
operations.

Another way to demonstrate this model is drawing parallels between

mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,... /merged

and

cp -a /lower /upper
mount --bind /upper /merged

The resulting access permissions should be the same. The difference is in
the time of copy (on-demand vs. up-front).


Multiple lower layers
---------------------

Expand Down Expand Up @@ -383,7 +460,8 @@ guarantee that the values of st_ino and st_dev returned by stat(2) and the
value of d_ino returned by readdir(3) will act like on a normal filesystem.
E.g. the value of st_dev may be different for two objects in the same
overlay filesystem and the value of st_ino for directory objects may not be
persistent and could change even while the overlay filesystem is mounted.
persistent and could change even while the overlay filesystem is mounted, as
summarized in the `Inode properties`_ table above.


Changes to underlying filesystems
Expand Down
16 changes: 14 additions & 2 deletions fs/overlayfs/copy_up.c
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,13 @@ static int ovl_ccup_get(char *buf, const struct kernel_param *param)
module_param_call(check_copy_up, ovl_ccup_set, ovl_ccup_get, NULL, 0644);
MODULE_PARM_DESC(check_copy_up, "Obsolete; does nothing");

static bool ovl_must_copy_xattr(const char *name)
{
return !strcmp(name, XATTR_POSIX_ACL_ACCESS) ||
!strcmp(name, XATTR_POSIX_ACL_DEFAULT) ||
!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN);
}

int ovl_copy_xattr(struct dentry *old, struct dentry *new)
{
ssize_t list_size, size, value_size = 0;
Expand Down Expand Up @@ -107,8 +114,13 @@ int ovl_copy_xattr(struct dentry *old, struct dentry *new)
continue; /* Discard */
}
error = vfs_setxattr(new, name, value, size, 0);
if (error)
break;
if (error) {
if (error != -EOPNOTSUPP || ovl_must_copy_xattr(name))
break;

/* Ignore failure to copy unknown xattrs */
error = 0;
}
}
kfree(value);
out:
Expand Down
31 changes: 28 additions & 3 deletions fs/overlayfs/dir.c
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ int ovl_cleanup(struct inode *wdir, struct dentry *wdentry)
return err;
}

static struct dentry *ovl_lookup_temp(struct dentry *workdir)
struct dentry *ovl_lookup_temp(struct dentry *workdir)
{
struct dentry *temp;
char name[20];
Expand Down Expand Up @@ -243,6 +243,9 @@ static int ovl_instantiate(struct dentry *dentry, struct inode *inode,

ovl_dir_modified(dentry->d_parent, false);
ovl_dentry_set_upper_alias(dentry);
ovl_dentry_update_reval(dentry, newdentry,
DCACHE_OP_REVALIDATE | DCACHE_OP_WEAK_REVALIDATE);

if (!hardlink) {
/*
* ovl_obtain_alias() can be called after ovl_create_real()
Expand Down Expand Up @@ -819,6 +822,28 @@ static bool ovl_pure_upper(struct dentry *dentry)
!ovl_test_flag(OVL_WHITEOUTS, d_inode(dentry));
}

static void ovl_drop_nlink(struct dentry *dentry)
{
struct inode *inode = d_inode(dentry);
struct dentry *alias;

/* Try to find another, hashed alias */
spin_lock(&inode->i_lock);
hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
if (alias != dentry && !d_unhashed(alias))
break;
}
spin_unlock(&inode->i_lock);

/*
* Changes to underlying layers may cause i_nlink to lose sync with
* reality. In this case prevent the link count from going to zero
* prematurely.
*/
if (inode->i_nlink > !!alias)
drop_nlink(inode);
}

static int ovl_do_remove(struct dentry *dentry, bool is_dir)
{
int err;
Expand Down Expand Up @@ -856,7 +881,7 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
if (is_dir)
clear_nlink(dentry->d_inode);
else
drop_nlink(dentry->d_inode);
ovl_drop_nlink(dentry);
}
ovl_nlink_end(dentry);

Expand Down Expand Up @@ -1201,7 +1226,7 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
if (new_is_dir)
clear_nlink(d_inode(new));
else
drop_nlink(d_inode(new));
ovl_drop_nlink(new);
}

ovl_dir_modified(old->d_parent, ovl_type_origin(old) ||
Expand Down
40 changes: 23 additions & 17 deletions fs/overlayfs/export.c
Original file line number Diff line number Diff line change
Expand Up @@ -308,29 +308,35 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
ovl_set_flag(OVL_UPPERDATA, inode);

dentry = d_find_any_alias(inode);
if (!dentry) {
dentry = d_alloc_anon(inode->i_sb);
if (!dentry)
goto nomem;
oe = ovl_alloc_entry(lower ? 1 : 0);
if (!oe)
goto nomem;

if (lower) {
oe->lowerstack->dentry = dget(lower);
oe->lowerstack->layer = lowerpath->layer;
}
dentry->d_fsdata = oe;
if (upper_alias)
ovl_dentry_set_upper_alias(dentry);
if (dentry)
goto out_iput;

dentry = d_alloc_anon(inode->i_sb);
if (unlikely(!dentry))
goto nomem;
oe = ovl_alloc_entry(lower ? 1 : 0);
if (!oe)
goto nomem;

if (lower) {
oe->lowerstack->dentry = dget(lower);
oe->lowerstack->layer = lowerpath->layer;
}
dentry->d_fsdata = oe;
if (upper_alias)
ovl_dentry_set_upper_alias(dentry);

ovl_dentry_update_reval(dentry, upper,
DCACHE_OP_REVALIDATE | DCACHE_OP_WEAK_REVALIDATE);

return d_instantiate_anon(dentry, inode);

nomem:
iput(inode);
dput(dentry);
return ERR_PTR(-ENOMEM);
dentry = ERR_PTR(-ENOMEM);
out_iput:
iput(inode);
return dentry;
}

/* Get the upper or lower dentry in stach whose on layer @idx */
Expand Down
Loading

0 comments on commit c6b80eb

Please sign in to comment.