Skip to content

Commit

Permalink
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…
Browse files Browse the repository at this point in the history
…/git/ebiederm/user-namespace

Pull namespace updates from Eric Biederman:
 "There is a lot here. A lot of these changes result in subtle user
  visible differences in kernel behavior. I don't expect anything will
  care but I will revert/fix things immediately if any regressions show
  up.

  From Seth Forshee there is a continuation of the work to make the vfs
  ready for unpriviled mounts. We had thought the previous changes
  prevented the creation of files outside of s_user_ns of a filesystem,
  but it turns we missed the O_CREAT path. Ooops.

  Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
  standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
  children that are forked after the prctl are considered and not
  children forked before the prctl. The only known user of this prctl
  systemd forks all children after the prctl. So no userspace
  regressions will occur. Holding earlier forked children to the same
  rules as later forked children creates a semantic that is sane enough
  to allow checkpoing of processes that use this feature.

  There is a long delayed change by Nikolay Borisov to limit inotify
  instances inside a user namespace.

  Michael Kerrisk extends the API for files used to maniuplate
  namespaces with two new trivial ioctls to allow discovery of the
  hierachy and properties of namespaces.

  Konstantin Khlebnikov with the help of Al Viro adds code that when a
  network namespace exits purges it's sysctl entries from the dcache. As
  in some circumstances this could use a lot of memory.

  Vivek Goyal fixed a bug with stacked filesystems where the permissions
  on the wrong inode were being checked.

  I continue previous work on ptracing across exec. Allowing a file to
  be setuid across exec while being ptraced if the tracer has enough
  credentials in the user namespace, and if the process has CAP_SETUID
  in it's own namespace. Proc files for setuid or otherwise undumpable
  executables are now owned by the root in the user namespace of their
  mm. Allowing debugging of setuid applications in containers to work
  better.

  A bug I introduced with permission checking and automount is now
  fixed. The big change is to mark the mounts that the kernel initiates
  as a result of an automount. This allows the permission checks in sget
  to be safely suppressed for this kind of mount. As the permission
  check happened when the original filesystem was mounted.

  Finally a special case in the mount namespace is removed preventing
  unbounded chains in the mount hash table, and making the semantics
  simpler which benefits CRIU.

  The vfs fix along with related work in ima and evm I believe makes us
  ready to finish developing and merge fully unprivileged mounts of the
  fuse filesystem. The cleanups of the mount namespace makes discussing
  how to fix the worst case complexity of umount. The stacked filesystem
  fixes pave the way for adding multiple mappings for the filesystem
  uids so that efficient and safer containers can be implemented"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc/sysctl: Don't grab i_lock under sysctl_lock.
  vfs: Use upper filesystem inode in bprm_fill_uid()
  proc/sysctl: prune stale dentries during unregistering
  mnt: Tuck mounts under others instead of creating shadow/side mounts.
  prctl: propagate has_child_subreaper flag to every descendant
  introduce the walk_process_tree() helper
  nsfs: Add an ioctl() to return owner UID of a userns
  fs: Better permission checking for submounts
  exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
  vfs: open() with O_CREAT should not create inodes with unknown ids
  nsfs: Add an ioctl() to return the namespace type
  proc: Better ownership of files for non-dumpable tasks in user namespaces
  exec: Remove LSM_UNSAFE_PTRACE_CAP
  exec: Test the ptracer's saved cred to see if the tracee can gain caps
  exec: Don't reset euid and egid when the tracee has CAP_SETUID
  inotify: Convert to using per-namespace limits
  • Loading branch information
torvalds committed Feb 24, 2017
2 parents ef96152 + ace0c79 commit f1ef09f
Show file tree
Hide file tree
Showing 40 changed files with 431 additions and 226 deletions.
2 changes: 1 addition & 1 deletion fs/afs/mntpt.c
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ static struct vfsmount *afs_mntpt_do_automount(struct dentry *mntpt)

/* try and do the mount */
_debug("--- attempting mount %s -o %s ---", devname, options);
mnt = vfs_kern_mount(&afs_fs_type, 0, devname, options);
mnt = vfs_submount(mntpt, &afs_fs_type, devname, options);
_debug("--- mount result %p ---", mnt);

free_page((unsigned long) devname);
Expand Down
4 changes: 2 additions & 2 deletions fs/autofs4/waitq.c
Original file line number Diff line number Diff line change
Expand Up @@ -436,8 +436,8 @@ int autofs4_wait(struct autofs_sb_info *sbi,
memcpy(&wq->name, &qstr, sizeof(struct qstr));
wq->dev = autofs4_get_dev(sbi);
wq->ino = autofs4_get_ino(sbi);
wq->uid = current_real_cred()->uid;
wq->gid = current_real_cred()->gid;
wq->uid = current_cred()->uid;
wq->gid = current_cred()->gid;
wq->pid = pid;
wq->tgid = tgid;
wq->status = -EINTR; /* Status return if interrupted */
Expand Down
7 changes: 4 additions & 3 deletions fs/cifs/cifs_dfs_ref.c
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,8 @@ char *cifs_compose_mount_options(const char *sb_mountdata,
* @fullpath: full path in UNC format
* @ref: server's referral
*/
static struct vfsmount *cifs_dfs_do_refmount(struct cifs_sb_info *cifs_sb,
static struct vfsmount *cifs_dfs_do_refmount(struct dentry *mntpt,
struct cifs_sb_info *cifs_sb,
const char *fullpath, const struct dfs_info3_param *ref)
{
struct vfsmount *mnt;
Expand All @@ -259,7 +260,7 @@ static struct vfsmount *cifs_dfs_do_refmount(struct cifs_sb_info *cifs_sb,
if (IS_ERR(mountdata))
return (struct vfsmount *)mountdata;

mnt = vfs_kern_mount(&cifs_fs_type, 0, devname, mountdata);
mnt = vfs_submount(mntpt, &cifs_fs_type, devname, mountdata);
kfree(mountdata);
kfree(devname);
return mnt;
Expand Down Expand Up @@ -334,7 +335,7 @@ static struct vfsmount *cifs_dfs_do_automount(struct dentry *mntpt)
mnt = ERR_PTR(-EINVAL);
break;
}
mnt = cifs_dfs_do_refmount(cifs_sb,
mnt = cifs_dfs_do_refmount(mntpt, cifs_sb,
full_path, referrals + i);
cifs_dbg(FYI, "%s: cifs_dfs_do_refmount:%s , mnt:%p\n",
__func__, referrals[i].node_name, mnt);
Expand Down
8 changes: 4 additions & 4 deletions fs/debugfs/inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -187,9 +187,9 @@ static const struct super_operations debugfs_super_operations = {

static struct vfsmount *debugfs_automount(struct path *path)
{
struct vfsmount *(*f)(void *);
f = (struct vfsmount *(*)(void *))path->dentry->d_fsdata;
return f(d_inode(path->dentry)->i_private);
debugfs_automount_t f;
f = (debugfs_automount_t)path->dentry->d_fsdata;
return f(path->dentry, d_inode(path->dentry)->i_private);
}

static const struct dentry_operations debugfs_dops = {
Expand Down Expand Up @@ -540,7 +540,7 @@ EXPORT_SYMBOL_GPL(debugfs_create_dir);
*/
struct dentry *debugfs_create_automount(const char *name,
struct dentry *parent,
struct vfsmount *(*f)(void *),
debugfs_automount_t f,
void *data)
{
struct dentry *dentry = start_creating(name, parent);
Expand Down
10 changes: 3 additions & 7 deletions fs/exec.c
Original file line number Diff line number Diff line change
Expand Up @@ -1426,12 +1426,8 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
struct task_struct *p = current, *t;
unsigned n_fs;

if (p->ptrace) {
if (ptracer_capable(p, current_user_ns()))
bprm->unsafe |= LSM_UNSAFE_PTRACE_CAP;
else
bprm->unsafe |= LSM_UNSAFE_PTRACE;
}
if (p->ptrace)
bprm->unsafe |= LSM_UNSAFE_PTRACE;

/*
* This isn't strictly necessary, but it makes it harder for LSMs to
Expand Down Expand Up @@ -1479,7 +1475,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
if (task_no_new_privs(current))
return;

inode = file_inode(bprm->file);
inode = bprm->file->f_path.dentry->d_inode;
mode = READ_ONCE(inode->i_mode);
if (!(mode & (S_ISUID|S_ISGID)))
return;
Expand Down
1 change: 0 additions & 1 deletion fs/mount.h
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,6 @@ static inline int is_mounted(struct vfsmount *mnt)
}

extern struct mount *__lookup_mnt(struct vfsmount *, struct dentry *);
extern struct mount *__lookup_mnt_last(struct vfsmount *, struct dentry *);

extern int __legitimize_mnt(struct vfsmount *, unsigned);
extern bool legitimize_mnt(struct vfsmount *, unsigned);
Expand Down
9 changes: 6 additions & 3 deletions fs/namei.c
Original file line number Diff line number Diff line change
Expand Up @@ -1100,7 +1100,6 @@ static int follow_automount(struct path *path, struct nameidata *nd,
bool *need_mntput)
{
struct vfsmount *mnt;
const struct cred *old_cred;
int err;

if (!path->dentry->d_op || !path->dentry->d_op->d_automount)
Expand Down Expand Up @@ -1129,9 +1128,7 @@ static int follow_automount(struct path *path, struct nameidata *nd,
if (nd->total_link_count >= 40)
return -ELOOP;

old_cred = override_creds(&init_cred);
mnt = path->dentry->d_op->d_automount(path);
revert_creds(old_cred);
if (IS_ERR(mnt)) {
/*
* The filesystem is allowed to return -EISDIR here to indicate
Expand Down Expand Up @@ -2941,10 +2938,16 @@ static inline int open_to_namei_flags(int flag)

static int may_o_create(const struct path *dir, struct dentry *dentry, umode_t mode)
{
struct user_namespace *s_user_ns;
int error = security_path_mknod(dir, dentry, mode, 0);
if (error)
return error;

s_user_ns = dir->dentry->d_sb->s_user_ns;
if (!kuid_has_mapping(s_user_ns, current_fsuid()) ||
!kgid_has_mapping(s_user_ns, current_fsgid()))
return -EOVERFLOW;

error = inode_permission(dir->dentry->d_inode, MAY_WRITE | MAY_EXEC);
if (error)
return error;
Expand Down
127 changes: 76 additions & 51 deletions fs/namespace.c
Original file line number Diff line number Diff line change
Expand Up @@ -636,28 +636,6 @@ struct mount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
return NULL;
}

/*
* find the last mount at @dentry on vfsmount @mnt.
* mount_lock must be held.
*/
struct mount *__lookup_mnt_last(struct vfsmount *mnt, struct dentry *dentry)
{
struct mount *p, *res = NULL;
p = __lookup_mnt(mnt, dentry);
if (!p)
goto out;
if (!(p->mnt.mnt_flags & MNT_UMOUNT))
res = p;
hlist_for_each_entry_continue(p, mnt_hash) {
if (&p->mnt_parent->mnt != mnt || p->mnt_mountpoint != dentry)
break;
if (!(p->mnt.mnt_flags & MNT_UMOUNT))
res = p;
}
out:
return res;
}

/*
* lookup_mnt - Return the first child mount mounted at path
*
Expand Down Expand Up @@ -878,6 +856,13 @@ void mnt_set_mountpoint(struct mount *mnt,
hlist_add_head(&child_mnt->mnt_mp_list, &mp->m_list);
}

static void __attach_mnt(struct mount *mnt, struct mount *parent)
{
hlist_add_head_rcu(&mnt->mnt_hash,
m_hash(&parent->mnt, mnt->mnt_mountpoint));
list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
}

/*
* vfsmount lock must be held for write
*/
Expand All @@ -886,28 +871,45 @@ static void attach_mnt(struct mount *mnt,
struct mountpoint *mp)
{
mnt_set_mountpoint(parent, mp, mnt);
hlist_add_head_rcu(&mnt->mnt_hash, m_hash(&parent->mnt, mp->m_dentry));
list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
__attach_mnt(mnt, parent);
}

static void attach_shadowed(struct mount *mnt,
struct mount *parent,
struct mount *shadows)
void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct mount *mnt)
{
if (shadows) {
hlist_add_behind_rcu(&mnt->mnt_hash, &shadows->mnt_hash);
list_add(&mnt->mnt_child, &shadows->mnt_child);
} else {
hlist_add_head_rcu(&mnt->mnt_hash,
m_hash(&parent->mnt, mnt->mnt_mountpoint));
list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
}
struct mountpoint *old_mp = mnt->mnt_mp;
struct dentry *old_mountpoint = mnt->mnt_mountpoint;
struct mount *old_parent = mnt->mnt_parent;

list_del_init(&mnt->mnt_child);
hlist_del_init(&mnt->mnt_mp_list);
hlist_del_init_rcu(&mnt->mnt_hash);

attach_mnt(mnt, parent, mp);

put_mountpoint(old_mp);

/*
* Safely avoid even the suggestion this code might sleep or
* lock the mount hash by taking advantage of the knowledge that
* mnt_change_mountpoint will not release the final reference
* to a mountpoint.
*
* During mounting, the mount passed in as the parent mount will
* continue to use the old mountpoint and during unmounting, the
* old mountpoint will continue to exist until namespace_unlock,
* which happens well after mnt_change_mountpoint.
*/
spin_lock(&old_mountpoint->d_lock);
old_mountpoint->d_lockref.count--;
spin_unlock(&old_mountpoint->d_lock);

mnt_add_count(old_parent, -1);
}

/*
* vfsmount lock must be held for write
*/
static void commit_tree(struct mount *mnt, struct mount *shadows)
static void commit_tree(struct mount *mnt)
{
struct mount *parent = mnt->mnt_parent;
struct mount *m;
Expand All @@ -925,7 +927,7 @@ static void commit_tree(struct mount *mnt, struct mount *shadows)
n->mounts += n->pending_mounts;
n->pending_mounts = 0;

attach_shadowed(mnt, parent, shadows);
__attach_mnt(mnt, parent);
touch_mnt_namespace(n);
}

Expand Down Expand Up @@ -989,6 +991,21 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
}
EXPORT_SYMBOL_GPL(vfs_kern_mount);

struct vfsmount *
vfs_submount(const struct dentry *mountpoint, struct file_system_type *type,
const char *name, void *data)
{
/* Until it is worked out how to pass the user namespace
* through from the parent mount to the submount don't support
* unprivileged mounts with submounts.
*/
if (mountpoint->d_sb->s_user_ns != &init_user_ns)
return ERR_PTR(-EPERM);

return vfs_kern_mount(type, MS_SUBMOUNT, name, data);
}
EXPORT_SYMBOL_GPL(vfs_submount);

static struct mount *clone_mnt(struct mount *old, struct dentry *root,
int flag)
{
Expand Down Expand Up @@ -1764,7 +1781,6 @@ struct mount *copy_tree(struct mount *mnt, struct dentry *dentry,
continue;

for (s = r; s; s = next_mnt(s, r)) {
struct mount *t = NULL;
if (!(flag & CL_COPY_UNBINDABLE) &&
IS_MNT_UNBINDABLE(s)) {
s = skip_mnt_tree(s);
Expand All @@ -1786,14 +1802,7 @@ struct mount *copy_tree(struct mount *mnt, struct dentry *dentry,
goto out;
lock_mount_hash();
list_add_tail(&q->mnt_list, &res->mnt_list);
mnt_set_mountpoint(parent, p->mnt_mp, q);
if (!list_empty(&parent->mnt_mounts)) {
t = list_last_entry(&parent->mnt_mounts,
struct mount, mnt_child);
if (t->mnt_mp != p->mnt_mp)
t = NULL;
}
attach_shadowed(q, parent, t);
attach_mnt(q, parent, p->mnt_mp);
unlock_mount_hash();
}
}
Expand Down Expand Up @@ -1992,10 +2001,18 @@ static int attach_recursive_mnt(struct mount *source_mnt,
{
HLIST_HEAD(tree_list);
struct mnt_namespace *ns = dest_mnt->mnt_ns;
struct mountpoint *smp;
struct mount *child, *p;
struct hlist_node *n;
int err;

/* Preallocate a mountpoint in case the new mounts need
* to be tucked under other mounts.
*/
smp = get_mountpoint(source_mnt->mnt.mnt_root);
if (IS_ERR(smp))
return PTR_ERR(smp);

/* Is there space to add these mounts to the mount namespace? */
if (!parent_path) {
err = count_mounts(ns, source_mnt);
Expand All @@ -2022,16 +2039,19 @@ static int attach_recursive_mnt(struct mount *source_mnt,
touch_mnt_namespace(source_mnt->mnt_ns);
} else {
mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt);
commit_tree(source_mnt, NULL);
commit_tree(source_mnt);
}

hlist_for_each_entry_safe(child, n, &tree_list, mnt_hash) {
struct mount *q;
hlist_del_init(&child->mnt_hash);
q = __lookup_mnt_last(&child->mnt_parent->mnt,
child->mnt_mountpoint);
commit_tree(child, q);
q = __lookup_mnt(&child->mnt_parent->mnt,
child->mnt_mountpoint);
if (q)
mnt_change_mountpoint(child, smp, q);
commit_tree(child);
}
put_mountpoint(smp);
unlock_mount_hash();

return 0;
Expand All @@ -2046,6 +2066,11 @@ static int attach_recursive_mnt(struct mount *source_mnt,
cleanup_group_ids(source_mnt, NULL);
out:
ns->pending_mounts = 0;

read_seqlock_excl(&mount_lock);
put_mountpoint(smp);
read_sequnlock_excl(&mount_lock);

return err;
}

Expand Down Expand Up @@ -2794,7 +2819,7 @@ long do_mount(const char *dev_name, const char __user *dir_name,

flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
MS_STRICTATIME | MS_NOREMOTELOCK);
MS_STRICTATIME | MS_NOREMOTELOCK | MS_SUBMOUNT);

if (flags & MS_REMOUNT)
retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags,
Expand Down
2 changes: 1 addition & 1 deletion fs/nfs/namespace.c
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ static struct vfsmount *nfs_do_clone_mount(struct nfs_server *server,
const char *devname,
struct nfs_clone_mount *mountdata)
{
return vfs_kern_mount(&nfs_xdev_fs_type, 0, devname, mountdata);
return vfs_submount(mountdata->dentry, &nfs_xdev_fs_type, devname, mountdata);
}

/**
Expand Down
2 changes: 1 addition & 1 deletion fs/nfs/nfs4namespace.c
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ static struct vfsmount *try_location(struct nfs_clone_mount *mountdata,
mountdata->hostname,
mountdata->mnt_path);

mnt = vfs_kern_mount(&nfs4_referral_fs_type, 0, page, mountdata);
mnt = vfs_submount(mountdata->dentry, &nfs4_referral_fs_type, page, mountdata);
if (!IS_ERR(mnt))
break;
}
Expand Down
17 changes: 17 additions & 0 deletions fs/notify/inotify/inotify.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,20 @@ extern int inotify_handle_event(struct fsnotify_group *group,
const unsigned char *file_name, u32 cookie);

extern const struct fsnotify_ops inotify_fsnotify_ops;

#ifdef CONFIG_INOTIFY_USER
static inline void dec_inotify_instances(struct ucounts *ucounts)
{
dec_ucount(ucounts, UCOUNT_INOTIFY_INSTANCES);
}

static inline struct ucounts *inc_inotify_watches(struct ucounts *ucounts)
{
return inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_INOTIFY_WATCHES);
}

static inline void dec_inotify_watches(struct ucounts *ucounts)
{
dec_ucount(ucounts, UCOUNT_INOTIFY_WATCHES);
}
#endif
Loading

0 comments on commit f1ef09f

Please sign in to comment.