Skip to content

Commit

Permalink
commoncap: handle idmapped mounts
Browse files Browse the repository at this point in the history
When interacting with user namespace and non-user namespace aware
filesystem capabilities the vfs will perform various security checks to
determine whether or not the filesystem capabilities can be used by the
caller, whether they need to be removed and so on. The main
infrastructure for this resides in the capability codepaths but they are
called through the LSM security infrastructure even though they are not
technically an LSM or optional. This extends the existing security hooks
security_inode_removexattr(), security_inode_killpriv(),
security_inode_getsecurity() to pass down the mount's user namespace and
makes them aware of idmapped mounts.

In order to actually get filesystem capabilities from disk the
capability infrastructure exposes the get_vfs_caps_from_disk() helper.
For user namespace aware filesystem capabilities a root uid is stored
alongside the capabilities.

In order to determine whether the caller can make use of the filesystem
capability or whether it needs to be ignored it is translated according
to the superblock's user namespace. If it can be translated to uid 0
according to that id mapping the caller can use the filesystem
capabilities stored on disk. If we are accessing the inode that holds
the filesystem capabilities through an idmapped mount we map the root
uid according to the mount's user namespace. Afterwards the checks are
identical to non-idmapped mounts: reading filesystem caps from disk
enforces that the root uid associated with the filesystem capability
must have a mapping in the superblock's user namespace and that the
caller is either in the same user namespace or is a descendant of the
superblock's user namespace. For filesystems that are mountable inside
user namespace the caller can just mount the filesystem and won't
usually need to idmap it. If they do want to idmap it they can create an
idmapped mount and mark it with a user namespace they created and which
is thus a descendant of s_user_ns. For filesystems that are not
mountable inside user namespaces the descendant rule is trivially true
because the s_user_ns will be the initial user namespace.

If the initial user namespace is passed nothing changes so non-idmapped
mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/[email protected]
Cc: Christoph Hellwig <[email protected]>
Cc: David Howells <[email protected]>
Cc: Al Viro <[email protected]>
Cc: [email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: James Morris <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
  • Loading branch information
Christian Brauner committed Jan 24, 2021
1 parent c7c7a1a commit 71bc356
Show file tree
Hide file tree
Showing 11 changed files with 146 additions and 74 deletions.
2 changes: 1 addition & 1 deletion fs/attr.c
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ int setattr_prepare(struct user_namespace *mnt_userns, struct dentry *dentry,
if (ia_valid & ATTR_KILL_PRIV) {
int error;

error = security_inode_killpriv(dentry);
error = security_inode_killpriv(mnt_userns, dentry);
if (error)
return error;
}
Expand Down
18 changes: 11 additions & 7 deletions fs/xattr.c
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,8 @@ __vfs_setxattr_locked(struct user_namespace *mnt_userns, struct dentry *dentry,
if (error)
return error;

error = security_inode_setxattr(dentry, name, value, size, flags);
error = security_inode_setxattr(mnt_userns, dentry, name, value, size,
flags);
if (error)
goto out;

Expand Down Expand Up @@ -313,18 +314,20 @@ vfs_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
EXPORT_SYMBOL_GPL(vfs_setxattr);

static ssize_t
xattr_getsecurity(struct inode *inode, const char *name, void *value,
size_t size)
xattr_getsecurity(struct user_namespace *mnt_userns, struct inode *inode,
const char *name, void *value, size_t size)
{
void *buffer = NULL;
ssize_t len;

if (!value || !size) {
len = security_inode_getsecurity(inode, name, &buffer, false);
len = security_inode_getsecurity(mnt_userns, inode, name,
&buffer, false);
goto out_noalloc;
}

len = security_inode_getsecurity(inode, name, &buffer, true);
len = security_inode_getsecurity(mnt_userns, inode, name, &buffer,
true);
if (len < 0)
return len;
if (size < len) {
Expand Down Expand Up @@ -414,7 +417,8 @@ vfs_getxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
if (!strncmp(name, XATTR_SECURITY_PREFIX,
XATTR_SECURITY_PREFIX_LEN)) {
const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
int ret = xattr_getsecurity(inode, suffix, value, size);
int ret = xattr_getsecurity(mnt_userns, inode, suffix, value,
size);
/*
* Only overwrite the return value if a security module
* is actually active.
Expand Down Expand Up @@ -486,7 +490,7 @@ __vfs_removexattr_locked(struct user_namespace *mnt_userns,
if (error)
return error;

error = security_inode_removexattr(dentry, name);
error = security_inode_removexattr(mnt_userns, dentry, name);
if (error)
goto out;

Expand Down
4 changes: 3 additions & 1 deletion include/linux/capability.h
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,9 @@ static inline bool checkpoint_restore_ns_capable(struct user_namespace *ns)
}

/* audit system wants to get cap info from files as well */
extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
int get_vfs_caps_from_disk(struct user_namespace *mnt_userns,
const struct dentry *dentry,
struct cpu_vfs_cap_data *cpu_caps);

int cap_convert_nscap(struct user_namespace *mnt_userns, struct dentry *dentry,
const void **ivalue, size_t size);
Expand Down
15 changes: 9 additions & 6 deletions include/linux/lsm_hook_defs.h
Original file line number Diff line number Diff line change
Expand Up @@ -133,17 +133,20 @@ LSM_HOOK(int, 0, inode_follow_link, struct dentry *dentry, struct inode *inode,
LSM_HOOK(int, 0, inode_permission, struct inode *inode, int mask)
LSM_HOOK(int, 0, inode_setattr, struct dentry *dentry, struct iattr *attr)
LSM_HOOK(int, 0, inode_getattr, const struct path *path)
LSM_HOOK(int, 0, inode_setxattr, struct dentry *dentry, const char *name,
const void *value, size_t size, int flags)
LSM_HOOK(int, 0, inode_setxattr, struct user_namespace *mnt_userns,
struct dentry *dentry, const char *name, const void *value,
size_t size, int flags)
LSM_HOOK(void, LSM_RET_VOID, inode_post_setxattr, struct dentry *dentry,
const char *name, const void *value, size_t size, int flags)
LSM_HOOK(int, 0, inode_getxattr, struct dentry *dentry, const char *name)
LSM_HOOK(int, 0, inode_listxattr, struct dentry *dentry)
LSM_HOOK(int, 0, inode_removexattr, struct dentry *dentry, const char *name)
LSM_HOOK(int, 0, inode_removexattr, struct user_namespace *mnt_userns,
struct dentry *dentry, const char *name)
LSM_HOOK(int, 0, inode_need_killpriv, struct dentry *dentry)
LSM_HOOK(int, 0, inode_killpriv, struct dentry *dentry)
LSM_HOOK(int, -EOPNOTSUPP, inode_getsecurity, struct inode *inode,
const char *name, void **buffer, bool alloc)
LSM_HOOK(int, 0, inode_killpriv, struct user_namespace *mnt_userns,
struct dentry *dentry)
LSM_HOOK(int, -EOPNOTSUPP, inode_getsecurity, struct user_namespace *mnt_userns,
struct inode *inode, const char *name, void **buffer, bool alloc)
LSM_HOOK(int, -EOPNOTSUPP, inode_setsecurity, struct inode *inode,
const char *name, const void *value, size_t size, int flags)
LSM_HOOK(int, 0, inode_listsecurity, struct inode *inode, char *buffer,
Expand Down
1 change: 1 addition & 0 deletions include/linux/lsm_hooks.h
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,7 @@
* @inode_killpriv:
* The setuid bit is being removed. Remove similar security labels.
* Called with the dentry->d_inode->i_mutex held.
* @mnt_userns: user namespace of the mount
* @dentry is the dentry being changed.
* Return 0 on success. If error is returned, then the operation
* causing setuid bit removal is failed.
Expand Down
54 changes: 34 additions & 20 deletions include/linux/security.h
Original file line number Diff line number Diff line change
Expand Up @@ -145,13 +145,16 @@ extern int cap_capset(struct cred *new, const struct cred *old,
const kernel_cap_t *inheritable,
const kernel_cap_t *permitted);
extern int cap_bprm_creds_from_file(struct linux_binprm *bprm, struct file *file);
extern int cap_inode_setxattr(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags);
extern int cap_inode_removexattr(struct dentry *dentry, const char *name);
extern int cap_inode_need_killpriv(struct dentry *dentry);
extern int cap_inode_killpriv(struct dentry *dentry);
extern int cap_inode_getsecurity(struct inode *inode, const char *name,
void **buffer, bool alloc);
int cap_inode_setxattr(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags);
int cap_inode_removexattr(struct user_namespace *mnt_userns,
struct dentry *dentry, const char *name);
int cap_inode_need_killpriv(struct dentry *dentry);
int cap_inode_killpriv(struct user_namespace *mnt_userns,
struct dentry *dentry);
int cap_inode_getsecurity(struct user_namespace *mnt_userns,
struct inode *inode, const char *name, void **buffer,
bool alloc);
extern int cap_mmap_addr(unsigned long addr);
extern int cap_mmap_file(struct file *file, unsigned long reqprot,
unsigned long prot, unsigned long flags);
Expand Down Expand Up @@ -345,16 +348,21 @@ int security_inode_follow_link(struct dentry *dentry, struct inode *inode,
int security_inode_permission(struct inode *inode, int mask);
int security_inode_setattr(struct dentry *dentry, struct iattr *attr);
int security_inode_getattr(const struct path *path);
int security_inode_setxattr(struct dentry *dentry, const char *name,
int security_inode_setxattr(struct user_namespace *mnt_userns,
struct dentry *dentry, const char *name,
const void *value, size_t size, int flags);
void security_inode_post_setxattr(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags);
int security_inode_getxattr(struct dentry *dentry, const char *name);
int security_inode_listxattr(struct dentry *dentry);
int security_inode_removexattr(struct dentry *dentry, const char *name);
int security_inode_removexattr(struct user_namespace *mnt_userns,
struct dentry *dentry, const char *name);
int security_inode_need_killpriv(struct dentry *dentry);
int security_inode_killpriv(struct dentry *dentry);
int security_inode_getsecurity(struct inode *inode, const char *name, void **buffer, bool alloc);
int security_inode_killpriv(struct user_namespace *mnt_userns,
struct dentry *dentry);
int security_inode_getsecurity(struct user_namespace *mnt_userns,
struct inode *inode, const char *name,
void **buffer, bool alloc);
int security_inode_setsecurity(struct inode *inode, const char *name, const void *value, size_t size, int flags);
int security_inode_listsecurity(struct inode *inode, char *buffer, size_t buffer_size);
void security_inode_getsecid(struct inode *inode, u32 *secid);
Expand Down Expand Up @@ -831,8 +839,9 @@ static inline int security_inode_getattr(const struct path *path)
return 0;
}

static inline int security_inode_setxattr(struct dentry *dentry,
const char *name, const void *value, size_t size, int flags)
static inline int security_inode_setxattr(struct user_namespace *mnt_userns,
struct dentry *dentry, const char *name, const void *value,
size_t size, int flags)
{
return cap_inode_setxattr(dentry, name, value, size, flags);
}
Expand All @@ -852,25 +861,30 @@ static inline int security_inode_listxattr(struct dentry *dentry)
return 0;
}

static inline int security_inode_removexattr(struct dentry *dentry,
const char *name)
static inline int security_inode_removexattr(struct user_namespace *mnt_userns,
struct dentry *dentry,
const char *name)
{
return cap_inode_removexattr(dentry, name);
return cap_inode_removexattr(mnt_userns, dentry, name);
}

static inline int security_inode_need_killpriv(struct dentry *dentry)
{
return cap_inode_need_killpriv(dentry);
}

static inline int security_inode_killpriv(struct dentry *dentry)
static inline int security_inode_killpriv(struct user_namespace *mnt_userns,
struct dentry *dentry)
{
return cap_inode_killpriv(dentry);
return cap_inode_killpriv(mnt_userns, dentry);
}

static inline int security_inode_getsecurity(struct inode *inode, const char *name, void **buffer, bool alloc)
static inline int security_inode_getsecurity(struct user_namespace *mnt_userns,
struct inode *inode,
const char *name, void **buffer,
bool alloc)
{
return cap_inode_getsecurity(inode, name, buffer, alloc);
return cap_inode_getsecurity(mnt_userns, inode, name, buffer, alloc);
}

static inline int security_inode_setsecurity(struct inode *inode, const char *name, const void *value, size_t size, int flags)
Expand Down
5 changes: 3 additions & 2 deletions kernel/auditsc.c
Original file line number Diff line number Diff line change
Expand Up @@ -1930,7 +1930,7 @@ static inline int audit_copy_fcaps(struct audit_names *name,
if (!dentry)
return 0;

rc = get_vfs_caps_from_disk(dentry, &caps);
rc = get_vfs_caps_from_disk(&init_user_ns, dentry, &caps);
if (rc)
return rc;

Expand Down Expand Up @@ -2481,7 +2481,8 @@ int __audit_log_bprm_fcaps(struct linux_binprm *bprm,
ax->d.next = context->aux;
context->aux = (void *)ax;

get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
get_vfs_caps_from_disk(&init_user_ns,
bprm->file->f_path.dentry, &vcaps);

ax->fcap.permitted = vcaps.permitted;
ax->fcap.inheritable = vcaps.inheritable;
Expand Down
62 changes: 49 additions & 13 deletions security/commoncap.c
Original file line number Diff line number Diff line change
Expand Up @@ -303,17 +303,25 @@ int cap_inode_need_killpriv(struct dentry *dentry)

/**
* cap_inode_killpriv - Erase the security markings on an inode
* @dentry: The inode/dentry to alter
*
* @mnt_userns: user namespace of the mount the inode was found from
* @dentry: The inode/dentry to alter
*
* Erase the privilege-enhancing security markings on an inode.
*
* If the inode has been found through an idmapped mount the user namespace of
* the vfsmount must be passed through @mnt_userns. This function will then
* take care to map the inode according to @mnt_userns before checking
* permissions. On non-idmapped mounts or if permission checking is to be
* performed on the raw inode simply passs init_user_ns.
*
* Returns 0 if successful, -ve on error.
*/
int cap_inode_killpriv(struct dentry *dentry)
int cap_inode_killpriv(struct user_namespace *mnt_userns, struct dentry *dentry)
{
int error;

error = __vfs_removexattr(&init_user_ns, dentry, XATTR_NAME_CAPS);
error = __vfs_removexattr(mnt_userns, dentry, XATTR_NAME_CAPS);
if (error == -EOPNOTSUPP)
error = 0;
return error;
Expand Down Expand Up @@ -366,7 +374,8 @@ static bool is_v3header(size_t size, const struct vfs_cap_data *cap)
* by the integrity subsystem, which really wants the unconverted values -
* so that's good.
*/
int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer,
int cap_inode_getsecurity(struct user_namespace *mnt_userns,
struct inode *inode, const char *name, void **buffer,
bool alloc)
{
int size, ret;
Expand All @@ -386,7 +395,7 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer,
return -EINVAL;

size = sizeof(struct vfs_ns_cap_data);
ret = (int)vfs_getxattr_alloc(&init_user_ns, dentry, XATTR_NAME_CAPS,
ret = (int)vfs_getxattr_alloc(mnt_userns, dentry, XATTR_NAME_CAPS,
&tmpbuf, size, GFP_NOFS);
dput(dentry);

Expand All @@ -412,6 +421,9 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer,
root = le32_to_cpu(nscap->rootid);
kroot = make_kuid(fs_ns, root);

/* If this is an idmapped mount shift the kuid. */
kroot = kuid_into_mnt(mnt_userns, kroot);

/* If the root kuid maps to a valid uid in current ns, then return
* this as a nscap. */
mappedroot = from_kuid(current_user_ns(), kroot);
Expand Down Expand Up @@ -595,10 +607,24 @@ static inline int bprm_caps_from_vfs_caps(struct cpu_vfs_cap_data *caps,
return *effective ? ret : 0;
}

/*
/**
* get_vfs_caps_from_disk - retrieve vfs caps from disk
*
* @mnt_userns: user namespace of the mount the inode was found from
* @dentry: dentry from which @inode is retrieved
* @cpu_caps: vfs capabilities
*
* Extract the on-exec-apply capability sets for an executable file.
*
* If the inode has been found through an idmapped mount the user namespace of
* the vfsmount must be passed through @mnt_userns. This function will then
* take care to map the inode according to @mnt_userns before checking
* permissions. On non-idmapped mounts or if permission checking is to be
* performed on the raw inode simply passs init_user_ns.
*/
int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps)
int get_vfs_caps_from_disk(struct user_namespace *mnt_userns,
const struct dentry *dentry,
struct cpu_vfs_cap_data *cpu_caps)
{
struct inode *inode = d_backing_inode(dentry);
__u32 magic_etc;
Expand Down Expand Up @@ -654,6 +680,7 @@ int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data
/* Limit the caps to the mounter of the filesystem
* or the more limited uid specified in the xattr.
*/
rootkuid = kuid_into_mnt(mnt_userns, rootkuid);
if (!rootid_owns_currentns(rootkuid))
return -ENODATA;

Expand Down Expand Up @@ -699,7 +726,8 @@ static int get_file_caps(struct linux_binprm *bprm, struct file *file,
if (!current_in_userns(file->f_path.mnt->mnt_sb->s_user_ns))
return 0;

rc = get_vfs_caps_from_disk(file->f_path.dentry, &vcaps);
rc = get_vfs_caps_from_disk(file_mnt_user_ns(file),
file->f_path.dentry, &vcaps);
if (rc < 0) {
if (rc == -EINVAL)
printk(KERN_NOTICE "Invalid argument reading file caps for %s\n",
Expand Down Expand Up @@ -964,16 +992,25 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,

/**
* cap_inode_removexattr - Determine whether an xattr may be removed
* @dentry: The inode/dentry being altered
* @name: The name of the xattr to be changed
*
* @mnt_userns: User namespace of the mount the inode was found from
* @dentry: The inode/dentry being altered
* @name: The name of the xattr to be changed
*
* Determine whether an xattr may be removed from an inode, returning 0 if
* permission is granted, -ve if denied.
*
* If the inode has been found through an idmapped mount the user namespace of
* the vfsmount must be passed through @mnt_userns. This function will then
* take care to map the inode according to @mnt_userns before checking
* permissions. On non-idmapped mounts or if permission checking is to be
* performed on the raw inode simply passs init_user_ns.
*
* This is used to make sure security xattrs don't get removed by those who
* aren't privileged to remove them.
*/
int cap_inode_removexattr(struct dentry *dentry, const char *name)
int cap_inode_removexattr(struct user_namespace *mnt_userns,
struct dentry *dentry, const char *name)
{
struct user_namespace *user_ns = dentry->d_sb->s_user_ns;

Expand All @@ -987,8 +1024,7 @@ int cap_inode_removexattr(struct dentry *dentry, const char *name)
struct inode *inode = d_backing_inode(dentry);
if (!inode)
return -EINVAL;
if (!capable_wrt_inode_uidgid(&init_user_ns, inode,
CAP_SETFCAP))
if (!capable_wrt_inode_uidgid(mnt_userns, inode, CAP_SETFCAP))
return -EPERM;
return 0;
}
Expand Down
Loading

0 comments on commit 71bc356

Please sign in to comment.