Skip to content

Commit

Permalink
Merge branch 'for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/…
Browse files Browse the repository at this point in the history
…git/tj/cgroup

Pull cgroup changes from Tejun Heo:

 - Waiman made the debug controller work and a lot more useful on
   cgroup2

 - There were a couple issues with cgroup subtree delegation. The
   documentation on delegating to a non-root user was missing some part
   and cgroup namespace support wasn't factoring in delegation at all.
   The documentation is updated and the now there is a mount option to
   make cgroup namespace fit for delegation

* 'for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: implement "nsdelegate" mount option
  cgroup: restructure cgroup_procs_write_permission()
  cgroup: "cgroup.subtree_control" should be writeable by delegatee
  cgroup: fix lockdep warning in debug controller
  cgroup: refactor cgroup_masks_read() in the debug controller
  cgroup: make debug an implicit controller on cgroup2
  cgroup: Make debug cgroup support v2 and thread mode
  cgroup: Make Kconfig prompt of debug cgroup more accurate
  cgroup: Move debug cgroup to its own file
  cgroup: Keep accurate count of tasks in each css_set
  • Loading branch information
torvalds committed Jul 6, 2017
2 parents 109a5db + 5136f63 commit 9ced560
Show file tree
Hide file tree
Showing 8 changed files with 548 additions and 201 deletions.
60 changes: 43 additions & 17 deletions Documentation/cgroup-v2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,16 @@ during boot, before manual intervention is possible. To make testing
and experimenting easier, the kernel parameter cgroup_no_v1= allows
disabling controllers in v1 and make them always available in v2.

cgroup v2 currently supports the following mount options.

nsdelegate

Consider cgroup namespaces as delegation boundaries. This
option is system wide and can only be set on mount or modified
through remount from the init namespace. The mount option is
ignored on non-init namespace mounts. Please refer to the
Delegation section for details.


2-2. Organizing Processes

Expand Down Expand Up @@ -308,18 +318,27 @@ file.

2-5-1. Model of Delegation

A cgroup can be delegated to a less privileged user by granting write
access of the directory and its "cgroup.procs" file to the user. Note
that resource control interface files in a given directory control the
distribution of the parent's resources and thus must not be delegated
along with the directory.

Once delegated, the user can build sub-hierarchy under the directory,
organize processes as it sees fit and further distribute the resources
it received from the parent. The limits and other settings of all
resource controllers are hierarchical and regardless of what happens
in the delegated sub-hierarchy, nothing can escape the resource
restrictions imposed by the parent.
A cgroup can be delegated in two ways. First, to a less privileged
user by granting write access of the directory and its "cgroup.procs"
and "cgroup.subtree_control" files to the user. Second, if the
"nsdelegate" mount option is set, automatically to a cgroup namespace
on namespace creation.

Because the resource control interface files in a given directory
control the distribution of the parent's resources, the delegatee
shouldn't be allowed to write to them. For the first method, this is
achieved by not granting access to these files. For the second, the
kernel rejects writes to all files other than "cgroup.procs" and
"cgroup.subtree_control" on a namespace root from inside the
namespace.

The end results are equivalent for both delegation types. Once
delegated, the user can build sub-hierarchy under the directory,
organize processes inside it as it sees fit and further distribute the
resources it received from the parent. The limits and other settings
of all resource controllers are hierarchical and regardless of what
happens in the delegated sub-hierarchy, nothing can escape the
resource restrictions imposed by the parent.

Currently, cgroup doesn't impose any restrictions on the number of
cgroups in or nesting depth of a delegated sub-hierarchy; however,
Expand All @@ -329,10 +348,12 @@ this may be limited explicitly in the future.
2-5-2. Delegation Containment

A delegated sub-hierarchy is contained in the sense that processes
can't be moved into or out of the sub-hierarchy by the delegatee. For
a process with a non-root euid to migrate a target process into a
cgroup by writing its PID to the "cgroup.procs" file, the following
conditions must be met.
can't be moved into or out of the sub-hierarchy by the delegatee.

For delegations to a less privileged user, this is achieved by
requiring the following conditions for a process with a non-root euid
to migrate a target process into a cgroup by writing its PID to the
"cgroup.procs" file.

- The writer must have write access to the "cgroup.procs" file.

Expand All @@ -359,6 +380,11 @@ destination cgroup C00 is above the points of delegation and U0 would
not have write access to its "cgroup.procs" files and thus the write
will be denied with -EACCES.

For delegations to namespaces, containment is achieved by requiring
that both the source and destination cgroups are reachable from the
namespace of the process which is attempting the migration. If either
is not reachable, the migration is rejected with -ENOENT.


2-6. Guidelines

Expand Down Expand Up @@ -1413,7 +1439,7 @@ D. Deprecated v1 Core Features

- Multiple hierarchies including named ones are not supported.

- All mount options and remounting are not supported.
- All v1 mount options are not supported.

- The "tasks" file is removed and "cgroup.procs" is not sorted.

Expand Down
12 changes: 12 additions & 0 deletions include/linux/cgroup-defs.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,21 @@ enum {
enum {
CGRP_ROOT_NOPREFIX = (1 << 1), /* mounted subsystems have no named prefix */
CGRP_ROOT_XATTR = (1 << 2), /* supports extended attributes */

/*
* Consider namespaces as delegation boundaries. If this flag is
* set, controller specific interface files in a namespace root
* aren't writeable from inside the namespace.
*/
CGRP_ROOT_NS_DELEGATE = (1 << 3),
};

/* cftype->flags */
enum {
CFTYPE_ONLY_ON_ROOT = (1 << 0), /* only create on root cgrp */
CFTYPE_NOT_ON_ROOT = (1 << 1), /* don't create on root cgrp */
CFTYPE_NS_DELEGATABLE = (1 << 2), /* writeable beyond delegation boundaries */

CFTYPE_NO_PREFIX = (1 << 3), /* (DON'T USE FOR NEW FILES) no subsys prefix */
CFTYPE_WORLD_WRITABLE = (1 << 4), /* (DON'T USE FOR NEW FILES) S_IWUGO */

Expand Down Expand Up @@ -166,6 +175,9 @@ struct css_set {
/* the default cgroup associated with this css_set */
struct cgroup *dfl_cgrp;

/* internal task count, protected by css_set_lock */
int nr_tasks;

/*
* Lists running through all tasks using this cgroup group.
* mg_tasks lists tasks which belong to this cset but are in the
Expand Down
7 changes: 5 additions & 2 deletions init/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -859,11 +859,14 @@ config CGROUP_BPF
inet sockets.

config CGROUP_DEBUG
bool "Example controller"
bool "Debug controller"
default n
depends on DEBUG_KERNEL
help
This option enables a simple controller that exports
debugging information about the cgroups framework.
debugging information about the cgroups framework. This
controller is for control cgroup debugging only. Its
interfaces are not stable.

Say N.

Expand Down
1 change: 1 addition & 0 deletions kernel/cgroup/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ obj-$(CONFIG_CGROUP_FREEZER) += freezer.o
obj-$(CONFIG_CGROUP_PIDS) += pids.o
obj-$(CONFIG_CGROUP_RDMA) += rdma.o
obj-$(CONFIG_CPUSETS) += cpuset.o
obj-$(CONFIG_CGROUP_DEBUG) += debug.o
2 changes: 2 additions & 0 deletions kernel/cgroup/cgroup-internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,8 @@ int cgroup_rmdir(struct kernfs_node *kn);
int cgroup_show_path(struct seq_file *sf, struct kernfs_node *kf_node,
struct kernfs_root *kf_root);

int cgroup_task_count(const struct cgroup *cgrp);

/*
* namespace.c
*/
Expand Down
155 changes: 2 additions & 153 deletions kernel/cgroup/cgroup-v1.c
Original file line number Diff line number Diff line change
Expand Up @@ -334,19 +334,15 @@ static struct cgroup_pidlist *cgroup_pidlist_find_create(struct cgroup *cgrp,
/**
* cgroup_task_count - count the number of tasks in a cgroup.
* @cgrp: the cgroup in question
*
* Return the number of tasks in the cgroup. The returned number can be
* higher than the actual number of tasks due to css_set references from
* namespace roots and temporary usages.
*/
static int cgroup_task_count(const struct cgroup *cgrp)
int cgroup_task_count(const struct cgroup *cgrp)
{
int count = 0;
struct cgrp_cset_link *link;

spin_lock_irq(&css_set_lock);
list_for_each_entry(link, &cgrp->cset_links, cset_link)
count += refcount_read(&link->cset->refcount);
count += link->cset->nr_tasks;
spin_unlock_irq(&css_set_lock);
return count;
}
Expand Down Expand Up @@ -1263,150 +1259,3 @@ static int __init cgroup_no_v1(char *str)
return 1;
}
__setup("cgroup_no_v1=", cgroup_no_v1);


#ifdef CONFIG_CGROUP_DEBUG
static struct cgroup_subsys_state *
debug_css_alloc(struct cgroup_subsys_state *parent_css)
{
struct cgroup_subsys_state *css = kzalloc(sizeof(*css), GFP_KERNEL);

if (!css)
return ERR_PTR(-ENOMEM);

return css;
}

static void debug_css_free(struct cgroup_subsys_state *css)
{
kfree(css);
}

static u64 debug_taskcount_read(struct cgroup_subsys_state *css,
struct cftype *cft)
{
return cgroup_task_count(css->cgroup);
}

static u64 current_css_set_read(struct cgroup_subsys_state *css,
struct cftype *cft)
{
return (u64)(unsigned long)current->cgroups;
}

static u64 current_css_set_refcount_read(struct cgroup_subsys_state *css,
struct cftype *cft)
{
u64 count;

rcu_read_lock();
count = refcount_read(&task_css_set(current)->refcount);
rcu_read_unlock();
return count;
}

static int current_css_set_cg_links_read(struct seq_file *seq, void *v)
{
struct cgrp_cset_link *link;
struct css_set *cset;
char *name_buf;

name_buf = kmalloc(NAME_MAX + 1, GFP_KERNEL);
if (!name_buf)
return -ENOMEM;

spin_lock_irq(&css_set_lock);
rcu_read_lock();
cset = rcu_dereference(current->cgroups);
list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
struct cgroup *c = link->cgrp;

cgroup_name(c, name_buf, NAME_MAX + 1);
seq_printf(seq, "Root %d group %s\n",
c->root->hierarchy_id, name_buf);
}
rcu_read_unlock();
spin_unlock_irq(&css_set_lock);
kfree(name_buf);
return 0;
}

#define MAX_TASKS_SHOWN_PER_CSS 25
static int cgroup_css_links_read(struct seq_file *seq, void *v)
{
struct cgroup_subsys_state *css = seq_css(seq);
struct cgrp_cset_link *link;

spin_lock_irq(&css_set_lock);
list_for_each_entry(link, &css->cgroup->cset_links, cset_link) {
struct css_set *cset = link->cset;
struct task_struct *task;
int count = 0;

seq_printf(seq, "css_set %pK\n", cset);

list_for_each_entry(task, &cset->tasks, cg_list) {
if (count++ > MAX_TASKS_SHOWN_PER_CSS)
goto overflow;
seq_printf(seq, " task %d\n", task_pid_vnr(task));
}

list_for_each_entry(task, &cset->mg_tasks, cg_list) {
if (count++ > MAX_TASKS_SHOWN_PER_CSS)
goto overflow;
seq_printf(seq, " task %d\n", task_pid_vnr(task));
}
continue;
overflow:
seq_puts(seq, " ...\n");
}
spin_unlock_irq(&css_set_lock);
return 0;
}

static u64 releasable_read(struct cgroup_subsys_state *css, struct cftype *cft)
{
return (!cgroup_is_populated(css->cgroup) &&
!css_has_online_children(&css->cgroup->self));
}

static struct cftype debug_files[] = {
{
.name = "taskcount",
.read_u64 = debug_taskcount_read,
},

{
.name = "current_css_set",
.read_u64 = current_css_set_read,
},

{
.name = "current_css_set_refcount",
.read_u64 = current_css_set_refcount_read,
},

{
.name = "current_css_set_cg_links",
.seq_show = current_css_set_cg_links_read,
},

{
.name = "cgroup_css_links",
.seq_show = cgroup_css_links_read,
},

{
.name = "releasable",
.read_u64 = releasable_read,
},

{ } /* terminate */
};

struct cgroup_subsys debug_cgrp_subsys = {
.css_alloc = debug_css_alloc,
.css_free = debug_css_free,
.legacy_cftypes = debug_files,
};
#endif /* CONFIG_CGROUP_DEBUG */
Loading

0 comments on commit 9ced560

Please sign in to comment.