Skip to content

Commit

Permalink
Merge branch 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/g…
Browse files Browse the repository at this point in the history
…it/tj/cgroup

Pull cgroup updates from Tejun Heo:
 "The cgroup core saw several significant updates this cycle:

   - percpu_rwsem for threadgroup locking is reinstated.  This was
     temporarily dropped due to down_write latency issues.  Oleg's
     rework of percpu_rwsem which is scheduled to be merged in this
     merge window resolves the issue.

   - On the v2 hierarchy, when controllers are enabled and disabled, all
     operations are atomic and can fail and revert cleanly.  This allows
     ->can_attach() failure which is necessary for cpu RT slices.

   - Tasks now stay associated with the original cgroups after exit
     until released.  This allows tracking resources held by zombies
     (e.g.  pids) and makes it easy to find out where zombies came from
     on the v2 hierarchy.  The pids controller was broken before these
     changes as zombies escaped the limits; unfortunately, updating this
     behavior required too many invasive changes and I don't think it's
     a good idea to backport them, so the pids controller on 4.3, the
     first version which included the pids controller, will stay broken
     at least until I'm sure about the cgroup core changes.

   - Optimization of a couple common tests using static_key"

* 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (38 commits)
  cgroup: fix race condition around termination check in css_task_iter_next()
  blkcg: don't create "io.stat" on the root cgroup
  cgroup: drop cgroup__DEVEL__legacy_files_on_dfl
  cgroup: replace error handling in cgroup_init() with WARN_ON()s
  cgroup: add cgroup_subsys->free() method and use it to fix pids controller
  cgroup: keep zombies associated with their original cgroups
  cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock
  cgroup: don't hold css_set_rwsem across css task iteration
  cgroup: reorganize css_task_iter functions
  cgroup: factor out css_set_move_task()
  cgroup: keep css_set and task lists in chronological order
  cgroup: make cgroup_destroy_locked() test cgroup_is_populated()
  cgroup: make css_sets pin the associated cgroups
  cgroup: relocate cgroup_[try]get/put()
  cgroup: move check_for_release() invocation
  cgroup: replace cgroup_has_tasks() with cgroup_is_populated()
  cgroup: make cgroup->nr_populated count the number of populated css_sets
  cgroup: remove an unused parameter from cgroup_task_migrate()
  cgroup: fix too early usage of static_branch_disable()
  cgroup: make cgroup_update_dfl_csses() migrate all target processes atomically
  ...
  • Loading branch information
torvalds committed Nov 5, 2015
2 parents 11eaaad + d574567 commit 69234ac
Show file tree
Hide file tree
Showing 21 changed files with 952 additions and 771 deletions.
4 changes: 4 additions & 0 deletions Documentation/cgroups/cgroups.txt
Original file line number Diff line number Diff line change
Expand Up @@ -637,6 +637,10 @@ void exit(struct task_struct *task)

Called during task exit.

void free(struct task_struct *task)

Called when the task_struct is freed.

void bind(struct cgroup *root)
(cgroup_mutex held by caller)

Expand Down
25 changes: 14 additions & 11 deletions Documentation/cgroups/unified-hierarchy.txt
Original file line number Diff line number Diff line change
Expand Up @@ -107,12 +107,6 @@ root of unified hierarchy can be bound to other hierarchies. This
allows mixing unified hierarchy with the traditional multiple
hierarchies in a fully backward compatible way.

For development purposes, the following boot parameter makes all
controllers to appear on the unified hierarchy whether supported or
not.

cgroup__DEVEL__legacy_files_on_dfl

A controller can be moved across hierarchies only after the controller
is no longer referenced in its current hierarchy. Because per-cgroup
controller states are destroyed asynchronously and controllers may
Expand Down Expand Up @@ -341,11 +335,11 @@ is riddled with issues.
unnecessarily complicated and probably done this way because event
delivery itself was expensive.

Unified hierarchy implements an interface file "cgroup.populated"
which can be used to monitor whether the cgroup's subhierarchy has
tasks in it or not. Its value is 0 if there is no task in the cgroup
and its descendants; otherwise, 1. poll and [id]notify events are
triggered when the value changes.
Unified hierarchy implements "populated" field in "cgroup.events"
interface file which can be used to monitor whether the cgroup's
subhierarchy has tasks in it or not. Its value is 0 if there is no
task in the cgroup and its descendants; otherwise, 1. poll and
[id]notify events are triggered when the value changes.

This is significantly lighter and simpler and trivially allows
delegating management of subhierarchy - subhierarchy monitoring can
Expand Down Expand Up @@ -374,6 +368,10 @@ supported and the interface files "release_agent" and

- The "cgroup.clone_children" file is removed.

- /proc/PID/cgroup keeps reporting the cgroup that a zombie belonged
to before exiting. If the cgroup is removed before the zombie is
reaped, " (deleted)" is appeneded to the path.


5-3. Controller File Conventions

Expand Down Expand Up @@ -435,6 +433,11 @@ may be specified in any order and not all pairs have to be specified.
the first entry in the file. Specific entries can use "default" as
its value to indicate inheritance of the default value.

- For events which are not very high frequency, an interface file
"events" should be created which lists event key value pairs.
Whenever a notifiable event happens, file modified event should be
generated on the file.


5-4. Per-Controller Changes

Expand Down
1 change: 1 addition & 0 deletions block/blk-cgroup.c
Original file line number Diff line number Diff line change
Expand Up @@ -899,6 +899,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
struct cftype blkcg_files[] = {
{
.name = "stat",
.flags = CFTYPE_NOT_ON_ROOT,
.seq_show = blkcg_print_stat,
},
{ } /* terminate */
Expand Down
2 changes: 1 addition & 1 deletion block/blk-throttle.c
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,7 @@ static void throtl_pd_init(struct blkg_policy_data *pd)
* regardless of the position of the group in the hierarchy.
*/
sq->parent_sq = &td->service_queue;
if (cgroup_on_dfl(blkg->blkcg->css.cgroup) && blkg->parent)
if (cgroup_subsys_on_dfl(io_cgrp_subsys) && blkg->parent)
sq->parent_sq = &blkg_to_tg(blkg->parent)->service_queue;
tg->td = td;
}
Expand Down
4 changes: 2 additions & 2 deletions block/cfq-iosched.c
Original file line number Diff line number Diff line change
Expand Up @@ -1581,7 +1581,7 @@ static struct blkcg_policy_data *cfq_cpd_alloc(gfp_t gfp)
static void cfq_cpd_init(struct blkcg_policy_data *cpd)
{
struct cfq_group_data *cgd = cpd_to_cfqgd(cpd);
unsigned int weight = cgroup_on_dfl(blkcg_root.css.cgroup) ?
unsigned int weight = cgroup_subsys_on_dfl(io_cgrp_subsys) ?
CGROUP_WEIGHT_DFL : CFQ_WEIGHT_LEGACY_DFL;

if (cpd_to_blkcg(cpd) == &blkcg_root)
Expand All @@ -1599,7 +1599,7 @@ static void cfq_cpd_free(struct blkcg_policy_data *cpd)
static void cfq_cpd_bind(struct blkcg_policy_data *cpd)
{
struct blkcg *blkcg = cpd_to_blkcg(cpd);
bool on_dfl = cgroup_on_dfl(blkcg_root.css.cgroup);
bool on_dfl = cgroup_subsys_on_dfl(io_cgrp_subsys);
unsigned int weight = on_dfl ? CGROUP_WEIGHT_DFL : CFQ_WEIGHT_LEGACY_DFL;

if (blkcg == &blkcg_root)
Expand Down
5 changes: 2 additions & 3 deletions include/linux/backing-dev.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
#include <linux/sched.h>
#include <linux/blkdev.h>
#include <linux/writeback.h>
#include <linux/memcontrol.h>
#include <linux/blk-cgroup.h>
#include <linux/backing-dev-defs.h>
#include <linux/slab.h>
Expand Down Expand Up @@ -267,8 +266,8 @@ static inline bool inode_cgwb_enabled(struct inode *inode)
{
struct backing_dev_info *bdi = inode_to_bdi(inode);

return cgroup_on_dfl(mem_cgroup_root_css->cgroup) &&
cgroup_on_dfl(blkcg_root_css->cgroup) &&
return cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
cgroup_subsys_on_dfl(io_cgrp_subsys) &&
bdi_cap_account_dirty(bdi) &&
(bdi->capabilities & BDI_CAP_CGROUP_WRITEBACK) &&
(inode->i_sb->s_iflags & SB_I_CGROUPWB);
Expand Down
76 changes: 59 additions & 17 deletions include/linux/cgroup-defs.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,12 +76,24 @@ enum {
CFTYPE_ONLY_ON_ROOT = (1 << 0), /* only create on root cgrp */
CFTYPE_NOT_ON_ROOT = (1 << 1), /* don't create on root cgrp */
CFTYPE_NO_PREFIX = (1 << 3), /* (DON'T USE FOR NEW FILES) no subsys prefix */
CFTYPE_WORLD_WRITABLE = (1 << 4), /* (DON'T USE FOR NEW FILES) S_IWUGO */

/* internal flags, do not use outside cgroup core proper */
__CFTYPE_ONLY_ON_DFL = (1 << 16), /* only on default hierarchy */
__CFTYPE_NOT_ON_DFL = (1 << 17), /* not on default hierarchy */
};

/*
* cgroup_file is the handle for a file instance created in a cgroup which
* is used, for example, to generate file changed notifications. This can
* be obtained by setting cftype->file_offset.
*/
struct cgroup_file {
/* do not access any fields from outside cgroup core */
struct list_head node; /* anchored at css->files */
struct kernfs_node *kn;
};

/*
* Per-subsystem/per-cgroup state maintained by the system. This is the
* fundamental structural building block that controllers deal with.
Expand Down Expand Up @@ -122,6 +134,9 @@ struct cgroup_subsys_state {
*/
u64 serial_nr;

/* all cgroup_files associated with this css */
struct list_head files;

/* percpu_ref killing and RCU release */
struct rcu_head rcu_head;
struct work_struct destroy_work;
Expand Down Expand Up @@ -196,6 +211,9 @@ struct css_set {
*/
struct list_head e_cset_node[CGROUP_SUBSYS_COUNT];

/* all css_task_iters currently walking this cset */
struct list_head task_iters;

/* For RCU-protected deletion */
struct rcu_head rcu_head;
};
Expand All @@ -217,16 +235,16 @@ struct cgroup {
int id;

/*
* If this cgroup contains any tasks, it contributes one to
* populated_cnt. All children with non-zero popuplated_cnt of
* their own contribute one. The count is zero iff there's no task
* in this cgroup or its subtree.
* Each non-empty css_set associated with this cgroup contributes
* one to populated_cnt. All children with non-zero popuplated_cnt
* of their own contribute one. The count is zero iff there's no
* task in this cgroup or its subtree.
*/
int populated_cnt;

struct kernfs_node *kn; /* cgroup kernfs entry */
struct kernfs_node *procs_kn; /* kn for "cgroup.procs" */
struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
struct cgroup_file procs_file; /* handle for "cgroup.procs" */
struct cgroup_file events_file; /* handle for "cgroup.events" */

/*
* The bitmask of subsystems enabled on the child cgroups.
Expand Down Expand Up @@ -324,11 +342,6 @@ struct cftype {
*/
char name[MAX_CFTYPE_NAME];
unsigned long private;
/*
* If not 0, file mode is set to this value, otherwise it will
* be figured out automatically
*/
umode_t mode;

/*
* The maximum length of string, excluding trailing nul, that can
Expand All @@ -339,6 +352,14 @@ struct cftype {
/* CFTYPE_* flags */
unsigned int flags;

/*
* If non-zero, should contain the offset from the start of css to
* a struct cgroup_file field. cgroup will record the handle of
* the created file into it. The recorded handle can be used as
* long as the containing css remains accessible.
*/
unsigned int file_offset;

/*
* Fields used for internal bookkeeping. Initialized automatically
* during registration.
Expand Down Expand Up @@ -414,12 +435,10 @@ struct cgroup_subsys {
int (*can_fork)(struct task_struct *task, void **priv_p);
void (*cancel_fork)(struct task_struct *task, void *priv);
void (*fork)(struct task_struct *task, void *priv);
void (*exit)(struct cgroup_subsys_state *css,
struct cgroup_subsys_state *old_css,
struct task_struct *task);
void (*exit)(struct task_struct *task);
void (*free)(struct task_struct *task);
void (*bind)(struct cgroup_subsys_state *root_css);

int disabled;
int early_init;

/*
Expand Down Expand Up @@ -473,8 +492,31 @@ struct cgroup_subsys {
unsigned int depends_on;
};

void cgroup_threadgroup_change_begin(struct task_struct *tsk);
void cgroup_threadgroup_change_end(struct task_struct *tsk);
extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem;

/**
* cgroup_threadgroup_change_begin - threadgroup exclusion for cgroups
* @tsk: target task
*
* Called from threadgroup_change_begin() and allows cgroup operations to
* synchronize against threadgroup changes using a percpu_rw_semaphore.
*/
static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk)
{
percpu_down_read(&cgroup_threadgroup_rwsem);
}

/**
* cgroup_threadgroup_change_end - threadgroup exclusion for cgroups
* @tsk: target task
*
* Called from threadgroup_change_end(). Counterpart of
* cgroup_threadcgroup_change_begin().
*/
static inline void cgroup_threadgroup_change_end(struct task_struct *tsk)
{
percpu_up_read(&cgroup_threadgroup_rwsem);
}

#else /* CONFIG_CGROUPS */

Expand Down
Loading

0 comments on commit 69234ac

Please sign in to comment.