Skip to content

Commit

Permalink
cgroup: add cpu.stat file to root cgroup
Browse files Browse the repository at this point in the history
Currently, the root cgroup does not have a cpu.stat file. Add one which
is consistent with /proc/stat to capture global cpu statistics that
might not fall under cgroup accounting.

We haven't done this in the past because the data are already presented
in /proc/stat and we didn't want to add overhead from collecting root
cgroup stats when cgroups are configured, but no cgroups have been
created.

By keeping the data consistent with /proc/stat, I think we avoid the
first problem, while improving the usability of cgroups stats.
We avoid the second problem by computing the contents of cpu.stat from
existing data collected for /proc/stat anyway.

Signed-off-by: Boris Burkov <[email protected]>
Suggested-by: Tejun Heo <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
  • Loading branch information
boryas authored and htejun committed May 28, 2020
1 parent 6b6ebb3 commit 936f2a7
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 13 deletions.
6 changes: 2 additions & 4 deletions Documentation/admin-guide/cgroup-v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -714,9 +714,7 @@ Conventions
- Settings for a single feature should be contained in a single file.

- The root cgroup should be exempt from resource control and thus
shouldn't have resource control interface files. Also,
informational files on the root cgroup which end up showing global
information available elsewhere shouldn't exist.
shouldn't have resource control interface files.

- The default time unit is microseconds. If a different unit is ever
used, an explicit unit suffix must be present.
Expand Down Expand Up @@ -985,7 +983,7 @@ CPU Interface Files
All time durations are in microseconds.

cpu.stat
A read-only flat-keyed file which exists on non-root cgroups.
A read-only flat-keyed file.
This file exists whether the controller is enabled or not.

It always reports the following three stats:
Expand Down
1 change: 0 additions & 1 deletion kernel/cgroup/cgroup.c
Original file line number Diff line number Diff line change
Expand Up @@ -4874,7 +4874,6 @@ static struct cftype cgroup_base_files[] = {
},
{
.name = "cpu.stat",
.flags = CFTYPE_NOT_ON_ROOT,
.seq_show = cpu_stat_show,
},
#ifdef CONFIG_PSI
Expand Down
60 changes: 52 additions & 8 deletions kernel/cgroup/rstat.c
Original file line number Diff line number Diff line change
Expand Up @@ -389,18 +389,62 @@ void __cgroup_account_cputime_field(struct cgroup *cgrp,
cgroup_base_stat_cputime_account_end(cgrp, rstatc);
}

/*
* compute the cputime for the root cgroup by getting the per cpu data
* at a global level, then categorizing the fields in a manner consistent
* with how it is done by __cgroup_account_cputime_field for each bit of
* cpu time attributed to a cgroup.
*/
static void root_cgroup_cputime(struct task_cputime *cputime)
{
int i;

cputime->stime = 0;
cputime->utime = 0;
cputime->sum_exec_runtime = 0;
for_each_possible_cpu(i) {
struct kernel_cpustat kcpustat;
u64 *cpustat = kcpustat.cpustat;
u64 user = 0;
u64 sys = 0;

kcpustat_cpu_fetch(&kcpustat, i);

user += cpustat[CPUTIME_USER];
user += cpustat[CPUTIME_NICE];
cputime->utime += user;

sys += cpustat[CPUTIME_SYSTEM];
sys += cpustat[CPUTIME_IRQ];
sys += cpustat[CPUTIME_SOFTIRQ];
cputime->stime += sys;

cputime->sum_exec_runtime += user;
cputime->sum_exec_runtime += sys;
cputime->sum_exec_runtime += cpustat[CPUTIME_STEAL];
cputime->sum_exec_runtime += cpustat[CPUTIME_GUEST];
cputime->sum_exec_runtime += cpustat[CPUTIME_GUEST_NICE];
}
}

void cgroup_base_stat_cputime_show(struct seq_file *seq)
{
struct cgroup *cgrp = seq_css(seq)->cgroup;
u64 usage, utime, stime;

if (!cgroup_parent(cgrp))
return;

cgroup_rstat_flush_hold(cgrp);
usage = cgrp->bstat.cputime.sum_exec_runtime;
cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime, &utime, &stime);
cgroup_rstat_flush_release();
struct task_cputime cputime;

if (cgroup_parent(cgrp)) {
cgroup_rstat_flush_hold(cgrp);
usage = cgrp->bstat.cputime.sum_exec_runtime;
cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime,
&utime, &stime);
cgroup_rstat_flush_release();
} else {
root_cgroup_cputime(&cputime);
usage = cputime.sum_exec_runtime;
utime = cputime.utime;
stime = cputime.stime;
}

do_div(usage, NSEC_PER_USEC);
do_div(utime, NSEC_PER_USEC);
Expand Down

0 comments on commit 936f2a7

Please sign in to comment.