pm24.git/kernel/cgroup.c, branch v3.7-rc3

Revert "cgroup: Remove task_lock() from cgroup_post_fork()"

2012-10-19T21:09:35Z

This reverts commit 7e3aa30ac8c904a706518b725c451bb486daaae9. The commit incorrectly assumed that fork path always performed threadgroup_change_begin/end() and depended on that for synchronization against task exit and cgroup migration paths instead of explicitly grabbing task_lock(). threadgroup_change is not locked when forking a new process (as opposed to a new thread in the same process) and even if it were it wouldn't be effective as different processes use different threadgroup locks. Revert the incorrect optimization. Signed-off-by: Tejun Heo LKML-Reference: <20121008020000.GB2575@localhost> Acked-by: Li Zefan Cc: Frederic Weisbecker Cc: stable@vger.kernel.org

Revert "cgroup: Drop task_lock(parent) on cgroup_fork()"

2012-10-19T21:08:49Z

This reverts commit 7e381b0eb1e1a9805c37335562e8dc02e7d7848c. The commit incorrectly assumed that fork path always performed threadgroup_change_begin/end() and depended on that for synchronization against task exit and cgroup migration paths instead of explicitly grabbing task_lock(). threadgroup_change is not locked when forking a new process (as opposed to a new thread in the same process) and even if it were it wouldn't be effective as different processes use different threadgroup locks. Revert the incorrect optimization. Signed-off-by: Tejun Heo LKML-Reference: <20121008020000.GB2575@localhost> Acked-by: Li Zefan Bitterly-Acked-by: Frederic Weisbecker Cc: stable@vger.kernel.org

cgroup: notify_on_release may not be triggered in some cases

2012-10-17T00:09:36Z

notify_on_release must be triggered when the last process in a cgroup is move to another. But if the first(and only) process in a cgroup is moved to another, notify_on_release is not triggered. # mkdir /cgroup/cpu/SRC # mkdir /cgroup/cpu/DST # # echo 1 >/cgroup/cpu/SRC/notify_on_release # echo 1 >/cgroup/cpu/DST/notify_on_release # # sleep 300 & [1] 8629 # # echo 8629 >/cgroup/cpu/SRC/tasks # echo 8629 >/cgroup/cpu/DST/tasks -> notify_on_release for /SRC must be triggered at this point, but it isn't. This is because put_css_set() is called before setting CGRP_RELEASABLE in cgroup_task_migrate(), and is a regression introduce by the commit:74a1166d(cgroups: make procs file writable), which was merged into v3.0. Cc: Ben Blum Cc: # v3.0.x and later Acked-by: Li Zefan Signed-off-by: Daisuke Nishimura Signed-off-by: Tejun Heo

Merge branch 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

2012-10-02T17:52:28Z

Pull cgroup hierarchy update from Tejun Heo: "Currently, different cgroup subsystems handle nested cgroups completely differently. There's no consistency among subsystems and the behaviors often are outright broken. People at least seem to agree that the broken hierarhcy behaviors need to be weeded out if any progress is gonna be made on this front and that the fallouts from deprecating the broken behaviors should be acceptable especially given that the current behaviors don't make much sense when nested. This patch makes cgroup emit warning messages if cgroups for subsystems with broken hierarchy behavior are nested to prepare for fixing them in the future. This was put in a separate branch because more related changes were expected (didn't make it this round) and the memory cgroup wanted to pull in this and make changes on top." * 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them

cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them

2012-09-14T19:01:16Z

Currently, cgroup hierarchy support is a mess. cpu related subsystems behave correctly - configuration, accounting and control on a parent properly cover its children. blkio and freezer completely ignore hierarchy and treat all cgroups as if they're directly under the root cgroup. Others show yet different behaviors. These differing interpretations of cgroup hierarchy make using cgroup confusing and it impossible to co-mount controllers into the same hierarchy and obtain sane behavior. Eventually, we want full hierarchy support from all subsystems and probably a unified hierarchy. Users using separate hierarchies expecting completely different behaviors depending on the mounted subsystem is deterimental to making any progress on this front. This patch adds cgroup_subsys.broken_hierarchy and sets it to %true for controllers which are lacking in hierarchy support. The goal of this patch is two-fold. * Move users away from using hierarchy on currently non-hierarchical subsystems, so that implementing proper hierarchy support on those doesn't surprise them. * Keep track of which controllers are broken how and nudge the subsystems to implement proper hierarchy support. For now, start with a single warning message. We can whine louder later on. v2: Fixed a typo spotted by Michal. Warning message updated. v3: Updated memcg part so that it doesn't generate warning in the cases where .use_hierarchy=false doesn't make the behavior different from root.use_hierarchy=true. Fixed a typo spotted by Glauber. v4: Check ->broken_hierarchy after cgroup creation is complete so that ->create() can affect the result per Michal. Dropped unnecessary memcg root handling per Michal. Signed-off-by: Tejun Heo Acked-by: Michal Hocko Acked-by: Li Zefan Acked-by: Serge E. Hallyn Cc: Glauber Costa Cc: Peter Zijlstra Cc: Paul Turner Cc: Johannes Weiner Cc: Thomas Graf Cc: Vivek Goyal Cc: Paul Mackerras Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Neil Horman Cc: Aneesh Kumar K.V

cgroup: Assign subsystem IDs during compile time

2012-09-14T16:57:43Z

WARNING: With this change it is impossible to load external built controllers anymore. In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is set, corresponding subsys_id should also be a constant. Up to now, net_prio_subsys_id and net_cls_subsys_id would be of the type int and the value would be assigned during runtime. By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN to IS_ENABLED, all *_subsys_id will have constant value. That means we need to remove all the code which assumes a value can be assigned to net_prio_subsys_id and net_cls_subsys_id. A close look is necessary on the RCU part which was introduces by following patch: commit f845172531fb7410c7fb7780b1a6e51ee6df7d52 Author: Herbert Xu Mon May 24 09:12:34 2010 Committer: David S. Miller Mon May 24 09:12:34 2010 cls_cgroup: Store classid in struct sock Tis code was added to init_cgroup_cls() /* We can't use rcu_assign_pointer because this is an int. */ smp_wmb(); net_cls_subsys_id = net_cls_subsys.subsys_id; respectively to exit_cgroup_cls() net_cls_subsys_id = -1; synchronize_rcu(); and in module version of task_cls_classid() rcu_read_lock(); id = rcu_dereference(net_cls_subsys_id); if (id >= 0) classid = container_of(task_subsys_state(p, id), struct cgroup_cls_state, css)->classid; rcu_read_unlock(); Without an explicit explaination why the RCU part is needed. (The rcu_deference was fixed by exchanging it to rcu_derefence_index_check() in a later commit, but that is a minor detail.) So here is my pondering why it was introduced and why it safe to remove it now. Note that this code was copied over to net_prio the reasoning holds for that subsystem too. The idea behind the RCU use for net_cls_subsys_id is to make sure we get a valid pointer back from task_subsys_state(). task_subsys_state() is just blindly accessing the subsys array and returning the pointer. Obviously, passing in -1 as id into task_subsys_state() returns an invalid value (out of lower bound). So this code makes sure that only after module is loaded and the subsystem registered, the id is assigned. Before unregistering the module all old readers must have left the critical section. This is done by assigning -1 to the id and issuing a synchronized_rcu(). Any new readers wont call task_subsys_state() anymore and therefore it is safe to unregister the subsystem. The new code relies on the same trick, but it looks at the subsys pointer return by task_subsys_state() (remember the id is constant and therefore we allways have a valid index into the subsys array). No precautions need to be taken during module loading module. Eventually, all CPUs will get a valid pointer back from task_subsys_state() because rebind_subsystem() which is called after the module init() function will assigned subsys[net_cls_subsys_id] the newly loaded module subsystem pointer. When the subsystem is about to be removed, rebind_subsystem() will called before the module exit() function. In this case, rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer and then it calls synchronize_rcu(). All old readers have left by then the critical section. Any new reader wont access the subsystem anymore. At this point we are safe to unregister the subsystem. No synchronize_rcu() call is needed. Signed-off-by: Daniel Wagner Signed-off-by: Tejun Heo Acked-by: Li Zefan Acked-by: Neil Horman Cc: "David S. Miller" Cc: "Paul E. McKenney" Cc: Andrew Morton Cc: Eric Dumazet Cc: Gao feng Cc: Glauber Costa Cc: Herbert Xu Cc: Jamal Hadi Salim Cc: John Fastabend Cc: Kamezawa Hiroyuki Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org

cgroup: Do not depend on a given order when populating the subsys array

2012-09-14T16:57:40Z

The *_subsys_id will be used as index to access the subsys. Therefore we need to care we populate the subsystem at the correct position by using designated initialization. With this change we are able to interleave builtin and modules in the subsys array. Signed-off-by: Daniel Wagner Signed-off-by: Tejun Heo Acked-by: Li Zefan Acked-by: Neil Horman Cc: Gao feng Cc: Jamal Hadi Salim Cc: John Fastabend Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org

cgroup: Wrap subsystem selection macro

2012-09-14T16:57:37Z

Before we are able to define all subsystem ids at compile time we need a more fine grained control what gets defined when we include cgroup_subsys.h. For example we define the enums for the subsystems or to declare for struct cgroup_subsys (builtin subsystem) by including cgroup_subsys.h and defining SUBSYS accordingly. Currently, the decision if a subsys is used is defined inside the header by testing if CONFIG_*=y is true. By moving this test outside of cgroup_subsys.h we are able to control it on the include level. This is done by introducing IS_SUBSYS_ENABLED which then is defined according the task, e.g. is CONFIG_*=y or CONFIG_*=m. Signed-off-by: Daniel Wagner Signed-off-by: Tejun Heo Acked-by: Li Zefan Acked-by: Neil Horman Cc: Gao feng Cc: Jamal Hadi Salim Cc: John Fastabend Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org

cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT

2012-09-14T16:57:32Z

CGROUP_BUILTIN_SUBSYS_COUNT is used as start index or stop index when looping over the subsys array looking either at the builtin or the module subsystems. Since all the builtin subsystems have an id which is lower then CGROUP_BUILTIN_SUBSYS_COUNT we know that any module will have an id larger than CGROUP_BUILTIN_SUBSYS_COUNT. In short the ids are sorted. We are about to change id assignment to happen only at compile time later in this series. That means we can't rely on the above trick since all ids will always be defined at compile time. Furthermore, ordering the builtin subsystems and the module subsystems is not really necessary. So we need a different way to know which subsystem is a builtin or a module one. We can use the subsys[]->module pointer for this. Any place where we need to know if a subsys is module we just check for the pointer. If it is NULL then the subsystem is a builtin one. With this we are able to drop the CGROUP_BUILTIN_SUBSYS_COUNT enum. Though we need to introduce a temporary placeholder so that we don't get a compilation error when only CONFIG_CGROUP is selected and no single controller. An empty enum definition is not valid. Later in this series we are able to remove the placeholder again. And with this change we get a fix for this: kernel/cgroup.c: In function ‘cgroup_load_subsys’: kernel/cgroup.c:4326:38: warning: array subscript is below array bounds [-Warray-bounds] when CONFIG_CGROUP=y and no built in controller was enabled. Signed-off-by: Daniel Wagner Signed-off-by: Tejun Heo Acked-by: Li Zefan Acked-by: Neil Horman Cc: Gao feng Cc: Jamal Hadi Salim Cc: John Fastabend Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org

cgroup: rename subsys_bits to subsys_mask

2012-08-24T22:55:33Z

In a previous discussion, Tejun Heo suggested to rename references to subsys_bits (added_bits, removed_bits, etc) by something more meaningful. Cc: Li Zefan Cc: Tejun Heo Cc: Hugh Dickins Cc: Hillf Danton Cc: Lennart Poettering Signed-off-by: Aristeu Rozanski Signed-off-by: Tejun Heo