<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/include/linux/cgroup.h, branch v3.7-rc8</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v3.7-rc8</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v3.7-rc8'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2012-10-02T17:52:28Z</updated>
<entry>
<title>Merge branch 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2012-10-02T17:52:28Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-10-02T17:52:28Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=68d47a137c3bef754923bccf73fb639c9b0bbd5e'/>
<id>urn:sha1:68d47a137c3bef754923bccf73fb639c9b0bbd5e</id>
<content type='text'>
Pull cgroup hierarchy update from Tejun Heo:
 "Currently, different cgroup subsystems handle nested cgroups
  completely differently.  There's no consistency among subsystems and
  the behaviors often are outright broken.

  People at least seem to agree that the broken hierarhcy behaviors need
  to be weeded out if any progress is gonna be made on this front and
  that the fallouts from deprecating the broken behaviors should be
  acceptable especially given that the current behaviors don't make much
  sense when nested.

  This patch makes cgroup emit warning messages if cgroups for
  subsystems with broken hierarchy behavior are nested to prepare for
  fixing them in the future.  This was put in a separate branch because
  more related changes were expected (didn't make it this round) and the
  memory cgroup wanted to pull in this and make changes on top."

* 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
</content>
</entry>
<entry>
<title>cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them</title>
<updated>2012-09-14T19:01:16Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-09-13T19:20:58Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=8c7f6edbda01f1b1a2e60ad61f14fe38023e433b'/>
<id>urn:sha1:8c7f6edbda01f1b1a2e60ad61f14fe38023e433b</id>
<content type='text'>
Currently, cgroup hierarchy support is a mess.  cpu related subsystems
behave correctly - configuration, accounting and control on a parent
properly cover its children.  blkio and freezer completely ignore
hierarchy and treat all cgroups as if they're directly under the root
cgroup.  Others show yet different behaviors.

These differing interpretations of cgroup hierarchy make using cgroup
confusing and it impossible to co-mount controllers into the same
hierarchy and obtain sane behavior.

Eventually, we want full hierarchy support from all subsystems and
probably a unified hierarchy.  Users using separate hierarchies
expecting completely different behaviors depending on the mounted
subsystem is deterimental to making any progress on this front.

This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
for controllers which are lacking in hierarchy support.  The goal of
this patch is two-fold.

* Move users away from using hierarchy on currently non-hierarchical
  subsystems, so that implementing proper hierarchy support on those
  doesn't surprise them.

* Keep track of which controllers are broken how and nudge the
  subsystems to implement proper hierarchy support.

For now, start with a single warning message.  We can whine louder
later on.

v2: Fixed a typo spotted by Michal. Warning message updated.

v3: Updated memcg part so that it doesn't generate warning in the
    cases where .use_hierarchy=false doesn't make the behavior
    different from root.use_hierarchy=true.  Fixed a typo spotted by
    Glauber.

v4: Check -&gt;broken_hierarchy after cgroup creation is complete so that
    -&gt;create() can affect the result per Michal.  Dropped unnecessary
    memcg root handling per Michal.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Serge E. Hallyn &lt;serue@us.ibm.com&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Thomas Graf &lt;tgraf@suug.ch&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Cc: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>cgroup: Define CGROUP_SUBSYS_COUNT according the configuration</title>
<updated>2012-09-14T16:57:47Z</updated>
<author>
<name>Daniel Wagner</name>
<email>daniel.wagner@bmw-carit.de</email>
</author>
<published>2012-09-12T14:12:08Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=a6f00298b2ceaf50b4ab00e6ee3eb0206ac72fac'/>
<id>urn:sha1:a6f00298b2ceaf50b4ab00e6ee3eb0206ac72fac</id>
<content type='text'>
Since we know exactly how many subsystems exists at compile time we are
able to define CGROUP_SUBSYS_COUNT correctly. CGROUP_SUBSYS_COUNT will
be at max 12 (all controllers enabled). Depending on the architecture
we safe either 32 - 12 pointers (80 bytes) or 64 - 12 pointers (416
bytes) per cgroup.

With this change we can also remove the temporary placeholder to avoid
compilation errors.

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Cc: Gao feng &lt;gaofeng@cn.fujitsu.com&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
</content>
</entry>
<entry>
<title>cgroup: Assign subsystem IDs during compile time</title>
<updated>2012-09-14T16:57:43Z</updated>
<author>
<name>Daniel Wagner</name>
<email>daniel.wagner@bmw-carit.de</email>
</author>
<published>2012-09-12T14:12:07Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=8a8e04df4747661daaee77e98e102d99c9e09b98'/>
<id>urn:sha1:8a8e04df4747661daaee77e98e102d99c9e09b98</id>
<content type='text'>
WARNING: With this change it is impossible to load external built
controllers anymore.

In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is
set, corresponding subsys_id should also be a constant. Up to now,
net_prio_subsys_id and net_cls_subsys_id would be of the type int and
the value would be assigned during runtime.

By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN
to IS_ENABLED, all *_subsys_id will have constant value. That means we
need to remove all the code which assumes a value can be assigned to
net_prio_subsys_id and net_cls_subsys_id.

A close look is necessary on the RCU part which was introduces by
following patch:

  commit f845172531fb7410c7fb7780b1a6e51ee6df7d52
  Author:	Herbert Xu &lt;herbert@gondor.apana.org.au&gt;  Mon May 24 09:12:34 2010
  Committer:	David S. Miller &lt;davem@davemloft.net&gt;  Mon May 24 09:12:34 2010

  cls_cgroup: Store classid in struct sock

  Tis code was added to init_cgroup_cls()

	  /* We can't use rcu_assign_pointer because this is an int. */
	  smp_wmb();
	  net_cls_subsys_id = net_cls_subsys.subsys_id;

  respectively to exit_cgroup_cls()

	  net_cls_subsys_id = -1;
	  synchronize_rcu();

  and in module version of task_cls_classid()

	  rcu_read_lock();
	  id = rcu_dereference(net_cls_subsys_id);
	  if (id &gt;= 0)
		  classid = container_of(task_subsys_state(p, id),
					 struct cgroup_cls_state, css)-&gt;classid;
	  rcu_read_unlock();

Without an explicit explaination why the RCU part is needed. (The
rcu_deference was fixed by exchanging it to rcu_derefence_index_check()
in a later commit, but that is a minor detail.)

So here is my pondering why it was introduced and why it safe to
remove it now. Note that this code was copied over to net_prio the
reasoning holds for that subsystem too.

The idea behind the RCU use for net_cls_subsys_id is to make sure we
get a valid pointer back from task_subsys_state(). task_subsys_state()
is just blindly accessing the subsys array and returning the
pointer. Obviously, passing in -1 as id into task_subsys_state()
returns an invalid value (out of lower bound).

So this code makes sure that only after module is loaded and the
subsystem registered, the id is assigned.

Before unregistering the module all old readers must have left the
critical section. This is done by assigning -1 to the id and issuing a
synchronized_rcu(). Any new readers wont call task_subsys_state()
anymore and therefore it is safe to unregister the subsystem.

The new code relies on the same trick, but it looks at the subsys
pointer return by task_subsys_state() (remember the id is constant
and therefore we allways have a valid index into the subsys
array).

No precautions need to be taken during module loading
module. Eventually, all CPUs will get a valid pointer back from
task_subsys_state() because rebind_subsystem() which is called after
the module init() function will assigned subsys[net_cls_subsys_id] the
newly loaded module subsystem pointer.

When the subsystem is about to be removed, rebind_subsystem() will
called before the module exit() function. In this case,
rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer
and then it calls synchronize_rcu(). All old readers have left by then
the critical section. Any new reader wont access the subsystem
anymore.  At this point we are safe to unregister the subsystem. No
synchronize_rcu() call is needed.

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Gao feng &lt;gaofeng@cn.fujitsu.com&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Cc: Kamezawa Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
</content>
</entry>
<entry>
<title>cgroup: Wrap subsystem selection macro</title>
<updated>2012-09-14T16:57:37Z</updated>
<author>
<name>Daniel Wagner</name>
<email>daniel.wagner@bmw-carit.de</email>
</author>
<published>2012-09-12T14:12:05Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=5fc0b02544b3b9bd3db5a8156b5f3e7350f8e797'/>
<id>urn:sha1:5fc0b02544b3b9bd3db5a8156b5f3e7350f8e797</id>
<content type='text'>
Before we are able to define all subsystem ids at compile time we need
a more fine grained control what gets defined when we include
cgroup_subsys.h. For example we define the enums for the subsystems or
to declare for struct cgroup_subsys (builtin subsystem) by including
cgroup_subsys.h and defining SUBSYS accordingly.

Currently, the decision if a subsys is used is defined inside the
header by testing if CONFIG_*=y is true. By moving this test outside
of cgroup_subsys.h we are able to control it on the include level.

This is done by introducing IS_SUBSYS_ENABLED which then is defined
according the task, e.g. is CONFIG_*=y or CONFIG_*=m.

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Cc: Gao feng &lt;gaofeng@cn.fujitsu.com&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
</content>
</entry>
<entry>
<title>cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT</title>
<updated>2012-09-14T16:57:32Z</updated>
<author>
<name>Daniel Wagner</name>
<email>daniel.wagner@bmw-carit.de</email>
</author>
<published>2012-09-13T07:50:55Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=be45c900fdc2c66baad5a7703fb8136991d88aeb'/>
<id>urn:sha1:be45c900fdc2c66baad5a7703fb8136991d88aeb</id>
<content type='text'>
CGROUP_BUILTIN_SUBSYS_COUNT is used as start index or stop index when
looping over the subsys array looking either at the builtin or the
module subsystems. Since all the builtin subsystems have an id which
is lower then CGROUP_BUILTIN_SUBSYS_COUNT we know that any module will
have an id larger than CGROUP_BUILTIN_SUBSYS_COUNT. In short the ids
are sorted.

We are about to change id assignment to happen only at compile time
later in this series. That means we can't rely on the above trick
since all ids will always be defined at compile time. Furthermore,
ordering the builtin subsystems and the module subsystems is not
really necessary.

So we need a different way to know which subsystem is a builtin or a
module one. We can use the subsys[]-&gt;module pointer for this. Any
place where we need to know if a subsys is module we just check for
the pointer. If it is NULL then the subsystem is a builtin one.

With this we are able to drop the CGROUP_BUILTIN_SUBSYS_COUNT
enum. Though we need to introduce a temporary placeholder so that we
don't get a compilation error when only CONFIG_CGROUP is selected and
no single controller. An empty enum definition is not valid. Later in
this series we are able to remove the placeholder again.

And with this change we get a fix for this:

kernel/cgroup.c: In function ‘cgroup_load_subsys’:
kernel/cgroup.c:4326:38: warning: array subscript is below array bounds [-Warray-bounds]

when CONFIG_CGROUP=y and no built in controller was enabled.

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Cc: Gao feng &lt;gaofeng@cn.fujitsu.com&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
</content>
</entry>
<entry>
<title>cgroup: add xattr support</title>
<updated>2012-08-24T22:55:33Z</updated>
<author>
<name>Aristeu Rozanski</name>
<email>aris@redhat.com</email>
</author>
<published>2012-08-23T20:53:30Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=03b1cde6b22f625ae832b939bc7379ec1466aec5'/>
<id>urn:sha1:03b1cde6b22f625ae832b939bc7379ec1466aec5</id>
<content type='text'>
This is one of the items in the plumber's wish list.

For use cases:

&gt;&gt; What would the use case be for this?
&gt;
&gt; Attaching meta information to services, in an easily discoverable
&gt; way. For example, in systemd we create one cgroup for each service, and
&gt; could then store data like the main pid of the specific service as an
&gt; xattr on the cgroup itself. That way we'd have almost all service state
&gt; in the cgroupfs, which would make it possible to terminate systemd and
&gt; later restart it without losing any state information. But there's more:
&gt; for example, some very peculiar services cannot be terminated on
&gt; shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
&gt; services in question could just mark that on their cgroup, by setting an
&gt; xattr. On the more desktopy side of things there are other
&gt; possibilities: for example there are plans defining what an application
&gt; is along the lines of a cgroup (i.e. an app being a collection of
&gt; processes). With xattrs one could then attach an icon or human readable
&gt; program name on the cgroup.
&gt;
&gt; The key idea is that this would allow attaching runtime meta information
&gt; to cgroups and everything they model (services, apps, vms), that doesn't
&gt; need any complex userspace infrastructure, has good access control
&gt; (i.e. because the file system enforces that anyway, and there's the
&gt; "trusted." xattr namespace), notifications (inotify), and can easily be
&gt; shared among applications.
&gt;
&gt; Lennart

v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support

Original-patch-by: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: remove hierarchy_mutex</title>
<updated>2012-06-07T02:12:30Z</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2012-06-07T02:12:30Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=6be96a5c905178637ec06a5d456a76b2b304fca3'/>
<id>urn:sha1:6be96a5c905178637ec06a5d456a76b2b304fca3</id>
<content type='text'>
It was introduced for memcg to iterate cgroup hierarchy without
holding cgroup_mutex, but soon after that it was replaced with
a lockless way in memcg.

No one used hierarchy_mutex since that, so remove it.

Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: remove cgroup_subsys-&gt;populate()</title>
<updated>2012-04-11T16:16:48Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-04-10T17:16:36Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=86f82d561864e902c70282b6f17cf590c0f34691'/>
<id>urn:sha1:86f82d561864e902c70282b6f17cf590c0f34691</id>
<content type='text'>
With memcg converted, cgroup_subsys-&gt;populate() doesn't have any user
left.  Remove it.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
</entry>
<entry>
<title>cgroup: make css-&gt;refcnt clearing on cgroup removal optional</title>
<updated>2012-04-01T19:09:56Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-04-01T19:09:56Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=48ddbe194623ae089cc0576e60363f2d2e85662a'/>
<id>urn:sha1:48ddbe194623ae089cc0576e60363f2d2e85662a</id>
<content type='text'>
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
-&gt;pre_detroy() until either all refs drop to zero or removal is
cancelled.

This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).

Unfortunately, memcg currently depends on -&gt;pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.

Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.

  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251

Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
</content>
</entry>
</feed>
