<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/include/linux/cgroup.h, branch v4.12-rc2</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v4.12-rc2</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v4.12-rc2'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2017-05-01T20:52:24Z</updated>
<entry>
<title>Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2017-05-01T20:52:24Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-05-01T20:52:24Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=9410091dd5b4097819fcbb6d63987c51f62c85fd'/>
<id>urn:sha1:9410091dd5b4097819fcbb6d63987c51f62c85fd</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:
 "Nothing major. Two notable fixes are Li's second stab at fixing the
  long-standing race condition in the mount path and suppression of
  spurious warning from cgroup_get(). All other changes are trivial"

* 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: mark cgroup_get() with __maybe_unused
  cgroup: avoid attaching a cgroup root to two different superblocks, take 2
  cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc()
  cgroup: move cgroup_subsys_state parent field for cache locality
  cpuset: Remove cpuset_update_active_cpus()'s parameter.
  cgroup: switch to BUG_ON()
  cgroup: drop duplicate header nsproxy.h
  kernel: convert css_set.refcount from atomic_t to refcount_t
  kernel: convert cgroup_namespace.count from atomic_t to refcount_t
</content>
</entry>
<entry>
<title>cgroup: drop duplicate header nsproxy.h</title>
<updated>2017-03-24T15:36:16Z</updated>
<author>
<name>Geliang Tang</name>
<email>geliangtang@gmail.com</email>
</author>
<published>2017-03-24T14:13:35Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=8f48cfabac57977338f5c828ed3e12fc34373c7d'/>
<id>urn:sha1:8f48cfabac57977338f5c828ed3e12fc34373c7d</id>
<content type='text'>
Drop duplicate header nsproxy.h from linux/cgroup.h.

Signed-off-by: Geliang Tang &lt;geliangtang@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups</title>
<updated>2017-03-17T14:18:47Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2017-03-16T20:54:24Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=77f88796cee819b9c4562b0b6b44691b3b7755b1'/>
<id>urn:sha1:77f88796cee819b9c4562b0b6b44691b3b7755b1</id>
<content type='text'>
Creation of a kthread goes through a couple interlocked stages between
the kthread itself and its creator.  Once the new kthread starts
running, it initializes itself and wakes up the creator.  The creator
then can further configure the kthread and then let it start doing its
job by waking it up.

In this configuration-by-creator stage, the creator is the only one
that can wake it up but the kthread is visible to userland.  When
altering the kthread's attributes from userland is allowed, this is
fine; however, for cases where CPU affinity is critical,
kthread_bind() is used to first disable affinity changes from userland
and then set the affinity.  This also prevents the kthread from being
migrated into non-root cgroups as that can affect the CPU affinity and
many other things.

Unfortunately, the cgroup side of protection is racy.  While the
PF_NO_SETAFFINITY flag prevents further migrations, userland can win
the race before the creator sets the flag with kthread_bind() and put
the kthread in a non-root cgroup, which can lead to all sorts of
problems including incorrect CPU affinity and starvation.

This bug got triggered by userland which periodically tries to migrate
all processes in the root cpuset cgroup to a non-root one.  Per-cpu
workqueue workers got caught while being created and ended up with
incorrected CPU affinity breaking concurrency management and sometimes
stalling workqueue execution.

This patch adds task-&gt;no_cgroup_migration which disallows the task to
be migrated by userland.  kthreadd starts with the flag set making
every child kthread start in the root cgroup with migration
disallowed.  The flag is cleared after the kthread finishes
initialization by which time PF_NO_SETAFFINITY is set if the kthread
should stay in the root cgroup.

It'd be better to wait for the initialization instead of failing but I
couldn't think of a way of implementing that without adding either a
new PF flag, or sleeping and retrying from waiting side.  Even if
userland depends on changing cgroup membership of a kthread, it either
has to be synchronized with kthread_create() or periodically repeat,
so it's unlikely that this would break anything.

v2: Switch to a simpler implementation using a new task_struct bit
    field suggested by Oleg.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reported-and-debugged-by: Chris Mason &lt;clm@fb.com&gt;
Cc: stable@vger.kernel.org # v4.3+ (we can't close the race on &lt; v4.3)
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernel: convert cgroup_namespace.count from atomic_t to refcount_t</title>
<updated>2017-03-06T19:55:22Z</updated>
<author>
<name>Elena Reshetova</name>
<email>elena.reshetova@intel.com</email>
</author>
<published>2017-02-20T10:19:00Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=387ad9674b0013c8756ad20d854ff005b0c313ad'/>
<id>urn:sha1:387ad9674b0013c8756ad20d854ff005b0c313ad</id>
<content type='text'>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova &lt;elena.reshetova@intel.com&gt;
Signed-off-by: Hans Liljestrand &lt;ishkamiel@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David Windsor &lt;dwindsor@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: fix a comment typo</title>
<updated>2016-12-27T19:52:47Z</updated>
<author>
<name>Geliang Tang</name>
<email>geliangtang@gmail.com</email>
</author>
<published>2016-12-24T15:28:35Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=7b4632f048415263669676dda20fd5d811c3d3e4'/>
<id>urn:sha1:7b4632f048415263669676dda20fd5d811c3d3e4</id>
<content type='text'>
Fix a comment typo in cgroup.h.

Signed-off-by: Geliang Tang &lt;geliangtang@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2016-10-14T19:18:50Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-10-14T19:18:50Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=f34d3606f76a8121b9d4940d2dd436bebeb2f9d7'/>
<id>urn:sha1:f34d3606f76a8121b9d4940d2dd436bebeb2f9d7</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - tracepoints for basic cgroup management operations added

 - kernfs and cgroup path formatting functions updated to behave in the
   style of strlcpy()

 - non-critical bug fixes

* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
  cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent()
  cpuset: fix error handling regression in proc_cpuset_show()
  cgroup: add tracepoints for basic operations
  cgroup: make cgroup_path() and friends behave in the style of strlcpy()
  kernfs: remove kernfs_path_len()
  kernfs: make kernfs_path*() behave in the style of strlcpy()
  kernfs: add dummy implementation of kernfs_path_from_node()
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2016-10-06T16:52:23Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-10-06T16:52:23Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=14986a34e1289424811443a524cdd9e1688c7913'/>
<id>urn:sha1:14986a34e1289424811443a524cdd9e1688c7913</id>
<content type='text'>
Pull namespace updates from Eric Biederman:
 "This set of changes is a number of smaller things that have been
  overlooked in other development cycles focused on more fundamental
  change. The devpts changes are small things that were a distraction
  until we managed to kill off DEVPTS_MULTPLE_INSTANCES. There is an
  trivial regression fix to autofs for the unprivileged mount changes
  that went in last cycle. A pair of ioctls has been added by Andrey
  Vagin making it is possible to discover the relationships between
  namespaces when referring to them through file descriptors.

  The big user visible change is starting to add simple resource limits
  to catch programs that misbehave. With namespaces in general and user
  namespaces in particular allowing users to use more kinds of
  resources, it has become important to have something to limit errant
  programs. Because the purpose of these limits is to catch errant
  programs the code needs to be inexpensive to use as it always on, and
  the default limits need to be high enough that well behaved programs
  on well behaved systems don't encounter them.

  To this end, after some review I have implemented per user per user
  namespace limits, and use them to limit the number of namespaces. The
  limits being per user mean that one user can not exhause the limits of
  another user. The limits being per user namespace allow contexts where
  the limit is 0 and security conscious folks can remove from their
  threat anlysis the code used to manage namespaces (as they have
  historically done as it root only). At the same time the limits being
  per user namespace allow other parts of the system to use namespaces.

  Namespaces are increasingly being used in application sand boxing
  scenarios so an all or nothing disable for the entire system for the
  security conscious folks makes increasing use of these sandboxes
  impossible.

  There is also added a limit on the maximum number of mounts present in
  a single mount namespace. It is nontrivial to guess what a reasonable
  system wide limit on the number of mount structure in the kernel would
  be, especially as it various based on how a system is using
  containers. A limit on the number of mounts in a mount namespace
  however is much easier to understand and set. In most cases in
  practice only about 1000 mounts are used. Given that some autofs
  scenarious have the potential to be 30,000 to 50,000 mounts I have set
  the default limit for the number of mounts at 100,000 which is well
  above every known set of users but low enough that the mount hash
  tables don't degrade unreaonsably.

  These limits are a start. I expect this estabilishes a pattern that
  other limits for resources that namespaces use will follow. There has
  been interest in making inotify event limits per user per user
  namespace as well as interest expressed in making details about what
  is going on in the kernel more visible"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (28 commits)
  autofs:  Fix automounts by using current_real_cred()-&gt;uid
  mnt: Add a per mount namespace limit on the number of mounts
  netns: move {inc,dec}_net_namespaces into #ifdef
  nsfs: Simplify __ns_get_path
  tools/testing: add a test to check nsfs ioctl-s
  nsfs: add ioctl to get a parent namespace
  nsfs: add ioctl to get an owning user namespace for ns file descriptor
  kernel: add a helper to get an owning user namespace for a namespace
  devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts
  devpts: Remove sync_filesystems
  devpts: Make devpts_kill_sb safe if fsi is NULL
  devpts: Simplify devpts_mount by using mount_nodev
  devpts: Move the creation of /dev/pts/ptmx into fill_super
  devpts: Move parse_mount_options into fill_super
  userns: When the per user per user namespace limit is reached return ENOSPC
  userns; Document per user per user namespace limits.
  mntns: Add a limit on the number of mount namespaces.
  netns: Add a limit on the number of net namespaces
  cgroupns: Add a limit on the number of cgroup namespaces
  ipcns: Add a  limit on the number of ipc namespaces
  ...
</content>
</entry>
<entry>
<title>cgroup: Add task_under_cgroup_hierarchy cgroup inline function to headers</title>
<updated>2016-08-13T04:49:41Z</updated>
<author>
<name>Sargun Dhillon</name>
<email>sargun@sargun.me</email>
</author>
<published>2016-08-12T15:56:40Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=aed704b7a634954dc28fe5c4b49db478cf2d96b7'/>
<id>urn:sha1:aed704b7a634954dc28fe5c4b49db478cf2d96b7</id>
<content type='text'>
This commit adds an inline function to cgroup.h to check whether a given
task is under a given cgroup hierarchy. This is to avoid having to put
ifdefs in .c files to gate access to cgroups. When cgroups are disabled
this always returns true.

Signed-off-by: Sargun Dhillon &lt;sargun@sargun.me&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>cgroup: make cgroup_path() and friends behave in the style of strlcpy()</title>
<updated>2016-08-10T15:23:44Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-08-10T15:23:44Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=4c737b41de7f4eef2a593803bad1b918dd718b10'/>
<id>urn:sha1:4c737b41de7f4eef2a593803bad1b918dd718b10</id>
<content type='text'>
cgroup_path() and friends used to format the path from the end and
thus the resulting path usually didn't start at the start of the
passed in buffer.  Also, when the buffer was too small, the partial
result was truncated from the head rather than tail and there was no
way to tell how long the full path would be.  These make the functions
less robust and more awkward to use.

With recent updates to kernfs_path(), cgroup_path() and friends can be
made to behave in strlcpy() style.

* cgroup_path(), cgroup_path_ns[_locked]() and task_cgroup_path() now
  always return the length of the full path.  If buffer is too small,
  it contains nul terminated truncated output.

* All users updated accordingly.

v2: cgroup_path() usage in kernel/sched/debug.c converted.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
</content>
</entry>
<entry>
<title>kernfs: make kernfs_path*() behave in the style of strlcpy()</title>
<updated>2016-08-10T15:23:44Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-08-10T15:23:44Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=3abb1d90f5d930c6183534a624aa0158a71bc5eb'/>
<id>urn:sha1:3abb1d90f5d930c6183534a624aa0158a71bc5eb</id>
<content type='text'>
kernfs_path*() functions always return the length of the full path but
the path content is undefined if the length is larger than the
provided buffer.  This makes its behavior different from strlcpy() and
requires error handling in all its users even when they don't care
about truncation.  In addition, the implementation can actully be
simplified by making it behave properly in strlcpy() style.

* Update kernfs_path_from_node_locked() to always fill up the buffer
  with path.  If the buffer is not large enough, the output is
  truncated and terminated.

* kernfs_path() no longer needs error handling.  Make it a simple
  inline wrapper around kernfs_path_from_node().

* sysfs_warn_dup()'s use of kernfs_path() doesn't need error handling.
  Updated accordingly.

* cgroup_path()'s use of kernfs_path() updated to retain the old
  behavior.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt;
</content>
</entry>
</feed>
