<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/kernel/exit.c, branch v3.7-rc3</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v3.7-rc3</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v3.7-rc3'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2012-10-03T03:25:04Z</updated>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2012-10-03T03:25:04Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-10-03T03:25:04Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=aab174f0df5d72d31caccf281af5f614fa254578'/>
<id>urn:sha1:aab174f0df5d72d31caccf281af5f614fa254578</id>
<content type='text'>
Pull vfs update from Al Viro:

 - big one - consolidation of descriptor-related logics; almost all of
   that is moved to fs/file.c

   (BTW, I'm seriously tempted to rename the result to fd.c.  As it is,
   we have a situation when file_table.c is about handling of struct
   file and file.c is about handling of descriptor tables; the reasons
   are historical - file_table.c used to be about a static array of
   struct file we used to have way back).

   A lot of stray ends got cleaned up and converted to saner primitives,
   disgusting mess in android/binder.c is still disgusting, but at least
   doesn't poke so much in descriptor table guts anymore.  A bunch of
   relatively minor races got fixed in process, plus an ext4 struct file
   leak.

 - related thing - fget_light() partially unuglified; see fdget() in
   there (and yes, it generates the code as good as we used to have).

 - also related - bits of Cyrill's procfs stuff that got entangled into
   that work; _not_ all of it, just the initial move to fs/proc/fd.c and
   switch of fdinfo to seq_file.

 - Alex's fs/coredump.c spiltoff - the same story, had been easier to
   take that commit than mess with conflicts.  The rest is a separate
   pile, this was just a mechanical code movement.

 - a few misc patches all over the place.  Not all for this cycle,
   there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
  MAX_LFS_FILESIZE should be a loff_t
  compat: fs: Generic compat_sys_sendfile implementation
  fs: push rcu_barrier() from deactivate_locked_super() to filesystems
  btrfs: reada_extent doesn't need kref for refcount
  coredump: move core dump functionality into its own file
  coredump: prevent double-free on an error path in core dumper
  usb/gadget: fix misannotations
  fcntl: fix misannotations
  ceph: don't abuse d_delete() on failure exits
  hypfs: -&gt;d_parent is never NULL or negative
  vfs: delete surplus inode NULL check
  switch simple cases of fget_light to fdget
  new helpers: fdget()/fdput()
  switch o2hb_region_dev_write() to fget_light()
  proc_map_files_readdir(): don't bother with grabbing files
  make get_file() return its argument
  vhost_set_vring(): turn pollstart/pollstop into bool
  switch prctl_set_mm_exe_file() to fget_light()
  switch xfs_find_handle() to fget_light()
  switch xfs_swapext() to fget_light()
  ...
</content>
</entry>
<entry>
<title>new helper: daemonize_descriptors()</title>
<updated>2012-09-27T01:10:00Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2012-08-22T22:42:10Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=864bdb3b6cbd9911222543fef1cfe36f88183f44'/>
<id>urn:sha1:864bdb3b6cbd9911222543fef1cfe36f88183f44</id>
<content type='text'>
descriptor-related parts of daemonize, done right.  As the
result we simplify the locking rules for -&gt;files - we
hold task_lock in *all* cases when we modify -&gt;files.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>move files_struct-related bits from kernel/exit.c to fs/file.c</title>
<updated>2012-09-27T01:08:54Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2012-08-15T23:56:12Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=7cf4dc3c8dbfdfde163d4636f621cf99a1f63bfb'/>
<id>urn:sha1:7cf4dc3c8dbfdfde163d4636f621cf99a1f63bfb</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>net: use a per task frag allocator</title>
<updated>2012-09-24T20:31:37Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-09-23T23:04:42Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=5640f7685831e088fe6c2e1f863a6805962f8e81'/>
<id>urn:sha1:5640f7685831e088fe6c2e1f863a6805962f8e81</id>
<content type='text'>
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk-&gt;sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Cc: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Cc: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
Tested-by: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>posix_types.h: Cleanup stale __NFDBITS and related definitions</title>
<updated>2012-07-26T20:36:43Z</updated>
<author>
<name>Josh Boyer</name>
<email>jwboyer@redhat.com</email>
</author>
<published>2012-07-25T14:40:34Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=8ded2bbc1845e19c771eb55209aab166ef011243'/>
<id>urn:sha1:8ded2bbc1845e19c771eb55209aab166ef011243</id>
<content type='text'>
Recently, glibc made a change to suppress sign-conversion warnings in
FD_SET (glibc commit ceb9e56b3d1).  This uncovered an issue with the
kernel's definition of __NFDBITS if applications #include
&lt;linux/types.h&gt; after including &lt;sys/select.h&gt;.  A build failure would
be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2
flags to gcc.

It was suggested that the kernel should either match the glibc
definition of __NFDBITS or remove that entirely.  The current in-kernel
uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no
uses of the related __FDELT and __FDMASK defines.  Given that, we'll
continue the cleanup that was started with commit 8b3d1cda4f5f
("posix_types: Remove fd_set macros") and drop the remaining unused
macros.

Additionally, linux/time.h has similar macros defined that expand to
nothing so we'll remove those at the same time.

Reported-by: Jeff Law &lt;law@redhat.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
CC: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Josh Boyer &lt;jwboyer@redhat.com&gt;
[ .. and fix up whitespace as per akpm ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>move exit_task_work() past exit_files() et.al.</title>
<updated>2012-07-22T19:57:57Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2012-06-27T07:31:24Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=ed3e694d78cc75fa79bf29698631b146fd27aa35'/>
<id>urn:sha1:ed3e694d78cc75fa79bf29698631b146fd27aa35</id>
<content type='text'>
... and get rid of PF_EXITING check in task_work_add().

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper</title>
<updated>2012-06-20T21:39:36Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2012-06-20T19:53:04Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=50d75f8daead8a1f850c40a3b6c6575ab19b48cf'/>
<id>urn:sha1:50d75f8daead8a1f850c40a3b6c6575ab19b48cf</id>
<content type='text'>
find_new_reaper() changes pid_ns-&gt;child_reaper, see add0d4df ("pid_ns:
zap_pid_ns_processes: fix the -&gt;child_reaper changing").

The original reason has gone away after the previous patch, -&gt;children
list must be empty after zap_pid_ns_processes().

However now we can not switch to init_pid_ns.child_reaper.
__unhash_process() relies on the "-&gt;child_reaper == parent" check, but
this check does not work if the last exiting task is also the child
reaper.

As Eric sugested, we can change __unhash_process() to use the parent's
pid_ns and remove this code.

Also, with this change we can move detach_pid(PIDTYPE_PID) back, where it
was before the previous fix.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Louis Rilling &lt;louis.rilling@kerlabs.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Acked-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Tested-by: Andrew Wagin &lt;avagin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pidns: guarantee that the pidns init will be the last pidns process reaped</title>
<updated>2012-06-20T21:39:36Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2012-06-20T19:53:03Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=6347e90091041e34bea625370794c92f4ce71228'/>
<id>urn:sha1:6347e90091041e34bea625370794c92f4ce71228</id>
<content type='text'>
Today we have a twofold bug.  Sometimes release_task on pid == 1 in a pid
namespace can run before other processes in a pid namespace have had
release task called.  With the result that pid_ns_release_proc can be
called before the last proc_flus_task() is done using upid-&gt;ns-&gt;proc_mnt,
resulting in the use of a stale pointer.  This same set of circumstances
can lead to waitpid(...) returning for a processes started with
clone(CLONE_NEWPID) before the every process in the pid namespace has
actually exited.

To fix this modify zap_pid_ns_processess wait until all other processes in
the pid namespace have exited, even EXIT_DEAD zombies.

The delay_group_leader and related tests ensure that the thread gruop
leader will be the last thread of a process group to be reaped, or to
become EXIT_DEAD and self reap.  With the change to zap_pid_ns_processes
we get the guarantee that pid == 1 in a pid namespace will be the last
task that release_task is called on.

With pid == 1 being the last task to pass through release_task
pid_ns_release_proc can no longer be called too early nor can wait return
before all of the EXIT_DEAD tasks in a pid namespace have exited.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Louis Rilling &lt;louis.rilling@kerlabs.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Acked-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Tested-by: Andrew Wagin &lt;avagin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: correctly synchronize rss-counters at exit/exec</title>
<updated>2012-06-20T21:39:36Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@openvz.org</email>
</author>
<published>2012-06-20T19:53:01Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=4fe7efdbdfb1c7e7a7f31decfd831c0f31d37091'/>
<id>urn:sha1:4fe7efdbdfb1c7e7a7f31decfd831c0f31d37091</id>
<content type='text'>
do_exit() and exec_mmap() call sync_mm_rss() before mm_release() does
put_user(clear_child_tid) which can update task-&gt;rss_stat and thus make
mm-&gt;rss_stat inconsistent.  This triggers the "BUG:" printk in check_mm().

Let's fix this bug in the safest way, and optimize/cleanup this later.

Reported-by: Markus Trippelsdorf &lt;markus@trippelsdorf.de&gt;
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "mm: correctly synchronize rss-counters at exit/exec"</title>
<updated>2012-06-08T00:54:07Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-06-08T00:54:07Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=48d212a2eecaca2e1875925837ad27b2f43f48a3'/>
<id>urn:sha1:48d212a2eecaca2e1875925837ad27b2f43f48a3</id>
<content type='text'>
This reverts commit 40af1bbdca47e5c8a2044039bb78ca8fd8b20f94.

It's horribly and utterly broken for at least the following reasons:

 - calling sync_mm_rss() from mmput() is fundamentally wrong, because
   there's absolutely no reason to believe that the task that does the
   mmput() always does it on its own VM.  Example: fork, ptrace, /proc -
   you name it.

 - calling it *after* having done mmdrop() on it is doubly insane, since
   the mm struct may well be gone now.

 - testing mm against NULL before you call it is insane too, since a
NULL mm there would have caused oopses long before.

.. and those are just the three bugs I found before I decided to give up
looking for me and revert it asap.  I should have caught it before I
even took it, but I trusted Andrew too much.

Cc: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Markus Trippelsdorf &lt;markus@trippelsdorf.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
