<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/kernel/futex.c, branch v4.6-rc7</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v4.6-rc7</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v4.6-rc7'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2016-04-21T09:06:09Z</updated>
<entry>
<title>futex: Acknowledge a new waiter in counter before plist</title>
<updated>2016-04-21T09:06:09Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2016-04-21T03:09:24Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=fe1bce9e2107ba3a8faffe572483b6974201a0e6'/>
<id>urn:sha1:fe1bce9e2107ba3a8faffe572483b6974201a0e6</id>
<content type='text'>
Otherwise an incoming waker on the dest hash bucket can miss
the waiter adding itself to the plist during the lockless
check optimization (small window but still the correct way
of doing this); similarly to the decrement counterpart.

Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: bigeasy@linutronix.de
Cc: dvhart@infradead.org
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1461208164-29150-1-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

</content>
</entry>
<entry>
<title>futex: Handle unlock_pi race gracefully</title>
<updated>2016-04-20T10:33:13Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2016-04-15T12:35:39Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=89e9e66ba1b3bde9d8ea90566c2aee20697ad681'/>
<id>urn:sha1:89e9e66ba1b3bde9d8ea90566c2aee20697ad681</id>
<content type='text'>
If userspace calls UNLOCK_PI unconditionally without trying the TID -&gt; 0
transition in user space first then the user space value might not have the
waiters bit set. This opens the following race:

CPU0	    	      	    CPU1
uval = get_user(futex)
			    lock(hb)
lock(hb)
			    futex |= FUTEX_WAITERS
			    ....
			    unlock(hb)

cmpxchg(futex, uval, newval)

So the cmpxchg fails and returns -EINVAL to user space, which is wrong because
the futex value is valid.

To handle this (yes, yet another) corner case gracefully, check for a flag
change and retry.

[ tglx: Massaged changelog and slightly reworked implementation ]

Fixes: ccf9e6a80d9e ("futex: Make unlock_pi more robust")
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: stable@vger.kernel.org
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1460723739-5195-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>futex: Replace barrier() in unqueue_me() with READ_ONCE()</title>
<updated>2016-03-08T16:04:02Z</updated>
<author>
<name>Jianyu Zhan</name>
<email>nasa4836@gmail.com</email>
</author>
<published>2016-03-07T01:32:24Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=29b75eb2d56a714190a93d7be4525e617591077a'/>
<id>urn:sha1:29b75eb2d56a714190a93d7be4525e617591077a</id>
<content type='text'>
Commit e91467ecd1ef ("bug in futex unqueue_me") introduced a barrier() in
unqueue_me() to prevent the compiler from rereading the lock pointer which
might change after a check for NULL.

Replace the barrier() with a READ_ONCE() for the following reasons:

1) READ_ONCE() is a weaker form of barrier() that affects only the specific
   load operation, while barrier() is a general compiler level memory barrier.
   READ_ONCE() was not available at the time when the barrier was added.

2) Aside of that READ_ONCE() is descriptive and self explainatory while a
   barrier without comment is not clear to the casual reader.

No functional change.

[ tglx: Massaged changelog ]

Signed-off-by: Jianyu Zhan &lt;nasa4836@gmail.com&gt;
Acked-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Acked-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: dave@stgolabs.net
Cc: peterz@infradead.org
Cc: linux@rasmusvillemoes.dk
Cc: akpm@linux-foundation.org
Cc: fengguang.wu@intel.com
Cc: bigeasy@linutronix.de
Link: http://lkml.kernel.org/r/1457314344-5685-1-git-send-email-nasa4836@gmail.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>futex: Remove requirement for lock_page() in get_futex_key()</title>
<updated>2016-02-17T09:42:17Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2016-02-09T19:15:14Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=65d8fc777f6dcfee12785c057a6b57f679641c90'/>
<id>urn:sha1:65d8fc777f6dcfee12785c057a6b57f679641c90</id>
<content type='text'>
When dealing with key handling for shared futexes, we can drastically reduce
the usage/need of the page lock. 1) For anonymous pages, the associated futex
object is the mm_struct which does not require the page lock. 2) For inode
based, keys, we can check under RCU read lock if the page mapping is still
valid and take reference to the inode. This just leaves one rare race that
requires the page lock in the slow path when examining the swapcache.

Additionally realtime users currently have a problem with the page lock being
contended for unbounded periods of time during futex operations.

Task A
     get_futex_key()
     lock_page()
    ---&gt; preempted

Now any other task trying to lock that page will have to wait until
task A gets scheduled back in, which is an unbound time.

With this patch, we pretty much have a lockless futex_get_key().

Experiments show that this patch can boost/speedup the hashing of shared
futexes with the perf futex benchmarks (which is good for measuring such
change) by up to 45% when there are high (&gt; 100) thread counts on a 60 core
Westmere. Lower counts are pretty much in the noise range or less than 10%,
but mid range can be seen at over 30% overall throughput (hash ops/sec).
This makes anon-mem shared futexes much closer to its private counterpart.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
[ Ported on top of thp refcount rework, changelog, comments, fixes. ]
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1455045314-8305-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>futex: Rename barrier references in ordering guarantees</title>
<updated>2016-02-17T09:42:17Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2016-02-09T19:15:13Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=8ad7b378d0d016309014cae0f640434bca7b5e11'/>
<id>urn:sha1:8ad7b378d0d016309014cae0f640434bca7b5e11</id>
<content type='text'>
Ingo suggested we rename how we reference barriers A and B
regarding futex ordering guarantees. This patch replaces,
for both barriers, MB (A) with smp_mb(); (A), such that:

 - We explicitly state that the barriers are SMP, and

 - We standardize how we reference these across futex.c
   helping readers follow what barrier does what and where.

Suggested-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1455045314-8305-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>rtmutex: Make wait_lock irq safe</title>
<updated>2016-01-26T10:08:35Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-01-13T10:25:38Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=b4abf91047cf054f203dcfac97e1038388826937'/>
<id>urn:sha1:b4abf91047cf054f203dcfac97e1038388826937</id>
<content type='text'>
Sasha reported a lockdep splat about a potential deadlock between RCU boosting
rtmutex and the posix timer it_lock.

CPU0					CPU1

rtmutex_lock(&amp;rcu-&gt;rt_mutex)
  spin_lock(&amp;rcu-&gt;rt_mutex.wait_lock)
					local_irq_disable()
					spin_lock(&amp;timer-&gt;it_lock)
					spin_lock(&amp;rcu-&gt;mutex.wait_lock)
--&gt; Interrupt
    spin_lock(&amp;timer-&gt;it_lock)

This is caused by the following code sequence on CPU1

     rcu_read_lock()
     x = lookup();
     if (x)
     	spin_lock_irqsave(&amp;x-&gt;it_lock);
     rcu_read_unlock();
     return x;

We could fix that in the posix timer code by keeping rcu read locked across
the spinlocked and irq disabled section, but the above sequence is common and
there is no reason not to support it.

Taking rt_mutex.wait_lock irq safe prevents the deadlock.

Reported-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Paul McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>ptrace: use fsuid, fsgid, effective creds for fs access checks</title>
<updated>2016-01-21T01:09:18Z</updated>
<author>
<name>Jann Horn</name>
<email>jann@thejh.net</email>
</author>
<published>2016-01-20T23:00:04Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=caaee6234d05a58c5b4d05e7bf766131b810a657'/>
<id>urn:sha1:caaee6234d05a58c5b4d05e7bf766131b810a657</id>
<content type='text'>
By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.

To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.

The problem was that when a privileged task had temporarily dropped its
privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.

While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.

In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:

 /proc/$pid/stat - uses the check for determining whether pointers
     should be visible, useful for bypassing ASLR
 /proc/$pid/maps - also useful for bypassing ASLR
 /proc/$pid/cwd - useful for gaining access to restricted
     directories that contain files with lax permissions, e.g. in
     this scenario:
     lrwxrwxrwx root root /proc/13020/cwd -&gt; /root/foobar
     drwx------ root root /root
     drwxr-xr-x root root /root/foobar
     -rw-r--r-- root root /root/foobar/secret

Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn &lt;jann@thejh.net&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Casey Schaufler &lt;casey@schaufler-ca.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Morris &lt;james.l.morris@oracle.com&gt;
Cc: "Serge E. Hallyn" &lt;serge.hallyn@ubuntu.com&gt;
Cc: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: bring in additional flag for fixup_user_fault to signal unlock</title>
<updated>2016-01-16T01:56:32Z</updated>
<author>
<name>Dominik Dingel</name>
<email>dingel@linux.vnet.ibm.com</email>
</author>
<published>2016-01-16T00:57:04Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=4a9e1cda274893eca7d178d7dc265503ccb9d87a'/>
<id>urn:sha1:4a9e1cda274893eca7d178d7dc265503ccb9d87a</id>
<content type='text'>
During Jason's work with postcopy migration support for s390 a problem
regarding gmap faults was discovered.

The gmap code will call fixup_user_fault which will end up always in
handle_mm_fault.  Till now we never cared about retries, but as the
userfaultfd code kind of relies on it.  this needs some fix.

This patchset does not take care of the futex code.  I will now look
closer at this.

This patch (of 2):

With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault
to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the
faulting we ever unlocked mmap_sem.

This patch brings in the logic to handle retries as well as it cleans up
the current documentation.  fixup_user_fault was not having the same
semantics as filemap_fault.  It never indicated if a retry happened and
so a caller wasn't able to handle that case.  So we now changed the
behaviour to always retry a locked mmap_sem.

Signed-off-by: Dominik Dingel &lt;dingel@linux.vnet.ibm.com&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: "Jason J. Herne" &lt;jjherne@linux.vnet.ibm.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Eric B Munson &lt;emunson@akamai.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Dominik Dingel &lt;dingel@linux.vnet.ibm.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>futex, thp: remove special case for THP in get_futex_key</title>
<updated>2016-01-16T01:56:32Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-01-16T00:53:00Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=14d27abd1d12a64c89df1ce8c00ef1403226db5a'/>
<id>urn:sha1:14d27abd1d12a64c89df1ce8c00ef1403226db5a</id>
<content type='text'>
With new THP refcounting, we don't need tricks to stabilize huge page.
If we've got reference to tail page, it can't split under us.

This patch effectively reverts a5b338f2b0b1 ("thp: update futex compound
knowledge").

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Tested-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Acked-by: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Tested-by: Artem Savkov &lt;artem.savkov@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op</title>
<updated>2015-12-20T11:43:25Z</updated>
<author>
<name>Darren Hart</name>
<email>dvhart@linux.intel.com</email>
</author>
<published>2015-12-18T21:36:37Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=337f13046ff03717a9e99675284a817527440a49'/>
<id>urn:sha1:337f13046ff03717a9e99675284a817527440a49</id>
<content type='text'>
While reviewing Michael Kerrisk's recent futex manpage update, I noticed
that we allow the FUTEX_CLOCK_REALTIME flag for FUTEX_WAIT_BITSET but
not for FUTEX_WAIT.

FUTEX_WAIT is treated as a simple version for FUTEX_WAIT_BITSET
internally (with a bitmask of FUTEX_BITSET_MATCH_ANY). As such, I cannot
come up with a reason for this exclusion for FUTEX_WAIT.

This change does modify the behavior of the futex syscall, changing a
call with FUTEX_WAIT | FUTEX_CLOCK_REALTIME from returning -ENOSYS, to be
equivalent to FUTEX_WAIT_BITSET | FUTEX_CLOCK_REALTIME with a bitset of
FUTEX_BITSET_MATCH_ANY.

Reported-by: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Signed-off-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Link: http://lkml.kernel.org/r/9f3bdc116d79d23f5ee72ceb9a2a857f5ff8fa29.1450474525.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
</feed>
