diff options
| author | Michael Kerrisk <mtk.manpages@gmail.com> | 2014-03-10 14:46:07 +0100 | 
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2014-03-16 10:41:04 -0700 | 
| commit | 4f87dac386cc43d5525da7a939d4b4e7edbea22c (patch) | |
| tree | 80f2b7381fe213302d1903b32f9617d1de3501ce /tools/perf/scripts/python/failed-syscalls-by-pid.py | |
| parent | 3b4df68d0697129c172e66fd925cf32192044f9a (diff) | |
ipc: Fix 2 bugs in msgrcv() MSG_COPY implementation
While testing and documenting the msgrcv() MSG_COPY flag that Stanislav
Kinsbursky added in commit 4a674f34ba04 ("ipc: introduce message queue
copy feature" => kernel 3.8), I discovered a couple of bugs in the
implementation.  The two bugs concern MSG_COPY interactions with other
msgrcv() flags, namely:
 (A) MSG_COPY + MSG_EXCEPT
 (B) MSG_COPY + !IPC_NOWAIT
The bugs are distinct (and the fix for the first one is obvious),
however my fix for both is a single-line patch, which is why I'm
combining them in a single mail, rather than writing two mails+patches.
 ===== (A) MSG_COPY + MSG_EXCEPT =====
With the addition of the MSG_COPY flag, there are now two msgrcv()
flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp'
argument in unrelated ways.  Specifying both in the same call is a
logical error that is currently permitted, with the effect that MSG_COPY
has priority and MSG_EXCEPT is ignored.  The call should give an error
if both flags are specified.  The patch below implements that behavior.
 ===== (B) (B) MSG_COPY + !IPC_NOWAIT =====
The test code that was submitted in commit 3a665531a3b7 ("selftests: IPC
message queue copy feature test") shows MSG_COPY being used in
conjunction with IPC_NOWAIT.  In other words, if there is no message at
the position 'msgtyp'.  return immediately with the error in ENOMSG.
What was not (fully) tested is the behavior if MSG_COPY is specified
*without* IPC_NOWAIT, and there is an odd behavior.  If the queue
contains less than 'msgtyp' messages, then the call blocks until the
next message is written to the queue.  At that point, the msgrcv() call
returns a copy of the newly added message, regardless of whether that
message is at the ordinal position 'msgtyp'.  This is clearly bogus, and
problematic for applications that might want to make use of the MSG_COPY
flag.
I considered the following possible solutions to this problem:
 (1) Force the call to block until a message *does* appear at the
     position 'msgtyp'.
 (2) If the MSG_COPY flag is specified, the kernel should implicitly add
     IPC_NOWAIT, so that the call fails with ENOMSG for this case.
 (3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate
     an error (probably, EINVAL is the right one).
I do not know if any application would really want to have the
functionality of solution (1), especially since an application can
determine in advance the number of messages in the queue using msgctl()
IPC_STAT.  Obviously, this solution would be the most work to implement.
Solution (2) would have the effect of silently fixing any applications
that tried to employ broken behavior.  However, it would mean that if we
later decided to implement solution (1), then user-space could not
easily detect what the kernel supports (but, since I'm somewhat doubtful
that solution (1) is needed, I'm not sure that this is much of a
problem).
Solution (3) would have the effect of informing broken applications that
they are doing something broken.  The downside is that this would cause
a ABI breakage for any applications that are currently employing the
broken behavior.  However:
a) Those applications are almost certainly not getting the results they
   expect.
b) Possibly, those applications don't even exist, because MSG_COPY is
   currently hidden behind CONFIG_CHECKPOINT_RESTORE.
The upside of solution (3) is that if we later decided to implement
solution (1), user-space could determine what the kernel supports, via
the error return.
In my view, solution (3) is mildly preferable to solution (2), and
solution (1) could still be done later if anyone really cares.  The
patch below implements solution (3).
PS.  For anyone out there still listening, it's the usual story:
documenting an API (and the thinking about, and the testing of the API,
that documentation entails) is the one of the single best ways of
finding bugs in the API, as I've learned from a lot of experience.  Best
to do that documentation before releasing the API.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'tools/perf/scripts/python/failed-syscalls-by-pid.py')
0 files changed, 0 insertions, 0 deletions
