summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-03-29x86,kgdb: Fix DEBUG_RODATA limitation using text_poke()Jason Wessel
There has long been a limitation using software breakpoints with a kernel compiled with CONFIG_DEBUG_RODATA going back to 2.6.26. For this particular patch, it will apply cleanly and has been tested all the way back to 2.6.36. The kprobes code uses the text_poke() function which accommodates writing a breakpoint into a read-only page. The x86 kgdb code can solve the problem similarly by overriding the default breakpoint set/remove routines and using text_poke() directly. The x86 kgdb code will first attempt to use the traditional probe_kernel_write(), and next try using a the text_poke() function. The break point install method is tracked such that the correct break point removal routine will get called later on. Cc: x86@kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: stable@vger.kernel.org # >= 2.6.36 Inspried-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-29kgdb,debug_core: pass the breakpoint struct instead of address and memoryJason Wessel
There is extra state information that needs to be exposed in the kgdb_bpt structure for tracking how a breakpoint was installed. The debug_core only uses the the probe_kernel_write() to install breakpoints, but this is not enough for all the archs. Some arch such as x86 need to use text_poke() in order to install a breakpoint into a read only page. Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint() allows other archs to set the type variable which indicates how the breakpoint was installed. Cc: stable@vger.kernel.org # >= 2.6.36 Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-29kgdbts: (2 of 2) fix single step awareness to work correctly with SMPJason Wessel
The do_fork and sys_open tests have never worked properly on anything other than a UP configuration with the kgdb test suite. This is because the test suite did not fully implement the behavior of a real debugger. A real debugger tracks the state of what thread it asked to single step and can correctly continue other threads of execution or conditionally stop while waiting for the original thread single step request to return. Below is a simple method to cause a fatal kernel oops with the kgdb test suite on a 2 processor ARM system: while [ 1 ] ; do ls > /dev/null 2> /dev/null; done& while [ 1 ] ; do ls > /dev/null 2> /dev/null; done& echo V1I1F100 > /sys/module/kgdbts/parameters/kgdbts Very soon after starting the test the kernel will start warning with messages like: kgdbts: BP mismatch c002487c expected c0024878 ------------[ cut here ]------------ WARNING: at drivers/misc/kgdbts.c:317 check_and_rewind_pc+0x9c/0xc4() [<c01f6520>] (check_and_rewind_pc+0x9c/0xc4) [<c01f595c>] (validate_simple_test+0x3c/0xc4) [<c01f60d4>] (run_simple_test+0x1e8/0x274) The kernel will eventually recovers, but the test suite has completely failed to test anything useful. This patch implements behavior similar to a real debugger that does not rely on hardware single stepping by using only software planted breakpoints. In order to mimic a real debugger, the kgdb test suite now tracks the most recent thread that was continued (cont_thread_id), with the intent to single step just this thread. When the response to the single step request stops in a different thread that hit the original break point that thread will now get continued, while the debugger waits for the thread with the single step pending. Here is a high level description of the sequence of events. cont_instead_of_sstep = 0; 1) set breakpoint at do_fork 2) continue 3) Save the thread id where we stop to cont_thread_id 4) Remove breakpoint at do_fork 5) Reset the PC if needed depending on kernel exception type 6) soft single step 7) Check where we stopped if current thread != cont_thread_id { if (here for more than 2 times for the same thead) { ### must be a really busy system, start test again ### goto step 1 } goto step 5 } else { cont_instead_of_sstep = 0; } 8) clean up and run test again if needed 9) Clear out any threads that were waiting on a break point at the point in time the test is ended with get_cont_catch(). This happens sometimes because breakpoints are used in place of single stepping and some threads could have been in the debugger exception handling queue because breakpoints were hit concurrently on different CPUs. This also means we wait at least one second before unplumbing the debugger connection at the very end, so as respond to any debug threads waiting to be serviced. Cc: stable@vger.kernel.org # >= 3.0 Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-29kgdbts: (1 of 2) fix single step awareness to work correctly with SMPJason Wessel
The do_fork and sys_open tests have never worked properly on anything other than a UP configuration with the kgdb test suite. This is because the test suite did not fully implement the behavior of a real debugger. A real debugger tracks the state of what thread it asked to single step and can correctly continue other threads of execution or conditionally stop while waiting for the original thread single step request to return. Below is a simple method to cause a fatal kernel oops with the kgdb test suite on a 4 processor x86 system: while [ 1 ] ; do ls > /dev/null 2> /dev/null; done& while [ 1 ] ; do ls > /dev/null 2> /dev/null; done& while [ 1 ] ; do ls > /dev/null 2> /dev/null; done& while [ 1 ] ; do ls > /dev/null 2> /dev/null; done& echo V1I1F1000 > /sys/module/kgdbts/parameters/kgdbts Very soon after starting the test the kernel will oops with a message like: kgdbts: BP mismatch 3b7da66480 expected ffffffff8106a590 WARNING: at drivers/misc/kgdbts.c:303 check_and_rewind_pc+0xe0/0x100() Call Trace: [<ffffffff812994a0>] check_and_rewind_pc+0xe0/0x100 [<ffffffff81298945>] validate_simple_test+0x25/0xc0 [<ffffffff81298f77>] run_simple_test+0x107/0x2c0 [<ffffffff81298a18>] kgdbts_put_char+0x18/0x20 The warn will turn to a hard kernel crash shortly after that because the pc will not get properly rewound to the right value after hitting a breakpoint leading to a hard lockup. This change is broken up into 2 pieces because archs that have hw single stepping (2.6.26 and up) need different changes than archs that do not have hw single stepping (3.0 and up). This change implements the correct behavior for an arch that supports hw single stepping. A minor defect was fixed where sys_open should be do_sys_open for the sys_open break point test. This solves the problem of running a 64 bit with a 32 bit user space. The sys_open() never gets called when using the 32 bit file system for the kgdb testsuite because the 32 bit binaries invoke the compat_sys_open() call leading to the test never completing. In order to mimic a real debugger, the kgdb test suite now tracks the most recent thread that was continued (cont_thread_id), with the intent to single step just this thread. When the response to the single step request stops in a different thread that hit the original break point that thread will now get continued, while the debugger waits for the thread with the single step pending. Here is a high level description of the sequence of events. cont_instead_of_sstep = 0; 1) set breakpoint at do_fork 2) continue 3) Save the thread id where we stop to cont_thread_id 4) Remove breakpoint at do_fork 5) Reset the PC if needed depending on kernel exception type 6) if (cont_instead_of_sstep) { continue } else { single step } 7) Check where we stopped if current thread != cont_thread_id { cont_instead_of_sstep = 1; goto step 5 } else { cont_instead_of_sstep = 0; } 8) clean up and run test again if needed Cc: stable@vger.kernel.org # >= 2.6.26 Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-29kgdbts: Fix kernel oops with CONFIG_DEBUG_RODATAJason Wessel
On x86 the kgdb test suite will oops when the kernel is compiled with CONFIG_DEBUG_RODATA and you run the tests after boot time. This is regression has existed since 2.6.26 by commit: b33cb815 (kgdbts: Use HW breakpoints with CONFIG_DEBUG_RODATA). The test suite can use hw breakpoints for all the tests, but it has to execute the hardware breakpoint specific tests first in order to determine that the hw breakpoints actually work. Specifically the very first test causes an oops: # echo V1I1 > /sys/module/kgdbts/parameters/kgdbts kgdb: Registered I/O driver kgdbts. kgdbts:RUN plant and detach test Entering kdb (current=0xffff880017aa9320, pid 1078) on processor 0 due to Keyboard Entry [0]kdb> kgdbts: ERROR PUT: end of test buffer on 'plant_and_detach_test' line 1 expected OK got $E14#aa WARNING: at drivers/misc/kgdbts.c:730 run_simple_test+0x151/0x2c0() [...oops clipped...] This commit re-orders the running of the tests and puts the RODATA check into its own function so as to correctly avoid the kernel oops by detecting and using the hw breakpoints. Cc: <stable@vger.kernel.org> # >= 2.6.26 Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-29kdb: Fix smatch warning on dbg_io_ops->is_consoleJason Wessel
The Smatch tool warned that the change from commit b8adde8dd (kdb: Avoid using dbg_io_ops until it is initialized) should add another null check later in the kdb_printf(). It is worth noting that the second use of dbg_io_ops->is_console is protected by the KDB_PAGER state variable which would only get set when kdb is fully active and initialized. If we ever encounter changes or defects in the KDB_PAGER state we do not want to crash the kernel in a kdb_printf/printk. CC: Tim Bird <tim.bird@am.sony.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-22kdb: Add message about CONFIG_DEBUG_RODATA on failure to install breakpointJason Wessel
On x86, if CONFIG_DEBUG_RODATA is set, one cannot set breakpoints via KDB. Apparently this is a well-known problem, as at least one distribution now ships with both KDB enabled and CONFIG_DEBUG_RODATA=y for security reasons. This patch adds an printk message to the breakpoint failure case, in order to provide suggestions about how to use the debugger. Reported-by: Tim Bird <tim.bird@am.sony.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Tim Bird <tim.bird@am.sony.com>
2012-03-22kdb: Avoid using dbg_io_ops until it is initializedTim Bird
This fixes a bug with setting a breakpoint during kdb initialization (from kdb_cmds). Any call to kdb_printf() before the initialization of the kgdboc serial console driver (which happens much later during bootup than kdb_init), results in kernel panic due to the use of dbg_io_ops before it is initialized. Signed-off-by: Tim Bird <tim.bird@am.sony.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-22kgdb,debug_core: add the ability to control the reboot notifierJason Wessel
Sometimes it is desirable to stop the kernel debugger before allowing a system to reboot either with kdb or kgdb. This patch adds the ability to turn the reboot notifier on and off or enter the debugger and stop kernel execution before rebooting. It is possible to change the setting after booting the kernel with the following: echo 1 > /sys/module/debug_core/parameters/kgdbreboot It is also possible to change this setting using kdb / kgdb to manipulate the variable directly. Using KDB: mm kgdbreboot 1 Using gdb: set kgdbreboot=1 Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-22KDB: Fix usability issues relating to the 'enter' key.Andrei Warkentin
This fixes the following problems: 1) Typematic-repeat of 'enter' gives warning message and leaks make/break if KDB exits. Repeats look something like 0x1c 0x1c .... 0x9c 2) Use of 'keypad enter' gives warning message and leaks the ENTER break/make code out if KDB exits. KP ENTER repeats look someting like 0xe0 0x1c 0xe0 0x1c ... 0xe0 0x9c. 3) Lag on the order of seconds between "break" and "make" when expecting the enter "break" code. Seen under virtualized environments such as VMware ESX. The existing special enter handler tries to glob the enter break code, but this fails if the other (KP) enter was used, or if there was a key repeat. It also fails if you mashed some keys along with enter, and you ended up with a non-enter make or non-enter break code coming after the enter make code. So first, we modify the handler to handle these cases. But performing these actions on every enter is annoying since now you can't hold ENTER down to scroll <more>d messages in KDB. Since this special behaviour is only necessary to handle the exiting KDB ('g' + ENTER) without leaking scancodes to the OS. This cleanup needs to get executed anytime the kdb_main loop exits. Tested on QEMU. Set a bp on atkbd.c to verify no scan code was leaked. Cc: Andrei Warkentin <andreiw@vmware.com> [jason.wessel@windriver.com: move cleanup calls to kdb_main.c] Signed-off-by: Andrei Warkentin <andrey.warkentin@gmail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-22kgdb,debug-core,gdbstub: Hook the reboot notifier for debugger detachJason Wessel
The gdbstub and kdb should get detached if the system is rebooting. Calling gdbstub_exit() will set the proper debug core state and send a message to any debugger that is connected to correctly detach. An attached debugger will receive the exit code from include/linux/reboot.h based on SYS_HALT, SYS_REBOOT, etc... Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-22kgdb: Respect that flush op is optionalJan Kiszka
Not all kgdb I/O drivers implement a flush operation. Adjust gdbstub_exit accordingly. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-22kgdb: x86: Return all segment registers also in 64-bit modeJan Kiszka
Even if the content is always 0, gdb expects us to return also ds, es, fs, and gs while in x86-64 mode. Do this to avoid ugly errors on "info registers". [jason.wessel@windriver.com: adjust NUMREGBYTES for two new regs] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-18Linux 3.3v3.3Linus Torvalds
2012-03-18Don't limit non-nested epoll pathsJason Baron
Commit 28d82dc1c4ed ("epoll: limit paths") that I did to limit the number of possible wakeup paths in epoll is causing a few applications to longer work (dovecot for one). The original patch is really about limiting the amount of epoll nesting (since epoll fds can be attached to other fds). Thus, we probably can allow an unlimited number of paths of depth 1. My current patch limits it at 1000. And enforce the limits on paths that have a greater depth. This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578 Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking changes from David Miller: "1) icmp6_dst_alloc() returns NULL instead of ERR_PTR() leading to crashes, particularly during shutdown. Reported by Dave Jones and fixed by Eric Dumazet. 2) hyperv and wimax/i2400m return NETDEV_TX_BUSY when they have already freed the SKB, which causes crashes as to the caller this means requeue the packet. Fixes from Eric Dumazet. 3) usbnet driver doesn't allocate the right amount of headroom on fresh RX SKBs, fix from Eric Dumazet. 4) Fix regression in ip6_mc_find_dev_rcu(), as an RCU lookup it abolutely should not take a reference to 'dev', this leads to leaks. Fix from RonQing Li. 5) Fix netfilter ctnetlink race between delete and timeout expiration. From Pablo Neira Ayuso. 6) Revert SFQ change which causes regressions, specifically queueing to tail can lead to unavoidable flow starvation. From Eric Dumazet. 7) Fix a memory leak and a crash on corrupt firmware files in bnx2x, from Michal Schmidt." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: netfilter: ctnetlink: fix race between delete and timeout expiration ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu. wimax/i2400m: fix erroneous NETDEV_TX_BUSY use net/hyperv: fix erroneous NETDEV_TX_BUSY use net/usbnet: reserve headroom on rx skbs bnx2x: fix memory leak in bnx2x_init_firmware() bnx2x: fix a crash on corrupt firmware file sch_sfq: revert dont put new flow at the end of flows ipv6: fix icmp6_dst_alloc()
2012-03-17Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools, x86: Build perf on older user-space as well perf tools: Use scnprintf where applicable perf tools: Incorrect use of snprintf results in SEGV
2012-03-17netfilter: ctnetlink: fix race between delete and timeout expirationPablo Neira Ayuso
Kerin Millar reported hardlockups while running `conntrackd -c' in a busy firewall. That system (with several processors) was acting as backup in a primary-backup setup. After several tries, I found a race condition between the deletion operation of ctnetlink and timeout expiration. This patch fixes this problem. Tested-by: Kerin Millar <kerframil@gmail.com> Reported-by: Kerin Millar <kerframil@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.RongQing.Li
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't need to dev_hold(). With dev_hold(), not corresponding dev_put(), will lead to leak. [ bug introduced in 96b52e61be1 (ipv6: mcast: RCU conversions) ] Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16Merge branch 'akpm' (more patches from Andrew)Linus Torvalds
Merge some more email patches from Andrew Morton: "A couple of nilfs fixes" * emailed from Andrew Morton <akpm@linux-foundation.org>: nilfs2: fix NULL pointer dereference in nilfs_load_super_block() nilfs2: clamp ns_r_segments_percentage to [1, 99]
2012-03-16nilfs2: fix NULL pointer dereference in nilfs_load_super_block()Ryusuke Konishi
According to the report from Slicky Devil, nilfs caused kernel oops at nilfs_load_super_block function during mount after he shrank the partition without resizing the filesystem: BUG: unable to handle kernel NULL pointer dereference at 00000048 IP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP ... Call Trace: [<d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2] [<d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2] [<c0226636>] mount_fs+0x36/0x180 [<c023d961>] vfs_kern_mount+0x51/0xa0 [<c023ddae>] do_kern_mount+0x3e/0xe0 [<c023f189>] do_mount+0x169/0x700 [<c023fa9b>] sys_mount+0x6b/0xa0 [<c04abd1f>] sysenter_do_call+0x12/0x28 Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43 20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72 48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00 EIP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:ca9bbdcc CR2: 0000000000000048 This turned out due to a defect in an error path which runs if the calculated location of the secondary super block was invalid. This patch fixes it and eliminates the reported oops. Reported-by: Slicky Devil <slicky.dvl@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Slicky Devil <slicky.dvl@gmail.com> Cc: <stable@vger.kernel.org> [2.6.30+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-16nilfs2: clamp ns_r_segments_percentage to [1, 99]Haogang Chen
ns_r_segments_percentage is read from the disk. Bogus or malicious value could cause integer overflow and malfunction due to meaningless disk usage calculation. This patch reports error when mounting such bogus volumes. Signed-off-by: Haogang Chen <haogangchen@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull maintainer update from James Morris: "Please pull this patch which adds Serge as maintainer of the capabilities code, as discussed on lwn and the lsm list. New capabilities must be signed off by the maintainer, and new uses of any capabilities should at be cc'd to the maintainer." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: MAINTAINERS: Add Serge as maintainer of capabilities
2012-03-16Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreamingLinus Torvalds
Pull c6x bugfix from Mark Salter: "Remove dead code from entry.S which causes a build failure when using a newer assembler (v2.22 complains about it, v2.20 ignores it)." * tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming: C6X: remove dead code from entry.S
2012-03-16afs: Remote abort can cause BUG in rxrpc codeAnton Blanchard
When writing files to afs I sometimes hit a BUG: kernel BUG at fs/afs/rxrpc.c:179! With a backtrace of: afs_free_call afs_make_call afs_fs_store_data afs_vnode_store_data afs_write_back_from_locked_page afs_writepages_region afs_writepages The cause is: ASSERT(skb_queue_empty(&call->rx_queue)); Looking at a tcpdump of the session the abort happens because we are exceeding our disk quota: rx abort fs reply store-data error diskquota exceeded (32) So the abort error is valid. We hit the BUG because we haven't freed all the resources for the call. By freeing any skbs in call->rx_queue before calling afs_free_call we avoid hitting leaking memory and avoid hitting the BUG. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-16afs: Read of file returns EBADMSGAnton Blanchard
A read of a large file on an afs mount failed: # cat junk.file > /dev/null cat: junk.file: Bad message Looking at the trace, call->offset wrapped since it is only an unsigned short. In afs_extract_data: _enter("{%u},{%zu},%d,,%zu", call->offset, len, last, count); ... if (call->offset < count) { if (last) { _leave(" = -EBADMSG [%d < %zu]", call->offset, count); return -EBADMSG; } Which matches the trace: [cat ] ==> afs_extract_data({65132},{524},1,,65536) [cat ] <== afs_extract_data() = -EBADMSG [0 < 65536] call->offset went from 65132 to 0. Fix this by making call->offset an unsigned int. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-16C6X: remove dead code from entry.SMark Salter
The ENDPROC() on sys_fadvise64_c6x() in arch/c6x/kernel/entry.S is outside of the conditional block with the matching ENTRY() macro. This leads a newer (v2.22 vs. v2.20) assembler to complain: /tmp/ccGZBaPT.s: Assembler messages: /tmp/ccGZBaPT.s: Error: .size expression for sys_fadvise64_c6x does not evaluate to a constant The conditional block became dead code when c6x switched to generic unistd.h and should be removed along with the offending ENDPROC(). Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: David Howells <dhowells@redhat.com>
2012-03-16wimax/i2400m: fix erroneous NETDEV_TX_BUSY useEric Dumazet
A driver start_xmit() method cannot free skb and return NETDEV_TX_BUSY, since caller is going to reuse freed skb. In fact netif_tx_stop_queue() / netif_stop_queue() is needed before returning NETDEV_TX_BUSY or you can trigger a ksoftirqd fatal loop. In case of memory allocation error, only safe way is to drop the packet and return NETDEV_TX_OK Also increments tx_dropped counter Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16net/hyperv: fix erroneous NETDEV_TX_BUSY useEric Dumazet
A driver start_xmit() method cannot free skb and return NETDEV_TX_BUSY, since caller is going to reuse freed skb. This is mostly a revert of commit bf769375c (staging: hv: fix the return status of netvsc_start_xmit()) In fact netif_tx_stop_queue() / netif_stop_queue() is needed before returning NETDEV_TX_BUSY or you can trigger a ksoftirqd fatal loop. In case of memory allocation error, only safe way is to drop the packet and return NETDEV_TX_OK Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16net/usbnet: reserve headroom on rx skbsEric Dumazet
network drivers should reserve some headroom on incoming skbs so that we dont need expensive reallocations, eg forwarding packets in tunnels. This NET_SKB_PAD padding is done in various helpers, like __netdev_alloc_skb_ip_align() in this patch, combining NET_SKB_PAD and NET_IP_ALIGN magic. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Oliver Neukum <oneukum@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16bnx2x: fix memory leak in bnx2x_init_firmware()Michal Schmidt
When cycling the interface down and up, bnx2x_init_firmware() knows that the firmware is already loaded, but nevertheless it allocates certain arrays anew (init_data, init_ops, init_ops_offsets, iro_arr). The old arrays are leaked. Fix the leaks by returning early if the firmware was already loaded. Because if the firmware is loaded, so are the arrays. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16bnx2x: fix a crash on corrupt firmware fileMichal Schmidt
If the requested firmware is deemed corrupt and then released, reset the pointer to NULL in order to avoid double-freeing it in bnx2x_release_firmware() or dereferencing it in bnx2x_init_firmware(). Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16sch_sfq: revert dont put new flow at the end of flowsEric Dumazet
This reverts commit d47a0ac7b6 (sch_sfq: dont put new flow at the end of flows) As Jesper found out, patch sounded great but has bad side effects. In stress situation, pushing new flows in front of the queue can prevent old flows doing any progress. Packets can stay in SFQ queue for unlimited amount of time. It's possible to add heuristics to limit this problem, but this would add complexity outside of SFQ scope. A more sensible answer to Dave Taht concerns (who reported the issued I tried to solve in original commit) is probably to use a qdisc hierarchy so that high prio packets dont enter a potentially crowded SFQ qdisc. Reported-by: Jesper Dangaard Brouer <jdb@comx.dk> Cc: Dave Taht <dave.taht@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16ipv6: fix icmp6_dst_alloc()Eric Dumazet
commit 87a115783 ( ipv6: Move xfrm_lookup() call down into icmp6_dst_alloc().) forgot to convert one error path, leading to crashes in mld_sendpack() Many thanks to Dave Jones for providing a very complete bug report. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16MAINTAINERS: Add Serge as maintainer of capabilitiesJames Morris
Add Serge as maintainer of capabilities, per suggestion on LWN: http://lwn.net/Articles/486306/ Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-03-15Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge patches from Andrew Morton: "Nine patches - some bug fixes and some MAINTAINERS fiddling." * emailed from Andrew Morton <akpm@linux-foundation.org>: drivers/video/backlight/s6e63m0.c: fix corruption storing gamma mode MAINTAINERS: add entry for exynos mipi display drivers MAINTAINERS: fix link to Gustavo Padovans tree MAINTAINERS: add Johan to Bluetooth maintainers MAINTAINERS: Gustavo has moved prctl: use CAP_SYS_RESOURCE for PR_SET_MM option rapidio/tsi721: fix bug in register offset definitions MAINTAINERS: update ST's Mailing list for SPEAr memcg: free mem_cgroup by RCU to fix oops
2012-03-15Merge branch 'i2c-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull i2c subsystem fixes from Jean Delvare. * 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: i2c-algo-bit: Fix spurious SCL timeouts under heavy load i2c-core: Comment says "transmitted" but means "received"
2012-03-15Merge tag 'hwmon-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck. * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (zl6100) Enable interval between chip accesses for all chips hwmon: (w83627ehf) Describe undocumented pwm attributes hwmon: (w83627ehf) Fix temp2 source for W83627UHG hwmon: (w83627ehf) Fix memory leak in probe function hwmon: (w83627ehf) Fix writing into fan_stop_time for NCT6775F/NCT6776F
2012-03-15Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm exynos/intel updates from Dave Airlie: "Two minor updates from Jesse for Intel SNB fixes, and a few fixes from Samsung for exynos. The pull req has Alan's commit in it since Intel based their tree on my tree at that time, but it all seems fine wrt merging." * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm exynos: use drm_fb_helper_set_par directly drm/exynos: Fix fb_videomode <-> drm_mode_modeinfo conversion drm/exynos: fix runtime_pm fimd device state on probe drm/exynos: use correct 'exynos-drm' name for platform device drm/i915: support 32 bit BGR formats in sprite planes drm/i915: fix color order for BGR formats on SNB drm/gma500: Fix Cedarview boot failures in 3.3-rc
2012-03-15Merge branch 'v4l_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "For 4 fixes for 3.3 (all trivial): - uvc video driver: fixes a division by zero; - davinci: add module.h to fix compilation; - smsusb: fix the delivery system setting; - smsdvb: the get_frontend implementation there is broken. The smsdvb patch has 127 lines, but it is trivial: instead of returning a cache of the set_frontend (with is wrong, as it doesn't have the updated values for the data, and the implementation there is buggy), it copies the information of the detected DVB parameters from the smsdvb private structures into the corresponding DVBv5 struct fields." * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] smsdvb: fix get_frontend [media] smsusb: fix the default delivery system setting [media] media: davinci: added module.h to resolve unresolved macros [media] [FOR,v3.3] uvcvideo: Avoid division by 0 in timestamp calculation
2012-03-15Merge branch '3.3-urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull target fixes from Nicholas Bellinger: "This series addresses two recently reported regression bugs related to legacy SCSI reservation usage in target core, and iscsi-target reservation conflict handling. The second patch in particular addresses possible data-corruption with SCSI reservations that is specific to iscsi-target fabric LUNs with multiple client writers. Both patches need to go into v3.2 stable ASAP, and the branch based on the last target-pending/3.3-rc-fixes HEAD. Again, thanks to Martin Svec for his help to identify and address this regression bug with iscsi-target." * '3.3-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: iscsi-target: Fix reservation conflict -EBUSY response handling bug target: Fix compatible reservation handling (CRH=1) with legacy RESERVE/RELEASE
2012-03-15drivers/video/backlight/s6e63m0.c: fix corruption storing gamma modeDan Carpenter
strict_strtoul() writes a long but ->gamma_mode only has space to store an int, so on 64 bit systems we end up scribbling over ->gamma_table_count as well. I've changed it to use kstrtouint() instead. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-15MAINTAINERS: add entry for exynos mipi display driversDonghwa Lee
I'd like to add Inki Dae, Donghwa Lee and Kyungmin Park as maintainers who developers for exynos mipi display drivers for video/driver/exynos/exynos_mipi* and include/video/exynos_mipi*. Signed-off-by: Donghwa Lee <dh09.lee@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Kukjin Kim <kgene.kim@samsung.com> Cc: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-15MAINTAINERS: fix link to Gustavo Padovans treeJohan Hedberg
Gustavo's tree is called just bluetooth.git and not bluetooth-2.6.git anymore. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: "Gustavo F. Padovan" <padovan@profusion.mobi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-15MAINTAINERS: add Johan to Bluetooth maintainersJohan Hedberg
I've been coordinating Bluetooth patches in my tree for some time and it's possible I'll do it in the future too, so add myself to the Bluetooth sections as well as mention my tree there. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: "Gustavo F. Padovan" <padovan@profusion.mobi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-15MAINTAINERS: Gustavo has movedGustavo Padovan
This is going to be the primary e-mail for kernel development. Signed-off-by: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-15prctl: use CAP_SYS_RESOURCE for PR_SET_MM optionCyrill Gorcunov
CAP_SYS_ADMIN is already overloaded left and right, so to have more fine-grained access control use CAP_SYS_RESOURCE here. The CAP_SYS_RESOUCE is chosen because this prctl option allows a current process to adjust some fields of memory map descriptor which rather represents what the process owns: pointers to code, data, stack segments, command line, auxiliary vector data and etc. Suggested-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-15rapidio/tsi721: fix bug in register offset definitionsAlexandre Bounine
Fix indexed register offset definitions that use decimal (wrong) instead of hexadecimal (correct) notation for indexing multipliers. Incorrect definitions do not affect Tsi721 driver in its current default configuration because it uses only IDB queue 0. Loss of inbound doorbell functionality should be observed if queue other than 0 is used. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Chul Kim <chul.kim@idt.com> Cc: <stable@vger.kernel.org> [3.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-15MAINTAINERS: update ST's Mailing list for SPEArViresh Kumar
We have created a ST's Mailing list for SPEAr. This can be accessed from non-st email ids. I want people to cc this list, when they have changes specific to SPEAr. So, its better to get this updated in MAINTAINERS file. linux-arm-kernel@lists.infradead.org is also added for SPEAr. Signed-off-by: Viresh Kumar <viresh.kumar@st.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-15memcg: free mem_cgroup by RCU to fix oopsHugh Dickins
After fixing the GPF in mem_cgroup_lru_del_list(), three times one machine running a similar load (moving and removing memcgs while swapping) has oopsed in mem_cgroup_zone_nr_lru_pages(), when retrieving memcg zone numbers for get_scan_count() for shrink_mem_cgroup_zone(): this is where a struct mem_cgroup is first accessed after being chosen by mem_cgroup_iter(). Just what protects a struct mem_cgroup from being freed, in between mem_cgroup_iter()'s css_get_next() and its css_tryget()? css_tryget() fails once css->refcnt is zero with CSS_REMOVED set in flags, yes: but what if that memory is freed and reused for something else, which sets "refcnt" non-zero? Hmm, and scope for an indefinite freeze if refcnt is left at zero but flags are cleared. It's tempting to move the css_tryget() into css_get_next(), to make it really "get" the css, but I don't think that actually solves anything: the same difficulty in moving from css_id found to stable css remains. But we already have rcu_read_lock() around the two, so it's easily fixed if __mem_cgroup_free() just uses kfree_rcu() to free mem_cgroup. However, a big struct mem_cgroup is allocated with vzalloc() instead of kzalloc(), and we're not allowed to vfree() at interrupt time: there doesn't appear to be a general vfree_rcu() to help with this, so roll our own using schedule_work(). The compiler decently removes vfree_work() and vfree_rcu() when the config doesn't need them. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>