From f37755490fe9bf76f6ba1d8c6591745d3574a6a6 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Red Hat)" Date: Mon, 15 Feb 2016 12:36:14 -0500 Subject: tracepoints: Do not trace when cpu is offline The tracepoint infrastructure uses RCU sched protection to enable and disable tracepoints safely. There are some instances where tracepoints are used in infrastructure code (like kfree()) that get called after a CPU is going offline, and perhaps when it is coming back online but hasn't been registered yet. This can probuce the following warning: [ INFO: suspicious RCU usage. ] 4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S ------------------------------- include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/8/0. stack backtrace: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #34 Call Trace: [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable) [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170 [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440 [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100 [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150 [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140 [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310 [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60 [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40 [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560 [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360 [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14 This warning is not a false positive either. RCU is not protecting code that is being executed while the CPU is offline. Instead of playing "whack-a-mole(TM)" and adding conditional statements to the tracepoints we find that are used in this instance, simply add a cpu_online() test to the tracepoint code where the tracepoint will be ignored if the CPU is offline. Use of raw_smp_processor_id() is fine, as there should never be a case where the tracepoint code goes from running on a CPU that is online and suddenly gets migrated to a CPU that is offline. Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org Reported-by: Denis Kirjanov Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints") Cc: stable@vger.kernel.org # v2.6.28+ Signed-off-by: Steven Rostedt --- include/linux/tracepoint.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index acd522a91539..acfdbf353a0b 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -14,8 +14,10 @@ * See the file COPYING for more details. */ +#include #include #include +#include #include #include @@ -132,6 +134,9 @@ extern void syscall_unregfunc(void); void *it_func; \ void *__data; \ \ + if (!cpu_online(raw_smp_processor_id())) \ + return; \ + \ if (!(cond)) \ return; \ prercu; \ -- cgit v1.2.3-70-g09d2 From b33c8ff4431a343561e2319f17c14286f2aa52e2 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 12 Feb 2016 22:26:42 +0100 Subject: tracing: Fix freak link error caused by branch tracer In my randconfig tests, I came across a bug that involves several components: * gcc-4.9 through at least 5.3 * CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files * CONFIG_PROFILE_ALL_BRANCHES overriding every if() * The optimized implementation of do_div() that tries to replace a library call with an division by multiplication * code in drivers/media/dvb-frontends/zl10353.c doing u32 adc_clock = 450560; /* 45.056 MHz */ if (state->config.adc_clock) adc_clock = state->config.adc_clock; do_div(value, adc_clock); In this case, gcc fails to determine whether the divisor in do_div() is __builtin_constant_p(). In particular, it concludes that __builtin_constant_p(adc_clock) is false, while __builtin_constant_p(!!adc_clock) is true. That in turn throws off the logic in do_div() that also uses __builtin_constant_p(), and instead of picking either the constant- optimized division, and the code in ilog2() that uses __builtin_constant_p() to figure out whether it knows the answer at compile time. The result is a link error from failing to find multiple symbols that should never have been called based on the __builtin_constant_p(): dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN' dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod' ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined! ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined! This patch avoids the problem by changing __trace_if() to check whether the condition is known at compile-time to be nonzero, rather than checking whether it is actually a constant. I see this one link error in roughly one out of 1600 randconfig builds on ARM, and the patch fixes all known instances. Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de Acked-by: Nicolas Pitre Fixes: ab3c9c686e22 ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y") Cc: stable@vger.kernel.org # v2.6.30+ Signed-off-by: Arnd Bergmann Signed-off-by: Steven Rostedt --- include/linux/compiler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 00b042c49ccd..48f5aab117ae 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -144,7 +144,7 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect); */ #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) ) #define __trace_if(cond) \ - if (__builtin_constant_p((cond)) ? !!(cond) : \ + if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ({ \ int ______r; \ static struct ftrace_branch_data \ -- cgit v1.2.3-70-g09d2