diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-01-11 14:39:17 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-01-11 14:39:17 -0800 |
commit | 5cb52b5e1654f3f1ed9c32e34456d98559c85aa0 (patch) | |
tree | 737c73d6aef99a17f57c2974f1e2a142a5f1a377 /tools/perf/builtin-script.c | |
parent | 24af98c4cf5f5e69266e270c7f3fb34b82ff6656 (diff) | |
parent | 3eb9ede23bdd96e9ba60e2b4d4d17a7c35d58448 (diff) |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Intel Knights Landing support. (Harish Chegondi)
- Intel Broadwell-EP uncore PMU support. (Kan Liang)
- Core code improvements. (Peter Zijlstra.)
- Event filter, LBR and PEBS fixes. (Stephane Eranian)
- Enable cycles:pp on Intel Atom. (Stephane Eranian)
- Add cycles:ppp support for Skylake. (Andi Kleen)
- Various x86 NMI overhead optimizations. (Andi Kleen)
- Intel PT enhancements. (Takao Indoh)
- AMD cache events fix. (Vince Weaver)
Tons of tooling changes:
- Show random perf tool tips in the 'perf report' bottom line
(Namhyung Kim)
- perf report now defaults to --group if the perf.data file has
grouped events, try it with:
# perf record -e '{cycles,instructions}' -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ]
# perf report
# Samples: 1K of event 'anon group { cycles, instructions }'
# Event count (approx.): 1955219195
#
# Overhead Command Shared Object Symbol
2.86% 0.22% swapper [kernel.kallsyms] [k] intel_idle
1.05% 0.33% firefox libxul.so [.] js::SetObjectElement
1.05% 0.00% kworker/0:3 [kernel.kallsyms] [k] gen6_ring_get_seqno
0.88% 0.17% chrome chrome [.] 0x0000000000ee27ab
0.65% 0.86% firefox libxul.so [.] js::ValueToId<(js::AllowGC)1>
0.64% 0.23% JS Helper libxul.so [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay
0.62% 1.27% firefox libxul.so [.] js::GetIterator
0.61% 1.74% firefox libxul.so [.] js::NativeSetProperty
0.61% 0.31% firefox libxul.so [.] js::SetPropertyByDefining
- Introduce the 'perf stat record/report' workflow:
Generate perf.data files from 'perf stat', to tap into the
scripting capabilities perf has instead of defining a 'perf stat'
specific scripting support to calculate event ratios, etc.
Simple example:
$ perf stat record -e cycles usleep 1
Performance counter stats for 'usleep 1':
1,134,996 cycles
0.000670644 seconds time elapsed
$ perf stat report
Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1':
1,134,996 cycles
0.000670644 seconds time elapsed
$
It generates PERF_RECORD_ userspace records to store the details:
$ perf report -D | grep PERF_RECORD
0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637
0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
0x12a [0x40]: PERF_RECORD_STAT_CONFIG
0x16a [0x30]: PERF_RECORD_STAT
-1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text
0x1da [0x18]: PERF_RECORD_STAT_ROUND
[acme@ssdandy linux]$
An effort was made to make perf.data files generated like this to
not generate cryptic messages when processed by older tools.
The 'perf script' bits need rebasing, will go up later.
- Make command line options always available, even when they depend
on some feature being enabled, warning the user about use of such
options (Wang Nan)
- Support hw breakpoint events (mem:0xAddress) in the default output
mode in 'perf script' (Wang Nan)
- Fixes and improvements for supporting annotating ARM binaries,
support ARM call and jump instructions, more work needed to have
arch specific stuff separated into tools/perf/arch/*/annotate/
(Russell King)
- Add initial 'perf config' command, for now just with a --list
command to the contents of the configuration file in use and a
basic man page describing its format, commands for doing edits and
detailed documentation are being reviewed and proof-read. (Taeung
Song)
- Allows BPF scriptlets specify arguments to be fetched using DWARF
info, using a prologue generated at compile/build time (He Kuang,
Wang Nan)
- Allow attaching BPF scriptlets to module symbols (Wang Nan)
- Allow attaching BPF scriptlets to userspace code using uprobe (Wang
Nan)
- BPF programs now can specify 'perf probe' tunables via its section
name, separating key=val values using semicolons (Wang Nan)
Testing some of these new BPF features:
Use case: get callchains when receiving SSL packets, filter then in the
kernel, at arbitrary place.
# cat ssl.bpf.c
#define SEC(NAME) __attribute__((section(NAME), used))
struct pt_regs;
SEC("func=__inet_lookup_established hnum")
int func(struct pt_regs *ctx, int err, unsigned short port)
{
return err == 0 && port == 443;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
#
# perf record -a -g -e ssl.bpf.c
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ]
# perf script | head -30
swapper 0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux)
856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux)
2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux)
2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux)
96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux)
969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux)
2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux)
95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux)
1163ffa start_kernel ([kernel.vmlinux].init.text)
11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text)
1163623 x86_64_start_kernel ([kernel.vmlinux].init.text)
qemu-system-x86 9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux)
8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux)
430a br_handle_frame_finish ([bridge])
48bc br_handle_frame ([bridge])
855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
#
- Use 'perf probe' various options to list functions, see what
variables can be collected at any given point, experiment first
collecting without a filter, then filter, use it together with
'perf trace', 'perf top', with or without callchains, if it
explodes, please tell us!
- Introduce a new callchain mode: "folded", that will list per line
representations of all callchains for a give histogram entry,
facilitating 'perf report' output processing by other tools, such
as Brendan Gregg's flamegraph tools (Namhyung Kim)
E.g:
# perf report | grep -v ^# | head
18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
|
---cpu_startup_entry
|
|--12.07%--start_secondary
|
--6.30%--rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
#
Becomes, in "folded" mode:
# perf report -g folded | grep -v ^# | head -5
18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
12.07% cpu_startup_entry;start_secondary
6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
16.90% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle
11.23% call_cpuidle;cpu_startup_entry;start_secondary
5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
16.90% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter
11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
15.12% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state
#
The user can also select one of "count", "period" or "percent" as
the first column.
... and lots of infrastructure enhancements, plus fixes and other
changes, features I failed to list - see the shortlog and the git log
for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits)
perf evlist: Add --trace-fields option to show trace fields
perf record: Store data mmaps for dwarf unwind
perf libdw: Check for mmaps also in MAP__VARIABLE tree
perf unwind: Check for mmaps also in MAP__VARIABLE tree
perf unwind: Use find_map function in access_dso_mem
perf evlist: Remove perf_evlist__(enable|disable)_event functions
perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does)
perf report: Show random usage tip on the help line
perf hists: Export a couple of hist functions
perf diff: Use perf_hpp__register_sort_field interface
perf tools: Add overhead/overhead_children keys defaults via string
perf tools: Remove list entry from struct sort_entry
perf tools: Include all tools/lib directory for tags/cscope/TAGS targets
perf script: Align event name properly
perf tools: Add missing headers in perf's MANIFEST
perf tools: Do not show trace command if it's not compiled in
perf report: Change default to use event group view
perf top: Decay periods in callchains
tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/
tools lib: Sync tools/lib/find_bit.c with the kernel
...
Diffstat (limited to 'tools/perf/builtin-script.c')
-rw-r--r-- | tools/perf/builtin-script.c | 245 |
1 files changed, 199 insertions, 46 deletions
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 72b5deb4bd79..c691214d820f 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3,9 +3,9 @@ #include "perf.h" #include "util/cache.h" #include "util/debug.h" -#include "util/exec_cmd.h" +#include <subcmd/exec-cmd.h> #include "util/header.h" -#include "util/parse-options.h" +#include <subcmd/parse-options.h> #include "util/perf_regs.h" #include "util/session.h" #include "util/tool.h" @@ -18,7 +18,11 @@ #include "util/sort.h" #include "util/data.h" #include "util/auxtrace.h" +#include "util/cpumap.h" +#include "util/thread_map.h" +#include "util/stat.h" #include <linux/bitmap.h> +#include "asm/bug.h" static char const *script_name; static char const *generate_script_lang; @@ -32,6 +36,7 @@ static bool print_flags; static bool nanosecs; static const char *cpu_list; static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); +static struct perf_stat_config stat_config; unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH; @@ -130,6 +135,18 @@ static struct { .invalid_fields = PERF_OUTPUT_TRACE, }, + + [PERF_TYPE_BREAKPOINT] = { + .user_set = false, + + .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID | + PERF_OUTPUT_CPU | PERF_OUTPUT_TIME | + PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP | + PERF_OUTPUT_SYM | PERF_OUTPUT_DSO | + PERF_OUTPUT_PERIOD, + + .invalid_fields = PERF_OUTPUT_TRACE, + }, }; static bool output_set_by_user(void) @@ -204,6 +221,9 @@ static int perf_evsel__check_attr(struct perf_evsel *evsel, struct perf_event_attr *attr = &evsel->attr; bool allow_user_set; + if (perf_header__has_feat(&session->header, HEADER_STAT)) + return 0; + allow_user_set = perf_header__has_feat(&session->header, HEADER_AUXTRACE); @@ -588,8 +608,35 @@ static void print_sample_flags(u32 flags) printf(" %-4s ", str); } -static void process_event(union perf_event *event, struct perf_sample *sample, - struct perf_evsel *evsel, struct addr_location *al) +struct perf_script { + struct perf_tool tool; + struct perf_session *session; + bool show_task_events; + bool show_mmap_events; + bool show_switch_events; + bool allocated; + struct cpu_map *cpus; + struct thread_map *threads; + int name_width; +}; + +static int perf_evlist__max_name_len(struct perf_evlist *evlist) +{ + struct perf_evsel *evsel; + int max = 0; + + evlist__for_each(evlist, evsel) { + int len = strlen(perf_evsel__name(evsel)); + + max = MAX(len, max); + } + + return max; +} + +static void process_event(struct perf_script *script, union perf_event *event, + struct perf_sample *sample, struct perf_evsel *evsel, + struct addr_location *al) { struct thread *thread = al->thread; struct perf_event_attr *attr = &evsel->attr; @@ -604,7 +651,12 @@ static void process_event(union perf_event *event, struct perf_sample *sample, if (PRINT_FIELD(EVNAME)) { const char *evname = perf_evsel__name(evsel); - printf("%s: ", evname ? evname : "[unknown]"); + + if (!script->name_width) + script->name_width = perf_evlist__max_name_len(script->session->evlist); + + printf("%*s: ", script->name_width, + evname ? evname : "[unknown]"); } if (print_flags) @@ -643,65 +695,81 @@ static void process_event(union perf_event *event, struct perf_sample *sample, printf("\n"); } -static int default_start_script(const char *script __maybe_unused, - int argc __maybe_unused, - const char **argv __maybe_unused) -{ - return 0; -} +static struct scripting_ops *scripting_ops; -static int default_flush_script(void) +static void __process_stat(struct perf_evsel *counter, u64 tstamp) { - return 0; + int nthreads = thread_map__nr(counter->threads); + int ncpus = perf_evsel__nr_cpus(counter); + int cpu, thread; + static int header_printed; + + if (counter->system_wide) + nthreads = 1; + + if (!header_printed) { + printf("%3s %8s %15s %15s %15s %15s %s\n", + "CPU", "THREAD", "VAL", "ENA", "RUN", "TIME", "EVENT"); + header_printed = 1; + } + + for (thread = 0; thread < nthreads; thread++) { + for (cpu = 0; cpu < ncpus; cpu++) { + struct perf_counts_values *counts; + + counts = perf_counts(counter->counts, cpu, thread); + + printf("%3d %8d %15" PRIu64 " %15" PRIu64 " %15" PRIu64 " %15" PRIu64 " %s\n", + counter->cpus->map[cpu], + thread_map__pid(counter->threads, thread), + counts->val, + counts->ena, + counts->run, + tstamp, + perf_evsel__name(counter)); + } + } } -static int default_stop_script(void) +static void process_stat(struct perf_evsel *counter, u64 tstamp) { - return 0; + if (scripting_ops && scripting_ops->process_stat) + scripting_ops->process_stat(&stat_config, counter, tstamp); + else + __process_stat(counter, tstamp); } -static int default_generate_script(struct pevent *pevent __maybe_unused, - const char *outfile __maybe_unused) +static void process_stat_interval(u64 tstamp) { - return 0; + if (scripting_ops && scripting_ops->process_stat_interval) + scripting_ops->process_stat_interval(tstamp); } -static struct scripting_ops default_scripting_ops = { - .start_script = default_start_script, - .flush_script = default_flush_script, - .stop_script = default_stop_script, - .process_event = process_event, - .generate_script = default_generate_script, -}; - -static struct scripting_ops *scripting_ops; - static void setup_scripting(void) { setup_perl_scripting(); setup_python_scripting(); - - scripting_ops = &default_scripting_ops; } static int flush_scripting(void) { - return scripting_ops->flush_script(); + return scripting_ops ? scripting_ops->flush_script() : 0; } static int cleanup_scripting(void) { pr_debug("\nperf script stopped\n"); - return scripting_ops->stop_script(); + return scripting_ops ? scripting_ops->stop_script() : 0; } -static int process_sample_event(struct perf_tool *tool __maybe_unused, +static int process_sample_event(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, struct perf_evsel *evsel, struct machine *machine) { + struct perf_script *scr = container_of(tool, struct perf_script, tool); struct addr_location al; if (debug_mode) { @@ -727,20 +795,16 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused, if (cpu_list && !test_bit(sample->cpu, cpu_bitmap)) goto out_put; - scripting_ops->process_event(event, sample, evsel, &al); + if (scripting_ops) + scripting_ops->process_event(event, sample, evsel, &al); + else + process_event(scr, event, sample, evsel, &al); + out_put: addr_location__put(&al); return 0; } -struct perf_script { - struct perf_tool tool; - struct perf_session *session; - bool show_task_events; - bool show_mmap_events; - bool show_switch_events; -}; - static int process_attr(struct perf_tool *tool, union perf_event *event, struct perf_evlist **pevlist) { @@ -1156,6 +1220,8 @@ static int parse_output_fields(const struct option *opt __maybe_unused, type = PERF_TYPE_TRACEPOINT; else if (!strcmp(str, "raw")) type = PERF_TYPE_RAW; + else if (!strcmp(str, "break")) + type = PERF_TYPE_BREAKPOINT; else { fprintf(stderr, "Invalid event type in field string.\n"); rc = -EINVAL; @@ -1421,7 +1487,7 @@ static int list_available_scripts(const struct option *opt __maybe_unused, char first_half[BUFSIZ]; char *script_root; - snprintf(scripts_path, MAXPATHLEN, "%s/scripts", perf_exec_path()); + snprintf(scripts_path, MAXPATHLEN, "%s/scripts", get_argv_exec_path()); scripts_dir = opendir(scripts_path); if (!scripts_dir) @@ -1542,7 +1608,7 @@ int find_scripts(char **scripts_array, char **scripts_path_array) if (!session) return -1; - snprintf(scripts_path, MAXPATHLEN, "%s/scripts", perf_exec_path()); + snprintf(scripts_path, MAXPATHLEN, "%s/scripts", get_argv_exec_path()); scripts_dir = opendir(scripts_path); if (!scripts_dir) { @@ -1600,7 +1666,7 @@ static char *get_script_path(const char *script_root, const char *suffix) char lang_path[MAXPATHLEN]; char *__script_root; - snprintf(scripts_path, MAXPATHLEN, "%s/scripts", perf_exec_path()); + snprintf(scripts_path, MAXPATHLEN, "%s/scripts", get_argv_exec_path()); scripts_dir = opendir(scripts_path); if (!scripts_dir) @@ -1695,6 +1761,87 @@ static void script__setup_sample_type(struct perf_script *script) } } +static int process_stat_round_event(struct perf_tool *tool __maybe_unused, + union perf_event *event, + struct perf_session *session) +{ + struct stat_round_event *round = &event->stat_round; + struct perf_evsel *counter; + + evlist__for_each(session->evlist, counter) { + perf_stat_process_counter(&stat_config, counter); + process_stat(counter, round->time); + } + + process_stat_interval(round->time); + return 0; +} + +static int process_stat_config_event(struct perf_tool *tool __maybe_unused, + union perf_event *event, + struct perf_session *session __maybe_unused) +{ + perf_event__read_stat_config(&stat_config, &event->stat_config); + return 0; +} + +static int set_maps(struct perf_script *script) +{ + struct perf_evlist *evlist = script->session->evlist; + + if (!script->cpus || !script->threads) + return 0; + + if (WARN_ONCE(script->allocated, "stats double allocation\n")) + return -EINVAL; + + perf_evlist__set_maps(evlist, script->cpus, script->threads); + + if (perf_evlist__alloc_stats(evlist, true)) + return -ENOMEM; + + script->allocated = true; + return 0; +} + +static +int process_thread_map_event(struct perf_tool *tool, + union perf_event *event, + struct perf_session *session __maybe_unused) +{ + struct perf_script *script = container_of(tool, struct perf_script, tool); + + if (script->threads) { + pr_warning("Extra thread map event, ignoring.\n"); + return 0; + } + + script->threads = thread_map__new_event(&event->thread_map); + if (!script->threads) + return -ENOMEM; + + return set_maps(script); +} + +static +int process_cpu_map_event(struct perf_tool *tool __maybe_unused, + union perf_event *event, + struct perf_session *session __maybe_unused) +{ + struct perf_script *script = container_of(tool, struct perf_script, tool); + + if (script->cpus) { + pr_warning("Extra cpu map event, ignoring.\n"); + return 0; + } + + script->cpus = cpu_map__new_data(&event->cpu_map.data); + if (!script->cpus) + return -ENOMEM; + + return set_maps(script); +} + int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused) { bool show_full_info = false; @@ -1723,6 +1870,11 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused) .auxtrace_info = perf_event__process_auxtrace_info, .auxtrace = perf_event__process_auxtrace, .auxtrace_error = perf_event__process_auxtrace_error, + .stat = perf_event__process_stat_event, + .stat_round = process_stat_round_event, + .stat_config = process_stat_config_event, + .thread_map = process_thread_map_event, + .cpu_map = process_cpu_map_event, .ordered_events = true, .ordering_requires_timestamps = true, }, @@ -1836,7 +1988,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused) scripting_max_stack = itrace_synth_opts.callchain_sz; /* make sure PERF_EXEC_PATH is set for scripts */ - perf_set_argv_exec_path(perf_exec_path()); + set_argv_exec_path(get_argv_exec_path()); if (argc && !script_name && !rec_script_path && !rep_script_path) { int live_pipe[2]; @@ -2076,6 +2228,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused) flush_scripting(); out_delete: + perf_evlist__free_stats(session->evlist); perf_session__delete(session); if (script_started) |