diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-01-11 14:39:17 -0800 | 
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-01-11 14:39:17 -0800 | 
| commit | 5cb52b5e1654f3f1ed9c32e34456d98559c85aa0 (patch) | |
| tree | 737c73d6aef99a17f57c2974f1e2a142a5f1a377 /tools/include/linux/bitmap.h | |
| parent | 24af98c4cf5f5e69266e270c7f3fb34b82ff6656 (diff) | |
| parent | 3eb9ede23bdd96e9ba60e2b4d4d17a7c35d58448 (diff) | |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:
   - Intel Knights Landing support.  (Harish Chegondi)
   - Intel Broadwell-EP uncore PMU support.  (Kan Liang)
   - Core code improvements.  (Peter Zijlstra.)
   - Event filter, LBR and PEBS fixes.  (Stephane Eranian)
   - Enable cycles:pp on Intel Atom.  (Stephane Eranian)
   - Add cycles:ppp support for Skylake.  (Andi Kleen)
   - Various x86 NMI overhead optimizations.  (Andi Kleen)
   - Intel PT enhancements.  (Takao Indoh)
   - AMD cache events fix.  (Vince Weaver)
  Tons of tooling changes:
   - Show random perf tool tips in the 'perf report' bottom line
     (Namhyung Kim)
   - perf report now defaults to --group if the perf.data file has
     grouped events, try it with:
      # perf record -e '{cycles,instructions}' -a sleep 1
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ]
      # perf report
      # Samples: 1K of event 'anon group { cycles, instructions }'
      # Event count (approx.): 1955219195
      #
      #       Overhead  Command     Shared Object      Symbol
         2.86%   0.22%  swapper     [kernel.kallsyms]  [k] intel_idle
         1.05%   0.33%  firefox     libxul.so          [.] js::SetObjectElement
         1.05%   0.00%  kworker/0:3 [kernel.kallsyms]  [k] gen6_ring_get_seqno
         0.88%   0.17%  chrome      chrome             [.] 0x0000000000ee27ab
         0.65%   0.86%  firefox     libxul.so          [.] js::ValueToId<(js::AllowGC)1>
         0.64%   0.23%  JS Helper   libxul.so          [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay
         0.62%   1.27%  firefox     libxul.so          [.] js::GetIterator
         0.61%   1.74%  firefox     libxul.so          [.] js::NativeSetProperty
         0.61%   0.31%  firefox     libxul.so          [.] js::SetPropertyByDefining
   - Introduce the 'perf stat record/report' workflow:
     Generate perf.data files from 'perf stat', to tap into the
     scripting capabilities perf has instead of defining a 'perf stat'
     specific scripting support to calculate event ratios, etc.
     Simple example:
        $ perf stat record -e cycles usleep 1
         Performance counter stats for 'usleep 1':
               1,134,996      cycles
             0.000670644 seconds time elapsed
        $ perf stat report
         Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1':
               1,134,996      cycles
             0.000670644 seconds time elapsed
        $
     It generates PERF_RECORD_ userspace records to store the details:
        $ perf report -D | grep PERF_RECORD
        0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637
        0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
        0x12a [0x40]: PERF_RECORD_STAT_CONFIG
        0x16a [0x30]: PERF_RECORD_STAT
        -1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text
        0x1da [0x18]: PERF_RECORD_STAT_ROUND
        [acme@ssdandy linux]$
     An effort was made to make perf.data files generated like this to
     not generate cryptic messages when processed by older tools.
     The 'perf script' bits need rebasing, will go up later.
   - Make command line options always available, even when they depend
     on some feature being enabled, warning the user about use of such
     options (Wang Nan)
   - Support hw breakpoint events (mem:0xAddress) in the default output
     mode in 'perf script' (Wang Nan)
   - Fixes and improvements for supporting annotating ARM binaries,
     support ARM call and jump instructions, more work needed to have
     arch specific stuff separated into tools/perf/arch/*/annotate/
     (Russell King)
   - Add initial 'perf config' command, for now just with a --list
     command to the contents of the configuration file in use and a
     basic man page describing its format, commands for doing edits and
     detailed documentation are being reviewed and proof-read.  (Taeung
     Song)
   - Allows BPF scriptlets specify arguments to be fetched using DWARF
     info, using a prologue generated at compile/build time (He Kuang,
     Wang Nan)
   - Allow attaching BPF scriptlets to module symbols (Wang Nan)
   - Allow attaching BPF scriptlets to userspace code using uprobe (Wang
     Nan)
   - BPF programs now can specify 'perf probe' tunables via its section
     name, separating key=val values using semicolons (Wang Nan)
     Testing some of these new BPF features:
        Use case: get callchains when receiving SSL packets, filter then in the
                  kernel, at arbitrary place.
        # cat ssl.bpf.c
        #define SEC(NAME) __attribute__((section(NAME), used))
        struct pt_regs;
        SEC("func=__inet_lookup_established hnum")
        int func(struct pt_regs *ctx, int err, unsigned short port)
        {
                return err == 0 && port == 443;
        }
        char _license[] SEC("license") = "GPL";
        int  _version   SEC("version") = LINUX_VERSION_CODE;
        #
        # perf record -a -g -e ssl.bpf.c
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ]
        # perf script | head -30
        swapper     0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
           8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
           896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
           8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
           855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
           8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
           8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux)
           856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux)
           2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux)
           2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux)
           96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux)
           969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux)
           2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux)
           95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux)
          1163ffa start_kernel ([kernel.vmlinux].init.text)
          11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text)
          1163623 x86_64_start_kernel ([kernel.vmlinux].init.text)
        qemu-system-x86  9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
           8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
           896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
           8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
           855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
           8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
           856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux)
           8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux)
             430a br_handle_frame_finish ([bridge])
             48bc br_handle_frame ([bridge])
           855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
           8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
        #
   - Use 'perf probe' various options to list functions, see what
     variables can be collected at any given point, experiment first
     collecting without a filter, then filter, use it together with
     'perf trace', 'perf top', with or without callchains, if it
     explodes, please tell us!
   - Introduce a new callchain mode: "folded", that will list per line
     representations of all callchains for a give histogram entry,
     facilitating 'perf report' output processing by other tools, such
     as Brendan Gregg's flamegraph tools (Namhyung Kim)
     E.g:
        # perf report | grep -v ^# | head
           18.37%     0.00%  swapper  [kernel.kallsyms]   [k] cpu_startup_entry
                           |
                           ---cpu_startup_entry
                              |
                              |--12.07%--start_secondary
                              |
                               --6.30%--rest_init
                                         start_kernel
                                         x86_64_start_reservations
                                         x86_64_start_kernel
         #
     Becomes, in "folded" mode:
        # perf report -g folded | grep -v ^# | head -5
            18.37%     0.00%  swapper [kernel.kallsyms]   [k] cpu_startup_entry
          12.07% cpu_startup_entry;start_secondary
           6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
            16.90%     0.00%  swapper [kernel.kallsyms]   [k] call_cpuidle
          11.23% call_cpuidle;cpu_startup_entry;start_secondary
           5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
            16.90%     0.00%  swapper [kernel.kallsyms]   [k] cpuidle_enter
          11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
           5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
            15.12%     0.00%  swapper [kernel.kallsyms]   [k] cpuidle_enter_state
         #
     The user can also select one of "count", "period" or "percent" as
     the first column.
  ... and lots of infrastructure enhancements, plus fixes and other
  changes, features I failed to list - see the shortlog and the git log
  for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits)
  perf evlist: Add --trace-fields option to show trace fields
  perf record: Store data mmaps for dwarf unwind
  perf libdw: Check for mmaps also in MAP__VARIABLE tree
  perf unwind: Check for mmaps also in MAP__VARIABLE tree
  perf unwind: Use find_map function in access_dso_mem
  perf evlist: Remove perf_evlist__(enable|disable)_event functions
  perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does)
  perf report: Show random usage tip on the help line
  perf hists: Export a couple of hist functions
  perf diff: Use perf_hpp__register_sort_field interface
  perf tools: Add overhead/overhead_children keys defaults via string
  perf tools: Remove list entry from struct sort_entry
  perf tools: Include all tools/lib directory for tags/cscope/TAGS targets
  perf script: Align event name properly
  perf tools: Add missing headers in perf's MANIFEST
  perf tools: Do not show trace command if it's not compiled in
  perf report: Change default to use event group view
  perf top: Decay periods in callchains
  tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/
  tools lib: Sync tools/lib/find_bit.c with the kernel
  ...
Diffstat (limited to 'tools/include/linux/bitmap.h')
| -rw-r--r-- | tools/include/linux/bitmap.h | 68 | 
1 files changed, 68 insertions, 0 deletions
| diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h new file mode 100644 index 000000000000..28f5493da491 --- /dev/null +++ b/tools/include/linux/bitmap.h @@ -0,0 +1,68 @@ +#ifndef _PERF_BITOPS_H +#define _PERF_BITOPS_H + +#include <string.h> +#include <linux/bitops.h> + +#define DECLARE_BITMAP(name,bits) \ +	unsigned long name[BITS_TO_LONGS(bits)] + +int __bitmap_weight(const unsigned long *bitmap, int bits); +void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1, +		 const unsigned long *bitmap2, int bits); + +#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1))) + +#define BITMAP_LAST_WORD_MASK(nbits)					\ +(									\ +	((nbits) % BITS_PER_LONG) ?					\ +		(1UL<<((nbits) % BITS_PER_LONG))-1 : ~0UL		\ +) + +#define small_const_nbits(nbits) \ +	(__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG) + +static inline void bitmap_zero(unsigned long *dst, int nbits) +{ +	if (small_const_nbits(nbits)) +		*dst = 0UL; +	else { +		int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); +		memset(dst, 0, len); +	} +} + +static inline int bitmap_weight(const unsigned long *src, int nbits) +{ +	if (small_const_nbits(nbits)) +		return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)); +	return __bitmap_weight(src, nbits); +} + +static inline void bitmap_or(unsigned long *dst, const unsigned long *src1, +			     const unsigned long *src2, int nbits) +{ +	if (small_const_nbits(nbits)) +		*dst = *src1 | *src2; +	else +		__bitmap_or(dst, src1, src2, nbits); +} + +/** + * test_and_set_bit - Set a bit and return its old value + * @nr: Bit to set + * @addr: Address to count from + */ +static inline int test_and_set_bit(int nr, unsigned long *addr) +{ +	unsigned long mask = BIT_MASK(nr); +	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); +	unsigned long old; + +	old = *p; +	*p = old | mask; + +	return (old & mask) != 0; +} + +#endif /* _PERF_BITOPS_H */ | 
