<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/kernel/bpf/core.c, branch v5.2-rc4</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v5.2-rc4</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v5.2-rc4'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2019-05-30T18:26:32Z</updated>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152</title>
<updated>2019-05-30T18:26:32Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-27T06:55:01Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=2874c5fd284268364ece81a7bd936f3c8168e567'/>
<id>urn:sha1:2874c5fd284268364ece81a7bd936f3c8168e567</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix out of bounds backwards jmps due to dead code removal</title>
<updated>2019-05-11T01:49:27Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-05-11T01:03:09Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=af959b18fd447170a10865283ba691af4353cc7f'/>
<id>urn:sha1:af959b18fd447170a10865283ba691af4353cc7f</id>
<content type='text'>
systemtap folks reported the following splat recently:

  [ 7790.862212] WARNING: CPU: 3 PID: 26759 at arch/x86/kernel/kprobes/core.c:1022 kprobe_fault_handler+0xec/0xf0
  [...]
  [ 7790.864113] CPU: 3 PID: 26759 Comm: sshd Not tainted 5.1.0-0.rc7.git1.1.fc31.x86_64 #1
  [ 7790.864198] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS[...]
  [ 7790.864314] RIP: 0010:kprobe_fault_handler+0xec/0xf0
  [ 7790.864375] Code: 48 8b 50 [...]
  [ 7790.864714] RSP: 0018:ffffc06800bdbb48 EFLAGS: 00010082
  [ 7790.864812] RAX: ffff9e2b75a16320 RBX: 0000000000000000 RCX: 0000000000000000
  [ 7790.865306] RDX: ffffffffffffffff RSI: 000000000000000e RDI: ffffc06800bdbbf8
  [ 7790.865514] RBP: ffffc06800bdbbf8 R08: 0000000000000000 R09: 0000000000000000
  [ 7790.865960] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc06800bdbbf8
  [ 7790.866037] R13: ffff9e2ab56a0418 R14: ffff9e2b6d0bb400 R15: ffff9e2b6d268000
  [ 7790.866114] FS:  00007fde49937d80(0000) GS:ffff9e2b75a00000(0000) knlGS:0000000000000000
  [ 7790.866193] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 7790.866318] CR2: 0000000000000000 CR3: 000000012f312000 CR4: 00000000000006e0
  [ 7790.866419] Call Trace:
  [ 7790.866677]  do_user_addr_fault+0x64/0x480
  [ 7790.867513]  do_page_fault+0x33/0x210
  [ 7790.868002]  async_page_fault+0x1e/0x30
  [ 7790.868071] RIP: 0010:          (null)
  [ 7790.868144] Code: Bad RIP value.
  [ 7790.868229] RSP: 0018:ffffc06800bdbca8 EFLAGS: 00010282
  [ 7790.868362] RAX: ffff9e2b598b60f8 RBX: ffffc06800bdbe48 RCX: 0000000000000004
  [ 7790.868629] RDX: 0000000000000004 RSI: ffffc06800bdbc6c RDI: ffff9e2b598b60f0
  [ 7790.868834] RBP: ffffc06800bdbcf8 R08: 0000000000000000 R09: 0000000000000004
  [ 7790.870432] R10: 00000000ff6f7a03 R11: 0000000000000000 R12: 0000000000000001
  [ 7790.871859] R13: ffffc06800bdbcb8 R14: 0000000000000000 R15: ffff9e2acd0a5310
  [ 7790.873455]  ? vfs_read+0x5/0x170
  [ 7790.874639]  ? vfs_read+0x1/0x170
  [ 7790.875834]  ? trace_call_bpf+0xf6/0x260
  [ 7790.877044]  ? vfs_read+0x1/0x170
  [ 7790.878208]  ? vfs_read+0x5/0x170
  [ 7790.879345]  ? kprobe_perf_func+0x233/0x260
  [ 7790.880503]  ? vfs_read+0x1/0x170
  [ 7790.881632]  ? vfs_read+0x5/0x170
  [ 7790.882751]  ? kprobe_ftrace_handler+0x92/0xf0
  [ 7790.883926]  ? __vfs_read+0x30/0x30
  [ 7790.885050]  ? ftrace_ops_assist_func+0x94/0x100
  [ 7790.886183]  ? vfs_read+0x1/0x170
  [ 7790.887283]  ? vfs_read+0x5/0x170
  [ 7790.888348]  ? ksys_read+0x5a/0xe0
  [ 7790.889389]  ? do_syscall_64+0x5c/0xa0
  [ 7790.890401]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

After some debugging, turns out that the logic in 2cbd95a5c4fb
("bpf: change parameters of call/branch offset adjustment") has
a bug that is exposed after 52875a04f4b2 ("bpf: verifier: remove
dead code") in that we miss some of the jump offset adjustments
after code patching when we remove dead code, more concretely,
upon backward jump spanning over the area that is being removed.

BPF insns of a case that was hit pre 52875a04f4b2:

  [...]
  676: (85) call bpf_perf_event_output#-47616
  677: (05) goto pc-636
  678: (62) *(u32 *)(r10 -64) = 0
  679: (bf) r7 = r10
  680: (07) r7 += -64
  681: (05) goto pc-44
  682: (05) goto pc-1
  683: (05) goto pc-1

BPF insns afterwards:

  [...]
  618: (85) call bpf_perf_event_output#-47616
  619: (05) goto pc-638
  620: (62) *(u32 *)(r10 -64) = 0
  621: (bf) r7 = r10
  622: (07) r7 += -64
  623: (05) goto pc-44

To illustrate the bug, situation looks as follows:
     ____
  0 |    | &lt;-- foo: [...]
  1 |____|
  2 |____| &lt;-- pos / end_new  ^
  3 |    |                    |
  4 |    |                    |  len
  5 |____|                    |  (remove region)
  6 |    | &lt;-- end_old        v
  7 |    |
  8 |    | &lt;-- curr  (jmp foo)
  9 |____|

The condition curr &gt;= end_new &amp;&amp; curr + off + 1 &lt; end_new in the
branch delta adjustments is never hit because curr + off + 1 &lt;
end_new is compared as unsigned and therefore curr + off + 1 &gt;
end_new in unsigned realm as curr + off + 1 becomes negative
since the insns are memmove()'d before the offset adjustments.

Correct BPF insns after this fix:

  [...]
  618: (85) call bpf_perf_event_output#-47216
  619: (05) goto pc-578
  620: (62) *(u32 *)(r10 -64) = 0
  621: (bf) r7 = r10
  622: (07) r7 += -64
  623: (05) goto pc-44

Note that unprivileged case is not affected from this.

Fixes: 52875a04f4b2 ("bpf: verifier: remove dead code")
Fixes: 2cbd95a5c4fb ("bpf: change parameters of call/branch offset adjustment")
Reported-by: Frank Ch. Eigler &lt;fche@redhat.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next</title>
<updated>2019-05-08T05:03:58Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-05-08T05:03:58Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=80f232121b69cc69a31ccb2b38c1665d770b0710'/>
<id>urn:sha1:80f232121b69cc69a31ccb2b38c1665d770b0710</id>
<content type='text'>
Pull networking updates from David Miller:
 "Highlights:

   1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.

   2) Add fib_sync_mem to control the amount of dirty memory we allow to
      queue up between synchronize RCU calls, from David Ahern.

   3) Make flow classifier more lockless, from Vlad Buslov.

   4) Add PHY downshift support to aquantia driver, from Heiner
      Kallweit.

   5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
      contention on SLAB spinlocks in heavy RPC workloads.

   6) Partial GSO offload support in XFRM, from Boris Pismenny.

   7) Add fast link down support to ethtool, from Heiner Kallweit.

   8) Use siphash for IP ID generator, from Eric Dumazet.

   9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
      entries, from David Ahern.

  10) Move skb-&gt;xmit_more into a per-cpu variable, from Florian
      Westphal.

  11) Improve eBPF verifier speed and increase maximum program size,
      from Alexei Starovoitov.

  12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
      spinlocks. From Neil Brown.

  13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.

  14) Improve link partner cap detection in generic PHY code, from
      Heiner Kallweit.

  15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
      Maguire.

  16) Remove SKB list implementation assumptions in SCTP, your's truly.

  17) Various cleanups, optimizations, and simplifications in r8169
      driver. From Heiner Kallweit.

  18) Add memory accounting on TX and RX path of SCTP, from Xin Long.

  19) Switch PHY drivers over to use dynamic featue detection, from
      Heiner Kallweit.

  20) Support flow steering without masking in dpaa2-eth, from Ioana
      Ciocoi.

  21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
      Pirko.

  22) Increase the strict parsing of current and future netlink
      attributes, also export such policies to userspace. From Johannes
      Berg.

  23) Allow DSA tag drivers to be modular, from Andrew Lunn.

  24) Remove legacy DSA probing support, also from Andrew Lunn.

  25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
      Haabendal.

  26) Add a generic tracepoint for TX queue timeouts to ease debugging,
      from Cong Wang.

  27) More indirect call optimizations, from Paolo Abeni"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
  cxgb4: Fix error path in cxgb4_init_module
  net: phy: improve pause mode reporting in phy_print_status
  dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
  net: macb: Change interrupt and napi enable order in open
  net: ll_temac: Improve error message on error IRQ
  net/sched: remove block pointer from common offload structure
  net: ethernet: support of_get_mac_address new ERR_PTR error
  net: usb: smsc: fix warning reported by kbuild test robot
  staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
  net: dsa: support of_get_mac_address new ERR_PTR error
  net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
  vrf: sit mtu should not be updated when vrf netdev is the link
  net: dsa: Fix error cleanup path in dsa_init_module
  l2tp: Fix possible NULL pointer dereference
  taprio: add null check on sched_nest to avoid potential null pointer dereference
  net: mvpp2: cls: fix less than zero check on a u32 variable
  net_sched: sch_fq: handle non connected flows
  net_sched: sch_fq: do not assume EDT packets are ordered
  net: hns3: use devm_kcalloc when allocating desc_cb
  net: hns3: some cleanup for struct hns3_enet_ring
  ...
</content>
</entry>
<entry>
<title>bpf: Use vmalloc special flag</title>
<updated>2019-04-30T10:37:59Z</updated>
<author>
<name>Rick Edgecombe</name>
<email>rick.p.edgecombe@intel.com</email>
</author>
<published>2019-04-26T00:11:38Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=d53d2f78ceadba081fc7785570798c3c8d50a718'/>
<id>urn:sha1:d53d2f78ceadba081fc7785570798c3c8d50a718</id>
<content type='text'>
Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
permissioned memory in vmalloc and remove places where memory was set RW
before freeing which is no longer needed. Don't track if the memory is RO
anymore because it is now tracked in vmalloc.

Signed-off-by: Rick Edgecombe &lt;rick.p.edgecombe@intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: &lt;akpm@linux-foundation.org&gt;
Cc: &lt;ard.biesheuvel@linaro.org&gt;
Cc: &lt;deneen.t.dock@intel.com&gt;
Cc: &lt;kernel-hardening@lists.openwall.com&gt;
Cc: &lt;kristen@linux.intel.com&gt;
Cc: &lt;linux_dti@icloud.com&gt;
Cc: &lt;will.deacon@arm.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Nadav Amit &lt;nadav.amit@gmail.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lkml.kernel.org/r/20190426001143.4983-19-namit@vmware.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: implement lookup-free direct value access for maps</title>
<updated>2019-04-10T00:05:46Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-04-09T21:20:03Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=d8eca5bbb2be9bc7546f9e733786fa2f1a594c67'/>
<id>urn:sha1:d8eca5bbb2be9bc7546f9e733786fa2f1a594c67</id>
<content type='text'>
This generic extension to BPF maps allows for directly loading
an address residing inside a BPF map value as a single BPF
ldimm64 instruction!

The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
is a special src_reg flag for ldimm64 instruction that indicates
that inside the first part of the double insns's imm field is a
file descriptor which the verifier then replaces as a full 64bit
address of the map into both imm parts. For the newly added
BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following:
the first part of the double insns's imm field is again a file
descriptor corresponding to the map, and the second part of the
imm field is an offset into the value. The verifier will then
replace both imm parts with an address that points into the BPF
map value at the given value offset for maps that support this
operation. Currently supported is array map with single entry.
It is possible to support more than just single map element by
reusing both 16bit off fields of the insns as a map index, so
full array map lookup could be expressed that way. It hasn't
been implemented here due to lack of concrete use case, but
could easily be done so in future in a compatible way, since
both off fields right now have to be 0 and would correctly
denote a map index 0.

The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with
BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of
map pointer versus load of map's value at offset 0, and changing
BPF_PSEUDO_MAP_FD's encoding into off by one to differ between
regular map pointer and map value pointer would add unnecessary
complexity and increases barrier for debugability thus less
suitable. Using the second part of the imm field as an offset
into the value does /not/ come with limitations since maximum
possible value size is in u32 universe anyway.

This optimization allows for efficiently retrieving an address
to a map value memory area without having to issue a helper call
which needs to prepare registers according to calling convention,
etc, without needing the extra NULL test, and without having to
add the offset in an additional instruction to the value base
pointer. The verifier then treats the destination register as
PTR_TO_MAP_VALUE with constant reg-&gt;off from the user passed
offset from the second imm field, and guarantees that this is
within bounds of the map value. Any subsequent operations are
normally treated as typical map value handling without anything
extra needed from verification side.

The two map operations for direct value access have been added to
array map for now. In future other types could be supported as
well depending on the use case. The main use case for this commit
is to allow for BPF loader support for global variables that
reside in .data/.rodata/.bss sections such that we can directly
load the address of them with minimal additional infrastructure
required. Loader support has been added in subsequent commits for
libbpf library.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: verbose jump offset overflow check</title>
<updated>2019-04-03T23:27:38Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-04-02T04:27:44Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=4f73379ec5c2891598aa715c6df7ac9afdc86fbf'/>
<id>urn:sha1:4f73379ec5c2891598aa715c6df7ac9afdc86fbf</id>
<content type='text'>
Larger programs may trigger 16-bit jump offset overflow check
during instruction patching. Make this error verbose otherwise
users cannot decipher error code without printks in the verifier.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2019-03-06T15:59:36Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-03-06T15:59:36Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=203b6609e0ede49eb0b97008b1150c69e9d2ffd3'/>
<id>urn:sha1:203b6609e0ede49eb0b97008b1150c69e9d2ffd3</id>
<content type='text'>
Pull perf updates from Ingo Molnar:
 "Lots of tooling updates - too many to list, here's a few highlights:

   - Various subcommand updates to 'perf trace', 'perf report', 'perf
     record', 'perf annotate', 'perf script', 'perf test', etc.

   - CPU and NUMA topology and affinity handling improvements,

   - HW tracing and HW support updates:
      - Intel PT updates
      - ARM CoreSight updates
      - vendor HW event updates

   - BPF updates

   - Tons of infrastructure updates, both on the build system and the
     library support side

   - Documentation updates.

   - ... and lots of other changes, see the changelog for details.

  Kernel side updates:

   - Tighten up kprobes blacklist handling, reduce the number of places
     where developers can install a kprobe and hang/crash the system.

   - Fix/enhance vma address filter handling.

   - Various PMU driver updates, small fixes and additions.

   - refcount_t conversions

   - BPF updates

   - error code propagation enhancements

   - misc other changes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
  perf script python: Add Python3 support to syscall-counts-by-pid.py
  perf script python: Add Python3 support to syscall-counts.py
  perf script python: Add Python3 support to stat-cpi.py
  perf script python: Add Python3 support to stackcollapse.py
  perf script python: Add Python3 support to sctop.py
  perf script python: Add Python3 support to powerpc-hcalls.py
  perf script python: Add Python3 support to net_dropmonitor.py
  perf script python: Add Python3 support to mem-phys-addr.py
  perf script python: Add Python3 support to failed-syscalls-by-pid.py
  perf script python: Add Python3 support to netdev-times.py
  perf tools: Add perf_exe() helper to find perf binary
  perf script: Handle missing fields with -F +..
  perf data: Add perf_data__open_dir_data function
  perf data: Add perf_data__(create_dir|close_dir) functions
  perf data: Fail check_backup in case of error
  perf data: Make check_backup work over directories
  perf tools: Add rm_rf_perf_data function
  perf tools: Add pattern name checking to rm_rf
  perf tools: Add depth checking to rm_rf
  perf data: Add global path holder
  ...
</content>
</entry>
<entry>
<title>bpf: fix u64_stats_init() usage in bpf_prog_alloc()</title>
<updated>2019-03-01T23:31:36Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-03-01T22:33:11Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=4b9113045b1745ec8512d6743680809edca6a74e'/>
<id>urn:sha1:4b9113045b1745ec8512d6743680809edca6a74e</id>
<content type='text'>
We need to iterate through all possible cpus.

Fixes: 492ecee892c2 ("bpf: enable program stats")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: enable program stats</title>
<updated>2019-02-27T16:22:50Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-02-25T22:28:39Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=492ecee892c2a4ba6a14903d5d586ff750b7e805'/>
<id>urn:sha1:492ecee892c2a4ba6a14903d5d586ff750b7e805</id>
<content type='text'>
JITed BPF programs are indistinguishable from kernel functions, but unlike
kernel code BPF code can be changed often.
Typical approach of "perf record" + "perf report" profiling and tuning of
kernel code works just as well for BPF programs, but kernel code doesn't
need to be monitored whereas BPF programs do.
Users load and run large amount of BPF programs.
These BPF stats allow tools monitor the usage of BPF on the server.
The monitoring tools will turn sysctl kernel.bpf_stats_enabled
on and off for few seconds to sample average cost of the programs.
Aggregated data over hours and days will provide an insight into cost of BPF
and alarms can trigger in case given program suddenly gets more expensive.

The cost of two sched_clock() per program invocation adds ~20 nsec.
Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down
from ~10 nsec to ~30 nsec.
static_key minimizes the cost of the stats collection.
There is no measurable difference before/after this patch
with kernel.bpf_stats_enabled=0

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: introduce bpf_spin_lock</title>
<updated>2019-02-01T19:55:38Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-01-31T23:40:04Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=d83525ca62cf8ebe3271d14c36fb900c294274a2'/>
<id>urn:sha1:d83525ca62cf8ebe3271d14c36fb900c294274a2</id>
<content type='text'>
Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
bpf program serialize access to other variables.

Example:
struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&amp;hash_map, &amp;key);
if (val) {
    bpf_spin_lock(&amp;val-&gt;lock);
    val-&gt;cnt++;
    bpf_spin_unlock(&amp;val-&gt;lock);
}

Restrictions and safety checks:
- bpf_spin_lock is only allowed inside HASH and ARRAY maps.
- BTF description of the map is mandatory for safety analysis.
- bpf program can take one bpf_spin_lock at a time, since two or more can
  cause dead locks.
- only one 'struct bpf_spin_lock' is allowed per map element.
  It drastically simplifies implementation yet allows bpf program to use
  any number of bpf_spin_locks.
- when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
- bpf program must bpf_spin_unlock() before return.
- bpf program can access 'struct bpf_spin_lock' only via
  bpf_spin_lock()/bpf_spin_unlock() helpers.
- load/store into 'struct bpf_spin_lock lock;' field is not allowed.
- to use bpf_spin_lock() helper the BTF description of map value must be
  a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
  Nested lock inside another struct is not allowed.
- syscall map_lookup doesn't copy bpf_spin_lock field to user space.
- syscall map_update and program map_update do not update bpf_spin_lock field.
- bpf_spin_lock cannot be on the stack or inside networking packet.
  bpf_spin_lock can only be inside HASH or ARRAY map value.
- bpf_spin_lock is available to root only and to all program types.
- bpf_spin_lock is not allowed in inner maps of map-in-map.
- ld_abs is not allowed inside spin_lock-ed region.
- tracing progs and socket filter progs cannot use bpf_spin_lock due to
  insufficient preemption checks

Implementation details:
- cgroup-bpf class of programs can nest with xdp/tc programs.
  Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
  Other solutions to avoid nested bpf_spin_lock are possible.
  Like making sure that all networking progs run with softirq disabled.
  spin_lock_irqsave is the simplest and doesn't add overhead to the
  programs that don't use it.
- arch_spinlock_t is used when its implemented as queued_spin_lock
- archs can force their own arch_spinlock_t
- on architectures where queued_spin_lock is not available and
  sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
- presence of bpf_spin_lock inside map value could have been indicated via
  extra flag during map_create, but specifying it via BTF is cleaner.
  It provides introspection for map key/value and reduces user mistakes.

Next steps:
- allow bpf_spin_lock in other map types (like cgroup local storage)
- introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
  to request kernel to grab bpf_spin_lock before rewriting the value.
  That will serialize access to map elements.

Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
</feed>
