<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/kernel, branch v6.6-rc1</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v6.6-rc1</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v6.6-rc1'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2023-09-09T21:25:11Z</updated>
<entry>
<title>Merge tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux</title>
<updated>2023-09-09T21:25:11Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-09-09T21:25:11Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=1b37a0a2d46f0c5fa5eee170ddeeb83342faa117'/>
<id>urn:sha1:1b37a0a2d46f0c5fa5eee170ddeeb83342faa117</id>
<content type='text'>
Pull more RISC-V updates from Palmer Dabbelt:

 - The kernel now dynamically probes for misaligned access speed, as
   opposed to relying on a table of known implementations.

 - Support for non-coherent devices on systems using the Andes AX45MP
   core, including the RZ/Five SoCs.

 - Support for the V extension in ptrace(), again.

 - Support for KASLR.

 - Support for the BPF prog pack allocator in RISC-V.

 - A handful of bug fixes and cleanups.

* tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (25 commits)
  soc: renesas: Kconfig: For ARCH_R9A07G043 select the required configs if dependencies are met
  riscv: Kconfig.errata: Add dependency for RISCV_SBI in ERRATA_ANDES config
  riscv: Kconfig.errata: Drop dependency for MMU in ERRATA_ANDES_CMO config
  riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabled
  bpf, riscv: use prog pack allocator in the BPF JIT
  riscv: implement a memset like function for text
  riscv: extend patch_text_nosync() for multiple pages
  bpf: make bpf_prog_pack allocator portable
  riscv: libstub: Implement KASLR by using generic functions
  libstub: Fix compilation warning for rv32
  arm64: libstub: Move KASLR handling functions to kaslr.c
  riscv: Dump out kernel offset information on panic
  riscv: Introduce virtual kernel mapping KASLR
  RISC-V: Add ptrace support for vectors
  soc: renesas: Kconfig: Select the required configs for RZ/Five SoC
  cache: Add L2 cache management for Andes AX45MP RISC-V core
  dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller
  riscv: mm: dma-noncoherent: nonstandard cache operations support
  riscv: errata: Add Andes alternative ports
  riscv: asm: vendorid_list: Add Andes Technology to the vendors list
  ...
</content>
</entry>
<entry>
<title>Merge tag 'dma-mapping-6.6-2023-09-09' of git://git.infradead.org/users/hch/dma-mapping</title>
<updated>2023-09-09T18:41:22Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-09-09T18:41:22Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=474197a4f7921a97aa1aabc6f759823a024ff25f'/>
<id>urn:sha1:474197a4f7921a97aa1aabc6f759823a024ff25f</id>
<content type='text'>
Pull dma-mapping fixes from Christoph Hellwig:

 - move a dma-debug call that prints a message out from a lock that's
   causing problems with the lock order in serial drivers (Sergey
   Senozhatsky)

 - fix the CONFIG_DMA_NUMA_CMA Kconfig entry to have the right
   dependency and not default to y (Christoph Hellwig)

 - move an ifdef a bit to remove a __maybe_unused that seems to trip up
   some sensitivities (Christoph Hellwig)

 - revert a bogus check in the CMA allocator (Zhenhua Huang)

* tag 'dma-mapping-6.6-2023-09-09' of git://git.infradead.org/users/hch/dma-mapping:
  Revert "dma-contiguous: check for memory region overlap"
  dma-pool: remove a __maybe_unused label in atomic_pool_expand
  dma-contiguous: fix the Kconfig entry for CONFIG_DMA_NUMA_CMA
  dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock
</content>
</entry>
<entry>
<title>Merge tag 'printk-for-6.6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux</title>
<updated>2023-09-08T19:13:01Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-09-08T19:13:01Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=01a46efcd8f4af44691d7273edf0c5c07dc9b619'/>
<id>urn:sha1:01a46efcd8f4af44691d7273edf0c5c07dc9b619</id>
<content type='text'>
Pull printk fix from Petr Mladek:

 - Revert exporting symbols needed for dumping the raw printk buffer in
   panic().

   I pushed the export prematurely before the user was ready for merging
   into the mainline.

* tag 'printk-for-6.6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  Revert "printk: export symbols for debug modules"
</content>
</entry>
<entry>
<title>Merge patch series "bpf, riscv: use BPF prog pack allocator in BPF JIT"</title>
<updated>2023-09-08T18:25:25Z</updated>
<author>
<name>Palmer Dabbelt</name>
<email>palmer@rivosinc.com</email>
</author>
<published>2023-09-08T17:18:02Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=77eea559bae925e3cd3c5193efcd37675ec227df'/>
<id>urn:sha1:77eea559bae925e3cd3c5193efcd37675ec227df</id>
<content type='text'>
Puranjay Mohan &lt;puranjay12@gmail.com&gt; says:

Here is some data to prove the V2 fixes the problem:

Without this series:
root@rv-selftester:~/src/kselftest/bpf# time ./test_tag
test_tag: OK (40945 tests)

real    7m47.562s
user    0m24.145s
sys     6m37.064s

With this series applied:
root@rv-selftester:~/src/selftest/bpf# time ./test_tag
test_tag: OK (40945 tests)

real    7m29.472s
user    0m25.865s
sys     6m18.401s

BPF programs currently consume a page each on RISCV. For systems with many BPF
programs, this adds significant pressure to instruction TLB. High iTLB pressure
usually causes slow down for the whole system.

Song Liu introduced the BPF prog pack allocator[1] to mitigate the above issue.
It packs multiple BPF programs into a single huge page. It is currently only
enabled for the x86_64 BPF JIT.

I enabled this allocator on the ARM64 BPF JIT[2]. It is being reviewed now.

This patch series enables the BPF prog pack allocator for the RISCV BPF JIT.

======================================================
Performance Analysis of prog pack allocator on RISCV64
======================================================

Test setup:
===========

Host machine: Debian GNU/Linux 11 (bullseye)
Qemu Version: QEMU emulator version 8.0.3 (Debian 1:8.0.3+dfsg-1)
u-boot-qemu Version: 2023.07+dfsg-1
opensbi Version: 1.3-1

To test the performance of the BPF prog pack allocator on RV, a stresser
tool[4] linked below was built. This tool loads 8 BPF programs on the system and
triggers 5 of them in an infinite loop by doing system calls.

The runner script starts 20 instances of the above which loads 8*20=160 BPF
programs on the system, 5*20=100 of which are being constantly triggered.
The script is passed a command which would be run in the above environment.

The script was run with following perf command:
./run.sh "perf stat -a \
        -e iTLB-load-misses \
        -e dTLB-load-misses  \
        -e dTLB-store-misses \
        -e instructions \
        --timeout 60000"

The output of the above command is discussed below before and after enabling the
BPF prog pack allocator.

The tests were run on qemu-system-riscv64 with 8 cpus, 16G memory. The rootfs
was created using Bjorn's riscv-cross-builder[5] docker container linked below.

Results
=======

Before enabling prog pack allocator:
------------------------------------

Performance counter stats for 'system wide':

           4939048      iTLB-load-misses
           5468689      dTLB-load-misses
            465234      dTLB-store-misses
     1441082097998      instructions

      60.045791200 seconds time elapsed

After enabling prog pack allocator:
-----------------------------------

Performance counter stats for 'system wide':

           3430035      iTLB-load-misses
           5008745      dTLB-load-misses
            409944      dTLB-store-misses
     1441535637988      instructions

      60.046296600 seconds time elapsed

Improvements in metrics
=======================

It was expected that the iTLB-load-misses would decrease as now a single huge
page is used to keep all the BPF programs compared to a single page for each
program earlier.

--------------------------------------------
The improvement in iTLB-load-misses: -30.5 %
--------------------------------------------

I repeated this expriment more than 100 times in different setups and the
improvement was always greater than 30%.

This patch series is boot tested on the Starfive VisionFive 2 board[6].
The performance analysis was not done on the board because it doesn't
expose iTLB-load-misses, etc. The stresser program was run on the board to test
the loading and unloading of BPF programs

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/
[2] https://lore.kernel.org/all/20230626085811.3192402-1-puranjay12@gmail.com/
[3] https://lore.kernel.org/all/20230626085811.3192402-2-puranjay12@gmail.com/
[4] https://github.com/puranjaymohan/BPF-Allocator-Bench
[5] https://github.com/bjoto/riscv-cross-builder
[6] https://www.starfivetech.com/en/site/boards

* b4-shazam-merge:
  bpf, riscv: use prog pack allocator in the BPF JIT
  riscv: implement a memset like function for text
  riscv: extend patch_text_nosync() for multiple pages
  bpf: make bpf_prog_pack allocator portable

Link: https://lore.kernel.org/r/20230831131229.497941-1-puranjay12@gmail.com
Signed-off-by: Palmer Dabbelt &lt;palmer@rivosinc.com&gt;
</content>
</entry>
<entry>
<title>Revert "dma-contiguous: check for memory region overlap"</title>
<updated>2023-09-08T08:58:32Z</updated>
<author>
<name>Zhenhua Huang</name>
<email>quic_zhenhuah@quicinc.com</email>
</author>
<published>2023-09-07T08:03:56Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=f875db4f20f4ec2e4fa3b3be0e5081976e0b5dad'/>
<id>urn:sha1:f875db4f20f4ec2e4fa3b3be0e5081976e0b5dad</id>
<content type='text'>
This reverts commit 3fa6456ebe13adab3ba1817c8e515a5b88f95dce.

The Commit broke the CMA region creation through DT on arm64,
as showed below logs with "memblock=debug":
[    0.000000] memblock_phys_alloc_range: 41943040 bytes align=0x200000
from=0x0000000000000000 max_addr=0x00000000ffffffff
early_init_dt_alloc_reserved_memory_arch+0x34/0xa0
[    0.000000] memblock_reserve: [0x00000000fd600000-0x00000000ffdfffff]
memblock_alloc_range_nid+0xc0/0x19c
[    0.000000] Reserved memory: overlap with other memblock reserved region

&gt;From call flow, region we defined in DT was always reserved before entering
into rmem_cma_setup. Also, rmem_cma_setup has one routine cma_init_reserved_mem
to ensure the region was reserved. Checking the region not reserved here seems
not correct.

early_init_fdt_scan_reserved_mem:
    fdt_scan_reserved_mem
        __reserved_mem_reserve_reg
		early_init_dt_reserve_memory
			memblock_reserve(using “reg” prop case)
        fdt_init_reserved_mem
		__reserved_mem_alloc_size
			*early_init_dt_alloc_reserved_memory_arch*
				memblock_reserve(dynamic alloc case)
        __reserved_mem_init_node
		rmem_cma_setup(region overlap check here should always fail)

Example DT can be used to reproduce issue:

    dump_mem: mem_dump_region {
            compatible = "shared-dma-pool";
            alloc-ranges = &lt;0x0 0x00000000 0x0 0xffffffff&gt;;
            reusable;
            size = &lt;0 0x2800000&gt;;
    };

Signed-off-by: Zhenhua Huang &lt;quic_zhenhuah@quicinc.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2023-09-08T01:33:07Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-09-08T01:33:07Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=73be7fb14e83d24383f840a22f24d3ed222ca319'/>
<id>urn:sha1:73be7fb14e83d24383f840a22f24d3ed222ca319</id>
<content type='text'>
Pull networking updates from Jakub Kicinski:
 "Including fixes from netfilter and bpf.

  Current release - regressions:

   - eth: stmmac: fix failure to probe without MAC interface specified

  Current release - new code bugs:

   - docs: netlink: fix missing classic_netlink doc reference

  Previous releases - regressions:

   - deal with integer overflows in kmalloc_reserve()

   - use sk_forward_alloc_get() in sk_get_meminfo()

   - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc

   - fib: avoid warn splat in flow dissector after packet mangling

   - skb_segment: call zero copy functions before using skbuff frags

   - eth: sfc: check for zero length in EF10 RX prefix

  Previous releases - always broken:

   - af_unix: fix msg_controllen test in scm_pidfd_recv() for
     MSG_CMSG_COMPAT

   - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR()

   - netfilter:
      - nft_exthdr: fix non-linear header modification
      - xt_u32, xt_sctp: validate user space input
      - nftables: exthdr: fix 4-byte stack OOB write
      - nfnetlink_osf: avoid OOB read
      - one more fix for the garbage collection work from last release

   - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU

   - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t

   - handshake: fix null-deref in handshake_nl_done_doit()

   - ip: ignore dst hint for multipath routes to ensure packets are
     hashed across the nexthops

   - phy: micrel:
      - correct bit assignments for cable test errata
      - disable EEE according to the KSZ9477 errata

  Misc:

   - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations

   - Revert "net: macsec: preserve ingress frame ordering", it appears
     to have been developed against an older kernel, problem doesn't
     exist upstream"

* tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
  net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()
  Revert "net: team: do not use dynamic lockdep key"
  net: hns3: remove GSO partial feature bit
  net: hns3: fix the port information display when sfp is absent
  net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
  net: hns3: fix debugfs concurrency issue between kfree buffer and read
  net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
  net: hns3: Support query tx timeout threshold by debugfs
  net: hns3: fix tx timeout issue
  net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)
  netfilter: nf_tables: Unbreak audit log reset
  netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
  netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
  netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
  netfilter: nfnetlink_osf: avoid OOB read
  netfilter: nftables: exthdr: fix 4-byte stack OOB write
  selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
  bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
  bpf: bpf_sk_storage: Fix invalid wait context lockdep report
  s390/bpf: Pass through tail call counter in trampolines
  ...
</content>
</entry>
<entry>
<title>Revert "printk: export symbols for debug modules"</title>
<updated>2023-09-07T12:19:42Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2023-09-05T08:19:02Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=4952801fc6adb5b50b8ec2bcc5aeef92fcce8730'/>
<id>urn:sha1:4952801fc6adb5b50b8ec2bcc5aeef92fcce8730</id>
<content type='text'>
This reverts commit 3e00123a13d824d63072b1824c9da59cd78356d9.

No, we never export random symbols for out of tree modules.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Acked-by: Petr Mladek &lt;pmladek@suse.com&gt;
Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Link: https://lore.kernel.org/r/20230905081902.321778-1-hch@lst.de
</content>
</entry>
<entry>
<title>bpf: make bpf_prog_pack allocator portable</title>
<updated>2023-09-06T13:26:05Z</updated>
<author>
<name>Puranjay Mohan</name>
<email>puranjay12@gmail.com</email>
</author>
<published>2023-08-31T13:12:26Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=20e490adea279d49d57b800475938f5b67926d98'/>
<id>urn:sha1:20e490adea279d49d57b800475938f5b67926d98</id>
<content type='text'>
The bpf_prog_pack allocator currently uses module_alloc() and
module_memfree() to allocate and free memory. This is not portable
because different architectures use different methods for allocating
memory for BPF programs. Like ARM64 and riscv use vmalloc()/vfree().

Use bpf_jit_alloc_exec() and bpf_jit_free_exec() for memory management
in bpf_prog_pack allocator. Other architectures can override these with
their implementation and will be able to use bpf_prog_pack directly.

On architectures that don't override bpf_jit_alloc/free_exec() this is
basically a NOP.

Signed-off-by: Puranjay Mohan &lt;puranjay12@gmail.com&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Acked-by: Björn Töpel &lt;bjorn@kernel.org&gt;
Tested-by: Björn Töpel &lt;bjorn@rivosinc.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/r/20230831131229.497941-2-puranjay12@gmail.com
Signed-off-by: Palmer Dabbelt &lt;palmer@rivosinc.com&gt;
</content>
</entry>
<entry>
<title>bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc</title>
<updated>2023-09-06T09:08:14Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>martin.lau@kernel.org</email>
</author>
<published>2023-09-01T23:11:28Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=55d49f750b1cb1f177fb1b00ae02cba4613bcfb7'/>
<id>urn:sha1:55d49f750b1cb1f177fb1b00ae02cba4613bcfb7</id>
<content type='text'>
The commit c83597fa5dc6 ("bpf: Refactor some inode/task/sk storage functions
for reuse"), refactored the bpf_{sk,task,inode}_storage_free() into
bpf_local_storage_unlink_nolock() which then later renamed to
bpf_local_storage_destroy(). The commit accidentally passed the
"bool uncharge_mem = false" argument to bpf_selem_unlink_storage_nolock()
which then stopped the uncharge from happening to the sk-&gt;sk_omem_alloc.

This missing uncharge only happens when the sk is going away (during
__sk_destruct).

This patch fixes it by always passing "uncharge_mem = true". It is a
noop to the task/inode/cgroup storage because they do not have the
map_local_storage_(un)charge enabled in the map_ops. A followup patch
will be done in bpf-next to remove the uncharge_mem argument.

A selftest is added in the next patch.

Fixes: c83597fa5dc6 ("bpf: Refactor some inode/task/sk storage functions for reuse")
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20230901231129.578493-3-martin.lau@linux.dev
</content>
</entry>
<entry>
<title>bpf: bpf_sk_storage: Fix invalid wait context lockdep report</title>
<updated>2023-09-06T09:07:54Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>martin.lau@kernel.org</email>
</author>
<published>2023-09-01T23:11:27Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=a96a44aba556c42b432929d37d60158aca21ad4c'/>
<id>urn:sha1:a96a44aba556c42b432929d37d60158aca21ad4c</id>
<content type='text'>
'./test_progs -t test_local_storage' reported a splat:

[   27.137569] =============================
[   27.138122] [ BUG: Invalid wait context ]
[   27.138650] 6.5.0-03980-gd11ae1b16b0a #247 Tainted: G           O
[   27.139542] -----------------------------
[   27.140106] test_progs/1729 is trying to lock:
[   27.140713] ffff8883ef047b88 (stock_lock){-.-.}-{3:3}, at: local_lock_acquire+0x9/0x130
[   27.141834] other info that might help us debug this:
[   27.142437] context-{5:5}
[   27.142856] 2 locks held by test_progs/1729:
[   27.143352]  #0: ffffffff84bcd9c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x40
[   27.144492]  #1: ffff888107deb2c0 (&amp;storage-&gt;lock){..-.}-{2:2}, at: bpf_local_storage_update+0x39e/0x8e0
[   27.145855] stack backtrace:
[   27.146274] CPU: 0 PID: 1729 Comm: test_progs Tainted: G           O       6.5.0-03980-gd11ae1b16b0a #247
[   27.147550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[   27.149127] Call Trace:
[   27.149490]  &lt;TASK&gt;
[   27.149867]  dump_stack_lvl+0x130/0x1d0
[   27.152609]  dump_stack+0x14/0x20
[   27.153131]  __lock_acquire+0x1657/0x2220
[   27.153677]  lock_acquire+0x1b8/0x510
[   27.157908]  local_lock_acquire+0x29/0x130
[   27.159048]  obj_cgroup_charge+0xf4/0x3c0
[   27.160794]  slab_pre_alloc_hook+0x28e/0x2b0
[   27.161931]  __kmem_cache_alloc_node+0x51/0x210
[   27.163557]  __kmalloc+0xaa/0x210
[   27.164593]  bpf_map_kzalloc+0xbc/0x170
[   27.165147]  bpf_selem_alloc+0x130/0x510
[   27.166295]  bpf_local_storage_update+0x5aa/0x8e0
[   27.167042]  bpf_fd_sk_storage_update_elem+0xdb/0x1a0
[   27.169199]  bpf_map_update_value+0x415/0x4f0
[   27.169871]  map_update_elem+0x413/0x550
[   27.170330]  __sys_bpf+0x5e9/0x640
[   27.174065]  __x64_sys_bpf+0x80/0x90
[   27.174568]  do_syscall_64+0x48/0xa0
[   27.175201]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   27.175932] RIP: 0033:0x7effb40e41ad
[   27.176357] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 &lt;48&gt; 3d 01 f0 ff ff 73 01 c3 48 8b 0d8
[   27.179028] RSP: 002b:00007ffe64c21fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[   27.180088] RAX: ffffffffffffffda RBX: 00007ffe64c22768 RCX: 00007effb40e41ad
[   27.181082] RDX: 0000000000000020 RSI: 00007ffe64c22008 RDI: 0000000000000002
[   27.182030] RBP: 00007ffe64c21ff0 R08: 0000000000000000 R09: 00007ffe64c22788
[   27.183038] R10: 0000000000000064 R11: 0000000000000202 R12: 0000000000000000
[   27.184006] R13: 00007ffe64c22788 R14: 00007effb42a1000 R15: 0000000000000000
[   27.184958]  &lt;/TASK&gt;

It complains about acquiring a local_lock while holding a raw_spin_lock.
It means it should not allocate memory while holding a raw_spin_lock
since it is not safe for RT.

raw_spin_lock is needed because bpf_local_storage supports tracing
context. In particular for task local storage, it is easy to
get a "current" task PTR_TO_BTF_ID in tracing bpf prog.
However, task (and cgroup) local storage has already been moved to
bpf mem allocator which can be used after raw_spin_lock.

The splat is for the sk storage. For sk (and inode) storage,
it has not been moved to bpf mem allocator. Using raw_spin_lock or not,
kzalloc(GFP_ATOMIC) could theoretically be unsafe in tracing context.
However, the local storage helper requires a verifier accepted
sk pointer (PTR_TO_BTF_ID), it is hypothetical if that (mean running
a bpf prog in a kzalloc unsafe context and also able to hold a verifier
accepted sk pointer) could happen.

This patch avoids kzalloc after raw_spin_lock to silent the splat.
There is an existing kzalloc before the raw_spin_lock. At that point,
a kzalloc is very likely required because a lookup has just been done
before. Thus, this patch always does the kzalloc before acquiring
the raw_spin_lock and remove the later kzalloc usage after the
raw_spin_lock. After this change, it will have a charge and then
uncharge during the syscall bpf_map_update_elem() code path.
This patch opts for simplicity and not continue the old
optimization to save one charge and uncharge.

This issue is dated back to the very first commit of bpf_sk_storage
which had been refactored multiple times to create task, inode, and
cgroup storage. This patch uses a Fixes tag with a more recent
commit that should be easier to do backport.

Fixes: b00fa38a9c1c ("bpf: Enable non-atomic allocations in local storage")
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20230901231129.578493-2-martin.lau@linux.dev
</content>
</entry>
</feed>
