<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/kernel/bpf/memalloc.c, branch v6.6-rc2</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v6.6-rc2</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v6.6-rc2'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2023-07-28T16:41:10Z</updated>
<entry>
<title>bpf: Non-atomically allocate freelist during prefill</title>
<updated>2023-07-28T16:41:10Z</updated>
<author>
<name>YiFei Zhu</name>
<email>zhuyifei@google.com</email>
</author>
<published>2023-07-28T04:33:59Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=d1a02358d48d659c2400fa3bbaf9cde2cf9f5040'/>
<id>urn:sha1:d1a02358d48d659c2400fa3bbaf9cde2cf9f5040</id>
<content type='text'>
In internal testing of test_maps, we sometimes observed failures like:
  test_maps: test_maps.c:173: void test_hashmap_percpu(unsigned int, void *):
    Assertion `bpf_map_update_elem(fd, &amp;key, value, BPF_ANY) == 0' failed.
where the errno is ENOMEM. After some troubleshooting and enabling
the warnings, we saw:
  [   91.304708] percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left
  [   91.304716] CPU: 51 PID: 24145 Comm: test_maps Kdump: loaded Tainted: G                 N 6.1.38-smp-DEV #7
  [   91.304719] Hardware name: Google Astoria/astoria, BIOS 0.20230627.0-0 06/27/2023
  [   91.304721] Call Trace:
  [   91.304724]  &lt;TASK&gt;
  [   91.304730]  [&lt;ffffffffa7ef83b9&gt;] dump_stack_lvl+0x59/0x88
  [   91.304737]  [&lt;ffffffffa7ef83f8&gt;] dump_stack+0x10/0x18
  [   91.304738]  [&lt;ffffffffa75caa0c&gt;] pcpu_alloc+0x6fc/0x870
  [   91.304741]  [&lt;ffffffffa75ca302&gt;] __alloc_percpu_gfp+0x12/0x20
  [   91.304743]  [&lt;ffffffffa756785e&gt;] alloc_bulk+0xde/0x1e0
  [   91.304746]  [&lt;ffffffffa7566c02&gt;] bpf_mem_alloc_init+0xd2/0x2f0
  [   91.304747]  [&lt;ffffffffa7547c69&gt;] htab_map_alloc+0x479/0x650
  [   91.304750]  [&lt;ffffffffa751d6e0&gt;] map_create+0x140/0x2e0
  [   91.304752]  [&lt;ffffffffa751d413&gt;] __sys_bpf+0x5a3/0x6c0
  [   91.304753]  [&lt;ffffffffa751c3ec&gt;] __x64_sys_bpf+0x1c/0x30
  [   91.304754]  [&lt;ffffffffa7ef847a&gt;] do_syscall_64+0x5a/0x80
  [   91.304756]  [&lt;ffffffffa800009b&gt;] entry_SYSCALL_64_after_hwframe+0x63/0xcd

This makes sense, because in atomic context, percpu allocation would
not create new chunks; it would only create in non-atomic contexts.
And if during prefill all precpu chunks are full, -ENOMEM would
happen immediately upon next unit_alloc.

Prefill phase does not actually run in atomic context, so we can
use this fact to allocate non-atomically with GFP_KERNEL instead
of GFP_NOWAIT. This avoids the immediate -ENOMEM.

GFP_NOWAIT has to be used in unit_alloc when bpf program runs
in atomic context. Even if bpf program runs in non-atomic context,
in most cases, rcu read lock is enabled for the program so
GFP_NOWAIT is still needed. This is often also the case for
BPF_MAP_UPDATE_ELEM syscalls.

Signed-off-by: YiFei Zhu &lt;zhuyifei@google.com&gt;
Acked-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20230728043359.3324347-1-zhuyifei@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: work around -Wuninitialized warning</title>
<updated>2023-07-26T00:14:18Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2023-07-25T20:26:40Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=63e2da3b7f7f63f881aa508825b0c4241e9910e1'/>
<id>urn:sha1:63e2da3b7f7f63f881aa508825b0c4241e9910e1</id>
<content type='text'>
Splitting these out into separate helper functions means that we
actually pass an uninitialized variable into another function call
if dec_active() happens to not be inlined, and CONFIG_PREEMPT_RT
is disabled:

kernel/bpf/memalloc.c: In function 'add_obj_to_free_list':
kernel/bpf/memalloc.c:200:9: error: 'flags' is used uninitialized [-Werror=uninitialized]
  200 |         dec_active(c, flags);

Avoid this by passing the flags by reference, so they either get
initialized and dereferenced through a pointer, or the pointer never
gets accessed at all.

Fixes: 18e027b1c7c6d ("bpf: Factor out inc/dec of active flag into helpers.")
Suggested-by: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Link: https://lore.kernel.org/r/20230725202653.2905259-1-arnd@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Add object leak check.</title>
<updated>2023-07-12T21:45:23Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2023-07-06T03:34:47Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=4ed8b5bcfada6687baa478bf8febe891d9107118'/>
<id>urn:sha1:4ed8b5bcfada6687baa478bf8febe891d9107118</id>
<content type='text'>
The object leak check is cheap. Do it unconditionally to spot difficult races
in bpf_mem_alloc.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20230706033447.54696-15-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().</title>
<updated>2023-07-12T21:45:23Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2023-07-06T03:34:45Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=5af6807bdb10d1af9d412d7d6c177ba8440adffb'/>
<id>urn:sha1:5af6807bdb10d1af9d412d7d6c177ba8440adffb</id>
<content type='text'>
Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu().
Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into
per-cpu free list the _rcu() flavor waits for RCU grace period and then moves
objects into free_by_rcu_ttrace list where they are waiting for RCU
task trace grace period to be freed into slab.

The life cycle of objects:
alloc: dequeue free_llist
free: enqeueu free_llist
free_rcu: enqueue free_by_rcu -&gt; waiting_for_gp
free_llist above high watermark -&gt; free_by_rcu_ttrace
after RCU GP waiting_for_gp -&gt; free_by_rcu_ttrace
free_by_rcu_ttrace -&gt; waiting_for_gp_ttrace -&gt; slab

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/bpf/20230706033447.54696-13-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Allow reuse from waiting_for_gp_ttrace list.</title>
<updated>2023-07-12T21:45:23Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2023-07-06T03:34:42Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=04fabf00b4d3aff5d010ecb617001814e409e24a'/>
<id>urn:sha1:04fabf00b4d3aff5d010ecb617001814e409e24a</id>
<content type='text'>
alloc_bulk() can reuse elements from free_by_rcu_ttrace.
Let it reuse from waiting_for_gp_ttrace as well to avoid unnecessary kmalloc().

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20230706033447.54696-10-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Add a hint to allocated objects.</title>
<updated>2023-07-12T21:45:23Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2023-07-06T03:34:41Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=822fb26bdb55932d0635f43cc418d2004b19e358'/>
<id>urn:sha1:822fb26bdb55932d0635f43cc418d2004b19e358</id>
<content type='text'>
To address OOM issue when one cpu is allocating and another cpu is freeing add
a target bpf_mem_cache hint to allocated objects and when local cpu free_llist
overflows free to that bpf_mem_cache. The hint addresses the OOM while
maintaining the same performance for common case when alloc/free are done on the
same cpu.

Note that do_call_rcu_ttrace() now has to check 'draining' flag in one more case,
since do_call_rcu_ttrace() is called not only for current cpu.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/bpf/20230706033447.54696-9-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Change bpf_mem_cache draining process.</title>
<updated>2023-07-12T21:45:22Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2023-07-06T03:34:40Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=d114dde245f9115b73756203b03a633a6fc1b36a'/>
<id>urn:sha1:d114dde245f9115b73756203b03a633a6fc1b36a</id>
<content type='text'>
The next patch will introduce cross-cpu llist access and existing
irq_work_sync() + drain_mem_cache() + rcu_barrier_tasks_trace() mechanism will
not be enough, since irq_work_sync() + drain_mem_cache() on cpu A won't
guarantee that llist on cpu A are empty. The free_bulk() on cpu B might add
objects back to llist of cpu A. Add 'bool draining' flag.
The modified sequence looks like:
for_each_cpu:
  WRITE_ONCE(c-&gt;draining, true); // do_call_rcu_ttrace() won't be doing call_rcu() any more
  irq_work_sync(); // wait for irq_work callback (free_bulk) to finish
  drain_mem_cache(); // free all objects
rcu_barrier_tasks_trace(); // wait for RCU callbacks to execute

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/bpf/20230706033447.54696-8-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Further refactor alloc_bulk().</title>
<updated>2023-07-12T21:45:22Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2023-07-06T03:34:39Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=7468048237b8a99c03e1325b11373f9b29ef4139'/>
<id>urn:sha1:7468048237b8a99c03e1325b11373f9b29ef4139</id>
<content type='text'>
In certain scenarios alloc_bulk() might be taking free objects mainly from
free_by_rcu_ttrace list. In such case get_memcg() and set_active_memcg() are
redundant, but they show up in perf profile. Split the loop and only set memcg
when allocating from slab. No performance difference in this patch alone, but
it helps in combination with further patches.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/bpf/20230706033447.54696-7-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Factor out inc/dec of active flag into helpers.</title>
<updated>2023-07-12T21:45:22Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2023-07-06T03:34:38Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=18e027b1c7c6dd858b36305468251a5e4a6bcdf7'/>
<id>urn:sha1:18e027b1c7c6dd858b36305468251a5e4a6bcdf7</id>
<content type='text'>
Factor out local_inc/dec_return(&amp;c-&gt;active) into helpers.
No functional changes.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/bpf/20230706033447.54696-6-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Refactor alloc_bulk().</title>
<updated>2023-07-12T21:45:22Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2023-07-06T03:34:37Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=05ae68656a8e9d9386ce4243fe992122fd29bb51'/>
<id>urn:sha1:05ae68656a8e9d9386ce4243fe992122fd29bb51</id>
<content type='text'>
Factor out inner body of alloc_bulk into separate helper.
No functional changes.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/bpf/20230706033447.54696-5-alexei.starovoitov@gmail.com
</content>
</entry>
</feed>
