<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/include/linux/filter.h, branch v5.4-rc4</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v5.4-rc4</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v5.4-rc4'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2019-09-16T09:44:05Z</updated>
<entry>
<title>bpf: fix accessing bpf_sysctl.file_pos on s390</title>
<updated>2019-09-16T09:44:05Z</updated>
<author>
<name>Ilya Leoshkevich</name>
<email>iii@linux.ibm.com</email>
</author>
<published>2019-08-16T10:53:00Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=d895a0f16fadb26d22ab531c49768f7642ae5c3e'/>
<id>urn:sha1:d895a0f16fadb26d22ab531c49768f7642ae5c3e</id>
<content type='text'>
"ctx:file_pos sysctl:read write ok" fails on s390 with "Read value  !=
nux". This is because verifier rewrites a complete 32-bit
bpf_sysctl.file_pos update to a partial update of the first 32 bits of
64-bit *bpf_sysctl_kern.ppos, which is not correct on big-endian
systems.

Fix by using an offset on big-endian systems.

Ditto for bpf_sysctl.file_pos reads. Currently the test does not detect
a problem there, since it expects to see 0, which it gets with high
probability in error cases, so change it to seek to offset 3 and expect
3 in bpf_sysctl.file_pos.

Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx")
Signed-off-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20190816105300.49035-1-iii@linux.ibm.com/
</content>
</entry>
<entry>
<title>bpf: fix narrower loads on s390</title>
<updated>2019-07-23T20:59:33Z</updated>
<author>
<name>Ilya Leoshkevich</name>
<email>iii@linux.ibm.com</email>
</author>
<published>2019-07-19T09:18:15Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=d9b8aadaffa65809d146cf0f8632a22a946367d7'/>
<id>urn:sha1:d9b8aadaffa65809d146cf0f8632a22a946367d7</id>
<content type='text'>
The very first check in test_pkt_md_access is failing on s390, which
happens because loading a part of a struct __sk_buff field produces
an incorrect result.

The preprocessed code of the check is:

{
	__u8 tmp = *((volatile __u8 *)&amp;skb-&gt;len +
		((sizeof(skb-&gt;len) - sizeof(__u8)) / sizeof(__u8)));
	if (tmp != ((*(volatile __u32 *)&amp;skb-&gt;len) &amp; 0xFF)) return 2;
};

clang generates the following code for it:

      0:	71 21 00 03 00 00 00 00	r2 = *(u8 *)(r1 + 3)
      1:	61 31 00 00 00 00 00 00	r3 = *(u32 *)(r1 + 0)
      2:	57 30 00 00 00 00 00 ff	r3 &amp;= 255
      3:	5d 23 00 1d 00 00 00 00	if r2 != r3 goto +29 &lt;LBB0_10&gt;

Finally, verifier transforms it to:

  0: (61) r2 = *(u32 *)(r1 +104)
  1: (bc) w2 = w2
  2: (74) w2 &gt;&gt;= 24
  3: (bc) w2 = w2
  4: (54) w2 &amp;= 255
  5: (bc) w2 = w2

The problem is that when verifier emits the code to replace a partial
load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of
struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a
bitwise AND, it assumes that the machine is little endian and
incorrectly decides to use a shift.

Adjust shift count calculation to account for endianness.

Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields")
Signed-off-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_ok</title>
<updated>2019-07-15T21:15:53Z</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2019-07-15T16:39:52Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=b43995469e5804636a55372e9bbb17ccb22441c5'/>
<id>urn:sha1:b43995469e5804636a55372e9bbb17ccb22441c5</id>
<content type='text'>
Rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_ok to indicate
that it can be used for both loads and stores.

Cc: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: allow wide (u64) aligned stores for some fields of bpf_sock_addr</title>
<updated>2019-07-08T14:22:55Z</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2019-07-01T17:38:39Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=600c70bad6594cb124c641ed05355ca134650ea4'/>
<id>urn:sha1:600c70bad6594cb124c641ed05355ca134650ea4</id>
<content type='text'>
Since commit cd17d7770578 ("bpf/tools: sync bpf.h") clang decided
that it can do a single u64 store into user_ip6[2] instead of two
separate u32 ones:

 #  17: (18) r2 = 0x100000000000000
 #  ; ctx-&gt;user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2);
 #  19: (7b) *(u64 *)(r1 +16) = r2
 #  invalid bpf_context access off=16 size=8

&gt;From the compiler point of view it does look like a correct thing
to do, so let's support it on the kernel side.

Credit to Andrii Nakryiko for a proper implementation of
bpf_ctx_wide_store_ok.

Cc: Andrii Nakryiko &lt;andriin@fb.com&gt;
Cc: Yonghong Song &lt;yhs@fb.com&gt;
Fixes: cd17d7770578 ("bpf/tools: sync bpf.h")
Reported-by: kernel test robot &lt;rong.a.chen@intel.com&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf_xdp_redirect_map: Perform map lookup in eBPF helper</title>
<updated>2019-06-28T23:31:09Z</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2019-06-28T09:12:34Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=43e74c0267a35d6f5127218054b2d80c7fe801f5'/>
<id>urn:sha1:43e74c0267a35d6f5127218054b2d80c7fe801f5</id>
<content type='text'>
The bpf_redirect_map() helper used by XDP programs doesn't return any
indication of whether it can successfully redirect to the map index it was
given. Instead, BPF programs have to track this themselves, leading to
programs using duplicate maps to track which entries are populated in the
devmap.

This patch fixes this by moving the map lookup into the bpf_redirect_map()
helper, which makes it possible to return failure to the eBPF program. The
lower bits of the flags argument is used as the return code, which means
that existing users who pass a '0' flag argument will get XDP_ABORTED.

With this, a BPF program can check the return code from the helper call and
react by, for instance, substituting a different redirect. This works for
any type of map used for redirect.

Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Acked-by: Jonathan Lemon &lt;jonathan.lemon@gmail.com&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>devmap: Rename ifindex member in bpf_redirect_info</title>
<updated>2019-06-28T23:31:09Z</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2019-06-28T09:12:34Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=4b55cf290dc6bd3a9e5da26d1ad60e77aa88c8cf'/>
<id>urn:sha1:4b55cf290dc6bd3a9e5da26d1ad60e77aa88c8cf</id>
<content type='text'>
The bpf_redirect_info struct has an 'ifindex' member which was named back
when the redirects could only target egress interfaces. Now that we can
also redirect to sockets and CPUs, this is a bit misleading, so rename the
member to tgt_index.

Reorder the struct members so we can have 'tgt_index' and 'tgt_value' next
to each other in a subsequent patch.

Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: implement getsockopt and setsockopt hooks</title>
<updated>2019-06-27T22:25:16Z</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2019-06-27T20:38:47Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=0d01da6afc5402f60325c5da31b22f7d56689b49'/>
<id>urn:sha1:0d01da6afc5402f60325c5da31b22f7d56689b49</id>
<content type='text'>
Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.

BPF_CGROUP_SETSOCKOPT can modify user setsockopt arguments before
passing them down to the kernel or bypass kernel completely.
BPF_CGROUP_GETSOCKOPT can can inspect/modify getsockopt arguments that
kernel returns.
Both hooks reuse existing PTR_TO_PACKET{,_END} infrastructure.

The buffer memory is pre-allocated (because I don't think there is
a precedent for working with __user memory from bpf). This might be
slow to do for each {s,g}etsockopt call, that's why I've added
__cgroup_bpf_prog_array_is_empty that exits early if there is nothing
attached to a cgroup. Note, however, that there is a race between
__cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
program layout might have changed; this should not be a problem
because in general there is a race between multiple calls to
{s,g}etsocktop and user adding/removing bpf progs from a cgroup.

The return code of the BPF program is handled as follows:
* 0: EPERM
* 1: success, continue with next BPF program in the cgroup chain

v9:
* allow overwriting setsockopt arguments (Alexei Starovoitov):
  * use set_fs (same as kernel_setsockopt)
  * buffer is always kzalloc'd (no small on-stack buffer)

v8:
* use s32 for optlen (Andrii Nakryiko)

v7:
* return only 0 or 1 (Alexei Starovoitov)
* always run all progs (Alexei Starovoitov)
* use optval=0 as kernel bypass in setsockopt (Alexei Starovoitov)
  (decided to use optval=-1 instead, optval=0 might be a valid input)
* call getsockopt hook after kernel handlers (Alexei Starovoitov)

v6:
* rework cgroup chaining; stop as soon as bpf program returns
  0 or 2; see patch with the documentation for the details
* drop Andrii's and Martin's Acked-by (not sure they are comfortable
  with the new state of things)

v5:
* skip copy_to_user() and put_user() when ret == 0 (Martin Lau)

v4:
* don't export bpf_sk_fullsock helper (Martin Lau)
* size != sizeof(__u64) for uapi pointers (Martin Lau)
* offsetof instead of bpf_ctx_range when checking ctx access (Martin Lau)

v3:
* typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko)
* reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii
  Nakryiko)
* use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau)
* use BPF_FIELD_SIZEOF() for consistency (Martin Lau)
* new CG_SOCKOPT_ACCESS macro to wrap repeated parts

v2:
* moved bpf_sockopt_kern fields around to remove a hole (Martin Lau)
* aligned bpf_sockopt_kern-&gt;buf to 8 bytes (Martin Lau)
* bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau)
* added [0,2] return code check to verifier (Martin Lau)
* dropped unused buf[64] from the stack (Martin Lau)
* use PTR_TO_SOCKET for bpf_sockopt-&gt;sk (Martin Lau)
* dropped bpf_target_off from ctx rewrites (Martin Lau)
* use return code for kernel bypass (Martin Lau &amp; Andrii Nakryiko)

Cc: Andrii Nakryiko &lt;andriin@fb.com&gt;
Cc: Martin Lau &lt;kafai@fb.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: cgroup inet skb programs can return 0 to 3</title>
<updated>2019-05-31T23:41:29Z</updated>
<author>
<name>brakmo</name>
<email>brakmo@fb.com</email>
</author>
<published>2019-05-28T23:59:36Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=5cf1e91456301f8c4f6bbc63ff76cff12f92f31b'/>
<id>urn:sha1:5cf1e91456301f8c4f6bbc63ff76cff12f92f31b</id>
<content type='text'>
Allows cgroup inet skb programs to return values in the range [0, 3].
The second bit is used to deterine if congestion occurred and higher
level protocol should decrease rate. E.g. TCP would call tcp_enter_cwr()

The bpf_prog must set expected_attach_type to BPF_CGROUP_INET_EGRESS
at load time if it uses the new return values (i.e. 2 or 3).

The expected_attach_type is currently not enforced for
BPF_PROG_TYPE_CGROUP_SKB.  e.g Meaning the current bpf_prog with
expected_attach_type setting to BPF_CGROUP_INET_EGRESS can attach to
BPF_CGROUP_INET_INGRESS.  Blindly enforcing expected_attach_type will
break backward compatibility.

This patch adds a enforce_expected_attach_type bit to only
enforce the expected_attach_type when it uses the new
return value.

Signed-off-by: Lawrence Brakmo &lt;brakmo@fb.com&gt;
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: verifier: insert zero extension according to analysis result</title>
<updated>2019-05-25T01:58:37Z</updated>
<author>
<name>Jiong Wang</name>
<email>jiong.wang@netronome.com</email>
</author>
<published>2019-05-24T22:25:15Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=a4b1d3c1ddf6cb441187b6c130a473c16a05a356'/>
<id>urn:sha1:a4b1d3c1ddf6cb441187b6c130a473c16a05a356</id>
<content type='text'>
After previous patches, verifier will mark a insn if it really needs zero
extension on dst_reg.

It is then for back-ends to decide how to use such information to eliminate
unnecessary zero extension code-gen during JIT compilation.

One approach is verifier insert explicit zero extension for those insns
that need zero extension in a generic way, JIT back-ends then do not
generate zero extension for sub-register write at default.

However, only those back-ends which do not have hardware zero extension
want this optimization. Back-ends like x86_64 and AArch64 have hardware
zero extension support that the insertion should be disabled.

This patch introduces new target hook "bpf_jit_needs_zext" which returns
false at default, meaning verifier zero extension insertion is disabled at
default. A back-end could override this hook to return true if it doesn't
have hardware support and want verifier insert zero extension explicitly.

Offload targets do not use this native target hook, instead, they could
get the optimization results using bpf_prog_offload_ops.finalize.

NOTE: arches could have diversified features, it is possible for one arch
to have hardware zero extension support for some sub-register write insns
but not for all. For example, PowerPC, SPARC have zero extended loads, but
not for alu32. So when verifier zero extension insertion enabled, these JIT
back-ends need to peephole insns to remove those zero extension inserted
for insn that actually has hardware zero extension support. The peephole
could be as simple as looking the next insn, if it is a special zero
extension insn then it is safe to eliminate it if the current insn has
hardware zero extension support.

Reviewed-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Signed-off-by: Jiong Wang &lt;jiong.wang@netronome.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: introduce new mov32 variant for doing explicit zero extension</title>
<updated>2019-05-25T01:58:37Z</updated>
<author>
<name>Jiong Wang</name>
<email>jiong.wang@netronome.com</email>
</author>
<published>2019-05-24T22:25:14Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=7d134041a89610ae552501fc88652805addcdee4'/>
<id>urn:sha1:7d134041a89610ae552501fc88652805addcdee4</id>
<content type='text'>
The encoding for this new variant is based on BPF_X format. "imm" field was
0 only, now it could be 1 which means doing zero extension unconditionally

  .code = BPF_ALU | BPF_MOV | BPF_X
  .dst_reg = DST
  .src_reg = SRC
  .imm  = 1

We use this new form for doing zero extension for which verifier will
guarantee SRC == DST.

Implications on JIT back-ends when doing code-gen for
BPF_ALU | BPF_MOV | BPF_X:
  1. No change if hardware already does zero extension unconditionally for
     sub-register write.
  2. Otherwise, when seeing imm == 1, just generate insns to clear high
     32-bit. No need to generate insns for the move because when imm == 1,
     dst_reg is the same as src_reg at the moment.

Interpreter doesn't need change as well. It is doing unconditionally zero
extension for mov32 already.

One helper macro BPF_ZEXT_REG is added to help creating zero extension
insn using this new mov32 variant.

One helper function insn_is_zext is added for checking one insn is an
zero extension on dst. This will be widely used by a few JIT back-ends in
later patches in this set.

Signed-off-by: Jiong Wang &lt;jiong.wang@netronome.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
</feed>
