summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-07-20ipv4: initialize fib_trie prior to register_netdev_notifier call.Mahesh Bandewar
Net stack initialization currently initializes fib-trie after the first call to netdevice_notifier() call. In fact fib_trie initialization needs to happen before first rtnl_register(). It does not cause any problem since there are no devices UP at this moment, but trying to bring 'lo' UP at initialization would make this assumption wrong and exposes the issue. Fixes following crash Call Trace: ? alternate_node_alloc+0x76/0xa0 fib_table_insert+0x1b7/0x4b0 fib_magic.isra.17+0xea/0x120 fib_add_ifaddr+0x7b/0x190 fib_netdev_event+0xc0/0x130 register_netdevice_notifier+0x1c1/0x1d0 ip_fib_init+0x72/0x85 ip_rt_init+0x187/0x1e9 ip_init+0xe/0x1a inet_init+0x171/0x26c ? ipv4_offload_init+0x66/0x66 do_one_initcall+0x43/0x160 kernel_init_freeable+0x191/0x219 ? rest_init+0x80/0x80 kernel_init+0xe/0x150 ret_from_fork+0x22/0x30 Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08 RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: ffff9b1500017c28 CR2: 0000000000000014 Fixes: 7b1a74fdbb9e ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.") Fixes: 7f9b80529b8a ("[IPV4]: fib hash|trie initialization") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20rtnetlink: allocate more memory for dev_set_mac_address()WANG Cong
virtnet_set_mac_address() interprets mac address as struct sockaddr, but upper layer only allocates dev->addr_len which is ETH_ALEN + sizeof(sa_family_t) in this case. We lack a unified definition for mac address, so just fix the upper layer, this also allows drivers to interpret it to struct sockaddr freely. Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20net: dsa: b53: Add missing ARL entries for BCM53125Florian Fainelli
The BCM53125 entry was missing an arl_entries member which would basically prevent the ARL search from terminating properly. This switch has 4 ARL entries, so add that. Fixes: 1da6df85c6fb ("net: dsa: b53: Implement ARL add/del/dump operations") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20Merge branch 'BPF-map-value-adjust-fix'David S. Miller
Daniel Borkmann says: ==================== BPF map value adjust fix First patch in the series is the actual fix and the remaining patches are just updates to selftests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20bpf: more tests for mixed signed and unsigned bounds checksDaniel Borkmann
Add a couple of more test cases to BPF selftests that are related to mixed signed and unsigned checks. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20bpf: add test for mixed signed and unsigned bounds checksEdward Cree
These failed due to a bug in verifier bounds handling. Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20bpf: fix up test cases with mixed signed/unsigned boundsDaniel Borkmann
Fix the few existing test cases that used mixed signed/unsigned bounds and switch them only to one flavor. Reason why we need this is that proper boundaries cannot be derived from mixed tests. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20bpf: allow to specify log level and reduce it for test_verifierDaniel Borkmann
For the test_verifier case, it's quite hard to parse log level 2 to figure out what's causing an issue when used to log level 1. We do want to use bpf_verify_program() in order to simulate some of the tests with strict alignment. So just add an argument to pass the level and put it to 1 for test_verifier. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20bpf: fix mixed signed/unsigned derived min/max value boundsDaniel Borkmann
Edward reported that there's an issue in min/max value bounds tracking when signed and unsigned compares both provide hints on limits when having unknown variables. E.g. a program such as the following should have been rejected: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff8a94cda93400 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+7 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) *(u64 *)(r10 -16) = -8 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = -1 10: (2d) if r1 > r2 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 11: (65) if r1 s> 0x1 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 12: (0f) r0 += r1 13: (72) *(u8 *)(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 14: (b7) r0 = 0 15: (95) exit What happens is that in the first part ... 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = -1 10: (2d) if r1 > r2 goto pc+3 ... r1 carries an unsigned value, and is compared as unsigned against a register carrying an immediate. Verifier deduces in reg_set_min_max() that since the compare is unsigned and operation is greater than (>), that in the fall-through/false case, r1's minimum bound must be 0 and maximum bound must be r2. Latter is larger than the bound and thus max value is reset back to being 'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now 'R1=inv,min_value=0'. The subsequent test ... 11: (65) if r1 s> 0x1 goto pc+2 ... is a signed compare of r1 with immediate value 1. Here, verifier deduces in reg_set_min_max() that since the compare is signed this time and operation is greater than (>), that in the fall-through/false case, we can deduce that r1's maximum bound must be 1, meaning with prior test, we result in r1 having the following state: R1=inv,min_value=0,max_value=1. Given that the actual value this holds is -8, the bounds are wrongly deduced. When this is being added to r0 which holds the map_value(_adj) type, then subsequent store access in above case will go through check_mem_access() which invokes check_map_access_adj(), that will then probe whether the map memory is in bounds based on the min_value and max_value as well as access size since the actual unknown value is min_value <= x <= max_value; commit fce366a9dd0d ("bpf, verifier: fix alu ops against map_value{, _adj} register types") provides some more explanation on the semantics. It's worth to note in this context that in the current code, min_value and max_value tracking are used for two things, i) dynamic map value access via check_map_access_adj() and since commit 06c1c049721a ("bpf: allow helpers access to variable memory") ii) also enforced at check_helper_mem_access() when passing a memory address (pointer to packet, map value, stack) and length pair to a helper and the length in this case is an unknown value defining an access range through min_value/max_value in that case. The min_value/max_value tracking is /not/ used in the direct packet access case to track ranges. However, the issue also affects case ii), for example, the following crafted program based on the same principle must be rejected as well: 0: (b7) r2 = 0 1: (bf) r3 = r10 2: (07) r3 += -512 3: (7a) *(u64 *)(r10 -16) = -8 4: (79) r4 = *(u64 *)(r10 -16) 5: (b7) r6 = -1 6: (2d) if r4 > r6 goto pc+5 R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512 R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 7: (65) if r4 s> 0x1 goto pc+4 R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512 R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 8: (07) r4 += 1 9: (b7) r5 = 0 10: (6a) *(u16 *)(r10 -512) = 0 11: (85) call bpf_skb_load_bytes#26 12: (b7) r0 = 0 13: (95) exit Meaning, while we initialize the max_value stack slot that the verifier thinks we access in the [1,2] range, in reality we pass -7 as length which is interpreted as u32 in the helper. Thus, this issue is relevant also for the case of helper ranges. Resetting both bounds in check_reg_overflow() in case only one of them exceeds limits is also not enough as similar test can be created that uses values which are within range, thus also here learned min value in r1 is incorrect when mixed with later signed test to create a range: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff880ad081fa00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+7 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) *(u64 *)(r10 -16) = -8 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = 2 10: (3d) if r2 >= r1 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 11: (65) if r1 s> 0x4 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 12: (0f) r0 += r1 13: (72) *(u8 *)(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4 R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 14: (b7) r0 = 0 15: (95) exit This leaves us with two options for fixing this: i) to invalidate all prior learned information once we switch signed context, ii) to track min/max signed and unsigned boundaries separately as done in [0]. (Given latter introduces major changes throughout the whole verifier, it's rather net-next material, thus this patch follows option i), meaning we can derive bounds either from only signed tests or only unsigned tests.) There is still the case of adjust_reg_min_max_vals(), where we adjust bounds on ALU operations, meaning programs like the following where boundaries on the reg get mixed in context later on when bounds are merged on the dst reg must get rejected, too: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff89b2bf87ce00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+6 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) *(u64 *)(r10 -16) = -8 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = 2 10: (3d) if r2 >= r1 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 11: (b7) r7 = 1 12: (65) if r7 s> 0x0 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp 13: (b7) r0 = 0 14: (95) exit from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp 15: (0f) r7 += r1 16: (65) if r7 s> 0x4 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp 17: (0f) r0 += r7 18: (72) *(u8 *)(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp 19: (b7) r0 = 0 20: (95) exit Meaning, in adjust_reg_min_max_vals() we must also reset range values on the dst when src/dst registers have mixed signed/ unsigned derived min/max value bounds with one unbounded value as otherwise they can be added together deducing false boundaries. Once both boundaries are established from either ALU ops or compare operations w/o mixing signed/unsigned insns, then they can safely be added to other regs also having both boundaries established. Adding regs with one unbounded side to a map value where the bounded side has been learned w/o mixing ops is possible, but the resulting map value won't recover from that, meaning such op is considered invalid on the time of actual access. Invalid bounds are set on the dst reg in case i) src reg, or ii) in case dst reg already had them. The only way to recover would be to perform i) ALU ops but only 'add' is allowed on map value types or ii) comparisons, but these are disallowed on pointers in case they span a range. This is fine as only BPF_JEQ and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers which potentially turn them into PTR_TO_MAP_VALUE type depending on the branch, so only here min/max value cannot be invalidated for them. In terms of state pruning, value_from_signed is considered as well in states_equal() when dealing with adjusted map values. With regards to breaking existing programs, there is a small risk, but use-cases are rather quite narrow where this could occur and mixing compares probably unlikely. Joint work with Josef and Edward. [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html Fixes: 484611357c19 ("bpf: allow access into map value arrays") Reported-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20Merge tag 'pm-4.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These are two stable-candidate fixes for the intel_pstate driver and the generic power domains (genpd) framework. Specifics: - Fix the average CPU load computations in the intel_pstate driver on Knights Landing (Xeon Phi) processors that require an extra factor to compensate for a rate change differences between the TSC and MPERF which is missing (Srinivas Pandruvada). - Fix an initialization ordering issue in the generic power domains (genpd) framework (Sudeep Holla)" * tag 'pm-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present cpufreq: intel_pstate: Correct the busy calculation for KNL
2017-07-20x86: mark kprobe templates as character arrays, not single charactersLinus Torvalds
They really are, and the "take the address of a single character" makes the string fortification code unhappy (it believes that you can now only acccess one byte, rather than a byte range, and then raises errors for the memory copies going on in there). We could now remove a few 'addressof' operators (since arrays naturally degrade to pointers), but this is the minimal patch that just changes the C prototypes of those template arrays (the templates themselves are defined in inline asm). Reported-by: kernel test robot <xiaolong.ye@intel.com> Acked-and-tested-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-20Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc filesystem fixes from Jan Kara: "Several ACL related fixes for ext2, reiserfs, and hfsplus. And also one minor isofs cleanup" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: hfsplus: Don't clear SGID when inheriting ACLs isofs: Fix off-by-one in 'session' mount option parsing reiserfs: preserve i_mode if __reiserfs_set_acl() fails ext2: preserve i_mode if ext2_set_acl() fails ext2: Don't clear SGID when inheriting ACLs reiserfs: Don't clear SGID when inheriting ACLs
2017-07-20Merge tag 'for-f2fs-v4.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "We've filed some bug fixes: - missing f2fs case in terms of stale SGID bit, introduced by Jan - build error for seq_file.h - avoid cpu lockup - wrong inode_unlock in error case" * tag 'for-f2fs-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: avoid cpu lockup f2fs: include seq_file.h for sysfs.c f2fs: Don't clear SGID when inheriting ACLs f2fs: remove extra inode_unlock() in error path
2017-07-20Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds
Pull audit fix from Paul Moore: "A small audit fix, just a single line, to plug a memory leak in some audit error handling code" * 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit: audit: fix memleak in auditd_send_unicast_skb.
2017-07-20Merge tag 'libnvdimm-fixes-4.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A handful of small fixes for 4.13-rc2. Three of these fixes are tagged for -stable. They have all appeared in at least one -next release with no reported issues - Fix handling of media errors that span a sector - Fix support of multiple namespaces in a libnvdimm region being in device-dax mode - Clean up the machine check notifier properly when the nfit driver fails to register - Address a static analysis (smatch) report in device-dax" * tag 'libnvdimm-fixes-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: fix sysfs duplicate warnings MAINTAINERS: list drivers/acpi/nfit/ files for libnvdimm sub-system acpi/nfit: Fix memory corruption/Unregister mce decoder on failure device-dax: fix 'passing zero to ERR_PTR()' warning libnvdimm: fix badblock range handling of ARS range
2017-07-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: - HID multitouch 4.12 regression fix from Dmitry Torokhov - error handling fix for HID++ driver from Gustavo A. R. Silva * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value HID: multitouch: do not blindly set EV_KEY or EV_ABS bits
2017-07-20Merge branches 'intel_pstate' and 'pm-domains'Rafael J. Wysocki
* intel_pstate: cpufreq: intel_pstate: Correct the busy calculation for KNL * pm-domains: PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
2017-07-20RDMA/core: Initialize port_num in qp_attrIsmail, Mustafa
Initialize the port_num for iWARP in rdma_init_qp_attr. Fixes: 5ecce4c9b17b("Check port number supplied by user verbs cmds") Cc: <stable@vger.kernel.org> # v2.6.14+ Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/uverbs: Fix the check for port numberIsmail, Mustafa
The port number is only valid if IB_QP_PORT is set in the mask. So only check port number if it is valid to prevent modify_qp from failing due to an invalid port number. Fixes: 5ecce4c9b17b("Check port number supplied by user verbs cmds") Cc: <stable@vger.kernel.org> # v2.6.14+ Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20IB/cma: Fix reference count leak when no ipv4 addresses are setKalderon, Michal
Once in_dev_get is called to receive in_device pointer, the in_device reference counter is increased, but if there are no ipv4 addresses configured on the net-device the ifa_list will be null, resulting in a flow that doesn't call in_dev_put to decrease the ref_cnt. This was exposed when running RoCE over ipv6 without any ipv4 addresses configured Fixes: commit 8e3867310c90 ("IB/cma: Fix a race condition in iboe_addr_get_sgid()") Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/iser: don't send an rkey if all data is written as immadiate-dataSagi Grimberg
We might get some bogus error completions in case the target will remotely invalidate the rkey and the HCA will need to retransmit from this buffer. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20rxe: fix broken receive queue drainingVijay Immanuel
If we modified the qp to ERROR state, and drained the recieve queue, post_recv must trigger the responder task to complete the drain work request. Cc: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>-- Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/qedr: Prevent memory overrun in verbs' user responsesAmrani, Ram
Wrap ib_copy_to_udata with a function that ensures that the data being copied over to user space isn't longer than the allowed. Fixes: cecbcddf6461 ("qedr: Add support for QP verbs") Fixes: a7efd7773e31 ("qedr: Add support for PD,PKEY and CQ verbs") Fixes: ac1b36e55a51 ("qedr: Add support for user context verbs") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20iw_cxgb4: don't use WR keys/addrs for 0 byte readsGanesh Goudar
Only use the read sge lkey/addr and the remote rkey/addr if the length of the read is not zero. Otherwise the read response might be treated as the RTR read response and not delivered to the application. Or worse Terminator hardware will fail a 0B read if the STAG is 0 even if the read length is 0. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20IB/mlx4: Fix CM REQ retries in paravirt modeHåkon Bugge
CM REQs cannot be successfully retried, because a new pv_cm_id is created for each request, without checking if one already exists. By checking if an id exists before creating one, the bug is fixed. This bug can be provoked by running an RDMA CM user-land application, but inserting a five seconds delay before the rdma_accept() call on the passive side. This delay is larger than the default CMA timeout, and triggers a retry from the active side. The retried REQ will use another pv_cm_id (the cm_id on the wire). This confuses the CM protocol and two REJs are sent from the passive side. Here is an excerpt from ibdump running without the patch: 3.285092 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.382711 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.382861 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject 7.387644 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject and here is the same with bug fix applied: 3.251010 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.349387 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 8.258443 LID: 4 -> LID: 4 SDP 290 CM: ConnectReply(SDP Hello) 8.259890 LID: 4 -> LID: 4 InfiniBand 290 CM: ReadyToUse Suggested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reported-by: Wei Lin Guay <wei.lin.guay@oracle.com> Tested-by: Wei Lin Guay <wei.lin.guay@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20IB/rdmavt: Setting of QP timeout can overflow jiffies computationKaike Wan
Current computation of qp->timeout_jiffies in rvt_modify_qp() will cause overflow due to the fact that the input to the function usecs_to_jiffies is only 32-bit ( unsigned int). Overflow will occur when attr->timeout is equal to or greater than 30. The consequence is unnecessarily excessive retry and thus degradation of the system performance. This patch fixes the problem by limiting the input to 5-bit and calling usecs_to_jiffies() before multiplying the scaling factor. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20IB/core: Fix sparse warningsMatan Barak
Delete unused variables to prevent sparse warnings. Fixes: db1b5ddd5336 ("IB/core: Rename uverbs event file structure") Fixes: fd3c7904db6e ("IB/core: Change idr objects to use the new schema") Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/bnxt_re: Fix the value reported for local ack delaySelvin Xavier
Local ack delay exposed by the driver is 0 which means infinite QP timeout. Reporting the default value to 16 (approx 260ms) Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/bnxt_re: Report MISSED_EVENTS in req_notify_cqSelvin Xavier
While invoking the req_notify_cq hook, ULPs can request whether the CQs have any CQEs pending. If CQEs are pending, drivers can indicate it by returning 1 for req_notify_cq. The stack will poll CQ again till CQ is empty. This patch peeks the CQ for any valid entries and return accordingly. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/bnxt_re: Fix return value of poll routineDevesh Sharma
Fix the incorrect reporting of number of polled entries by taking into account the max CQ depth in the driver. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/bnxt_re: Enable atomics only if host bios supportsDevesh Sharma
Driver shall check if the host system bios has enabled Atomic operations capability in PCI Device Control 2 register of the pci-device. Expose the ATOMIC_HCA flag only if the Atomic operations capability is set. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/bnxt_re: Specify RDMA component when allocating stats contextSomnath Kotur
Starting FW version 20.6.47, firmware is keeping separate statistics for L2 and RDMA. However, driver needs to specify RDMA or not when allocating stat_ctx. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/bnxt_re: Fixed the max_rd_atomic support for initiator and destination QPEddie Wai
There's a couple of bugs in the support of max_rd_atomic and max_dest_rd_atomic. In the modify_qp, if the requested max_rd_atomic, which is the ORRQ size, is greater than what the chip can support, then we have to cap the request to chip max as we can't have the HW overflow the ORRQ. Capping the max_rd_atomic support internally is okay to do as the remaining read/atomic WRs will still be sitting in the SQ. However, for the max_dest_rd_atomic, the driver has to error out as this dictates the IRRQ size and we can't control what the remote side sends. Signed-off-by: Eddie Wai <eddie.wai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/bnxt_re: Report supported value to IB stack in query_deviceSelvin Xavier
- Report supported value for max_mr_size to IB stack in query_device. Also, check and log if MR size requested by application in reg_user_mr() is greater than value currently supported by driver. - Report only 4K page size support for now - Fix Max_QP value returned by ibv_devinfo -vv. In case of PF, FW reserves 129 QPs for creating QP1s of VFs and PF. So the max_qp value reported by FW for PF doesn'tt include the QP1. Fixing this issue by adding 1 with the value reported by FW. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID failsSelvin Xavier
This fix is added only to avoid system crash in some a specific scenario. When bnxt_re driver is loaded and if user tries to change interface mac address, delete GID fails because QP1 is still associated with existing MAC (default GID). If the above command fails GID tables are not modified in the h/w or driver, but the GID context memory is freed. Now, if the user changes the mac back to the original value, another add_gid comes to the driver where the driver reports that the GID is already present in its table and tries to access the context which was already freed. So, in this case, in order to avoid NULL pointer de-reference, this patch removes the context memory free if delete_gid fails and the same context memory is re-used in new add_gid. Memory cleanup will be taken care during driver unload, while deleting the GID table. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/bnxt_re: Fix WQE Size posted to HW to prevent it from throwing errorSomnath Kotur
Posting WQE size of 2 results in a WQE_FORMAT_ERROR thrown by the HW as it requires host to supply WQE Size with room for atleast one SGE so that the resulting WQE size be atleast 3. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/bnxt_re: Free doorbell page index (DPI) during dealloc ucontextDevesh Sharma
The driver must free the DPI during the dealloc_ucontext instead of freeing it during dealloc_pd. However, the DPI allocation scheme remains unchanged. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20IB/mlx5: Fix a warning messageDan Carpenter
"umem" is a valid pointer. We intended to print "*umem" or even just "err" instead. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/ocrdma: Fix error codes in ocrdma_create_srq()Dan Carpenter
If either of these allocations fail then we return ERR_PTR(0). That's equivalent to NULL and results in a NULL pointer dereference in the caller. Fixes: fe2caefcdf58 ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/ocrdma: Fix an error code in ocrdma_alloc_pd()Dan Carpenter
We should preserve the original "status" error code instead of resetting it to zero. Returning ERR_PTR(0) is the same as NULL and results in a NULL dereference in the callers. I added a printk() on error instead. Fixes: 45e86b33ec8b ("RDMA/ocrdma: Cache recv DB until QP moved to RTR") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20IB/cxgb3: Fix error codes in iwch_alloc_mr()Dan Carpenter
We accidentally don't set the error code on some error paths. It means return ERR_PTR(0) which is NULL and results in a NULL dereference in the caller. Fixes: 13a239330abd ("RDMA/cxgb3: Don't ignore insert_handle() failures") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20cxgb4: Fix error codes in c4iw_create_cq()Dan Carpenter
If one of these kmalloc() calls fails then we return ERR_PTR(0) which is NULL. It results in a NULL dereference in the callers. Fixes: cfdda9d76436 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20IB/i40iw: Fix error code in i40iw_create_cq()Dan Carpenter
We accidentally forgot to set the error code if ib_copy_from_udata() fails. It means we return ERR_PTR(0) which is NULL and results in a NULL dereference in the callers. Fixes: d37498417947 ("i40iw: add files for iwarp interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20IB/IPoIB: Fix error code in ipoib_add_port()Dan Carpenter
We accidentally don't see the error code on some of these error paths. It means we return ERR_PTR(0) which is NULL and it results in a NULL dereference in the caller. This bug dates to pre-git days. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20RDMA/bnxt_re: checking for NULL instead of IS_ERR()Dan Carpenter
bnxt_re_alloc_mw() doesn't return NULL, it returns error pointers. Fixes: 9152e0b722b2 ("RDMA/bnxt_re: HW workarounds for handling specific conditions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20i40iw: Free QP PBLEs when the QP is destroyedTatyana Nikolova
If the physical buffer list entries (PBLEs) of a QP are freed up at i40iw_dereg_mr, they can be assigned to a newly created QP before the previous QP is destroyed. Fix this by freeing PBLEs only when the QP is destroyed. Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20i40iw: Avoid memory leak of CQP request objectsShiraz Saleem
Control Queue Pair (CQP) request objects, which have not received a completion upon interface close, remain in memory. To fix this, identify and free all pending CQP request objects during destroy CQP OP. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20i40iw: Update list correctlyHenry Orosco
To avoid infinite loop, in i40iw_ieq_handle_exception, update plist inside while loop. Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20i40iw: Add missing memory barrierHenry Orosco
Add missing write memory barrier before writing the header containing valid bit to the WQE in i40iw_puda_send. Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20i40iw: Free QP resources on CQP destroy QP failureShiraz Saleem
Current flow leaves software QP structures in memory if Control Queue Pair (CQP) destroy QP OP fails. To fix this, free QP resources on fail of CQP destroy QP OP. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>