summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-11-30KVM: SEV: Do COPY_ENC_CONTEXT_FROM with both VMs lockedPaolo Bonzini
Now that we have a facility to lock two VMs with deadlock protection, use it for the creation of mirror VMs as well. One of COPY_ENC_CONTEXT_FROM(dst, src) and COPY_ENC_CONTEXT_FROM(src, dst) would always fail, so the combination is nonsensical and it is okay to return -EBUSY if it is attempted. This sidesteps the question of what happens if a VM is MOVE_ENC_CONTEXT_FROM'd at the same time as it is COPY_ENC_CONTEXT_FROM'd: the locking prevents that from happening. Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211123005036.2954379-10-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30selftests: sev_migrate_tests: add tests for KVM_CAP_VM_COPY_ENC_CONTEXT_FROMPaolo Bonzini
I am putting the tests in sev_migrate_tests because the failure conditions are very similar and some of the setup code can be reused, too. The tests cover both successful creation of a mirror VM, and error conditions. Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Message-Id: <20211123005036.2954379-9-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: SEV: move mirror status to destination of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROMPaolo Bonzini
Allow intra-host migration of a mirror VM; the destination VM will be a mirror of the same ASID as the source. Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration") Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211123005036.2954379-8-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: SEV: initialize regions_list of a mirror VMPaolo Bonzini
This was broken before the introduction of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM, but technically harmless because the region list was unused for a mirror VM. However, it is untidy and it now causes a NULL pointer access when attempting to move the encryption context of a mirror VM. Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context") Message-Id: <20211123005036.2954379-7-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: SEV: cleanup locking for KVM_CAP_VM_MOVE_ENC_CONTEXT_FROMPaolo Bonzini
Encapsulate the handling of the migration_in_progress flag for both VMs in two functions sev_lock_two_vms and sev_unlock_two_vms. It does not matter if KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM locks the destination struct kvm a bit later, and this change 1) keeps the cleanup chain of labels smaller 2) makes it possible for KVM_CAP_VM_COPY_ENC_CONTEXT_FROM to reuse the logic. Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Message-Id: <20211123005036.2954379-6-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: SEV: do not use list_replace_init on an empty listPaolo Bonzini
list_replace_init cannot be used if the source is an empty list, because "new->next->prev = new" will overwrite "old->next": new old prev = new, next = new prev = old, next = old new->next = old->next prev = new, next = old prev = old, next = old new->next->prev = new prev = new, next = old prev = old, next = new new->prev = old->prev prev = old, next = old prev = old, next = old new->next->prev = new prev = old, next = old prev = new, next = new The desired outcome instead would be to leave both old and new the same as they were (two empty circular lists). Use list_cut_before, which already has the necessary check and is documented to discard the previous contents of the list that will hold the result. Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration") Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211123005036.2954379-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: x86: Use a stable condition around all VT-d PI pathsPaolo Bonzini
Currently, checks for whether VT-d PI can be used refer to the current status of the feature in the current vCPU; or they more or less pick vCPU 0 in case a specific vCPU is not available. However, these checks do not attempt to synchronize with changes to the IRTE. In particular, there is no path that updates the IRTE when APICv is re-activated on vCPU 0; and there is no path to wakeup a CPU that has APICv disabled, if the wakeup occurs because of an IRTE that points to a posted interrupt. To fix this, always go through the VT-d PI path as long as there are assigned devices and APICv is available on both the host and the VM side. Since the relevant condition was copied over three times, take the hint and factor it into a separate function. Suggested-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: David Matlack <dmatlack@google.com> Message-Id: <20211123004311.2954158-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: x86: check PIR even for vCPUs with disabled APICvPaolo Bonzini
The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even if APICv is disabled on the vCPU that receives it. In that case, the interrupt will just cause a vmexit and leave the ON bit set together with the PIR bit corresponding to the interrupt. Right now, the interrupt would not be delivered until APICv is re-enabled. However, fixing this is just a matter of always doing the PIR->IRR synchronization, even if the vCPU has temporarily disabled APICv. This is not a problem for performance, or if anything it is an improvement. First, in the common case where vcpu->arch.apicv_active is true, one fewer check has to be performed. Second, static_call_cond will elide the function call if APICv is not present or disabled. Finally, in the case for AMD hardware we can remove the sync_pir_to_irr callback: it is only needed for apic_has_interrupt_for_ppr, and that function already has a fallback for !APICv. Cc: stable@vger.kernel.org Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: David Matlack <dmatlack@google.com> Message-Id: <20211123004311.2954158-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: VMX: prepare sync_pir_to_irr for running with APICv disabledPaolo Bonzini
If APICv is disabled for this vCPU, assigned devices may still attempt to post interrupts. In that case, we need to cancel the vmentry and deliver the interrupt with KVM_REQ_EVENT. Extend the existing code that handles injection of L1 interrupts into L2 to cover this case as well. vmx_hwapic_irr_update is only called when APICv is active so it would be confusing to add a check for vcpu->arch.apicv_active in there. Instead, just use vmx_set_rvi directly in vmx_sync_pir_to_irr. Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211123004311.2954158-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: selftests: page_table_test: fix calculation of guest_test_phys_memMaciej S. Szmigiero
A kvm_page_table_test run with its default settings fails on VMX due to memory region add failure: > ==== Test Assertion Failure ==== > lib/kvm_util.c:952: ret == 0 > pid=10538 tid=10538 errno=17 - File exists > 1 0x00000000004057d1: vm_userspace_mem_region_add at kvm_util.c:947 > 2 0x0000000000401ee9: pre_init_before_test at kvm_page_table_test.c:302 > 3 (inlined by) run_test at kvm_page_table_test.c:374 > 4 0x0000000000409754: for_each_guest_mode at guest_modes.c:53 > 5 0x0000000000401860: main at kvm_page_table_test.c:500 > 6 0x00007f82ae2d8554: ?? ??:0 > 7 0x0000000000401894: _start at ??:? > KVM_SET_USER_MEMORY_REGION IOCTL failed, > rc: -1 errno: 17 > slot: 1 flags: 0x0 > guest_phys_addr: 0xc0000000 size: 0x40000000 This is because the memory range that this test is trying to add (0x0c0000000 - 0x100000000) conflicts with LAPIC mapping at 0x0fee00000. Looking at the code it seems that guest_test_*phys*_mem variable gets mistakenly overwritten with guest_test_*virt*_mem while trying to adjust the former for alignment. With the correct variable adjusted this test runs successfully. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <52e487458c3172923549bbcf9dfccfbe6faea60b.1637940473.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: x86/mmu: Handle "default" period when selectively waking kthreadSean Christopherson
Account for the '0' being a default, "let KVM choose" period, when determining whether or not the recovery worker needs to be awakened in response to userspace reducing the period. Failure to do so results in the worker not being awakened properly, e.g. when changing the period from '0' to any small-ish value. Fixes: 4dfe4f40d845 ("kvm: x86: mmu: Make NX huge page recovery period configurable") Cc: stable@vger.kernel.org Cc: Junaid Shahid <junaids@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211120015706.3830341-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: MMU: shadow nested paging does not have PKUPaolo Bonzini
Initialize the mask for PKU permissions as if CR4.PKE=0, avoiding incorrect interpretations of the nested hypervisor's page tables. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible pathSean Christopherson
Drop the "flush" param and return values to/from the TDP MMU's helper for zapping collapsible SPTEs. Because the helper runs with mmu_lock held for read, not write, it uses tdp_mmu_zap_spte_atomic(), and the atomic zap handles the necessary remote TLB flush. Similarly, because mmu_lock is dropped and re-acquired between zapping legacy MMUs and zapping TDP MMUs, kvm_mmu_zap_collapsible_sptes() must handle remote TLB flushes from the legacy MMU before calling into the TDP MMU. Fixes: e2209710ccc5d ("KVM: x86/mmu: Skip rmap operations if rmaps not allocated") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211120045046.3940942-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmappingSean Christopherson
Use the yield-safe variant of the TDP MMU iterator when handling an unmapping event from the MMU notifier, as most occurences of the event allow yielding. Fixes: e1eed5847b09 ("KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possible") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211120015008.3780032-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-29net: mscc: ocelot: fix missing unlock on error in ocelot_hwstamp_set()Wei Yongjun
Add the missing mutex_unlock before return from function ocelot_hwstamp_set() in the ocelot_setup_ptp_traps() error handling case. Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211129151652.1165433-1-weiyongjun1@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29Merge tag 'rxrpc-fixes-20211129' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Leak fixes Here are a couple of fixes for leaks in AF_RXRPC: (1) Fix a leak of rxrpc_peer structs in rxrpc_look_up_bundle(). (2) Fix a leak of rxrpc_local structs in rxrpc_lookup_peer(). * tag 'rxrpc-fixes-20211129' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: rxrpc: Fix rxrpc_local leak in rxrpc_lookup_peer() rxrpc: Fix rxrpc_peer leak in rxrpc_look_up_bundle() ==================== Link: https://lore.kernel.org/r/163820097905.226370.17234085194655347888.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29Merge branch 'wireguard-siphash-patches-for-5-16-rc6'Jakub Kicinski
Jason A. Donenfeld says: ==================== wireguard/siphash patches for 5.16-rc Here's quite a largeish set of stable patches I've had queued up and testing for a number of months now: - Patch (1) squelches a sparse warning by fixing an annotation. - Patches (2), (3), and (5) are minor improvements and fixes to the test suite. - Patch (4) is part of a tree-wide cleanup to have module-specific init and exit functions. - Patch (6) fixes a an issue with dangling dst references, by having a function to release references immediately rather than deferring, and adds an associated test case to prevent this from regressing. - Patches (7) and (8) help mitigate somewhat a potential DoS on the ingress path due to the use of skb_list's locking hitting contention on multiple cores by switching to using a ring buffer and dropping packets on contention rather than locking up another core spinning. - Patch (9) switches kvzalloc to kvcalloc for better form. - Patch (10) fixes alignment traps in siphash with clang-13 (and maybe other compilers) on armv6, by switching to using the unaligned functions by default instead of the aligned functions by default. ==================== Link: https://lore.kernel.org/r/20211129153929.3457-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29siphash: use _unaligned version by defaultArnd Bergmann
On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because the ordinary load/store instructions (ldr, ldrh, ldrb) can tolerate any misalignment of the memory address. However, load/store double and load/store multiple instructions (ldrd, ldm) may still only be used on memory addresses that are 32-bit aligned, and so we have to use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we may end up with a severe performance hit due to alignment traps that require fixups by the kernel. Testing shows that this currently happens with clang-13 but not gcc-11. In theory, any compiler version can produce this bug or other problems, as we are dealing with undefined behavior in C99 even on architectures that support this in hardware, see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363. Fortunately, the get_unaligned() accessors do the right thing: when building for ARMv6 or later, the compiler will emit unaligned accesses using the ordinary load/store instructions (but avoid the ones that require 32-bit alignment). When building for older ARM, those accessors will emit the appropriate sequence of ldrb/mov/orr instructions. And on architectures that can truly tolerate any kind of misalignment, the get_unaligned() accessors resolve to the leXX_to_cpup accessors that operate on aligned addresses. Since the compiler will in fact emit ldrd or ldm instructions when building this code for ARM v6 or later, the solution is to use the unaligned accessors unconditionally on architectures where this is known to be fast. The _aligned version of the hash function is however still needed to get the best performance on architectures that cannot do any unaligned access in hardware. This new version avoids the undefined behavior and should produce the fastest hash on all architectures we support. Link: https://lore.kernel.org/linux-arm-kernel/20181008211554.5355-4-ard.biesheuvel@linaro.org/ Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEAqT=8dQfa0=cqQ@mail.gmail.com/ Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Fixes: 2c956a60778c ("siphash: add cryptographically secure PRF") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29wireguard: ratelimiter: use kvcalloc() instead of kvzalloc()Gustavo A. R. Silva
Use 2-factor argument form kvcalloc() instead of kvzalloc(). Link: https://github.com/KSPP/linux/issues/162 Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> [Jason: Gustavo's link above is for KSPP, but this isn't actually a security fix, as table_size is bounded to 8192 anyway, and gcc realizes this, so the codegen comes out to be about the same.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29wireguard: receive: drop handshakes if queue lock is contendedJason A. Donenfeld
If we're being delivered packets from multiple CPUs so quickly that the ring lock is contended for CPU tries, then it's safe to assume that the queue is near capacity anyway, so just drop the packet rather than spinning. This helps deal with multicore DoS that can interfere with data path performance. It _still_ does not completely fix the issue, but it again chips away at it. Reported-by: Streun Fabio <fstreun@student.ethz.ch> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29wireguard: receive: use ring buffer for incoming handshakesJason A. Donenfeld
Apparently the spinlock on incoming_handshake's skb_queue is highly contended, and a torrent of handshake or cookie packets can bring the data plane to its knees, simply by virtue of enqueueing the handshake packets to be processed asynchronously. So, we try switching this to a ring buffer to hopefully have less lock contention. This alleviates the problem somewhat, though it still isn't perfect, so future patches will have to improve this further. However, it at least doesn't completely diminish the data plane. Reported-by: Streun Fabio <fstreun@student.ethz.ch> Reported-by: Joel Wanner <joel.wanner@inf.ethz.ch> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29wireguard: device: reset peer src endpoint when netns exitsJason A. Donenfeld
Each peer's endpoint contains a dst_cache entry that takes a reference to another netdev. When the containing namespace exits, we take down the socket and prevent future sockets from being created (by setting creating_net to NULL), which removes that potential reference on the netns. However, it doesn't release references to the netns that a netdev cached in dst_cache might be taking, so the netns still might fail to exit. Since the socket is gimped anyway, we can simply clear all the dst_caches (by way of clearing the endpoint src), which will release all references. However, the current dst_cache_reset function only releases those references lazily. But it turns out that all of our usages of wg_socket_clear_peer_endpoint_src are called from contexts that are not exactly high-speed or bottle-necked. For example, when there's connection difficulty, or when userspace is reconfiguring the interface. And in particular for this patch, when the netns is exiting. So for those cases, it makes more sense to call dst_release immediately. For that, we add a small helper function to dst_cache. This patch also adds a test to netns.sh from Hangbin Liu to ensure this doesn't regress. Tested-by: Hangbin Liu <liuhangbin@gmail.com> Reported-by: Xiumei Mu <xmu@redhat.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Paolo Abeni <pabeni@redhat.com> Fixes: 900575aa33a3 ("wireguard: device: avoid circular netns references") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29wireguard: selftests: rename DEBUG_PI_LIST to DEBUG_PLISTLi Zhijian
DEBUG_PI_LIST was renamed to DEBUG_PLIST since 8e18faeac3 ("lib/plist: rename DEBUG_PI_LIST to DEBUG_PLIST"). Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Fixes: 8e18faeac3e4 ("lib/plist: rename DEBUG_PI_LIST to DEBUG_PLIST") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29wireguard: main: rename 'mod_init' & 'mod_exit' functions to be module-specificRandy Dunlap
Rename module_init & module_exit functions that are named "mod_init" and "mod_exit" so that they are unique in both the System.map file and in initcall_debug output instead of showing up as almost anonymous "mod_init". This is helpful for debugging and in determining how long certain module_init calls take to execute. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29wireguard: selftests: actually test for routing loopsJason A. Donenfeld
We previously removed the restriction on looping to self, and then added a test to make sure the kernel didn't blow up during a routing loop. The kernel didn't blow up, thankfully, but on certain architectures where skb fragmentation is easier, such as ppc64, the skbs weren't actually being discarded after a few rounds through. But the test wasn't catching this. So actually test explicitly for massive increases in tx to see if we have a routing loop. Note that the actual loop problem will need to be addressed in a different commit. Fixes: b673e24aad36 ("wireguard: socket: remove errant restriction on looping to self") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29wireguard: selftests: increase default dmesg log sizeJason A. Donenfeld
The selftests currently parse the kernel log at the end to track potential memory leaks. With these tests now reading off the end of the buffer, due to recent optimizations, some creation messages were lost, making the tests think that there was a free without an alloc. Fix this by increasing the kernel log size. Fixes: 24b70eeeb4f4 ("wireguard: use synchronize_net rather than synchronize_rcu") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29wireguard: allowedips: add missing __rcu annotation to satisfy sparseJason A. Donenfeld
A __rcu annotation got lost during refactoring, which caused sparse to become enraged. Fixes: bf7b042dc62a ("wireguard: allowedips: free empty intermediate nodes when removing single node") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29scsi: ufs: ufs-pci: Add support for Intel ADLAdrian Hunter
Add PCI ID and callbacks to support Intel Alder Lake. Link: https://lore.kernel.org/r/20211124204218.1784559-1-adrian.hunter@intel.com Cc: stable@vger.kernel.org # v5.15+ Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-11-29drm/i915: Add support for panels with VESA backlights with PWM enable/disableLyude Paul
This simply adds proper support for panel backlights that can be controlled via VESA's backlight control protocol, but which also require that we enable and disable the backlight via PWM instead of via the DPCD interface. We also enable this by default, in order to fix some people's backlights that were broken by not having this enabled. For reference, backlights that require this and use VESA's backlight interface tend to be laptops with hybrid GPUs, but this very well may change in the future. v4: * Make sure that we call intel_backlight_level_to_pwm() in intel_dp_aux_vesa_enable_backlight() - vsyrjala Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/3680 Fixes: fe7d52bccab6 ("drm/i915/dp: Don't use DPCD backlights that need PWM enable/disable") Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.12+ Link: https://patchwork.freedesktop.org/patch/msgid/20211105183342.130810-2-lyude@redhat.com (cherry picked from commit 04f0d6cc62cc1eaf9242c081520c024a17ba86a3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2021-11-29netfs: Adjust docs after foliationDavid Howells
Adjust the netfslib docs in light of the foliation changes. Also un-kdoc-mark netfs_skip_folio_read() since it's internal and isn't part of the API. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-cachefs@redhat.com cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/163706992597.3179783.18360472879717076435.stgit@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-29mt76: fix key pointer overwrite in mt7921s_write_txwi/mt7663_usb_sdio_write_txwiLorenzo Bianconi
Fix pointer overwrite in mt7921s_tx_prepare_skb and mt7663_usb_sdio_tx_prepare_skb routines since in commit '2a9e9857473b ("mt76: fix possible pktid leak") mt76_tx_status_skb_add() has been moved out of mt7921s_write_txwi()/mt7663_usb_sdio_write_txwi() overwriting hw key pointer in ieee80211_tx_info structure. Fix the issue saving key pointer before running mt76_tx_status_skb_add(). Fixes: 2a9e9857473b ("mt76: fix possible pktid leak") Tested-by: Deren Wu <deren.wu@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/eba40c84b6d114f618e2ae486cc6d0f2e9272cf9.1638193069.git.lorenzo@kernel.org
2021-11-29rxrpc: Fix rxrpc_local leak in rxrpc_lookup_peer()Eiichi Tsukata
Need to call rxrpc_put_local() for peer candidate before kfree() as it holds a ref to rxrpc_local. [DH: v2: Changed to abstract the peer freeing code out into a function] Fixes: 9ebeddef58c4 ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record") Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/all/20211121041608.133740-2-eiichi.tsukata@nutanix.com/ # v1
2021-11-29rxrpc: Fix rxrpc_peer leak in rxrpc_look_up_bundle()Eiichi Tsukata
Need to call rxrpc_put_peer() for bundle candidate before kfree() as it holds a ref to rxrpc_peer. [DH: v2: Changed to abstract out the bundle freeing code into a function] Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20211121041608.133740-1-eiichi.tsukata@nutanix.com/ # v1
2021-11-29ipv6: fix memory leak in fib6_rule_suppressmsizanoen1
The kernel leaks memory when a `fib` rule is present in IPv6 nftables firewall rules and a suppress_prefix rule is present in the IPv6 routing rules (used by certain tools such as wg-quick). In such scenarios, every incoming packet will leak an allocation in `ip6_dst_cache` slab cache. After some hours of `bpftrace`-ing and source code reading, I tracked down the issue to ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule"). The problem with that change is that the generic `args->flags` always have `FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag `RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not decreasing the refcount when needed. How to reproduce: - Add the following nftables rule to a prerouting chain: meta nfproto ipv6 fib saddr . mark . iif oif missing drop This can be done with: sudo nft create table inet test sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }' sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop - Run: sudo ip -6 rule add table main suppress_prefixlength 0 - Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase with every incoming ipv6 packet. This patch exposes the protocol-specific flags to the protocol specific `suppress` function, and check the protocol-specific `flags` argument for RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when decreasing the refcount, like this. [1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71 [2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99 Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105 Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29Merge branch 'atlantic-fixes'David S. Miller
Sudarsana Reddy Kalluru says: ==================== net: atlantic: 11-2021 fixes The patch series contains fixes for atlantic driver to improve support of latest AQC113 chipset. Please consider applying it to 'net' tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29atlantic: Remove warn trace message.Sameer Saurabh
Remove the warn trace message - it's not a correct check here, because the function can still be called on the device in DOWN state Fixes: 508f2e3dce454 ("net: atlantic: split rx and tx per-queue stats") Signed-off-by: Sameer Saurabh <ssaurabh@marvell.com> Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29atlantic: Fix statistics logic for production hardwareDmitry Bogdanov
B0 is the main and widespread device revision of atlantic2 HW. In the current state, driver will incorrectly fetch the statistics for this revision. Fixes: 5cfd54d7dc186 ("net: atlantic: minimal A2 fw_ops") Signed-off-by: Dmitry Bogdanov <dbezrukov@marvell.com> Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29Remove Half duplex mode speed capabilities.Sameer Saurabh
Since Half Duplex mode has been deprecated by the firmware, driver should not advertise Half Duplex speed in ethtool support link speed values. Fixes: 071a02046c262 ("net: atlantic: A2: half duplex support") Signed-off-by: Sameer Saurabh <ssaurabh@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29atlantic: Add missing DIDs and fix 115c.Nikita Danilov
At the late production stages new dev ids were introduced. These are now in production, so its important for the driver to recognize these. And also fix the board caps for AQC115C adapter. Fixes: b3f0c79cba206 ("net: atlantic: A2 hw_ops skeleton") Signed-off-by: Nikita Danilov <ndanilov@aquantia.com> Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29atlantic: Fix to display FW bundle version instead of FW mac version.Sameer Saurabh
The correct way to reflect firmware version is to use bundle version. Hence populating the same instead of MAC fw version. Fixes: c1be0bf092bd2 ("net: atlantic: common functions needed for basic A2 init/deinit hw_ops") Signed-off-by: Sameer Saurabh <ssaurabh@marvell.com> Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29atlatnic: enable Nbase-t speeds with base-tNikita Danilov
When 2.5G is advertised, N-Base should be advertised against the T-base caps. N5G is out of use in baseline code and driver should treat both 5G and N5G (and also 2.5G and N2.5G) equally from user perspective. Fixes: 5cfd54d7dc186 ("net: atlantic: minimal A2 fw_ops") Signed-off-by: Nikita Danilov <ndanilov@aquantia.com> Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29atlantic: Increase delay for fw transactionsDmitry Bogdanov
The max waiting period (of 1 ms) while reading the data from FW shared buffer is too small for certain types of data (e.g., stats). There's a chance that FW could be updating buffer at the same time and driver would be unsuccessful in reading data. Firmware manual recommends to have 1 sec timeout to fix this issue. Fixes: 5cfd54d7dc186 ("net: atlantic: minimal A2 fw_ops") Signed-off-by: Dmitry Bogdanov <dbezrukov@marvell.com> Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29drm/vc4: kms: Fix previous HVS commit waitMaxime Ripard
Our current code is supposed to serialise the commits by waiting for all the drm_crtc_commits associated to the previous HVS state. However, assuming we have two CRTCs running and being configured and we configure each one alternately, we end up in a situation where we're not waiting at all. Indeed, starting with a state (state 0) where both CRTCs are running, and doing a commit (state 1) on the first CRTC (CRTC 0), we'll associate its commit to its assigned FIFO in vc4_hvs_state. If we get a new commit (state 2), this time affecting the second CRTC (CRTC 1), the DRM core will allow both commits to execute in parallel (assuming they don't have any share resources). Our code in vc4_atomic_commit_tail is supposed to make sure we only get one commit at a time and serialised by order of submission. It does so by using for_each_old_crtc_in_state, making sure that the CRTC has a FIFO assigned, is used, and has a commit pending. If it does, then we'll wait for the commit before going forward. During the transition from state 0 to state 1, as our old CRTC state we get the CRTC 0 state 0, its commit, we wait for it, everything works fine. During the transition from state 1 to state 2 though, the use of for_each_old_crtc_in_state is wrong. Indeed, while the code assumes it's returning the state of the CRTC in the old state (so CRTC 0 state 1), it actually returns the old state of the CRTC affected by the current commit, so CRTC 0 state 0 since it wasn't part of state 1. Due to this, if we alternate between the configuration of CRTC 0 and CRTC 1, we never actually wait for anything since we should be waiting on the other every time, but it never is affected by the previous commit. Change the logic to, at every commit, look at every FIFO in the previous HVS state, and if it's in use and has a commit associated to it, wait for that commit. Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-7-maxime@cerno.tech
2021-11-29drm/vc4: kms: Don't duplicate pending commitMaxime Ripard
Our HVS global state, when duplicated, will also copy the pointer to the drm_crtc_commit (and increase the reference count) for each FIFO if the pointer is not NULL. However, our atomic_setup function will overwrite that pointer without putting the reference back leading to a memory leak. Since the commit is only relevant during the atomic commit process, it doesn't make sense to duplicate the reference to the commit anyway. Let's remove it. Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-6-maxime@cerno.tech
2021-11-29drm/vc4: kms: Clear the HVS FIFO commit pointer once doneMaxime Ripard
Commit 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") introduced a wait on the previous commit done on a given HVS FIFO. However, we never cleared that pointer once done. Since drm_crtc_commit_put can free the drm_crtc_commit structure directly if we were the last user, this means that it can lead to a use-after free if we were to duplicate the state, and that stale pointer would even be copied to the new state. Set the pointer to NULL once we're done with the wait so that we don't carry over a pointer to a free'd structure. Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-5-maxime@cerno.tech
2021-11-29drm/vc4: kms: Add missing drm_crtc_commit_putMaxime Ripard
Commit 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") introduced a global state for the HVS, with each FIFO storing the current CRTC commit so that we can properly synchronize commits. However, the refcounting was off and we thus ended up leaking the drm_crtc_commit structure every commit. Add a drm_crtc_commit_put to prevent the leakage. Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-4-maxime@cerno.tech
2021-11-29drm/vc4: kms: Fix return code checkMaxime Ripard
The HVS global state functions return an error pointer, but in most cases we check if it's NULL, possibly resulting in an invalid pointer dereference. Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-3-maxime@cerno.tech
2021-11-29drm/vc4: kms: Wait for the commit before increasing our clock rateMaxime Ripard
Several DRM/KMS atomic commits can run in parallel if they affect different CRTC. These commits share the global HVS state, so we have some code to make sure we run commits in sequence. This synchronization code is one of the first thing that runs in vc4_atomic_commit_tail(). Another constraints we have is that we need to make sure the HVS clock gets a boost during the commit. That code relies on clk_set_min_rate and will remove the old minimum and set a new one. We also need another, temporary, minimum for the duration of the commit. The algorithm is thus to set a temporary minimum, drop the previous one, do the commit, and finally set the minimum for the current mode. However, the part that sets the temporary minimum and drops the older one runs before the commit synchronization code. Thus, under the proper conditions, we can end up mixing up the minimums and ending up with the wrong one for our current step. To avoid it, let's move the clock setup in the protected section. Fixes: d7d96c00e585 ("drm/vc4: hvs: Boost the core clock during modeset") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-2-maxime@cerno.tech
2021-11-29loop: Use pr_warn_once() for loop_control_remove() warningTetsuo Handa
kernel test robot reported that RCU stall via printk() flooding is possible [1] when stress testing. Link: https://lkml.kernel.org/r/20211129073709.GA18483@xsang-OptiPlex-9020 [1] Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29net/mlx4_en: Update reported link modes for 1/10GErik Ekman
When link modes were initially added in commit 2c762679435dc ("net/mlx4_en: Use PTYS register to query ethtool settings") and later updated for the new ethtool API in commit 3d8f7cc78d0eb ("net: mlx4: use new ETHTOOL_G/SSETTINGS API") the only 1/10G non-baseT link modes configured were 1000baseKX, 10000baseKX4 and 10000baseKR. It looks like these got picked to represent other modes since nothing better was available. Switch to using more specific link modes added in commit 5711a98221443 ("net: ethtool: add support for 1000BaseX and missing 10G link modes"). Tested with MCX311A-XCAT connected via DAC. Before: % sudo ethtool enp3s0 Settings for enp3s0: Supported ports: [ FIBRE ] Supported link modes: 1000baseKX/Full 10000baseKR/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: 1000baseKX/Full 10000baseKR/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 10000Mb/s Duplex: Full Auto-negotiation: off Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Supports Wake-on: d Wake-on: d Current message level: 0x00000014 (20) link ifdown Link detected: yes With this change: % sudo ethtool enp3s0 Settings for enp3s0: Supported ports: [ FIBRE ] Supported link modes: 1000baseX/Full 10000baseCR/Full 10000baseSR/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: 1000baseX/Full 10000baseCR/Full 10000baseSR/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 10000Mb/s Duplex: Full Auto-negotiation: off Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Supports Wake-on: d Wake-on: d Current message level: 0x00000014 (20) link ifdown Link detected: yes Tested-by: Michael Stapelberg <michael@stapelberg.ch> Signed-off-by: Erik Ekman <erik@kryo.se> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>