summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-06-05igc: Fix Energy Efficient Ethernet support declarationSasha Neftin
The commit 01cf893bf0f4 ("net: intel: i40e/igc: Remove setting Autoneg in EEE capabilities") removed SUPPORTED_Autoneg field but left inappropriate ethtool_keee structure initialization. When "ethtool --show <device>" (get_eee) invoke, the 'ethtool_keee' structure was accidentally overridden. Remove the 'ethtool_keee' overriding and add EEE declaration as per IEEE specification that allows reporting Energy Efficient Ethernet capabilities. Examples: Before fix: ethtool --show-eee enp174s0 EEE settings for enp174s0: EEE status: not supported After fix: EEE settings for enp174s0: EEE status: disabled Tx LPI: disabled Supported EEE link modes: 100baseT/Full 1000baseT/Full 2500baseT/Full Fixes: 01cf893bf0f4 ("net: intel: i40e/igc: Remove setting Autoneg in EEE capabilities") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-6-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05ice: map XDP queues to vectors in ice_vsi_map_rings_to_vectors()Larysa Zaremba
ice_pf_dcb_recfg() re-maps queues to vectors with ice_vsi_map_rings_to_vectors(), which does not restore the previous state for XDP queues. This leads to no AF_XDP traffic after rebuild. Map XDP queues to vectors in ice_vsi_map_rings_to_vectors(). Also, move the code around, so XDP queues are mapped independently only through .ndo_bpf(). Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-5-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05ice: add flag to distinguish reset from .ndo_bpf in XDP rings configLarysa Zaremba
Commit 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") has placed ice_vsi_free_q_vectors() after ice_destroy_xdp_rings() in the rebuild process. The behaviour of the XDP rings config functions is context-dependent, so the change of order has led to ice_destroy_xdp_rings() doing additional work and removing XDP prog, when it was supposed to be preserved. Also, dependency on the PF state reset flags creates an additional, fortunately less common problem: * PFR is requested e.g. by tx_timeout handler * .ndo_bpf() is asked to delete the program, calls ice_destroy_xdp_rings(), but reset flag is set, so rings are destroyed without deleting the program * ice_vsi_rebuild tries to delete non-existent XDP rings, because the program is still on the VSI * system crashes With a similar race, when requested to attach a program, ice_prepare_xdp_rings() can actually skip setting the program in the VSI and nevertheless report success. Instead of reverting to the old order of function calls, add an enum argument to both ice_prepare_xdp_rings() and ice_destroy_xdp_rings() in order to distinguish between calls from rebuild and .ndo_bpf(). Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-4-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05ice: remove af_xdp_zc_qps bitmapLarysa Zaremba
Referenced commit has introduced a bitmap to distinguish between ZC and copy-mode AF_XDP queues, because xsk_get_pool_from_qid() does not do this for us. The bitmap would be especially useful when restoring previous state after rebuild, if only it was not reallocated in the process. This leads to e.g. xdpsock dying after changing number of queues. Instead of preserving the bitmap during the rebuild, remove it completely and distinguish between ZC and copy-mode queues based on the presence of a device associated with the pool. Fixes: e102db780e1c ("ice: track AF_XDP ZC enabled queues in bitmap") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-3-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05ice: fix reads from NVM Shadow RAM on E830 and E825-C devicesJacob Keller
The ice driver reads data from the Shadow RAM portion of the NVM during initialization, including data used to identify the NVM image and device, such as the ETRACK ID used to populate devlink dev info fw.bundle. Currently it is using a fixed offset defined by ICE_CSS_HEADER_LENGTH to compute the appropriate offset. This worked fine for E810 and E822 devices which both have CSS header length of 330 words. Other devices, including both E825-C and E830 devices have different sizes for their CSS header. The use of a hard coded value results in the driver reading from the wrong block in the NVM when attempting to access the Shadow RAM copy. This results in the driver reporting the fw.bundle as 0x0 in both the devlink dev info and ethtool -i output. The first E830 support was introduced by commit ba20ecb1d1bb ("ice: Hook up 4 E830 devices by adding their IDs") and the first E825-C support was introducted by commit f64e18944233 ("ice: introduce new E825C devices family") The NVM actually contains the CSS header length embedded in it. Remove the hard coded value and replace it with logic to read the length from the NVM directly. This is more resilient against all existing and future hardware, vs looking up the expected values from a table. It ensures the driver will read from the appropriate place when determining the ETRACK ID value used for populating the fw.bundle_id and for reporting in ethtool -i. The CSS header length for both the active and inactive flash bank is stored in the ice_bank_info structure to avoid unnecessary duplicate work when accessing multiple words of the Shadow RAM. Both banks are read in the unlikely event that the header length is different for the NVM in the inactive bank, rather than being different only by the overall device family. Fixes: ba20ecb1d1bb ("ice: Hook up 4 E830 devices by adding their IDs") Co-developed-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-2-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05ice: fix iteration of TLVs in Preserved Fields AreaJacob Keller
The ice_get_pfa_module_tlv() function iterates over the Type-Length-Value structures in the Preserved Fields Area (PFA) of the NVM. This is used by the driver to access data such as the Part Board Assembly identifier. The function uses simple logic to iterate over the PFA. First, the pointer to the PFA in the NVM is read. Then the total length of the PFA is read from the first word. A pointer to the first TLV is initialized, and a simple loop iterates over each TLV. The pointer is moved forward through the NVM until it exceeds the PFA area. The logic seems sound, but it is missing a key detail. The Preserved Fields Area length includes one additional final word. This is documented in the device data sheet as a dummy word which contains 0xFFFF. All NVMs have this extra word. If the driver tries to scan for a TLV that is not in the PFA, it will read past the size of the PFA. It reads and interprets the last dummy word of the PFA as a TLV with type 0xFFFF. It then reads the word following the PFA as a length. The PFA resides within the Shadow RAM portion of the NVM, which is relatively small. All of its offsets are within a 16-bit size. The PFA pointer and TLV pointer are stored by the driver as 16-bit values. In almost all cases, the word following the PFA will be such that interpreting it as a length will result in 16-bit arithmetic overflow. Once overflowed, the new next_tlv value is now below the maximum offset of the PFA. Thus, the driver will continue to iterate the data as TLVs. In the worst case, the driver hits on a sequence of reads which loop back to reading the same offsets in an endless loop. To fix this, we need to correct the loop iteration check to account for this extra word at the end of the PFA. This alone is sufficient to resolve the known cases of this issue in the field. However, it is plausible that an NVM could be misconfigured or have corrupt data which results in the same kind of overflow. Protect against this by using check_add_overflow when calculating both the maximum offset of the TLVs, and when calculating the next_tlv offset at the end of each loop iteration. This ensures that the driver will not get stuck in an infinite loop when scanning the PFA. Fixes: e961b679fb0b ("ice: add board identifier info to devlink .info_get") Co-developed-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-1-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05drm/vmwgfx: remove unused struct 'vmw_stdu_dma'Dr. David Alan Gilbert
'vmw_stdu_dma' is unused since commit 39985eea5a6d ("drm/vmwgfx: Abstract placement selection") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240517232858.230860-1-linux@treblig.org
2024-06-05nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errorsRyusuke Konishi
The error handling in nilfs_empty_dir() when a directory folio/page read fails is incorrect, as in the old ext2 implementation, and if the folio/page cannot be read or nilfs_check_folio() fails, it will falsely determine the directory as empty and corrupt the file system. In addition, since nilfs_empty_dir() does not immediately return on a failed folio/page read, but continues to loop, this can cause a long loop with I/O if i_size of the directory's inode is also corrupted, causing the log writer thread to wait and hang, as reported by syzbot. Fix these issues by making nilfs_empty_dir() immediately return a false value (0) if it fails to get a directory folio/page. Link: https://lkml.kernel.org/r/20240604134255.7165-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+c8166c541d3971bf6c87@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c8166c541d3971bf6c87 Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm: fix xyz_noprof functions calling profiled functionsSuren Baghdasaryan
Grepping /proc/allocinfo for "noprof" reveals several xyz_noprof functions, which means internally they are calling profiled functions. This should never happen as such calls move allocation charge from a higher level location where it should be accounted for into these lower level helpers. Fix this by replacing profiled function calls with noprof ones. Link: https://lkml.kernel.org/r/20240531205350.3973009-1-surenb@google.com Fixes: b951aaff5035 ("mm: enable page allocation tagging") Fixes: e26d8769da6d ("mempool: hook up to memory allocation profiling") Fixes: 88ae5fb755b0 ("mm: vmalloc: enable memory allocation profiling") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Kees Cook <kees@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05codetag: avoid race at alloc_slab_obj_extsThadeu Lima de Souza Cascardo
When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, the following warning may be noticed: [ 48.299584] ------------[ cut here ]------------ [ 48.300092] alloc_tag was not set [ 48.300528] WARNING: CPU: 2 PID: 1361 at include/linux/alloc_tag.h:130 alloc_tagging_slab_free_hook+0x84/0xc7 [ 48.301305] Modules linked in: [ 48.301553] CPU: 2 PID: 1361 Comm: systemd-udevd Not tainted 6.10.0-rc1-00003-gac8755535862 #176 [ 48.302196] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 48.302752] RIP: 0010:alloc_tagging_slab_free_hook+0x84/0xc7 [ 48.303169] Code: 8d 1c c4 48 85 db 74 4d 48 83 3b 00 75 1e 80 3d 65 02 86 04 00 75 15 48 c7 c7 11 48 1d 85 c6 05 55 02 86 04 01 e8 64 44 a5 ff <0f> 0b 48 8b 03 48 85 c0 74 21 48 83 f8 01 74 14 48 8b 50 20 48 f7 [ 48.304411] RSP: 0018:ffff8880111b7d40 EFLAGS: 00010282 [ 48.304916] RAX: 0000000000000000 RBX: ffff88800fcc9008 RCX: 0000000000000000 [ 48.305455] RDX: 0000000080000000 RSI: ffff888014060000 RDI: ffffed1002236f97 [ 48.305979] RBP: 0000000000001100 R08: fffffbfff0aa73a1 R09: 0000000000000000 [ 48.306473] R10: ffffffff814515e5 R11: 0000000000000003 R12: ffff88800fcc9000 [ 48.306943] R13: ffff88800b2e5cc0 R14: ffff8880111b7d90 R15: 0000000000000000 [ 48.307529] FS: 00007faf5d1908c0(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000 [ 48.308223] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.308710] CR2: 000058fb220c9118 CR3: 00000000110cc000 CR4: 0000000000750ef0 [ 48.309274] PKRU: 55555554 [ 48.309804] Call Trace: [ 48.310029] <TASK> [ 48.310290] ? show_regs+0x84/0x8d [ 48.310722] ? alloc_tagging_slab_free_hook+0x84/0xc7 [ 48.311298] ? __warn+0x13b/0x2ff [ 48.311580] ? alloc_tagging_slab_free_hook+0x84/0xc7 [ 48.311987] ? report_bug+0x2ce/0x3ab [ 48.312292] ? handle_bug+0x8c/0x107 [ 48.312563] ? exc_invalid_op+0x34/0x6f [ 48.312842] ? asm_exc_invalid_op+0x1a/0x20 [ 48.313173] ? this_cpu_in_panic+0x1c/0x72 [ 48.313503] ? alloc_tagging_slab_free_hook+0x84/0xc7 [ 48.313880] ? putname+0x143/0x14e [ 48.314152] kmem_cache_free+0xe9/0x214 [ 48.314454] putname+0x143/0x14e [ 48.314712] do_unlinkat+0x413/0x45e [ 48.315001] ? __pfx_do_unlinkat+0x10/0x10 [ 48.315388] ? __check_object_size+0x4d7/0x525 [ 48.315744] ? __sanitizer_cov_trace_pc+0x20/0x4a [ 48.316167] ? __sanitizer_cov_trace_pc+0x20/0x4a [ 48.316757] ? getname_flags+0x4ed/0x500 [ 48.317261] __x64_sys_unlink+0x42/0x4a [ 48.317741] do_syscall_64+0xe2/0x149 [ 48.318171] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 48.318602] RIP: 0033:0x7faf5d8850ab [ 48.318891] Code: fd ff ff e8 27 dd 01 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 5f 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 41 2d 0e 00 f7 d8 [ 48.320649] RSP: 002b:00007ffc44982b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000057 [ 48.321182] RAX: ffffffffffffffda RBX: 00005ba344a44680 RCX: 00007faf5d8850ab [ 48.321667] RDX: 0000000000000000 RSI: 00005ba344a44430 RDI: 00007ffc44982b40 [ 48.322139] RBP: 00007ffc44982c00 R08: 0000000000000000 R09: 0000000000000007 [ 48.322598] R10: 00005ba344a44430 R11: 0000000000000246 R12: 0000000000000000 [ 48.323071] R13: 00007ffc44982b40 R14: 0000000000000000 R15: 0000000000000000 [ 48.323596] </TASK> This is due to a race when two objects are allocated from the same slab, which did not have an obj_exts allocated for. In such a case, the two threads will notice the NULL obj_exts and after one assigns slab->obj_exts, the second one will happily do the exchange if it reads this new assigned value. In order to avoid that, verify that the read obj_exts does not point to an allocated obj_exts before doing the exchange. Link: https://lkml.kernel.org/r/20240527183007.1595037-1-cascardo@igalia.com Fixes: 09c46563ff6d ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm/hugetlb: do not call vma_add_reservation upon ENOMEMOscar Salvador
sysbot reported a splat [1] on __unmap_hugepage_range(). This is because vma_needs_reservation() can return -ENOMEM if allocate_file_region_entries() fails to allocate the file_region struct for the reservation. Check for that and do not call vma_add_reservation() if that is the case, otherwise region_abort() and region_del() will see that we do not have any file_regions. If we detect that vma_needs_reservation() returned -ENOMEM, we clear the hugetlb_restore_reserve flag as if this reservation was still consumed, so free_huge_folio() will not increment the resv count. [1] https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/T/#ma5983bc1ab18a54910da83416b3f89f3c7ee43aa Link: https://lkml.kernel.org/r/20240528205323.20439-1-osalvador@suse.de Fixes: df7a6d1f6405 ("mm/hugetlb: restore the reservation if needed") Signed-off-by: Oscar Salvador <osalvador@suse.de> Reported-and-tested-by: syzbot+d3fe2dc5ffe9380b714b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/ Cc: Breno Leitao <leitao@debian.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm/ksm: fix ksm_zero_pages accountingChengming Zhou
We normally ksm_zero_pages++ in ksmd when page is merged with zero page, but ksm_zero_pages-- is done from page tables side, where there is no any accessing protection of ksm_zero_pages. So we can read very exceptional value of ksm_zero_pages in rare cases, such as -1, which is very confusing to users. Fix it by changing to use atomic_long_t, and the same case with the mm->ksm_zero_pages. Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-2-34bb358fdc13@linux.dev Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM") Fixes: 6080d19f0704 ("ksm: add ksm zero pages for each process") Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Stefan Roesch <shr@devkernel.io> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm/ksm: fix ksm_pages_scanned accountingChengming Zhou
Patch series "mm/ksm: fix some accounting problems", v3. We encountered some abnormal ksm_pages_scanned and ksm_zero_pages during some random tests. 1. ksm_pages_scanned unchanged even ksmd scanning has progress. 2. ksm_zero_pages maybe -1 in some rare cases. This patch (of 2): During testing, I found ksm_pages_scanned is unchanged although the scan_get_next_rmap_item() did return valid rmap_item that is not NULL. The reason is the scan_get_next_rmap_item() will return NULL after a full scan, so ksm_do_scan() just return without accounting of the ksm_pages_scanned. Fix it by just putting ksm_pages_scanned accounting in that loop, and it will be accounted more timely if that loop would last for a long time. Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-0-34bb358fdc13@linux.dev Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-1-34bb358fdc13@linux.dev Fixes: b348b5fe2b5f ("mm/ksm: add pages scanned metric") Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: xu xin <xu.xin16@zte.com.cn> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Stefan Roesch <shr@devkernel.io> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05kmsan: do not wipe out origin when doing partial unpoisoningAlexander Potapenko
As noticed by Brian, KMSAN should not be zeroing the origin when unpoisoning parts of a four-byte uninitialized value, e.g.: char a[4]; kmsan_unpoison_memory(a, 1); This led to false negatives, as certain poisoned values could receive zero origins, preventing those values from being reported. To fix the problem, check that kmsan_internal_set_shadow_origin() writes zero origins only to slots which have zero shadow. Link: https://lkml.kernel.org/r/20240528104807.738758-1-glider@google.com Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com> Link: https://lore.kernel.org/lkml/20240524232804.1984355-1-bjohannesmeyer@gmail.com/T/ Reviewed-by: Marco Elver <elver@google.com> Tested-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05vmalloc: check CONFIG_EXECMEM in is_vmalloc_or_module_addr()Cong Wang
After commit 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of") CONFIG_BPF_JIT does not depend on CONFIG_MODULES any more and bpf jit also uses the [MODULES_VADDR, MODULES_END] memory region. But is_vmalloc_or_module_addr() still checks CONFIG_MODULES, which then returns false for a bpf jit memory region when CONFIG_MODULES is not defined. It leads to the following kernel BUG: [ 1.567023] ------------[ cut here ]------------ [ 1.567883] kernel BUG at mm/vmalloc.c:745! [ 1.568477] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 1.569367] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0+ #448 [ 1.570247] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 [ 1.570786] RIP: 0010:vmalloc_to_page+0x48/0x1ec [ 1.570786] Code: 0f 00 00 e8 eb 1a 05 00 b8 37 00 00 00 48 ba fe ff ff ff ff 1f 00 00 4c 03 25 76 49 c6 02 48 c1 e0 28 48 01 e8 48 39 d0 76 02 <0f> 0b 4c 89 e7 e8 bf 1a 05 00 49 8b 04 24 48 a9 9f ff ff ff 0f 84 [ 1.570786] RSP: 0018:ffff888007787960 EFLAGS: 00010212 [ 1.570786] RAX: 000036ffa0000000 RBX: 0000000000000640 RCX: ffffffff8147e93c [ 1.570786] RDX: 00001ffffffffffe RSI: dffffc0000000000 RDI: ffffffff840e32c8 [ 1.570786] RBP: ffffffffa0000000 R08: 0000000000000000 R09: 0000000000000000 [ 1.570786] R10: ffff888007787a88 R11: ffffffff8475d8e7 R12: ffffffff83e80ff8 [ 1.570786] R13: 0000000000000640 R14: 0000000000000640 R15: 0000000000000640 [ 1.570786] FS: 0000000000000000(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000 [ 1.570786] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.570786] CR2: ffff888006a01000 CR3: 0000000003e80000 CR4: 0000000000350ef0 [ 1.570786] Call Trace: [ 1.570786] <TASK> [ 1.570786] ? __die_body+0x1b/0x58 [ 1.570786] ? die+0x31/0x4b [ 1.570786] ? do_trap+0x9d/0x138 [ 1.570786] ? vmalloc_to_page+0x48/0x1ec [ 1.570786] ? do_error_trap+0xcd/0x102 [ 1.570786] ? vmalloc_to_page+0x48/0x1ec [ 1.570786] ? vmalloc_to_page+0x48/0x1ec [ 1.570786] ? handle_invalid_op+0x2f/0x38 [ 1.570786] ? vmalloc_to_page+0x48/0x1ec [ 1.570786] ? exc_invalid_op+0x2b/0x41 [ 1.570786] ? asm_exc_invalid_op+0x16/0x20 [ 1.570786] ? vmalloc_to_page+0x26/0x1ec [ 1.570786] ? vmalloc_to_page+0x48/0x1ec [ 1.570786] __text_poke+0xb6/0x458 [ 1.570786] ? __pfx_text_poke_memcpy+0x10/0x10 [ 1.570786] ? __pfx___mutex_lock+0x10/0x10 [ 1.570786] ? __pfx___text_poke+0x10/0x10 [ 1.570786] ? __pfx_get_random_u32+0x10/0x10 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] text_poke_copy_locked+0x70/0x84 [ 1.570786] text_poke_copy+0x32/0x4f [ 1.570786] bpf_arch_text_copy+0xf/0x27 [ 1.570786] bpf_jit_binary_pack_finalize+0x26/0x5a [ 1.570786] bpf_int_jit_compile+0x576/0x8ad [ 1.570786] ? __pfx_bpf_int_jit_compile+0x10/0x10 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] ? __kmalloc_node_track_caller+0x2b5/0x2e0 [ 1.570786] bpf_prog_select_runtime+0x7c/0x199 [ 1.570786] bpf_prepare_filter+0x1e9/0x25b [ 1.570786] ? __pfx_bpf_prepare_filter+0x10/0x10 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] ? _find_next_bit+0x29/0x7e [ 1.570786] bpf_prog_create+0xb8/0xe0 [ 1.570786] ptp_classifier_init+0x75/0xa1 [ 1.570786] ? __pfx_ptp_classifier_init+0x10/0x10 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] ? register_pernet_subsys+0x36/0x42 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] sock_init+0x99/0xa3 [ 1.570786] ? __pfx_sock_init+0x10/0x10 [ 1.570786] do_one_initcall+0x104/0x2c4 [ 1.570786] ? __pfx_do_one_initcall+0x10/0x10 [ 1.570786] ? parameq+0x25/0x2d [ 1.570786] ? rcu_is_watching+0x1c/0x3c [ 1.570786] ? trace_kmalloc+0x81/0xb2 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] ? __kmalloc+0x29c/0x2c7 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] do_initcalls+0xf9/0x123 [ 1.570786] kernel_init_freeable+0x24f/0x289 [ 1.570786] ? __pfx_kernel_init+0x10/0x10 [ 1.570786] kernel_init+0x19/0x13a [ 1.570786] ret_from_fork+0x24/0x41 [ 1.570786] ? __pfx_kernel_init+0x10/0x10 [ 1.570786] ret_from_fork_asm+0x1a/0x30 [ 1.570786] </TASK> [ 1.570819] ---[ end trace 0000000000000000 ]--- [ 1.571463] RIP: 0010:vmalloc_to_page+0x48/0x1ec [ 1.572111] Code: 0f 00 00 e8 eb 1a 05 00 b8 37 00 00 00 48 ba fe ff ff ff ff 1f 00 00 4c 03 25 76 49 c6 02 48 c1 e0 28 48 01 e8 48 39 d0 76 02 <0f> 0b 4c 89 e7 e8 bf 1a 05 00 49 8b 04 24 48 a9 9f ff ff ff 0f 84 [ 1.574632] RSP: 0018:ffff888007787960 EFLAGS: 00010212 [ 1.575129] RAX: 000036ffa0000000 RBX: 0000000000000640 RCX: ffffffff8147e93c [ 1.576097] RDX: 00001ffffffffffe RSI: dffffc0000000000 RDI: ffffffff840e32c8 [ 1.577084] RBP: ffffffffa0000000 R08: 0000000000000000 R09: 0000000000000000 [ 1.578077] R10: ffff888007787a88 R11: ffffffff8475d8e7 R12: ffffffff83e80ff8 [ 1.578810] R13: 0000000000000640 R14: 0000000000000640 R15: 0000000000000640 [ 1.579823] FS: 0000000000000000(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000 [ 1.580992] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.581869] CR2: ffff888006a01000 CR3: 0000000003e80000 CR4: 0000000000350ef0 [ 1.582800] Kernel panic - not syncing: Fatal exception [ 1.583765] ---[ end Kernel panic - not syncing: Fatal exception ]--- Fix this by checking CONFIG_EXECMEM instead. Link: https://lkml.kernel.org/r/20240528160838.102223-1-xiyou.wangcong@gmail.com Fixes: 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm: page_alloc: fix highatomic typing in multi-block buddiesJohannes Weiner
Christoph reports a page allocator splat triggered by xfstests: generic/176 214s ... [ 1204.507931] run fstests generic/176 at 2024-05-27 12:52:30 XFS (nvme0n1): Mounting V5 Filesystem cd936307-415f-48a3-b99d-a2d52ae1f273 XFS (nvme0n1): Ending clean mount XFS (nvme1n1): Mounting V5 Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850 XFS (nvme1n1): Ending clean mount XFS (nvme1n1): Unmounting Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850 XFS (nvme1n1): Mounting V5 Filesystem 7099b02d-9c58-4d1d-be1d-2cc472d12cd9 XFS (nvme1n1): Ending clean mount ------------[ cut here ]------------ page type is 3, passed migratetype is 1 (nr=512) WARNING: CPU: 0 PID: 509870 at mm/page_alloc.c:645 expand+0x1c5/0x1f0 Modules linked in: i2c_i801 crc32_pclmul i2c_smbus [last unloaded: scsi_debug] CPU: 0 PID: 509870 Comm: xfs_io Not tainted 6.10.0-rc1+ #2437 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:expand+0x1c5/0x1f0 Code: 05 16 70 bf 02 01 e8 ca fc ff ff 8b 54 24 34 44 89 e1 48 c7 c7 80 a2 28 83 48 89 c6 b8 01 00 3 RSP: 0018:ffffc90003b2b968 EFLAGS: 00010082 RAX: 0000000000000000 RBX: ffffffff83fa9480 RCX: 0000000000000000 RDX: 0000000000000005 RSI: 0000000000000027 RDI: 00000000ffffffff RBP: 00000000001f2600 R08: 00000000fffeffff R09: 0000000000000001 R10: 0000000000000000 R11: ffffffff83676200 R12: 0000000000000009 R13: 0000000000000200 R14: 0000000000000001 R15: ffffea0007c98000 FS: 00007f72ca3d5780(0000) GS:ffff8881f9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f72ca1fff38 CR3: 00000001aa0c6002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x7b/0x120 ? expand+0x1c5/0x1f0 ? report_bug+0x191/0x1c0 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? expand+0x1c5/0x1f0 ? expand+0x1c5/0x1f0 __rmqueue_pcplist+0x3a9/0x730 get_page_from_freelist+0x7a0/0xf00 __alloc_pages_noprof+0x153/0x2e0 __folio_alloc_noprof+0x10/0xa0 __filemap_get_folio+0x16b/0x370 iomap_write_begin+0x496/0x680 While trying to service a movable allocation (page type 1), the page allocator runs into a two-pageblock buddy on the movable freelist whose second block is typed as highatomic (page type 3). This inconsistency is caused by the highatomic reservation system operating on single pageblocks, while MAX_ORDER can be bigger than that - in this configuration, pageblock_order is 9 while MAX_PAGE_ORDER is 10. The test case is observed to make several adjacent order-3 requests with __GFP_DIRECT_RECLAIM cleared, which marks the surrounding block as highatomic. Upon freeing, the blocks merge into an order-10 buddy. When the highatomic pool is drained later on, this order-10 buddy gets moved back to the movable list, but only the first pageblock is marked movable again. A subsequent expand() of this buddy warns about the tail being of a different type. This is a long-standing bug that's surfaced by the recent block type warnings added to the allocator. The consequences seem mostly benign, it just results in odd behavior: the highatomic tail blocks are not properly drained, instead they end up on the movable list first, then go back to the highatomic list after an alloc-free cycle. To fix this, make the highatomic reservation code aware that allocations/buddies can be larger than a pageblock. While it's an old quirk, the recently added type consistency warnings seem to be the most prominent consequence of it. Set the Fixes: tag accordingly to highlight this backporting dependency. Link: https://lkml.kernel.org/r/20240530114203.GA1222079@cmpxchg.org Fixes: e0932b6c1f94 ("mm: page_alloc: consolidate free page accounting") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Tested-by: Christoph Hellwig <hch@lst.de> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05nilfs2: fix potential kernel bug due to lack of writeback flag waitingRyusuke Konishi
Destructive writes to a block device on which nilfs2 is mounted can cause a kernel bug in the folio/page writeback start routine or writeback end routine (__folio_start_writeback in the log below): kernel BUG at mm/page-writeback.c:3070! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI ... RIP: 0010:__folio_start_writeback+0xbaa/0x10e0 Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f> 0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00 ... Call Trace: <TASK> nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2] nilfs_segctor_construct+0x181/0x6b0 [nilfs2] nilfs_segctor_thread+0x548/0x11c0 [nilfs2] kthread+0x2f0/0x390 ret_from_fork+0x4b/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> This is because when the log writer starts a writeback for segment summary blocks or a super root block that use the backing device's page cache, it does not wait for the ongoing folio/page writeback, resulting in an inconsistent writeback state. Fix this issue by waiting for ongoing writebacks when putting folios/pages on the backing device into writeback state. Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com Fixes: 9ff05123e3bf ("nilfs2: segment constructor") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05memcg: remove the lockdep assert from __mod_objcg_mlstate()Sebastian Andrzej Siewior
The assert was introduced in the commit cited below as an insurance that the semantic is the same after the local_irq_save() has been removed and the function has been made static. The original requirement to disable interrupt was due the modification of per-CPU counters which require interrupts to be disabled because the counter update operation is not atomic and some of the counters are updated from interrupt context. All callers of __mod_objcg_mlstate() acquire a lock (memcg_stock.stock_lock) which disables interrupts on !PREEMPT_RT and the lockdep assert is satisfied. On PREEMPT_RT the interrupts are not disabled and the assert triggers. The safety of the counter update is already ensured by VM_WARN_ON_IRQS_ENABLED() which is part of __mod_memcg_lruvec_state() and does not require yet another check. Remove the lockdep assert from __mod_objcg_mlstate(). Link: https://lkml.kernel.org/r/20240528141341.rz_rytN_@linutronix.de Fixes: 91882c1617c1 ("memcg: simple cleanup of stats update functions") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptesBarry Song
We are passing a huge nr to __clear_young_dirty_ptes() right now. While we should pass the number of pages, we are actually passing CONT_PTE_SIZE. This is causing lots of crashes of MADV_FREE, panic oops could vary everytime. Link: https://lkml.kernel.org/r/20240524005444.135417-1-21cnbao@gmail.com Fixes: 89e86854fb0a ("mm/arm64: override clear_young_dirty_ptes() batch helper") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Lance Yang <ioworker0@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Barry Song <21cnbao@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Jeff Xie <xiehuan09@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm: huge_mm: fix undefined reference to `mthp_stats' for CONFIG_SYSFS=nBarry Song
if CONFIG_SYSFS is not enabled in config, we get the below error, All errors (new ones prefixed by >>): s390-linux-ld: mm/memory.o: in function `count_mthp_stat': >> include/linux/huge_mm.h:285:(.text+0x191c): undefined reference to `mthp_stats' s390-linux-ld: mm/huge_memory.o:(.rodata+0x10): undefined reference to `mthp_stats' vim +285 include/linux/huge_mm.h 279 280 static inline void count_mthp_stat(int order, enum mthp_stat_item item) 281 { 282 if (order <= 0 || order > PMD_ORDER) 283 return; 284 > 285 this_cpu_inc(mthp_stats.stats[order][item]); 286 } 287 Link: https://lkml.kernel.org/r/20240523210045.40444-1-21cnbao@gmail.com Fixes: ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405231728.tCAogiSI-lkp@intel.com/ Signed-off-by: Barry Song <v-songbaohua@oppo.com> Tested-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm: drop the 'anon_' prefix for swap-out mTHP countersBaolin Wang
The mTHP swap related counters: 'anon_swpout' and 'anon_swpout_fallback' are confusing with an 'anon_' prefix, since the shmem can swap out non-anonymous pages. So drop the 'anon_' prefix to keep consistent with the old swap counter names. This is needed in 6.10-rcX to avoid having an inconsistent ABI out in the field. Link: https://lkml.kernel.org/r/7a8989c13299920d7589007a30065c3e2c19f0e0.1716431702.git.baolin.wang@linux.alibaba.com Fixes: d0f048ac39f6 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters") Fixes: 42248b9d34ea ("mm: add docs for per-order mTHP counters and transhuge_page ABI") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Suggested-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05drm/vmwgfx: Don't destroy Screen Target when CRTC is enabled but inactiveIan Forbes
drm_crtc_helper_funcs::atomic_disable can be called even when the CRTC is still enabled. This can occur when the mode changes or the CRTC is set as inactive. In the case where the CRTC is being set as inactive we only want to blank the screen. The Screen Target should remain intact as long as the mode has not changed and CRTC is enabled. This fixes a bug with GDM where locking the screen results in a permanent black screen because the Screen Target is no longer defined. Fixes: 7b0062036c3b ("drm/vmwgfx: Implement virtual crc generation") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Martin Krastev <martin.krastev@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240531203358.26677-1-ian.forbes@broadcom.com
2024-06-05drm/vmwgfx: Standardize use of kibibytes when loggingIan Forbes
Use the same standard abbreviation KiB instead of incorrect variants. Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-5-ian.forbes@broadcom.com
2024-06-05drm/vmwgfx: Remove STDU logic from generic mode_valid functionIan Forbes
STDU has its own mode_valid function now so this logic can be removed from the generic version. Fixes: 935f795045a6 ("drm/vmwgfx: Refactor drm connector probing for display modes") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-4-ian.forbes@broadcom.com
2024-06-05drm/vmwgfx: 3D disabled should not effect STDU memory limitsIan Forbes
This limit became a hard cap starting with the change referenced below. Surface creation on the device will fail if the requested size is larger than this limit so altering the value arbitrarily will expose modes that are too large for the device's hard limits. Fixes: 7ebb47c9f9ab ("drm/vmwgfx: Read new register for GB memory when available") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-3-ian.forbes@broadcom.com
2024-06-05drm/vmwgfx: Filter modes which exceed graphics memoryIan Forbes
SVGA requires individual surfaces to fit within graphics memory (max_mob_pages) which means that modes with a final buffer size that would exceed graphics memory must be pruned otherwise creation will fail. Additionally llvmpipe requires its buffer height and width to be a multiple of its tile size which is 64. As a result we have to anticipate that llvmpipe will round up the mode size passed to it by the compositor when it creates buffers and filter modes where this rounding exceeds graphics memory. This fixes an issue where VMs with low graphics memory (< 64MiB) configured with high resolution mode boot to a black screen because surface creation fails. Fixes: d947d1b71deb ("drm/vmwgfx: Add and connect connector helper function") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-2-ian.forbes@broadcom.com
2024-06-05Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-05 We've added 8 non-merge commits during the last 6 day(s) which contain a total of 9 files changed, 34 insertions(+), 35 deletions(-). The main changes are: 1) Fix a potential use-after-free in bpf_link_free when the link uses dealloc_deferred to free the link object but later still tests for presence of link->ops->dealloc, from Cong Wang. 2) Fix BPF test infra to set the run context for rawtp test_run callback where syzbot reported a crash, from Jiri Olsa. 3) Fix bpf_session_cookie BTF_ID in the special_kfunc_set list to exclude it for the case of !CONFIG_FPROBE, also from Jiri Olsa. 4) Fix a Coverity static analysis report to not close() a link_fd of -1 in the multi-uprobe feature detector, from Andrii Nakryiko. 5) Revert support for redirect to any xsk socket bound to the same umem as it can result in corrupted ring state which can lead to a crash when flushing rings. A different approach will be pursued for bpf-next to address it safely, from Magnus Karlsson. 6) Fix inet_csk_accept prototype in test_sk_storage_tracing.c which caused BPF CI failure after the last tree fast forwarding, from Andrii Nakryiko. 7) Fix a coccicheck warning in BPF devmap that iterator variable cannot be NULL, from Thorsten Blum. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: Revert "xsk: Document ability to redirect to any socket bound to the same umem" Revert "xsk: Support redirect to any socket bound to the same umem" bpf: Set run context for rawtp test_run callback bpf: Fix a potential use-after-free in bpf_link_free() bpf, devmap: Remove unnecessary if check in for loop libbpf: don't close(-1) in multi-uprobe feature detector bpf: Fix bpf_session_cookie BTF_ID in special_kfunc_set list selftests/bpf: fix inet_csk_accept prototype in test_sk_storage_tracing.c ==================== Link: https://lore.kernel.org/r/20240605091525.22628-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-06Merge tag 'drm-xe-fixes-2024-06-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - drm/xe/pf: Update the LMTT when freeing VF GT config Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zl8uFrQp0YjTtX4p@fedora
2024-06-05ptp: Fix error message on failed pin verificationKarol Kolacinski
On failed verification of PTP clock pin, error message prints channel number instead of pin index after "pin", which is incorrect. Fix error message by adding channel number to the message and printing pin number instead of channel number. Fixes: 6092315dfdec ("ptp: introduce programmable pins.") Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/20240604120555.16643-1-karol.kolacinski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05net/sched: taprio: always validate TCA_TAPRIO_ATTR_PRIOMAPEric Dumazet
If one TCA_TAPRIO_ATTR_PRIOMAP attribute has been provided, taprio_parse_mqprio_opt() must validate it, or userspace can inject arbitrary data to the kernel, the second time taprio_change() is called. First call (with valid attributes) sets dev->num_tc to a non zero value. Second call (with arbitrary mqprio attributes) returns early from taprio_parse_mqprio_opt() and bad things can happen. Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") Reported-by: Noam Rathaus <noamr@ssd-disclosure.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20240604181511.769870-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05Merge tag 'thermal-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "Fix issues related to the handling of invalid trip points in the thermal core and in the thermal debug code that have been overlooked by some recent thermal control core changes" * tag 'thermal-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: trip: Trigger trip down notifications when trips involved in mitigation become invalid thermal: core: Introduce thermal_trip_crossed() thermal/debugfs: Allow tze_seq_show() to print statistics for invalid trips thermal/debugfs: Print initial trip temperature and hysteresis in tze_seq_show()
2024-06-05Merge tag 'acpi-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix the ACPI EC and AC drivers, the ACPI APEI error injection driver and build issues related to the dev_is_pnp() macro referring to pnp_bus_type that is not exported to modules. Specifics: - Fix error handling during EC operation region accesses in the ACPI EC driver (Armin Wolf) - Fix a memory leak in the APEI error injection driver introduced during its converion to a platform driver (Dan Williams) - Fix build failures related to the dev_is_pnp() macro by redefining it as a proper function and exporting it to modules as appropriate and unexport pnp_bus_type which need not be exported any more (Andy Shevchenko) - Update the ACPI AC driver to use power_supply_changed() to let the power supply core handle configuration changes properly (Thomas Weißschuh)" * tag 'acpi-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: AC: Properly notify powermanagement core about changes PNP: Hide pnp_bus_type from the non-PNP code PNP: Make dev_is_pnp() to be a function and export it for modules ACPI: EC: Avoid returning AE_OK on errors in address space handler ACPI: EC: Abort address space access upon error ACPI: APEI: EINJ: Fix einj_dev release leak
2024-06-05Merge tag 'pm-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix the intel_pstate and amd-pstate cpufreq drivers and the cpupower utility. Specifics: - Fix a recently introduced unchecked HWP MSR access in the intel_pstate driver (Srinivas Pandruvada) - Add missing conversion from MHz to KHz to amd_pstate_set_boost() to address sysfs inteface inconsistency and fix P-state frequency reporting on AMD Family 1Ah CPUs in the cpupower utility (Dhananjay Ugwekar) - Get rid of an excess global header file used by the amd-pstate cpufreq driver (Arnd Bergmann)" * tag 'pm-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Fix unchecked HWP MSR access cpufreq: amd-pstate: Fix the inconsistency in max frequency units cpufreq: amd-pstate: remove global header file tools/power/cpupower: Fix Pstate frequency reporting on AMD Family 1Ah CPUs
2024-06-05net/mlx5: Fix tainted pointer delete is case of flow rules creation failAleksandr Mishin
In case of flow rule creation fail in mlx5_lag_create_port_sel_table(), instead of previously created rules, the tainted pointer is deleted deveral times. Fix this bug by using correct flow rules pointers. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 352899f384d4 ("net/mlx5: Lag, use buckets in hash mode") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240604100552.25201-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05Merge tag 'for-6.10-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A fix for fast fsync that needs to handle errors during writes after some COW failure so it does not lead to an inconsistent state" * tag 'for-6.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: ensure fast fsync waits for ordered extents after a write failure
2024-06-05Merge tag 'bcachefs-2024-06-05' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Just a few small fixes" * tag 'bcachefs-2024-06-05' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix trans->locked assert bcachefs: Rereplicate now moves data off of durability=0 devices bcachefs: Fix GFP_KERNEL allocation in break_cycle()
2024-06-05Merge tag 'nvme-6.10-2024-06-05' of git://git.infradead.org/nvme into block-6.10Jens Axboe
Pull NVMe fixes from Keith: "nvme fixes Linux 6.10 - Use reserved tags for special fabrics operations (Chunguang) - Persistent Reservation status masking fix (Weiwen)" * tag 'nvme-6.10-2024-06-05' of git://git.infradead.org/nvme: nvme: fix nvme_pr_* status code parsing nvme-fabrics: use reserved tag for reg read/write command
2024-06-05null_blk: fix validation of block sizeAndreas Hindborg
Block size should be between 512 and PAGE_SIZE and be a power of 2. The current check does not validate this, so update the check. Without this patch, null_blk would Oops due to a null pointer deref when loaded with bs=1536 [1]. Link: https://lore.kernel.org/all/87wmn8mocd.fsf@metaspace.dk/ Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240603192645.977968-1-nmi@metaspace.dk [axboe: remove unnecessary braces and != 0 check] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-05drm/amdgpu/pptable: Fix UBSAN array-index-out-of-boundsTasos Sahanidis
Flexible arrays used [1] instead of []. Replace the former with the latter to resolve multiple UBSAN warnings observed on boot with a BONAIRE card. In addition, use the __counted_by attribute where possible to hint the length of the arrays to the compiler and any sanitizers. Signed-off-by: Tasos Sahanidis <tasos@tasossah.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amd: Fix shutdown (again) on some SMU v13.0.4/11 platformsMario Limonciello
commit cd94d1b182d2 ("dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users") attempted to fix shutdown issues that were reported since commit 31729e8c21ec ("drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11") but caused issues for some people. Adjust the workaround flow to properly only apply in the S4 case: -> For shutdown go through SMU_MSG_PrepareMp1ForUnload -> For S4 go through SMU_MSG_GfxDeviceDriverReset and SMU_MSG_PrepareMp1ForUnload Reported-and-tested-by: lectrode <electrodexsnet@gmail.com> Closes: https://github.com/void-linux/void-packages/issues/50417 Cc: stable@vger.kernel.org Fixes: cd94d1b182d2 ("dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users") Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05Merge tag 'i2c-for-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "This should have been my second pull request during the merge window but one dependency in the drm subsystem fell through the cracks and was only applied for rc2. Now we can finally remove I2C_CLASS_SPD" * tag 'i2c-for-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: Remove I2C_CLASS_SPD i2c: synquacer: Remove a clk reference from struct synquacer_i2c
2024-06-05Merge tag 'tpmdd-next-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fixes from Jarkko Sakkinen: "The bug fix for tpm_tis_core_init() is not that critical but still makes sense to get into release for the sake of better quality. I included the Intel CPU model define change mainly to help Tony just a bit, as for this subsystem it cannot realistically speaking cause any possible harm" * tag 'tpmdd-next-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: Switch to new Intel CPU model defines tpm_tis: Do *not* flush uninitialized work
2024-06-05btrfs: fix leak of qgroup extent records after transaction abortFilipe Manana
Qgroup extent records are created when delayed ref heads are created and then released after accounting extents at btrfs_qgroup_account_extents(), called during the transaction commit path. If a transaction is aborted we free the qgroup records by calling btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(), unless we don't have delayed references. We are incorrectly assuming that no delayed references means we don't have qgroup extents records. We can currently have no delayed references because we ran them all during a transaction commit and the transaction was aborted after that due to some error in the commit path. So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs() even if we don't have any delayed references. Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/ Fixes: 81f7eb00ff5b ("btrfs: destroy qgroup extent records on transaction abort") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-05btrfs: fix crash on racing fsync and size-extending write into preallocOmar Sandoval
We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe(): BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs] With the following stack trace: #0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) #1 btrfs_drop_extents (fs/btrfs/file.c:411:4) #2 log_one_extent (fs/btrfs/tree-log.c:4732:9) #3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) #4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) #5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) #6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) #7 btrfs_sync_file (fs/btrfs/file.c:1933:8) #8 vfs_fsync_range (fs/sync.c:188:9) #9 vfs_fsync (fs/sync.c:202:9) #10 do_fsync (fs/sync.c:212:9) #11 __do_sys_fdatasync (fs/sync.c:225:9) #12 __se_sys_fdatasync (fs/sync.c:223:1) #13 __x64_sys_fdatasync (fs/sync.c:223:1) #14 do_syscall_x64 (arch/x86/entry/common.c:52:14) #15 do_syscall_64 (arch/x86/entry/common.c:83:7) #16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121) So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG(). This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us: >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ... So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size. Here is the state of the filesystem tree at the time of the crash: >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0)) >>> print_extent_buffer(nodes[0]) leaf 30425088 level 0 items 184 generation 9 owner 5 leaf 30425088 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da ... item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160 generation 7 transid 7 size 4096 nbytes 12288 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 6 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417703.220000000 (2024-05-22 15:41:43) mtime 1716417703.220000000 (2024-05-22 15:41:43) otime 1716417703.220000000 (2024-05-22 15:41:43) item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13 index 195 namelen 3 name: 193 item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 8192 ram 12288 extent compression 0 (none) item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 Item 5 in the log tree corresponds to item 183 in the filesystem tree, but nothing matches item 4. Furthermore, item 183 is the last item in the leaf. btrfs_log_prealloc_extents() is responsible for logging prealloc extents beyond i_size. It first truncates any previously logged prealloc extents that start beyond i_size. Then, it walks the filesystem tree and copies the prealloc extent items to the log tree. If it hits the end of a leaf, then it calls btrfs_next_leaf(), which unlocks the tree and does another search. However, while the filesystem tree is unlocked, an ordered extent completion may modify the tree. In particular, it may insert an extent item that overlaps with an extent item that was already copied to the log tree. This may manifest in several ways depending on the exact scenario, including an EEXIST error that is silently translated to a full sync, overlapping items in the log tree, or this crash. This particular crash is triggered by the following sequence of events: - Initially, the file has i_size=4k, a regular extent from 0-4k, and a prealloc extent beyond i_size from 4k-12k. The prealloc extent item is the last item in its B-tree leaf. - The file is fsync'd, which copies its inode item and both extent items to the log tree. - An xattr is set on the file, which sets the BTRFS_INODE_COPY_EVERYTHING flag. - The range 4k-8k in the file is written using direct I/O. i_size is extended to 8k, but the ordered extent is still in flight. - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this calls copy_inode_items_to_log(), which calls btrfs_log_prealloc_extents(). - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the filesystem tree. Since it starts before i_size, it skips it. Since it is the last item in its B-tree leaf, it calls btrfs_next_leaf(). - btrfs_next_leaf() unlocks the path. - The ordered extent completion runs, which converts the 4k-8k part of the prealloc extent to written and inserts the remaining prealloc part from 8k-12k. - btrfs_next_leaf() does a search and finds the new prealloc extent 8k-12k. - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into the log tree. Note that it overlaps with the 4k-12k prealloc extent that was copied to the log tree by the first fsync. - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k extent that was written. - This tries to drop the range 4k-8k in the log tree, which requires adjusting the start of the 4k-12k prealloc extent in the log tree to 8k. - btrfs_set_item_key_safe() sees that there is already an extent starting at 8k in the log tree and calls BUG(). Fix this by detecting when we're about to insert an overlapping file extent item in the log tree and truncating the part that would overlap. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "This is dominated by a couple large series for ARM and x86 respectively, but apart from that things are calm. ARM: - Large set of FP/SVE fixes for pKVM, addressing the fallout from the per-CPU data rework and making sure that the host is not involved in the FP/SVE switching any more - Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH is completely supported - Fix for the respective priorities of Failed PAC, Illegal Execution state and Instruction Abort exceptions - Fix the handling of AArch32 instruction traps failing their condition code, which was broken by the introduction of ESR_EL2.ISS2 - Allow vcpus running in AArch32 state to be restored in System mode - Fix AArch32 GPR restore that would lose the 64 bit state under some conditions RISC-V: - No need to use mask when hart-index-bits is 0 - Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext() x86: - Fixes and debugging help for the #VE sanity check. Also disable it by default, even for CONFIG_DEBUG_KERNEL, because it was found to trigger spuriously (most likely a processor erratum as the exact symptoms vary by generation). - Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled situation (GIF=0 or interrupt shadow) when the processor supports virtual NMI. While generally KVM will not request an NMI window when virtual NMIs are supported, in this case it *does* have to single-step over the interrupt shadow or enable the STGI intercept, in order to deliver the latched second NMI. - Drop support for hand tuning APIC timer advancement from userspace. Since we have adaptive tuning, and it has proved to work well, drop the module parameter for manual configuration and with it a few stupid bugs that it had" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (32 commits) KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attr KVM: arm64: Ensure that SME controls are disabled in protected mode KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx format KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVM KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM KVM: arm64: Specialize handling of host fpsimd state on trap KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helper KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_state KVM: arm64: Reintroduce __sve_save_state KVM: x86: Drop support for hand tuning APIC timer advancement from userspace KVM: SEV-ES: Delegate LBR virtualization to the processor KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absent KVM: SEV-ES: Prevent MSR access post VMSA encryption RISC-V: KVM: Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext function RISC-V: KVM: No need to use mask when hart-index-bit is 0 KVM: arm64: nv: Expose BTI and CSV_frac to a guest hypervisor KVM: arm64: nv: Fix relative priorities of exceptions generated by ERETAx KVM: arm64: AArch32: Fix spurious trapping of conditional instructions KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode ...
2024-06-05Merge branch 'pm-cpufreq'Rafael J. Wysocki
Merge cpufreq fixes for 6.10-rc3: - Fix a recently introduced unchecked HWP MSR access in the intel_pstate driver (Srinivas Pandruvada). - Add missing conversion from MHz to KHz to amd_pstate_set_boost() to address sysfs inteface inconsistency (Dhananjay Ugwekar). - Get rid of an excess global header file used by the amd-pstate cpufreq driver (Arnd Bergmann). * pm-cpufreq: cpufreq: intel_pstate: Fix unchecked HWP MSR access cpufreq: amd-pstate: Fix the inconsistency in max frequency units cpufreq: amd-pstate: remove global header file
2024-06-05KVM: s390x: selftests: Add shared zeropage testDavid Hildenbrand
Let's test that we can have shared zeropages in our process as long as storage keys are not getting used, that shared zeropages are properly unshared (replaced by anonymous pages) once storage keys are enabled, and that no new shared zeropages are populated after storage keys were enabled. We require the new pagemap interface to detect the shared zeropage. On an old kernel (zeropages always disabled): # ./s390x/shared_zeropage_test TAP version 13 1..3 not ok 1 Shared zeropages should be enabled ok 2 Shared zeropage should be gone ok 3 Shared zeropages should be disabled # Totals: pass:2 fail:1 xfail:0 xpass:0 skip:0 error:0 On a fixed kernel: # ./s390x/shared_zeropage_test TAP version 13 1..3 ok 1 Shared zeropages should be enabled ok 2 Shared zeropage should be gone ok 3 Shared zeropages should be disabled # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0 Testing of UFFDIO_ZEROPAGE can be added later. [ agordeev: Fixed checkpatch complaint, added ucall_common.h include ] Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Thomas Huth <thuth@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20240412084329.30315-1-david@redhat.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-06-05s390/crash: Do not use VM info if os_info does not have itAlexander Gordeev
The virtual memory information stored in os_info area is required for creation of the kernel image PT_LOAD program header for kernels since commit a2ec5bec56dd ("s390/mm: uncouple physical vs virtual address spaces"). By contrast, if such information in os_info is absent the PT_LOAD program header should not be created. Currently the proper PT_LOAD program header is created for kernels that contain the virtual memory information, but for kernels without one an invalid header of zero size is created. That in turn leads to stand-alone dump failures. Use OS_INFO_KASLR_OFFSET variable to check whether os_info is present or not (same as crash and makedumpfile tools do) and based on that create or do not create the kernel image PT_LOAD program header. Fixes: f4cac27dc0d6 ("s390/crash: Use old os_info to create PT_LOAD headers") Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Acked-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-06-05Merge branches 'acpi-ec', 'acpi-apei' and 'pnp'Rafael J. Wysocki
Merge ACPI EC driver fixes, an ACPI APEI fix and PNP fixes for 6.10-rc3: - Fix error handling during EC operation region accesses in the ACPI EC driver (Armin Wolf). - Fix a memory leak in the APEI error injection driver introduced during its converion to a platform driver (Dan Williams). - Fix build failures related to the dev_is_pnp() macro by redefining it as a proper function and exporting it to modules as appropriate and unexport pnp_bus_type which need not be exported any more (Andy Shevchenko). * acpi-ec: ACPI: EC: Avoid returning AE_OK on errors in address space handler ACPI: EC: Abort address space access upon error * acpi-apei: ACPI: APEI: EINJ: Fix einj_dev release leak * pnp: PNP: Hide pnp_bus_type from the non-PNP code PNP: Make dev_is_pnp() to be a function and export it for modules
2024-06-05bcachefs: Fix trans->locked assertKent Overstreet
in bch2_move_data_btree, we might start with the trans unlocked from a previous loop iteration - we need a trans_begin() before iter_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>