summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2024-03-11Merge tag 'timers-core-2024-03-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A large set of updates and features for timers and timekeeping: - The hierarchical timer pull model When timer wheel timers are armed they are placed into the timer wheel of a CPU which is likely to be busy at the time of expiry. This is done to avoid wakeups on potentially idle CPUs. This is wrong in several aspects: 1) The heuristics to select the target CPU are wrong by definition as the chance to get the prediction right is close to zero. 2) Due to #1 it is possible that timers are accumulated on a single target CPU 3) The required computation in the enqueue path is just overhead for dubious value especially under the consideration that the vast majority of timer wheel timers are either canceled or rearmed before they expire. The timer pull model avoids the above by removing the target computation on enqueue and queueing timers always on the CPU on which they get armed. This is achieved by having separate wheels for CPU pinned timers and global timers which do not care about where they expire. As long as a CPU is busy it handles both the pinned and the global timers which are queued on the CPU local timer wheels. When a CPU goes idle it evaluates its own timer wheels: - If the first expiring timer is a pinned timer, then the global timers can be ignored as the CPU will wake up before they expire. - If the first expiring timer is a global timer, then the expiry time is propagated into the timer pull hierarchy and the CPU makes sure to wake up for the first pinned timer. The timer pull hierarchy organizes CPUs in groups of eight at the lowest level and at the next levels groups of eight groups up to the point where no further aggregation of groups is required, i.e. the number of levels is log8(NR_CPUS). The magic number of eight has been established by experimention, but can be adjusted if needed. In each group one busy CPU acts as the migrator. It's only one CPU to avoid lock contention on remote timer wheels. The migrator CPU checks in its own timer wheel handling whether there are other CPUs in the group which have gone idle and have global timers to expire. If there are global timers to expire, the migrator locks the remote CPU timer wheel and handles the expiry. Depending on the group level in the hierarchy this handling can require to walk the hierarchy downwards to the CPU level. Special care is taken when the last CPU goes idle. At this point the CPU is the systemwide migrator at the top of the hierarchy and it therefore cannot delegate to the hierarchy. It needs to arm its own timer device to expire either at the first expiring timer in the hierarchy or at the first CPU local timer, which ever expires first. This completely removes the overhead from the enqueue path, which is e.g. for networking a true hotpath and trades it for a slightly more complex idle path. This has been in development for a couple of years and the final series has been extensively tested by various teams from silicon vendors and ran through extensive CI. There have been slight performance improvements observed on network centric workloads and an Intel team confirmed that this allows them to power down a die completely on a mult-die socket for the first time in a mostly idle scenario. There is only one outstanding ~1.5% regression on a specific overloaded netperf test which is currently investigated, but the rest is either positive or neutral performance wise and positive on the power management side. - Fixes for the timekeeping interpolation code for cross-timestamps: cross-timestamps are used for PTP to get snapshots from hardware timers and interpolated them back to clock MONOTONIC. The changes address a few corner cases in the interpolation code which got the math and logic wrong. - Simplifcation of the clocksource watchdog retry logic to automatically adjust to handle larger systems correctly instead of having more incomprehensible command line parameters. - Treewide consolidation of the VDSO data structures. - The usual small improvements and cleanups all over the place" * tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) timer/migration: Fix quick check reporting late expiry tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n vdso/datapage: Quick fix - use asm/page-def.h for ARM64 timers: Assert no next dyntick timer look-up while CPU is offline tick: Assume timekeeping is correctly handed over upon last offline idle call tick: Shut down low-res tick from dying CPU tick: Split nohz and highres features from nohz_mode tick: Move individual bit features to debuggable mask accesses tick: Move got_idle_tick away from common flags tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING tick: Move tick cancellation up to CPUHP_AP_TICK_DYING tick: Start centralizing tick related CPU hotplug operations tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick() tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick() tick: Use IS_ENABLED() whenever possible tick/sched: Remove useless oneshot ifdeffery tick/nohz: Remove duplicate between lowres and highres handlers tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer() hrtimer: Select housekeeping CPU during migration ...
2024-03-11selftests: forwarding: Add a test for NH group statsPetr Machata
Add to lib.sh support for fetching NH stats, and a new library, router_mpath_nh_lib.sh, with the common code for testing NH stats. Use the latter from router_mpath_nh.sh and router_mpath_nh_res.sh. The test works by sending traffic through a NH group, and checking that the reported values correspond to what the link that ultimately receives the traffic reports having seen. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/2a424c54062a5f1efd13b9ec5b2b0e29c6af2574.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11tools: ynl-gen: support using pre-defined values in attr checksHangbin Liu
Support using pre-defined values in checks so we don't need to use hard code number for the string, binary length. e.g. we have a definition like #define TEAM_STRING_MAX_LEN 32 Which defined in yaml like: definitions: - name: string-max-len type: const value: 32 It can be used in the attribute-sets like attribute-sets: - name: attr-option name-prefix: team-attr-option- attributes: - name: name type: string checks: len: string-max-len With this patch it will be converted to [TEAM_ATTR_OPTION_NAME] = { .type = NLA_STRING, .len = TEAM_STRING_MAX_LEN, } Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240311140727.109562-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11Merge tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue updates from Tejun Heo: "This cycle, a lot of workqueue changes including some that are significant and invasive. - During v6.6 cycle, unbound workqueues were updated so that they are more topology aware and flexible, which among other things improved workqueue behavior on modern multi-L3 CPUs. In the process, commit 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") switched unbound workqueues to use per-CPU frontend pool_workqueues as a part of increasing front-back mapping flexibility. An unwelcome side effect of this change was that this made max concurrency enforcement per-CPU blowing up the maximum number of allowed concurrent executions. I incorrectly assumed that this wouldn't cause practical problems as most unbound workqueue users are self-regulate max concurrency; however, there definitely are which don't (e.g. on IO paths) and the drastic increase in the allowed max concurrency led to noticeable perf regressions in some use cases. This is now addressed by separating out max concurrency enforcement to a separate struct - wq_node_nr_active - which makes @max_active consistently mean system-wide max concurrency regardless of the number of CPUs or (finally) NUMA nodes. This is a rather invasive and, in places, a bit clunky; however, the clunkiness rises from the the inherent requirement to handle the disagreement between the execution locality domain and max concurrency enforcement domain on some modern machines. See commit 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") for more details. - BH workqueue support is added. They are similar to per-CPU workqueues but execute work items in the softirq context. This is expected to replace tasklet. However, currently, it's missing the ability to disable and enable work items which is needed to convert many tasklet users. To avoid crowding this merge window too much, this will be included in the next merge window. A separate pull request will be sent for the couple conversion patches that are currently pending. - Waiman plugged a long-standing hole in workqueue CPU isolation where ordered workqueues didn't follow wq_unbound_cpumask updates. Ordered workqueues now follow the same rules as other unbound workqueues. - More CPU isolation improvements: Juri fixed another deficit in workqueue isolation where unbound rescuers don't respect wq_unbound_cpumask. Leonardo fixed delayed_work timers firing on isolated CPUs. - Other misc changes" * tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (54 commits) workqueue: Drain BH work items on hot-unplugged CPUs workqueue: Introduce from_work() helper for cleaner callback declarations workqueue: Control intensive warning threshold through cmdline workqueue: Make @flags handling consistent across set_work_data() and friends workqueue: Remove clear_work_data() workqueue: Factor out work_grab_pending() from __cancel_work_sync() workqueue: Clean up enum work_bits and related constants workqueue: Introduce work_cancel_flags workqueue: Use variable name irq_flags for saving local irq flags workqueue: Reorganize flush and cancel[_sync] functions workqueue: Rename __cancel_work_timer() to __cancel_timer_sync() workqueue: Use rcu_read_lock_any_held() instead of rcu_read_lock_held() workqueue: Cosmetic changes workqueue, irq_work: Build fix for !CONFIG_IRQ_WORK workqueue: Fix queue_work_on() with BH workqueues async: Use a dedicated unbound workqueue with raised min_active workqueue: Implement workqueue_set_min_active() workqueue: Fix kernel-doc comment of unplug_oldest_pwq() workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask kernel/workqueue: Let rescuers follow unbound wq cpumask changes ...
2024-03-11Merge tag 'vfs-6.9.pidfd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pdfd updates from Christian Brauner: - Until now pidfds could only be created for thread-group leaders but not for threads. There was no technical reason for this. We simply had no users that needed support for this. Now we do have users that need support for this. This introduces a new PIDFD_THREAD flag for pidfd_open(). If that flag is set pidfd_open() creates a pidfd that refers to a specific thread. In addition, we now allow clone() and clone3() to be called with CLONE_PIDFD | CLONE_THREAD which wasn't possible before. A pidfd that refers to an individual thread differs from a pidfd that refers to a thread-group leader: (1) Pidfds are pollable. A task may poll a pidfd and get notified when the task has exited. For thread-group leader pidfds the polling task is woken if the thread-group is empty. In other words, if the thread-group leader task exits when there are still threads alive in its thread-group the polling task will not be woken when the thread-group leader exits but rather when the last thread in the thread-group exits. For thread-specific pidfds the polling task is woken if the thread exits. (2) Passing a thread-group leader pidfd to pidfd_send_signal() will generate thread-group directed signals like kill(2) does. Passing a thread-specific pidfd to pidfd_send_signal() will generate thread-specific signals like tgkill(2) does. The default scope of the signal is thus determined by the type of the pidfd. Since use-cases exist where the default scope of the provided pidfd needs to be overriden the following flags are added to pidfd_send_signal(): - PIDFD_SIGNAL_THREAD Send a thread-specific signal. - PIDFD_SIGNAL_THREAD_GROUP Send a thread-group directed signal. - PIDFD_SIGNAL_PROCESS_GROUP Send a process-group directed signal. The scope change will only work if the struct pid is actually used for this scope. For example, in order to send a thread-group directed signal the provided pidfd must be used as a thread-group leader and similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be used as a process group leader. - Move pidfds from the anonymous inode infrastructure to a tiny pseudo filesystem. This will unblock further work that we weren't able to do simply because of the very justified limitations of anonymous inodes. Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds to become useful for the first time. They can now be compared by inode number which are unique for the system lifetime. Instead of stashing struct pid in file->private_data we can now stash it in inode->i_private. This makes it possible to introduce concepts that operate on a process once all file descriptors have been closed. A concrete example is kill-on-last-close. Another side-effect is that file->private_data is now freed up for per-file options for pidfds. Now, each struct pid will refer to a different inode but the same struct pid will refer to the same inode if it's opened multiple times. In contrast to now where each struct pid refers to the same inode. The tiny pseudo filesystem is not visible anywhere in userspace exactly like e.g., pipefs and sockfs. There's no lookup, there's no complex inode operations, nothing. Dentries and inodes are always deleted when the last pidfd is closed. We allocate a new inode and dentry for each struct pid and we reuse that inode and dentry for all pidfds that refer to the same struct pid. The code is entirely optional and fairly small. If it's not selected we fallback to anonymous inodes. Heavily inspired by nsfs. The dentry and inode allocation mechanism is moved into generic infrastructure that is now shared between nsfs and pidfs. The path_from_stashed() helper must be provided with a stashing location, an inode number, a mount, and the private data that is supposed to be used and it will provide a path that can be passed to dentry_open(). The helper will try retrieve an existing dentry from the provided stashing location. If a valid dentry is found it is reused. If not a new one is allocated and we try to stash it in the provided location. If this fails we retry until we either find an existing dentry or the newly allocated dentry could be stashed. Subsequent openers of the same namespace or task are then able to reuse it. - Currently it is only possible to get notified when a task has exited, i.e., become a zombie and userspace gets notified with EPOLLIN. We now also support waiting until the task has been reaped, notifying userspace with EPOLLHUP. - Ensure that ESRCH is reported for getfd if a task is exiting instead of the confusing EBADF. - Various smaller cleanups to pidfd functions. * tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits) libfs: improve path_from_stashed() libfs: add stashed_dentry_prune() libfs: improve path_from_stashed() helper pidfs: convert to path_from_stashed() helper nsfs: convert to path_from_stashed() helper libfs: add path_from_stashed() pidfd: add pidfs pidfd: move struct pidfd_fops pidfd: allow to override signal scope in pidfd_send_signal() pidfd: change pidfd_send_signal() to respect PIDFD_THREAD signal: fill in si_code in prepare_kill_siginfo() selftests: add ESRCH tests for pidfd_getfd() pidfd: getfd should always report ESRCH if a task is exiting pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together pidfd: exit: kill the no longer used thread_group_exited() pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN)) pid: kill the obsolete PIDTYPE_PID code in transfer_pid() pidfd: kill the no longer needed do_notify_pidfd() in de_thread() pidfd_poll: report POLLHUP when pid_task() == NULL pidfd: implement PIDFD_THREAD flag for pidfd_open() ...
2024-03-11Merge tag 'vfs-6.9.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Misc features, cleanups, and fixes for vfs and individual filesystems. Features: - Support idmapped mounts for hugetlbfs. - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug where the passed offset is ignored if the file is O_APPEND. The new flag allows a caller to enforce that the offset is honored to conform to posix even if the file was opened in append mode. - Move i_mmap_rwsem in struct address_space to avoid false sharing between i_mmap and i_mmap_rwsem. - Convert efs, qnx4, and coda to use the new mount api. - Add a generic is_dot_dotdot() helper that's used by various filesystems and the VFS code instead of open-coding it multiple times. - Recently we've added stable offsets which allows stable ordering when iterating directories exported through NFS on e.g., tmpfs filesystems. Originally an xarray was used for the offset map but that caused slab fragmentation issues over time. This switches the offset map to the maple tree which has a dense mode that handles this scenario a lot better. Includes tests. - Finally merge the case-insensitive improvement series Gabriel has been working on for a long time. This cleanly propagates case insensitive operations through ->s_d_op which in turn allows us to remove the quite ugly generic_set_encrypted_ci_d_ops() operations. It also improves performance by trying a case-sensitive comparison first and then fallback to case-insensitive lookup if that fails. This also fixes a bug where overlayfs would be able to be mounted over a case insensitive directory which would lead to all sort of odd behaviors. Cleanups: - Make file_dentry() a simple accessor now that ->d_real() is simplified because of the backing file work we did the last two cycles. - Use the dedicated file_mnt_idmap helper in ntfs3. - Use smp_load_acquire/store_release() in the i_size_read/write helpers and thus remove the hack to handle i_size reads in the filemap code. - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in fs/ - It's no longer necessary to perform a second built-in initramfs unpack call because we retain the contents of the previous extraction. Remove it. - Now that we have removed various allocators kfree_rcu() always works with kmem caches and kmalloc(). So simplify various places that only use an rcu callback in order to handle the kmem cache case. - Convert the pipe code to use a lockdep comparison function instead of open-coding the nesting making lockdep validation easier. - Move code into fs-writeback.c that was located in a header but can be made static as it's only used in that one file. - Rewrite the alignment checking iterators for iovec and bvec to be easier to read, and also significantly more compact in terms of generated code. This saves 270 bytes of text on x86-64 (with clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also saves a bit of time for the same workload. - Switch various places to use KMEM_CACHE instead of kmem_cache_create(). - Use inode_set_ctime_to_ts() in inode_set_ctime_current() - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak. - Various smaller cleanups for eventfds. Fixes: - Fix various comments and typos, and unneeded initializations. - Fix stack allocation hack for clang in the select code. - Improve dump_mapping() debug code on a best-effort basis. - Fix build errors in various selftests. - Avoid wrap-around instrumentation in various places. - Don't allow user namespaces without an idmapping to be used for idmapped mounts. - Fix sysv sb_read() call. - Fix fallback implementation of the get_name() export operation" * tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits) hugetlbfs: support idmapped mounts qnx4: convert qnx4 to use the new mount api fs: use inode_set_ctime_to_ts to set inode ctime to current time libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup efs: remove SLAB_MEM_SPREAD flag usage jfs: remove SLAB_MEM_SPREAD flag usage minix: remove SLAB_MEM_SPREAD flag usage openpromfs: remove SLAB_MEM_SPREAD flag usage proc: remove SLAB_MEM_SPREAD flag usage qnx6: remove SLAB_MEM_SPREAD flag usage ...
2024-03-11Merge tag 'linux_kselftest-kunit-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit updates from Shuah Khan: - fix to make kunit_bus_type const - kunit tool change to Print UML command - DRM device creation helpers are now using the new kunit device creation helpers. This change resulted in DRM helpers switching from using a platform_device, to a dedicated bus and device type used by kunit. kunit devices don't set DMA mask and this caused regression on some drm tests as they can't allocate DMA buffers. Fix this problem by setting DMA masks on the kunit device during initialization. - KUnit has several macros which accept a log message, which can contain printf format specifiers. Some of these (the explicit log macros) already use the __printf() gcc attribute to ensure the format specifiers are valid, but those which could fail the test, and hence used __kunit_do_failed_assertion() behind the scenes, did not. These include: KUNIT_EXPECT_*_MSG(), KUNIT_ASSERT_*_MSG(), and KUNIT_FAIL() A nine-patch series adds the __printf() attribute, and fixes all of the issues uncovered. * tag 'linux_kselftest-kunit-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Annotate _MSG assertion variants with gnu printf specifiers drm: tests: Fix invalid printf format specifiers in KUnit tests drm/xe/tests: Fix printf format specifiers in xe_migrate test net: test: Fix printf format specifier in skb_segment kunit test rtc: test: Fix invalid format specifier. time: test: Fix incorrect format specifier lib: memcpy_kunit: Fix an invalid format specifier in an assertion msg lib/cmdline: Fix an invalid format specifier in an assertion msg kunit: test: Log the correct filter string in executor_test kunit: Setup DMA masks on the kunit device kunit: make kunit_bus_type const kunit: Mark filter* params as rw kunit: tool: Print UML command
2024-03-11Merge tag 'linux_kselftest-next-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest update from Shuah Khan: - livepatch restructuring to move the module out of lib to be built as a out-of-tree modules during kselftest build. This makes it easier change, debug and rebuild the tests by running make on the selftests/livepatch directory, which is not currently possible since the modules on lib/livepatch are build and installed using the main makefile modules target. - livepatch restructuring fixes for problems found by kernel test robot. The change skips the test if kernel-devel isn't installed (default value of KDIR), or if KDIR variable passed doesn't exists. - resctrl test restructuring and new non-contiguous CBMs CAT test - new ktap_helpers to print diagnostic messages, pass/fail tests based on exit code, abort test, and finish the test. - a new test verify power supply properties. - a new ftrace to exercise function tracer across cpu hotplug. - timeout increase for mqueue test to allow the test to run on i3.metal AWS instances. - minor spelling corrections in several tests. - missing gitignore files and changes to existing gitignore files. * tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (57 commits) kselftest: Add basic test for probing the rust sample modules selftests: lib.mk: Do not process TEST_GEN_MODS_DIR selftests: livepatch: Avoid running the tests if kernel-devel is missing selftests: livepatch: Add initial .gitignore selftests/resctrl: Add non-contiguous CBMs CAT test selftests/resctrl: Add resource_info_file_exists() selftests/resctrl: Split validate_resctrl_feature_request() selftests/resctrl: Add a helper for the non-contiguous test selftests/resctrl: Add test groups and name L3 CAT test L3_CAT selftests: sched: Fix spelling mistake "hiearchy" -> "hierarchy" selftests/mqueue: Set timeout to 180 seconds selftests/ftrace: Add test to exercize function tracer across cpu hotplug selftest: ftrace: fix minor typo in log selftests: thermal: intel: workload_hint: add missing gitignore selftests: thermal: intel: power_floor: add missing gitignore selftests: uevent: add missing gitignore selftests: Add test to verify power supply properties selftests: ktap_helpers: Add a helper to finish the test selftests: ktap_helpers: Add a helper to abort the test selftests: ktap_helpers: Add helper to pass/fail test based on exit code ...
2024-03-11selftests/bpf: Add fexit and kretprobe triggering benchmarksAndrii Nakryiko
We already have kprobe and fentry benchmarks. Let's add kretprobe and fexit ones for completeness. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240309005124.3004446-1-andrii@kernel.org
2024-03-11Merge tag 'kvm-x86-xen-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM Xen and pfncache changes for 6.9: - Rip out the half-baked support for using gfn_to_pfn caches to manage pages that are "mapped" into guests via physical addresses. - Add support for using gfn_to_pfn caches with only a host virtual address, i.e. to bypass the "gfn" stage of the cache. The primary use case is overlay pages, where the guest may change the gfn used to reference the overlay page, but the backing hva+pfn remains the same. - Add an ioctl() to allow mapping Xen's shared_info page using an hva instead of a gpa, so that userspace doesn't need to reconfigure and invalidate the cache/mapping if the guest changes the gpa (but userspace keeps the resolved hva the same). - When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation. - Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior). - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs. - Extend gfn_to_pfn_cache's mutex to cover (de)activation (in addition to refresh), and drop a now-redundant acquisition of xen_lock (that was protecting the shared_info cache) to fix a deadlock due to recursively acquiring xen_lock.
2024-03-11Merge tag 'kvm-x86-pmu-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 PMU changes for 6.9: - Fix several bugs where KVM speciously prevents the guest from utilizing fixed counters and architectural event encodings based on whether or not guest CPUID reports support for the _architectural_ encoding. - Fix a variety of bugs in KVM's emulation of RDPMC, e.g. for "fast" reads, priority of VMX interception vs #GP, PMC types in architectural PMUs, etc. - Add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID, i.e. are difficult to validate via KVM-Unit-Tests. - Zero out PMU metadata on AMD if the virtual PMU is disabled to avoid wasting cycles, e.g. when checking if a PMC event needs to be synthesized when skipping an instruction. - Optimize triggering of emulated events, e.g. for "count instructions" events when skipping an instruction, which yields a ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest. - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit.
2024-03-11objtool: Check local label in read_unwind_hints()Tiezhu Yang
When update the latest upstream gcc and binutils, it generates some objtool warnings on LoongArch, like this: arch/loongarch/kernel/entry.o: warning: objtool: ret_from_fork+0x0: unreachable instruction We can see that the reloc sym name is local label instead of section in relocation section '.rela.discard.unwind_hints', in this case, the reloc sym type is STT_NOTYPE instead of STT_SECTION. Let us check it to not return -1, then use reloc->sym->offset instead of reloc addend which is 0 to find the corresponding instruction. Here are some detailed info: [fedora@linux 6.8.test]$ gcc --version gcc (GCC) 14.0.1 20240129 (experimental) [fedora@linux 6.8.test]$ as --version GNU assembler (GNU Binutils) 2.42.50.20240129 [fedora@linux 6.8.test]$ readelf -r arch/loongarch/kernel/entry.o | grep -A 3 "rela.discard.unwind_hints" Relocation section '.rela.discard.unwind_hints' at offset 0x3a8 contains 7 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000000 000a00000063 R_LARCH_32_PCREL 0000000000000000 .Lhere_1 + 0 00000000000c 000b00000063 R_LARCH_32_PCREL 00000000000000a8 .Lhere_50 + 0 Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-11objtool: Check local label in add_dead_ends()Tiezhu Yang
When update the latest upstream gcc and binutils, it generates more objtool warnings on LoongArch, like this: init/main.o: warning: objtool: unexpected relocation symbol type in .rela.discard.unreachable We can see that the reloc sym name is local label instead of section in relocation section '.rela.discard.unreachable', in this case, the reloc sym type is STT_NOTYPE instead of STT_SECTION. As suggested by Peter Zijlstra, we add a "local_label" member in struct symbol, then set it as true if symbol type is STT_NOTYPE and symbol name starts with ".L" string in classify_symbols(). Let's check reloc->sym->local_label to not return -1 in add_dead_ends(), and also use reloc->sym->offset instead of reloc addend which is 0 to find the corresponding instruction. At the same time, let's replace the variable "addend" with "offset" to reflect the reality. Here are some detailed info: [fedora@linux 6.8.test]$ gcc --version gcc (GCC) 14.0.1 20240129 (experimental) [fedora@linux 6.8.test]$ as --version GNU assembler (GNU Binutils) 2.42.50.20240129 [fedora@linux 6.8.test]$ readelf -r init/main.o | grep -A 2 "rela.discard.unreachable" Relocation section '.rela.discard.unreachable' at offset 0x6028 contains 1 entry: Offset Info Type Sym. Value Sym. Name + Addend 000000000000 00d900000063 R_LARCH_32_PCREL 00000000000002c4 .L500^B1 + 0 Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-11objtool/LoongArch: Enable orc to be builtTiezhu Yang
Implement arch-specific init_orc_entry(), write_orc_entry(), reg_name(), orc_type_name(), print_reg() and orc_print_dump(), then set BUILD_ORC as y to build the orc related files. Co-developed-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Jinyang He <hejinyang@loongson.cn> Co-developed-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-11objtool/x86: Separate arch-specific and generic partsTiezhu Yang
Move init_orc_entry(), write_orc_entry(), reg_name(), orc_type_name() and print_reg() from generic orc_gen.c and orc_dump.c to arch-specific orc.c, then introduce a new function orc_print_dump() to print info. This is preparation for later patch, no functionality change. Co-developed-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Jinyang He <hejinyang@loongson.cn> Co-developed-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-11objtool/LoongArch: Implement instruction decoderTiezhu Yang
Only copy the minimal definitions of instruction opcodes and formats in inst.h from arch/loongarch to tools/arch/loongarch, and also copy the definition of sign_extend64() to tools/include/linux/bitops.h to decode the following kinds of instructions: (1) stack pointer related instructions addi.d, ld.d, st.d, ldptr.d and stptr.d (2) branch and jump related instructions beq, bne, blt, bge, bltu, bgeu, beqz, bnez, bceqz, bcnez, b, bl and jirl (3) other instructions break, nop and ertn See more info about instructions in LoongArch Reference Manual: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html Co-developed-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Jinyang He <hejinyang@loongson.cn> Co-developed-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-11objtool/LoongArch: Enable objtool to be builtTiezhu Yang
Add the minimal changes to enable objtool build on LoongArch, most of the functions are stubs to only fix the build errors when make -C tools/objtool. This is similar with commit e52ec98c5ab1 ("objtool/powerpc: Enable objtool to be built on ppc"). Co-developed-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Jinyang He <hejinyang@loongson.cn> Co-developed-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-11Merge tag 'kvm-x86-selftests-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM selftests changes for 6.9: - Add macros to reduce the amount of boilerplate code needed to write "simple" selftests, and to utilize selftest TAP infrastructure, which is especially beneficial for KVM selftests with multiple testcases. - Add basic smoke tests for SEV and SEV-ES, along with a pile of library support for handling private/encrypted/protected memory. - Fix benign bugs where tests neglect to close() guest_memfd files.
2024-03-11Merge tag 'kvm-riscv-6.9-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 6.9 - Exception and interrupt handling for selftests - Sstc (aka arch_timer) selftest - Forward seed CSR access to KVM userspace - Ztso extension support for Guest/VM - Zacas extension support for Guest/VM
2024-03-11Merge tag 'kvmarm-6.9' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.9 - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests
2024-03-11Merge tag 'loongarch-kvm-6.9' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.9 * Set reserved bits as zero in CPUCFG. * Start SW timer only when vcpu is blocking. * Do not restart SW timer when it is expired. * Remove unnecessary CSR register saving during enter guest.
2024-03-11ynl: samples: fix recycling rate calculationJakub Kicinski
Running the page-pool sample on production machines under moderate networking load shows recycling rate higher than 100%: $ page-pool eth0[2] page pools: 14 (zombies: 0) refs: 89088 bytes: 364904448 (refs: 0 bytes: 0) recycling: 100.3% (alloc: 1392:2290247724 recycle: 469289484:1828235386) Note that outstanding refs (89088) == slow alloc * cache size (1392 * 64) which means this machine is recycling page pool pages perfectly, not a single page has been released. The extra 0.3% is because sample ignores allocations from the ptr_ring. Treat those the same as alloc_fast, the ring vs cache alloc is already captured accurately enough by recycling stats. With the fix: $ page-pool eth0[2] page pools: 14 (zombies: 0) refs: 89088 bytes: 364904448 (refs: 0 bytes: 0) recycling: 100.0% (alloc: 1392:2331141604 recycle: 473625579:1857460661) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not writable from userspace, so there would be no way to write to a read-only guest_memfd). - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely for development and testing. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD dirty logging test that caused false passes. x86 fixes: - Fix missing marking of a guest page as dirty when emulating an atomic access. - Check for mmu_notifier invalidation events before faulting in the pfn, and before acquiring mmu_lock, to avoid unnecessary work and lock contention with preemptible kernels (including CONFIG_PREEMPT_DYNAMIC in non-preemptible mode). - Disable AMD DebugSwap by default, it breaks VMSA signing and will be re-enabled with a better VM creation API in 6.10. - Do the cache flush of converted pages in svm_register_enc_region() before dropping kvm->lock, to avoid a race with unregistering of the same region and the consequent use-after-free issue" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: SEV: disable SEV-ES DebugSwap by default KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region() KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY KVM: x86: Mark target gfn of emulated atomic instruction as dirty
2024-03-10kbuild: unexport abs_srctree and abs_objtreeMasahiro Yamada
Commit 25b146c5b8ce ("kbuild: allow Kbuild to start from any directory") exported abs_srctree and abs_objtree to avoid recomputation after the sub-make. However, this approach turned out to be fragile. Commit 5fa94ceb793e ("kbuild: set correct abs_srctree and abs_objtree for package builds") moved them above "ifneq ($(sub_make_done),1)", eliminating the need for exporting them. These are only needed in the top Makefile. If an absolute path is required in sub-directories, you can use $(abspath ) or $(realpath ) as needed. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2024-03-09Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of ↵Paolo Bonzini
https://github.com/kvm-x86/linux into HEAD KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating ABI that KVM can't sanely support. - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely a development and testing vehicle, and come with zero guarantees. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
2024-03-08tools: ynl: Fix spelling mistake "Constructred" -> "Constructed"Colin Ian King
There is a spelling mistake in an error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240308084458.2045266-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07netdev: add queue stat for alloc failuresJakub Kicinski
Rx alloc failures are commonly counted by drivers. Support reporting those via netdev-genl queue stats. Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07netdev: add per-queue statisticsJakub Kicinski
The ethtool-nl family does a good job exposing various protocol related and IEEE/IETF statistics which used to get dumped under ethtool -S, with creative names. Queue stats don't have a netlink API, yet, and remain a lion's share of ethtool -S output for new drivers. Not only is that bad because the names differ driver to driver but it's also bug-prone. Intuitively drivers try to report only the stats for active queues, but querying ethtool stats involves multiple system calls, and the number of stats is read separately from the stats themselves. Worse still when user space asks for values of the stats, it doesn't inform the kernel how big the buffer is. If number of stats increases in the meantime kernel will overflow user buffer. Add a netlink API for dumping queue stats. Queue information is exposed via the netdev-genl family, so add the stats there. Support per-queue and sum-for-device dumps. Latter will be useful when subsequent patches add more interesting common stats than just bytes and packets. The API does not currently distinguish between HW and SW stats. The expectation is that the source of the stats will either not matter much (good packets) or be obvious (skb alloc errors). Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: userspace pm: avoid relaunching pm eventsMatthieu Baerts (NGI0)
'make_connection' is launched twice: once for IPv4, once for IPv6. But then, the "pm_nl_ctl events" was launched a first time, killed, then relaunched after for no particular reason. We can then move this code, and the generation of the temp file to exchange, to the init part, and remove extra conditions that no longer needed. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-12-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: mptcp: simult flows: fix shellcheck warningsMatthieu Baerts (NGI0)
shellcheck recently helped to prevent issues. It is then good to fix the other harmless issues in order to spot "real" ones later. Here, two categories of warnings are now ignored: - SC2317: Command appears to be unreachable. The cleanup() function is invoked indirectly via the EXIT trap. - SC2086: Double quote to prevent globbing and word splitting. This is recommended, but the current usage is correct and there is no need to do all these modifications to be compliant with this rule. For the modifications: - SC2034: ksft_skip appears unused. - SC2004: $/${} is unnecessary on arithmetic variables. Now this script is shellcheck (0.9.0) compliant. We can easily spot new issues. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-11-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: mptcp: pm netlink: fix shellcheck warningsMatthieu Baerts (NGI0)
shellcheck recently helped to prevent issues. It is then good to fix the other harmless issues in order to spot "real" ones later. Here, two categories of warnings are now ignored: - SC2317: Command appears to be unreachable. The cleanup() function is invoked indirectly via the EXIT trap. - SC2086: Double quote to prevent globbing and word splitting. This is recommended, but the current usage is correct and there is no need to do all these modifications to be compliant with this rule. For the modifications: - SC2034: ksft_skip appears unused. - SC2154: optstring is referenced but not assigned. - SC2006: Use $(...) notation instead of legacy backticks `...`. Now this script is shellcheck (0.9.0) compliant. We can easily spot new issues. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-10-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: mptcp: sockopt: fix shellcheck warningsMatthieu Baerts (NGI0)
shellcheck recently helped to prevent issues. It is then good to fix the other harmless issues in order to spot "real" ones later. Here, two categories of warnings are now ignored: - SC2317: Command appears to be unreachable. The cleanup() function is invoked indirectly via the EXIT trap. - SC2086: Double quote to prevent globbing and word splitting. This is recommended, but the current usage is correct and there is no need to do all these modifications to be compliant with this rule. For the modifications: - SC2034: ksft_skip appears unused. - SC2006: Use $(...) notation instead of legacy backticks `...`. - SC2145: Argument mixes string and array. Use * or separate argument. Now this script is shellcheck (0.9.0) compliant. We can easily spot new issues. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-9-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: mptcp: connect: fix shellcheck warningsMatthieu Baerts (NGI0)
shellcheck recently helped to prevent issues. It is then good to fix the other harmless issues in order to spot "real" ones later. Here, two categories of warnings are now ignored: - SC2317: Command appears to be unreachable. The cleanup() function is invoked indirectly via the EXIT trap. - SC2086: Double quote to prevent globbing and word splitting. This is recommended, but the current usage is correct and there is no need to do all these modifications to be compliant with this rule. For the modifications: - SC2034: ksft_skip appears unused. - SC2181: Check exit code directly with e.g. 'if mycmd;', not indirectly with $?. - SC2004: $/${} is unnecessary on arithmetic variables. - SC2155: Declare and assign separately to avoid masking return values. - SC2166: Prefer [ p ] && [ q ] as [ p -a q ] is not well defined. - SC2059: Don't use variables in the printf format string. Use printf '..%s..' "$foo". Now this script is shellcheck (0.9.0) compliant. We can easily spot new issues. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-8-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: mptcp: diag: fix shellcheck warningsMatthieu Baerts (NGI0)
shellcheck recently helped to prevent issues. It is then good to fix the other harmless issues in order to spot "real" ones later. Here, two categories of warnings are now ignored: - SC2317: Command appears to be unreachable. The cleanup() function is invoked indirectly via the EXIT trap. - SC2086: Double quote to prevent globbing and word splitting. This is recommended, but the current usage is correct and there is no need to do all these modifications to be compliant with this rule. For the modifications: - SC2034: ksft_skip appears unused. - SC2046: Quote '$(get_msk_inuse)' to prevent word splitting. - SC2006: Use $(...) notation instead of legacy backticks `...`. Now this script is shellcheck (0.9.0) compliant. We can easily spot new issues. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-7-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: mptcp: add mptcp_lib_events helperGeliang Tang
To avoid duplicated code in different MPTCP selftests, we can add and use helpers defined in mptcp_lib.sh. This patch unifies "pm_nl_ctl events" related code in userspace_pm.sh and mptcp_join.sh into a helper mptcp_lib_events(). Define it in mptcp_lib.sh and use it in both scripts. Note that mptcp_lib_kill_wait is now call before starting 'events' for mptcp_join.sh as well, but that's fine: each test is started from a new netns, so there will not be any existing pid there, and nothing is done when mptcp_lib_kill_wait is called with 0. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-6-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: mptcp: more operations in ns_init/exitGeliang Tang
Set more the default sysctl values in mptcp_lib_ns_init(). It is fine to do that everywhere, because they could be overridden latter if needed. mptcp_lib_ns_exit() now also try to remove temp netns files used for the stats even for selftests not using them. That's fine to do that because these files have a unique name. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-5-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: mptcp: add mptcp_lib_ns_init/exit helpersGeliang Tang
Add helpers mptcp_lib_ns_init() and mptcp_lib_ns_exit() in mptcp_lib.sh to initialize and delete the given namespaces. Then every test script can invoke these helpers and use all namespaces. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-4-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: mptcp: add local variables rndhGeliang Tang
This patch adds local variables rndh in do_transfer() functions both in mptcp_connect.sh and simult_flows.sh, setting it with ${ns1:4}, not the global variable rndh. The global one is hidden in the next commit. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-3-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: mptcp: add mptcp_lib_check_tools helperGeliang Tang
This patch exports check_tools() helper from mptcp_join.sh into mptcp_lib.sh as a public one mptcp_lib_check_tools(). The arguments "ip", "ss", "iptables" and "ip6tables" are passed into this helper to indicate whether to check ip tool, ss tool, iptables and ip6tables tools. This helper can be used in every scripts. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-2-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests: mptcp: stop forcing iptables-legacyMatthieu Baerts (NGI0)
Commit 0c4cd3f86a40 ("selftests: mptcp: join: use 'iptables-legacy' if available") and commit a5a5990c099d ("selftests: mptcp: sockopt: use 'iptables-legacy' if available") forced using iptables-legacy if available. This was needed because of some issues that were visible when testing the kselftests on a v5.15.x with iptables-nft as default backend. It looks like these errors are no longer present. As mentioned by Pablo [1], the errors were maybe due to missing kernel config. We can then use iptables-nft if it is the default one, instead of using a legacy tool. We can then check the variables iptables and ip6tables are valid. We can keep the variables to easily change it later or add options. Link: https://lore.kernel.org/netdev/ZbFiixyMFpQnxzCH@calendula/ [1] Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-1-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07selftests/net: fix waiting time for ipv6_gc test in fib_tests.sh.Kui-Feng Lee
ipv6_gc fails occasionally. According to the study, fib6_run_gc() using jiffies_round() to round the GC interval could increase the waiting time up to 750ms (3/4 seconds). The timer has a granularity of 512ms at the range 4s to 32s. That means a route with an expiration time E seconds can wait for more than E * 2 + 1 seconds if the GC interval is also E seconds. E * 2 + 2 seconds should be enough for waiting for removing routes. Also remove a check immediately after replacing 5 routes since it is very likely to remove some of routes before completing the last route with a slow environment. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240305183949.258473-1-thinker.li@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07tools/net/ynl: Add nest-type-value decodingDonald Hunter
The nlctrl genetlink-legacy family uses nest-type-value encoding as described in Documentation/userspace-api/netlink/genetlink-legacy.rst Add nest-type-value decoding to ynl. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240306231046.97158-5-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07tools/net/ynl: Fix c codegen for array-nestDonald Hunter
ynl-gen-c generates e.g. 'calloc(mcast_groups, sizeof(*dst->mcast_groups))' for array-nest attrs when it should be 'n_mcast_groups'. Add a 'n_' prefix in the generated code for array-nests. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240306231046.97158-4-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07tools/net/ynl: Report netlink errors without stacktraceDonald Hunter
ynl does not handle NlError exceptions so they get reported like program failures. Handle the NlError exceptions and report the netlink errors more cleanly. Example now: Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.op'} Example before: Traceback (most recent call last): File "/home/donaldh/net-next/./tools/net/ynl/cli.py", line 81, in <module> main() File "/home/donaldh/net-next/./tools/net/ynl/cli.py", line 69, in main reply = ynl.dump(args.dump, attrs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/donaldh/net-next/tools/net/ynl/lib/ynl.py", line 906, in dump return self._op(method, vals, [], dump=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/donaldh/net-next/tools/net/ynl/lib/ynl.py", line 872, in _op raise NlError(nl_msg) lib.ynl.NlError: Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.op'} Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240306231046.97158-3-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07tools/net/ynl: Fix extack decoding for netlink-rawDonald Hunter
Extack decoding was using a hard-coded msg header size of 20 but netlink-raw has a header size of 16. Use a protocol specific msghdr_size() when decoding the attr offssets. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240306231046.97158-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07bpftool: rename is_internal_mmapable_map into is_mmapable_mapAndrii Nakryiko
It's not restricted to working with "internal" maps, it cares about any map that can be mmap'ed. Reflect that in more succinct and generic name. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20240307031228.42896-6-alexei.starovoitov@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-07libbpf: Allow specifying 64-bit integers in map BTF.Alexei Starovoitov
__uint() macro that is used to specify map attributes like: __uint(type, BPF_MAP_TYPE_ARRAY); __uint(map_flags, BPF_F_MMAPABLE); It is limited to 32-bit, since BTF_KIND_ARRAY has u32 "number of elements" field in "struct btf_array". Introduce __ulong() macro that allows specifying values bigger than 32-bit. In map definition "map_extra" is the only u64 field, so far. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20240307031228.42896-5-alexei.starovoitov@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-07Merge tag 'linux-cpupower-6.9-rc1' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Merge a cpupower utility documentation update for 6.9-rc1 from Shuah Khan: "This cpupower update for Linux 6.9-rc1 consists of a single fix to a typo in cpupower-frequency-info.1 man page." * tag 'linux-cpupower-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: Fix cpupower-frequency-info.1 man page typo
2024-03-07Merge branches 'for-next/reorg-va-space', 'for-next/rust-for-arm64', ↵Catalin Marinas
'for-next/misc', 'for-next/daif-cleanup', 'for-next/kselftest', 'for-next/documentation', 'for-next/sysreg' and 'for-next/dpisa', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: (39 commits) docs: perf: Fix build warning of hisi-pcie-pmu.rst perf: starfive: Only allow COMPILE_TEST for 64-bit architectures MAINTAINERS: Add entry for StarFive StarLink PMU docs: perf: Add description for StarFive's StarLink PMU dt-bindings: perf: starfive: Add JH8100 StarLink PMU perf: starfive: Add StarLink PMU support docs: perf: Update usage for target filter of hisi-pcie-pmu drivers/perf: hisi_pcie: Merge find_related_event() and get_event_idx() drivers/perf: hisi_pcie: Relax the check on related events drivers/perf: hisi_pcie: Check the target filter properly drivers/perf: hisi_pcie: Add more events for counting TLP bandwidth drivers/perf: hisi_pcie: Fix incorrect counting under metric mode drivers/perf: hisi_pcie: Introduce hisi_pcie_pmu_get_event_ctrl_val() drivers/perf: hisi_pcie: Rename hisi_pcie_pmu_{config,clear}_filter() drivers/perf: hisi: Enable HiSilicon Erratum 162700402 quirk for HIP09 perf/arm_cspmu: Add devicetree support dt-bindings/perf: Add Arm CoreSight PMU perf/arm_cspmu: Simplify counter reset perf/arm_cspmu: Simplify attribute groups perf/arm_cspmu: Simplify initialisation ... * for-next/reorg-va-space: : Reorganise the arm64 kernel VA space in preparation for LPA2 support : (52-bit VA/PA). arm64: kaslr: Adjust randomization range dynamically arm64: mm: Reclaim unused vmemmap region for vmalloc use arm64: vmemmap: Avoid base2 order of struct page size to dimension region arm64: ptdump: Discover start of vmemmap region at runtime arm64: ptdump: Allow all region boundaries to be defined at boot time arm64: mm: Move fixmap region above vmemmap region arm64: mm: Move PCI I/O emulation region above the vmemmap region * for-next/rust-for-arm64: : Enable Rust support for arm64 arm64: rust: Enable Rust support for AArch64 rust: Refactor the build target to allow the use of builtin targets * for-next/misc: : Miscellaneous arm64 patches ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 arm64: Remove enable_daif macro arm64/hw_breakpoint: Directly use ESR_ELx_WNR for an watchpoint exception arm64: cpufeatures: Clean up temporary variable to simplify code arm64: Update setup_arch() comment on interrupt masking arm64: remove unnecessary ifdefs around is_compat_task() arm64: ftrace: Don't forbid CALL_OPS+CC_OPTIMIZE_FOR_SIZE with Clang arm64/sme: Ensure that all fields in SMCR_EL1 are set to known values arm64/sve: Ensure that all fields in ZCR_EL1 are set to known values arm64/sve: Document that __SVE_VQ_MAX is much larger than needed arm64: make member of struct pt_regs and it's offset macro in the same order arm64: remove unneeded BUILD_BUG_ON assertion arm64: kretprobes: acquire the regs via a BRK exception arm64: io: permit offset addressing arm64: errata: Don't enable workarounds for "rare" errata by default * for-next/daif-cleanup: : Clean up DAIF handling for EL0 returns arm64: Unmask Debug + SError in do_notify_resume() arm64: Move do_notify_resume() to entry-common.c arm64: Simplify do_notify_resume() DAIF masking * for-next/kselftest: : Miscellaneous arm64 kselftest patches kselftest/arm64: Test that ptrace takes effect in the target process * for-next/documentation: : arm64 documentation patches arm64/sme: Remove spurious 'is' in SME documentation arm64/fp: Clarify effect of setting an unsupported system VL arm64/sme: Fix cut'n'paste in ABI document arm64/sve: Remove bitrotted comment about syscall behaviour * for-next/sysreg: : sysreg updates arm64/sysreg: Update ID_AA64DFR0_EL1 register arm64/sysreg: Update ID_DFR0_EL1 register fields arm64/sysreg: Add register fields for ID_AA64DFR1_EL1 * for-next/dpisa: : Support for 2023 dpISA extensions kselftest/arm64: Add 2023 DPISA hwcap test coverage kselftest/arm64: Add basic FPMR test kselftest/arm64: Handle FPMR context in generic signal frame parser arm64/hwcap: Define hwcaps for 2023 DPISA features arm64/ptrace: Expose FPMR via ptrace arm64/signal: Add FPMR signal handling arm64/fpsimd: Support FEAT_FPMR arm64/fpsimd: Enable host kernel access to FPMR arm64/cpufeature: Hook new identification registers up to cpufeature
2024-03-07tools: ynl: check for overflow of constructed messagesJakub Kicinski
Donald points out that we don't check for overflows. Stash the length of the message on nlmsg_pid (nlmsg_seq would do as well). This allows the attribute helpers to remain self-contained (no extra arguments). Also let the put helpers continue to return nothing. The error is checked only in (newly introduced) ynl_msg_end(). Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240305185000.964773-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>