Age | Commit message (Collapse) | Author |
|
Rename the perf_test_util.[ch] files to memstress.[ch]. Symbols are
renamed in the following commit to reduce the amount of churn here in
hopes of playiing nice with git's file rename detection.
The name "memstress" was chosen to better describe the functionality
proveded by this library, which is to create and run a VM that
reads/writes to guest memory on all vCPUs in parallel.
"memstress" also contains the same number of chracters as "perf_test",
making it a drop-in replacement in symbols, e.g. function names, without
impacting line lengths. Also the lack of underscore between "mem" and
"stress" makes it clear "memstress" is a noun.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221012165729.3505266-2-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Create the ability to randomize page access order with the -a
argument. This includes the possibility that the same pages may be hit
multiple times during an iteration or not at all.
Population has random access as false to ensure all pages will be
touched by population and avoid page faults in late dirty memory that
would pollute the test results.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221107182208.479157-5-coltonlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Randomize which pages are written vs read using the random number
generator.
Change the variable wr_fract and associated function calls to
write_percent that now operates as a percentage from 0 to 100 where X
means each page has an X% chance of being written. Change the -f
argument to -w to reflect the new variable semantics. Keep the same
default of 100% writes.
Population always uses 100% writes to ensure all memory is actually
populated and not just mapped to the zero page. The prevents expensive
copy-on-write faults from occurring during the dirty memory iterations
below, which would pollute the performance results.
Each vCPU calculates its own random seed by adding its index to the
seed provided.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221107182208.479157-4-coltonlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Create a -r argument to specify a random seed. If no argument is
provided, the seed defaults to 1. The random seed is set with
perf_test_set_random_seed() and must be set before guest_code runs to
apply.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221107182208.479157-3-coltonlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add a command line option, -c, to pin vCPUs to physical CPUs (pCPUs),
i.e. to force vCPUs to run on specific pCPUs.
Requirement to implement this feature came in discussion on the patch
"Make page tables for eager page splitting NUMA aware"
https://lore.kernel.org/lkml/YuhPT2drgqL+osLl@google.com/
This feature is useful as it provides a way to analyze performance based
on the vCPUs and dirty log worker locations, like on the different NUMA
nodes or on the same NUMA nodes.
To keep things simple, implementation is intentionally very limited,
either all of the vCPUs will be pinned followed by an optional main
thread or nothing will be pinned.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-8-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Drop @num_percpu_pages from __vm_create_with_vcpus(), all callers pass
'0' and there's unlikely to be a test that allocates just enough memory
that it needs a per-CPU allocation, but not so much that it won't just do
its own memory management.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
All callers of __vm_create_with_vcpus() pass DEFAULT_GUEST_PHY_PAGES for
@slot_mem_pages; drop the param and just hardcode the "default" as the
base number of pages for slot0.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop a variety of 'struct kvm_vm' accessors that wrap a single variable
now that tests can simply reference the variable directly.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Take a vCPU directly instead of a VM+vcpu pair in all vCPU-scoped helpers
and ioctls.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Track vCPUs by their 'struct kvm_vcpu' object, and stop assuming that a
vCPU's ID is the same as its index when referencing a vCPU's metadata.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop the @vcpuids parameter from VM creators now that there are no users.
Allowing tests to specify IDs was a gigantic mistake as it resulted in
tests with arbitrary and ultimately meaningless IDs that differed only
because the author used test X intead of test Y as the source for
copy+paste (the de facto standard way to create a KVM selftest).
Except for literally two tests, x86's set_boot_cpu_id and s390's resets,
tests do not and should not care about the vCPU ID.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a VM creator that "returns" the created vCPUs by filling the provided
array. This will allow converting multi-vCPU tests away from hardcoded
vCPU IDs.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
nested
The selftests nested code only supports 4-level paging at the moment.
This means it cannot map nested guest physical addresses with more than
48 bits. Allow perf_test_util nested mode to work on hosts with more
than 48 physical addresses by restricting the guest test region to
48-bits.
While here, opportunistically fix an off-by-one error when dealing with
vm_get_max_gfn(). perf_test_util.c was treating this as the maximum
number of GFNs, rather than the maximum allowed GFN. This didn't result
in any correctness issues, but it did end up shifting the test region
down slightly when using huge pages.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-12-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add an option to dirty_log_perf_test that configures the vCPUs to run in
L2 instead of L1. This makes it possible to benchmark the dirty logging
performance of nested virtualization, which is particularly interesting
because KVM must shadow L1's EPT/NPT tables.
For now this support only works on x86_64 CPUs with VMX. Otherwise
passing -n results in the test being skipped.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-11-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Thread creation requires taking the mmap_sem in write mode, which causes
vCPU threads running in guest mode to block while they are populating
memory. Fix this by waiting for all vCPU threads to be created and start
running before entering guest mode on any one vCPU thread.
This substantially improves the "Populate memory time" when using 1GiB
pages since it allows all vCPUs to zero pages in parallel rather than
blocking because a writer is waiting (which is waiting for another vCPU
that is busy zeroing a 1GiB page).
Before:
$ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb
...
Populate memory time: 52.811184013s
After:
$ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb
...
Populate memory time: 10.204573342s
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111001257.1446428-4-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move vCPU thread creation and joining to common helper functions. This
is in preparation for the next commit which ensures that all vCPU
threads are fully created before entering guest mode on any one
vCPU.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20211111001257.1446428-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Copy perf_test_args to the guest during VM creation instead of relying on
the caller to do so at their leisure. Ideally, tests wouldn't even be
able to modify perf_test_args, i.e. they would have no motivation to do
the sync, but enforcing that is arguably a net negative for readability.
No functional change intended.
[Set wr_fract=1 by default and add helper to override it since the new
access_tracking_perf_test needs to set it dynamically.]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20211111000310.1435032-13-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fill the per-vCPU args when creating the perf_test VM instead of having
the caller do so. This helps ensure that any adjustments to the number
of pages (and thus vcpu_memory_bytes) are reflected in the per-VM args.
Automatically filling the per-vCPU args will also allow a future patch
to do the sync to the guest during creation.
Signed-off-by: Sean Christopherson <seanjc@google.com>
[Updated access_tracking_perf_test as well.]
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20211111000310.1435032-12-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use the already computed guest_num_pages when creating the so called
extra VM pages for a perf test, and add a comment explaining why the
pages are allocated as extra pages.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-11-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Remove perf_test_args.host_page_size and instead use getpagesize() so
that it's somewhat obvious that, for tests that care about the host page
size, they care about the system page size, not the hardware page size,
e.g. that the logic is unchanged if hugepages are in play.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-10-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the per-VM GPA into perf_test_args instead of storing it as a
separate global variable. It's not obvious that guest_test_phys_mem
holds a GPA, nor that it's connected/coupled with per_vcpu->gpa.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-9-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Capture the per-vCPU GPA in perf_test_vcpu_args so that tests can get
the GPA without having to calculate the GPA on their own.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-7-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use 'pta' as a local pointer to the global perf_tests_args in order to
shorten line lengths and make the code borderline readable.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-6-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Assert that the GPA for a memslot backed by a hugepage is aligned to
the hugepage size and fix perf_test_util accordingly. Lack of GPA
alignment prevents KVM from backing the guest with hugepages, e.g. x86's
write-protection of hugepages when dirty logging is activated is
otherwise not exercised.
Add a comment explaining that guest_page_size is for non-huge pages to
try and avoid confusion about what it actually tracks.
Cc: Ben Gardon <bgardon@google.com>
Cc: Yanan Wang <wangyanan55@huawei.com>
Cc: Andrew Jones <drjones@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
[Used get_backing_src_pagesz() to determine alignment dynamically.]
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211111000310.1435032-5-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Refactor align() to work with non-pointers and split into separate
helpers for aligning up vs. down. Add align_ptr_up() for use with
pointers. Expose all helpers so that they can be used by tests and/or
other utilities. The align_down() helper in particular will be used to
ensure gpa alignment for hugepages.
No functional change intended.
[Added sepearate up/down helpers and replaced open-coded alignment
bit math throughout the KVM selftests.]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20211111000310.1435032-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
perf_test_util is used to set up KVM selftests where vCPUs touch a
region of memory. The guest code is implemented in perf_test_util.c (not
the calling selftests). The guest code requires a 1 parameter, the
vcpuid, which has to be set by calling vcpu_args_set(vm, vcpu_id, 1,
vcpu_id).
Today all of the selftests that use perf_test_util are making this call.
Instead, perf_test_util should just do it. This will save some code but
more importantly prevents mistakes since totally non-obvious that this
needs to be called and failing to do so results in vCPUs not accessing
the right regions of memory.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210805172821.2622793-1-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Introduce a new option to dirty_log_perf_test: -x number_of_slots. This
causes the test to attempt to split the region of memory into the given
number of slots. If the region cannot be evenly divided, the test will
fail.
This allows testing with more than one slot and therefore measure how
performance scales with the number of memslots.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210804222844.1419481-8-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop the memslot param from virt_pg_map() and virt_map() and shove the
hardcoded '0' down to the vm_phy_page_alloc() calls.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622200529.3650424-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Until commit 39fe2fc96694 ("selftests: kvm: make allocation of extra
memory take effect", 2021-05-27), parameter extra_mem_pages was used
only to calculate the page table size for all the memory chunks,
because real memory allocation happened with calls of
vm_userspace_mem_region_add() after vm_create_default().
Commit 39fe2fc96694 however changed the meaning of extra_mem_pages to
the size of memory slot 0. This makes the memory allocation more
flexible, but makes it harder to account for the number of
pages needed for the page tables. For example, memslot_perf_test
has a small amount of memory in slot 0 but a lot in other slots,
and adding that memory twice (both in slot 0 and with later
calls to vm_userspace_mem_region_add()) causes an error that
was fixed in commit 000ac4295339 ("selftests: kvm: fix overlapping
addresses in memslot_perf_test", 2021-05-29)
Since both uses are sensible, add a new parameter slot0_mem_pages
to vm_create_with_vcpus() and some comments to clarify the meaning of
slot0_mem_pages and extra_mem_pages. With this change,
memslot_perf_test can go back to passing the number of memory
pages as extra_mem_pages.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20210608233816.423958-4-zhenzhong.duan@intel.com>
[Squashed in a single patch and rewrote the commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
vm_get_max_gfn() casts vm->max_gfn from a uint64_t to an unsigned int,
which causes the upper 32-bits of the max_gfn to get truncated.
Nobody noticed until now likely because vm_get_max_gfn() is only used
as a mechanism to create a memslot in an unused region of the guest
physical address space (the top), and the top of the 32-bit physical
address space was always good enough.
This fix reveals a bug in memslot_modification_stress_test which was
trying to create a dummy memslot past the end of guest physical memory.
Fix that by moving the dummy memslot lower.
Fixes: 52200d0d944e ("KVM: selftests: Remove duplicate guest mode handling")
Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210521173828.1180619-1-dmatlack@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a parameter to control the backing memory type for
dirty_log_perf_test so that the test can be run with hugepages.
To: linux-kselftest@vger.kernel.org
CC: Peter Xu <peterx@redhat.com>
CC: Andrew Jones <drjones@redhat.com>
CC: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-28-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add an option to overlap the ranges of memory each vCPU accesses instead
of partitioning them. This option will increase the probability of
multiple vCPUs faulting on the same page at the same time, and causing
interesting races, if there are bugs in the page fault handler or
elsewhere in the kernel.
Reviewed-by: Jacob Xu <jacobhxu@google.com>
Reviewed-by: Makarand Sonare <makarandsonare@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210112214253.463999-6-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
It's not conventional C to put non-inline functions in header
files. Create a source file for the functions instead. Also
reduce the amount of globals and rename the functions to
something less generic.
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201218141734.54359-4-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|