diff options
author | Jason Gunthorpe <jgg@mellanox.com> | 2020-05-01 15:20:48 -0300 |
---|---|---|
committer | Jason Gunthorpe <jgg@mellanox.com> | 2020-05-11 10:47:29 -0300 |
commit | 2733ea144dcce789de20988c1056e228a07b1bff (patch) | |
tree | 2b6a154b9258de033f1d3cbbdf45ba0e3c207198 /Documentation/vm/hmm.rst | |
parent | 5c8f3c4cf18ad007242bc370da54d45d4d4293dc (diff) |
mm/hmm: remove the customizable pfn format from hmm_range_fault
Presumably the intent here was that hmm_range_fault() could put the data
into some HW specific format and thus avoid some work. However, nothing
actually does that, and it isn't clear how anything actually could do that
as hmm_range_fault() provides CPU addresses which must be DMA mapped.
Perhaps there is some special HW that does not need DMA mapping, but we
don't have any examples of this, and the theoretical performance win of
avoiding an extra scan over the pfns array doesn't seem worth the
complexity. Plus pfns needs to be scanned anyhow to sort out any
DEVICE_PRIVATE pages.
This version replaces the uint64_t with an usigned long containing a pfn
and fixed flags. On input flags is filled with the HMM_PFN_REQ_* values,
on successful output it is filled with HMM_PFN_* values, describing the
state of the pages.
amdgpu is simple to convert, it doesn't use snapshot and doesn't use
per-page flags.
nouveau uses only 16 hmm_pte entries at most (ie fits in a few cache
lines), and it sweeps over its pfns array a couple of times anyhow. It
also has a nasty call chain before it reaches the dma map and hardware
suggesting performance isn't important:
nouveau_svm_fault():
args.i.m.method = NVIF_VMM_V0_PFNMAP
nouveau_range_fault()
nvif_object_ioctl()
client->driver->ioctl()
struct nvif_driver nvif_driver_nvkm:
.ioctl = nvkm_client_ioctl
nvkm_ioctl()
nvkm_ioctl_path()
nvkm_ioctl_v0[type].func(..)
nvkm_ioctl_mthd()
nvkm_object_mthd()
struct nvkm_object_func nvkm_uvmm:
.mthd = nvkm_uvmm_mthd
nvkm_uvmm_mthd()
nvkm_uvmm_mthd_pfnmap()
nvkm_vmm_pfn_map()
nvkm_vmm_ptes_get_map()
func == gp100_vmm_pgt_pfn
struct nvkm_vmm_desc_func gp100_vmm_desc_spt:
.pfn = gp100_vmm_pgt_pfn
nvkm_vmm_iter()
REF_PTES == func == gp100_vmm_pgt_pfn()
dma_map_page()
Link: https://lore.kernel.org/r/5-v2-b4e84f444c7d+24f57-hmm_no_flags_jgg@mellanox.com
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Diffstat (limited to 'Documentation/vm/hmm.rst')
-rw-r--r-- | Documentation/vm/hmm.rst | 28 |
1 files changed, 10 insertions, 18 deletions
diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index 9924f2caa018..561969754bc0 100644 --- a/Documentation/vm/hmm.rst +++ b/Documentation/vm/hmm.rst @@ -184,10 +184,7 @@ The usage pattern is:: range.notifier = &interval_sub; range.start = ...; range.end = ...; - range.pfns = ...; - range.flags = ...; - range.values = ...; - range.pfn_shift = ...; + range.hmm_pfns = ...; if (!mmget_not_zero(interval_sub->notifier.mm)) return -EFAULT; @@ -229,15 +226,10 @@ The hmm_range struct has 2 fields, default_flags and pfn_flags_mask, that specif fault or snapshot policy for the whole range instead of having to set them for each entry in the pfns array. -For instance, if the device flags for range.flags are:: +For instance if the device driver wants pages for a range with at least read +permission, it sets:: - range.flags[HMM_PFN_VALID] = (1 << 63); - range.flags[HMM_PFN_WRITE] = (1 << 62); - -and the device driver wants pages for a range with at least read permission, -it sets:: - - range->default_flags = (1 << 63); + range->default_flags = HMM_PFN_REQ_FAULT; range->pfn_flags_mask = 0; and calls hmm_range_fault() as described above. This will fill fault all pages @@ -246,18 +238,18 @@ in the range with at least read permission. Now let's say the driver wants to do the same except for one page in the range for which it wants to have write permission. Now driver set:: - range->default_flags = (1 << 63); - range->pfn_flags_mask = (1 << 62); - range->pfns[index_of_write] = (1 << 62); + range->default_flags = HMM_PFN_REQ_FAULT; + range->pfn_flags_mask = HMM_PFN_REQ_WRITE; + range->pfns[index_of_write] = HMM_PFN_REQ_WRITE; With this, HMM will fault in all pages with at least read (i.e., valid) and for the address == range->start + (index_of_write << PAGE_SHIFT) it will fault with write permission i.e., if the CPU pte does not have write permission set then HMM will call handle_mm_fault(). -Note that HMM will populate the pfns array with write permission for any page -that is mapped with CPU write permission no matter what values are set -in default_flags or pfn_flags_mask. +After hmm_range_fault completes the flag bits are set to the current state of +the page tables, ie HMM_PFN_VALID | HMM_PFN_WRITE will be set if the page is +writable. Represent and manage device memory from core kernel point of view |