summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/v3d
AgeCommit message (Collapse)Author
2024-07-13drm/v3d: Prevent out of bounds access in performance query extensionsTvrtko Ursulin
Check that the number of perfmons userspace is passing in the copy and reset extensions is not greater than the internal kernel storage where the ids will be copied into. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job") Cc: Maíra Canal <mcanal@igalia.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Cc: stable@vger.kernel.org # v6.8+ Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240711135340.84617-2-tursulin@igalia.com
2024-06-05drm/v3d: Fix perfmon build error/warningTvrtko Ursulin
Move static const array into the source file to fix the "defined but not used" errors. The fix is perhaps not the prettiest due hand crafting the array sizes in v3d_performance_counters.h, but I did add some build time asserts to validate the counts look sensible, so hopefully it is good enough for a quick fix. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: 3cbcbe016c31 ("drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405211137.hueFkLKG-lkp@intel.com/Cc: Maíra Canal <mcanal@igalia.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240604160210.24073-1-tursulin@igalia.com
2024-05-27Merge drm/drm-next into drm-misc-nextMaxime Ripard
Let's start the new release cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-05-20drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUMMaíra Canal
V3D_PERFCNT_NUM represents the maximum number of performance counters for V3D 4.2, but not for V3D 7.1. This means that, if we use V3D_PERFCNT_NUM, we might go out-of-bounds on V3D 7.1. Therefore, use the number of performance counters on V3D 7.1 as the maximum number of counters. This will allow us to create arrays on the stack with reasonable size. Note that userspace must use the value provided by DRM_V3D_PARAM_MAX_PERF_COUNTERS. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240512222655.2792754-6-mcanal@igalia.com
2024-05-20drm/v3d: Create new IOCTL to expose performance counters informationMaíra Canal
Userspace usually needs some information about the performance counters available. Although we could replicate this information in the kernel and user-space, let's use the kernel as the "single source of truth" to avoid issues in the future (e.g. list of performance counters is updated in user-space, but not in the kernel, generating invalid requests). Therefore, create a new IOCTL to expose the performance counters information, that is name, category, and description. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240512222655.2792754-5-mcanal@igalia.com
2024-05-20drm/v3d: Create a new V3D parameter for the maximum number of perfcntMaíra Canal
The maximum number of performance counters can change from version to version and it's important for userspace to know this value, as it needs to use the counters for performance queries. Therefore, expose the maximum number of performance counters to userspace as a parameter. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240512222655.2792754-4-mcanal@igalia.com
2024-05-20drm/v3d: Different V3D versions can have different number of perfcntMaíra Canal
Currently, even though V3D 7.1 has 93 performance counters, it is not possible to create counters bigger than 87, as `v3d_perfmon_create_ioctl()` understands that counters bigger than 87 are invalid. Therefore, create a device variable to expose the maximum number of counters for a given V3D version and make `v3d_perfmon_create_ioctl()` check this variable. This commit fixes CTS failures in the performance queries tests `dEQP-VK.query_pool.performance_query.*` [1] Link: https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81 [1] Fixes: 6fd9487147c4 ("drm/v3d: add brcm,2712-v3d as a compatible V3D device") Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240512222655.2792754-3-mcanal@igalia.com
2024-05-20drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1Maíra Canal
Add name, category and description for each one of the 93 performance counters available on V3D. Note that V3D 4.2 has 87 performance counters, while V3D 7.1 has 93. Therefore, there are two performance counters arrays. The index of the performance counter for each V3D version is represented by its position on the array. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240512222655.2792754-2-mcanal@igalia.com
2024-05-19Merge tag 'mm-stable-2024-05-17-19-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...
2024-04-25fix missing vmalloc.h includesKent Overstreet
Patch series "Memory allocation profiling", v6. Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production. Example output: root@moria-kvm:~# sort -rn /proc/allocinfo 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext 56373248 4737 mm/slub.c:2259 func:alloc_slab_page 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start 3940352 962 mm/memory.c:4214 func:alloc_anon_folio 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node ... Usage: kconfig options: - CONFIG_MEM_ALLOC_PROFILING - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT - CONFIG_MEM_ALLOC_PROFILING_DEBUG adds warnings for allocations that weren't accounted because of a missing annotation sysctl: /proc/sys/vm/mem_profiling Runtime info: /proc/allocinfo Notes: [1]: Overhead To measure the overhead we are comparing the following configurations: (1) Baseline with CONFIG_MEMCG_KMEM=n (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y Performance overhead: To evaluate performance we implemented an in-kernel test executing multiple get_free_page/free_page and kmalloc/kfree calls with allocation sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU affinity set to a specific CPU to minimize the noise. Below are results from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on 56 core Intel Xeon: kmalloc pgalloc (1 baseline) 6.764s 16.902s (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) (6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%) (7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%) Memory overhead: Kernel size: text data bss dec diff (1) 26515311 18890222 17018880 62424413 (2) 26524728 19423818 16740352 62688898 264485 (3) 26524724 19423818 16740352 62688894 264481 (4) 26524728 19423818 16740352 62688898 264485 (5) 26541782 18964374 16957440 62463596 39183 Memory consumption on a 56 core Intel CPU with 125GB of memory: Code tags: 192 kB PageExts: 262144 kB (256MB) SlabExts: 9876 kB (9.6MB) PcpuExts: 512 kB (0.5MB) Total overhead is 0.2% of total memory. Benchmarks: Hackbench tests run 100 times: hackbench -s 512 -l 200 -g 15 -f 25 -P baseline disabled profiling enabled profiling avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023) stdev 0.0137 0.0188 0.0077 hackbench -l 10000 baseline disabled profiling enabled profiling avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859) stdev 0.0933 0.0286 0.0489 stress-ng tests: stress-ng --class memory --seq 4 -t 60 stress-ng --class cpu --seq 4 -t 60 Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/ [2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/ This patch (of 37): The next patch drops vmalloc.h from a system header in order to fix a circular dependency; this adds it to all the files that were pulling it in implicitly. [kent.overstreet@linux.dev: fix arch/alpha/lib/memcpy.c] Link: https://lkml.kernel.org/r/20240327002152.3339937-1-kent.overstreet@linux.dev [surenb@google.com: fix arch/x86/mm/numa_32.c] Link: https://lkml.kernel.org/r/20240402180933.1663992-1-surenb@google.com [kent.overstreet@linux.dev: a few places were depending on sizes.h] Link: https://lkml.kernel.org/r/20240404034744.1664840-1-kent.overstreet@linux.dev [arnd@arndb.de: fix mm/kasan/hw_tags.c] Link: https://lkml.kernel.org/r/20240404124435.3121534-1-arnd@kernel.org [surenb@google.com: fix arc build] Link: https://lkml.kernel.org/r/20240405225115.431056-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-2-surenb@google.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-23drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handlerMaíra Canal
In V3D, the conclusion of a job is indicated by a IRQ. When a job finishes, then we update the local and the global GPU stats of that queue. But, while the GPU stats are being updated, a user might be reading the stats from sysfs or fdinfo. For example, on `gpu_stats_show()`, we could think about a scenario where `v3d->queue[queue].start_ns != 0`, then an interrupt happens, we update the value of `v3d->queue[queue].start_ns` to 0, we come back to `gpu_stats_show()` to calculate `active_runtime` and now, `active_runtime = timestamp`. In this simple example, the user would see a spike in the queue usage, that didn't match reality. In order to address this issue properly, use a seqcount to protect read and write sections of the code. Fixes: 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage stats") Reported-by: Tvrtko Ursulin <tursulin@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240420213632.339941-7-mcanal@igalia.com
2024-04-23drm/v3d: Decouple stats calculation from printingMaíra Canal
Create a function to decouple the stats calculation from the printing. This will be useful in the next step when we add a seqcount to protect the stats. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240420213632.339941-6-mcanal@igalia.com
2024-04-23drm/v3d: Create function to update a set of GPU statsMaíra Canal
Given a set of GPU stats, that is, a `struct v3d_stats` related to a queue in a given context, create a function that can update this set of GPU stats. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240420213632.339941-5-mcanal@igalia.com
2024-04-23drm/v3d: Create a struct to store the GPU statsMaíra Canal
This will make it easier to instantiate the GPU stats variables and it will create a structure where we can store all the variables that refer to GPU stats. Note that, when we created the struct `v3d_stats`, we renamed `jobs_sent` to `jobs_completed`. This better express the semantics of the variable, as we are only accounting jobs that have been completed. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240420213632.339941-4-mcanal@igalia.com
2024-04-23drm/v3d: Create two functions to update all GPU stats variablesMaíra Canal
Currently, we manually perform all operations to update the GPU stats variables. Apart from the code repetition, this is very prone to errors, as we can see on commit 35f4f8c9fc97 ("drm/v3d: Don't increment `enabled_ns` twice"). Therefore, create two functions to manage updating all GPU stats variables. Now, the jobs only need to call for `v3d_job_update_stats()` when the job is done and `v3d_job_start_stats()` when starting the job. Co-developed-by: Tvrtko Ursulin <tursulin@igalia.com> Signed-off-by: Tvrtko Ursulin <tursulin@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240420213632.339941-3-mcanal@igalia.com
2024-04-15drm/v3d: Don't increment `enabled_ns` twiceMaíra Canal
The commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs") introduced the calculation of global GPU stats. For the regards, it used the already existing infrastructure provided by commit 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage stats"). While adding global GPU stats calculation ability, the author forgot to delete the existing one. Currently, the value of `enabled_ns` is incremented twice by the end of the job, when it should be added just once. Therefore, delete the leftovers from commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs"). Fixes: 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs") Reported-by: Tvrtko Ursulin <tursulin@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240403203517.731876-2-mcanal@igalia.com
2024-02-23drm/v3d: Enable V3D to use different PAGE_SIZEMaíra Canal
Currently, the V3D driver uses PAGE_SHIFT over the assumption that PAGE_SHIFT = 12, as the PAGE_SIZE = 4KB. But, the RPi 5 is using PAGE_SIZE = 16KB, so the MMU PAGE_SHIFT is different than the system's PAGE_SHIFT. Enable V3D to be used in system's with any PAGE_SIZE by making sure that everything MMU-related uses the MMU page shift. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240214193503.164462-1-mcanal@igalia.com
2024-02-05Merge tag 'drm-misc-next-2024-01-11' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.9: UAPI Changes: virtio: - add Venus capset defines Cross-subsystem Changes: Core Changes: - fix drm_fixp2int_ceil() - documentation fixes - clean ups - allow DRM_MM_DEBUG with DRM=m - build fixes for debugfs support - EDID cleanups - sched: error-handling fixes - ttm: add tests Driver Changes: bridge: - ite-6505: fix DP link-training bug - samsung-dsim: fix error checking in probe - tc358767: fix regmap usage efifb: - use copy of global screen_info state hisilicon: - fix EDID includes mgag200: - improve ioremap usage - convert to struct drm_edid nouveau: - disp: use kmemdup() - fix EDID includes - documentation fixes panel: - ltk050h3146w: error-handling fixes - panel-edp: support delay between power-on and enable; use put_sync in unprepare; support Mediatek MT8173 Chromebooks, BOE NV116WHM-N49 V8.0, BOE NV122WUM-N41, CSO MNC207QS1-1 plus DT bindings - panel-lvds: support EDT ETML0700Z9NDHA plus DT bindings - panel-novatek: FRIDA FRD400B25025-A-CTK plus DT bindings qaic: - fixes to BO handling - make use of DRM managed release - fix order of remove operations rockchip: - analogix_dp: get encoder port from DT - inno_hdmi: support HDMI for RK3128 - lvds: error-handling fixes simplefb: - fix logging ssd130x: - support SSD133x plus DT bindings tegra: - fix error handling tilcdc: - make use of DRM managed release v3d: - show memory stats in debugfs vc4: - fix error handling in plane prepare_fb - fix framebuffer test in plane helpers vesafb: - use copy of global screen_info state virtio: - cleanups vkms: - fix OOB access when programming the LUT - Kconfig improvements vmwgfx: - unmap surface before changing plane state - fix memory leak in error handling - documentation fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240111154902.GA8448@linux-uq9g
2024-01-11drm/v3d: Show the memory-management stats on debugfsMaíra Canal
Dump the contents of the DRM MM allocator of the V3D driver. This will help us to debug the VA ranges allocated. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Melissa Wen <mwen@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240105145851.193492-1-mcanal@igalia.com
2024-01-11drm/v3d: Free the job and assign it to NULL if initialization failsMaíra Canal
Currently, if `v3d_job_init()` fails (e.g. in the IGT test "bad-in-sync", where we submit an invalid in-sync to the IOCTL), then we end up with the following NULL pointer dereference: [ 34.146279] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078 [ 34.146301] Mem abort info: [ 34.146306] ESR = 0x0000000096000005 [ 34.146315] EC = 0x25: DABT (current EL), IL = 32 bits [ 34.146322] SET = 0, FnV = 0 [ 34.146328] EA = 0, S1PTW = 0 [ 34.146334] FSC = 0x05: level 1 translation fault [ 34.146340] Data abort info: [ 34.146345] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [ 34.146351] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 34.146357] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 34.146366] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001232e6000 [ 34.146375] [0000000000000078] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [ 34.146399] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 34.146406] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device algif_hash aes_neon_bs aes_neon_blk algif_skcipher af_alg bnep hid_logitech_hidpp brcmfmac_wcc brcmfmac brcmutil hci_uart vc4 btbcm cfg80211 bluetooth bcm2835_v4l2(C) snd_soc_hdmi_codec binfmt_misc cec drm_display_helper hid_logitech_dj bcm2835_mmal_vchiq(C) drm_dma_helper drm_kms_helper videobuf2_v4l2 raspberrypi_hwmon ecdh_generic videobuf2_vmalloc videobuf2_memops ecc videobuf2_common rfkill videodev libaes snd_soc_core dwc2 i2c_brcmstb snd_pcm_dmaengine snd_bcm2835(C) i2c_bcm2835 pwm_bcm2835 snd_pcm mc v3d snd_timer snd gpu_sched drm_shmem_helper nvmem_rmem uio_pdrv_genirq uio i2c_dev drm fuse dm_mod drm_panel_orientation_quirks backlight configfs ip_tables x_tables ipv6 [ 34.146556] CPU: 1 PID: 1890 Comm: v3d_submit_csd Tainted: G C 6.7.0-rc3-g49ddab089611 #68 [ 34.146563] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT) [ 34.146569] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 34.146575] pc : drm_sched_job_cleanup+0x3c/0x190 [gpu_sched] [ 34.146611] lr : v3d_submit_csd_ioctl+0x1b4/0x460 [v3d] [ 34.146653] sp : ffffffc083cbbb80 [ 34.146658] x29: ffffffc083cbbb90 x28: ffffff81035afc00 x27: ffffffe77a641168 [ 34.146668] x26: ffffff81056a8000 x25: 0000000000000058 x24: 0000000000000000 [ 34.146677] x23: ffffff81065e2000 x22: ffffff81035afe00 x21: ffffffc083cbbcf0 [ 34.146686] x20: ffffff81035afe00 x19: 00000000ffffffea x18: 0000000000000000 [ 34.146694] x17: 0000000000000000 x16: ffffffe7989e34b0 x15: 0000000000000000 [ 34.146703] x14: 0000000004000004 x13: ffffff81035afe80 x12: ffffffc083cb8000 [ 34.146711] x11: cc57e05dfbe5ef00 x10: cc57e05dfbe5ef00 x9 : ffffffe77a64131c [ 34.146719] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f [ 34.146727] x5 : 0000000000000040 x4 : ffffff81fefb03f0 x3 : ffffffc083cbba40 [ 34.146736] x2 : ffffff81056a8000 x1 : ffffffe7989e35e8 x0 : 0000000000000000 [ 34.146745] Call trace: [ 34.146748] drm_sched_job_cleanup+0x3c/0x190 [gpu_sched] [ 34.146768] v3d_submit_csd_ioctl+0x1b4/0x460 [v3d] [ 34.146791] drm_ioctl_kernel+0xe0/0x120 [drm] [ 34.147029] drm_ioctl+0x264/0x408 [drm] [ 34.147135] __arm64_sys_ioctl+0x9c/0xe0 [ 34.147152] invoke_syscall+0x4c/0x118 [ 34.147162] el0_svc_common+0xb8/0xf0 [ 34.147168] do_el0_svc+0x28/0x40 [ 34.147174] el0_svc+0x38/0x88 [ 34.147184] el0t_64_sync_handler+0x84/0x100 [ 34.147191] el0t_64_sync+0x190/0x198 [ 34.147201] Code: aa0003f4 f90007e8 f9401008 aa0803e0 (b8478c09) [ 34.147210] ---[ end trace 0000000000000000 ]--- This happens because we are calling `drm_sched_job_cleanup()` twice: once at `v3d_job_init()` and again when we call `v3d_job_cleanup()`. To mitigate this issue, we can return to the same approach that we used to use before 464c61e76de8: deallocate the job after `v3d_job_init()` fails and assign it to NULL. Then, when we call `v3d_job_cleanup()`, job is NULL and the function returns. Fixes: 464c61e76de8 ("drm/v3d: Decouple job allocation from job initiation") Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240109142857.1122704-1-mcanal@igalia.com
2024-01-09drm/v3d: Fix support for register debugging on the RPi 4Maíra Canal
RPi 4 uses V3D 4.2, which is currently not supported by the register definition stated at `v3d_core_reg_defs`. We should be able to support V3D 4.2, therefore, change the maximum version of the register definition to 42, not 41. Fixes: 0ad5bc1ce463 ("drm/v3d: fix up register addresses for V3D 7.x") Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240109113126.929446-1-mcanal@igalia.com
2023-12-04drm/v3d: Fix missing error code in v3d_submit_cpu_ioctl()Harshit Mogalapalli
Smatch warns: drivers/gpu/drm/v3d/v3d_submit.c:1222 v3d_submit_cpu_ioctl() warn: missing error code 'ret' When there is no job type or job is submitted with wrong number of BOs it is an error path, ret is zero at this point which is incorrect return. Fix this by changing it to -EINVAL. Fixes: aafc1a2bea67 ("drm/v3d: Add a CPU job submission") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231204122102.181298-1-harshit.m.mogalapalli@oracle.com
2023-12-01drm/v3d: Create a CPU job extension for the copy performance query jobMaíra Canal
A CPU job is a type of job that performs operations that requires CPU intervention. A copy performance query job is a job that copy the complete or partial result of a query to a buffer. In order to copy the result of a performance query to a buffer, we need to get the values from the performance monitors. So, create a user extension for the CPU job that enables the creation of a copy performance query job. This user extension will allow the creation of a CPU job that copy the results of a performance query to a BO with the possibility to indicate the availability with a availability bit. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-19-mcanal@igalia.com
2023-12-01drm/v3d: Create a CPU job extension for the reset performance query jobMaíra Canal
A CPU job is a type of job that performs operations that requires CPU intervention. A reset performance query job is a job that resets the performance queries by resetting the values of the perfmons. Moreover, we also reset the syncobjs related to the availability of the query. So, create a user extension for the CPU job that enables the creation of a reset performance job. This user extension will allow the creation of a CPU job that resets the perfmons values and resets the availability syncobj. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-18-mcanal@igalia.com
2023-12-01drm/v3d: Create a CPU job extension to copy timestamp query to a bufferMaíra Canal
A CPU job is a type of job that performs operations that requires CPU intervention. A copy timestamp query job is a job that copy the complete or partial result of a query to a buffer. As V3D doesn't provide any mechanism to obtain a timestamp from the GPU, it is a job that needs CPU intervention. So, create a user extension for the CPU job that enables the creation of a copy timestamp query job. This user extension will allow the creation of a CPU job that copy the results of a timestamp query to a BO with the possibility to indicate the timestamp availability with a availability bit. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-17-mcanal@igalia.com
2023-12-01drm/v3d: Create a CPU job extension for the reset timestamp jobMaíra Canal
A CPU job is a type of job that performs operations that requires CPU intervention. A reset timestamp job is a job that resets the timestamp queries based on the value offset of the first query. As V3D doesn't provide any mechanism to obtain a timestamp from the GPU, it is a job that needs CPU intervention. So, create a user extension for the CPU job that enables the creation of a reset timestamp job. This user extension will allow the creation of a CPU job that resets the timestamp value in the timestamp BO and resets the availability syncobj. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-16-mcanal@igalia.com
2023-12-01drm/v3d: Create a CPU job extension for the timestamp query jobMaíra Canal
A CPU job is a type of job that performs operations that requires CPU intervention. A timestamp query job is a job that calculates the query timestamp and updates the query availability by signaling a syncobj. As V3D doesn't provide any mechanism to obtain a timestamp from the GPU, it is a job that needs CPU intervention. So, create a user extension for the CPU job that enables the creation of a timestamp query job. This user extension will allow the creation of a CPU job that performs the timestamp query calculation and updates the timestamp BO with the proper value. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-15-mcanal@igalia.com
2023-12-01drm/v3d: Create a CPU job extension for a indirect CSD jobMaíra Canal
A CPU job is a type of job that performs operations that requires CPU intervention. An indirect CSD job is a job that, when executed in the queue, will map the indirect buffer, read the dispatch parameters, and submit a regular dispatch. Therefore, it is a job that needs CPU intervention. So, create a user extension for the CPU job that enables the creation of an indirect CSD. This user extension will allow the creation of a CSD job linked to a CPU job. The CPU job will wait for the indirect CSD job dependencies and, once they are signaled, it will update the CSD job parameters. Co-developed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-14-mcanal@igalia.com
2023-12-01drm/v3d: Enable BO mappingMaíra Canal
For the indirect CSD CPU job, we will need to access the internal contents of the BO with the dispatch parameters. Therefore, create methods to allow the mapping and unmapping of the BO. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-13-mcanal@igalia.com
2023-12-01drm/v3d: Detach the CSD job BO setupMelissa Wen
Detach CSD job setup from CSD submission ioctl to reuse it in CPU submission ioctl for indirect CSD job. Signed-off-by: Melissa Wen <mwen@igalia.com> Co-developed-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-12-mcanal@igalia.com
2023-12-01drm/v3d: Create tracepoints to track the CPU jobMaíra Canal
Create tracepoints to track the three major events of a CPU job lifetime: 1. Submission of a `v3d_submit_cpu` IOCTL 2. Beginning of the execution of a CPU job 3. Ending of the execution of a CPU job Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-11-mcanal@igalia.com
2023-12-01drm/v3d: Use v3d_get_extensions() to parse CPU job dataMaíra Canal
Currently, v3d_get_extensions() only parses multisync data and assigns it to the `struct v3d_submit_ext`. But, to implement the CPU job with user extensions, we want v3d_get_extensions() to be able to parse CPU job data and assign it to the `struct v3d_cpu_job`. Therefore, allow the function v3d_get_extensions() to use `struct v3d_cpu_job *` as a parameter. If the `struct v3d_cpu_job *` is assigned to NULL, it means that the job is a GPU job and CPU job extensions should be rejected. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-10-mcanal@igalia.com
2023-12-01drm/v3d: Add a CPU job submissionMelissa Wen
Create a new type of job, a CPU job. A CPU job is a type of job that performs operations that requires CPU intervention. The overall idea is to use user extensions to enable different types of CPU job, allowing the CPU job to perform different operations according to the type of user extension. The user extension ID identify the type of CPU job that must be dealt. Having a CPU job is interesting for synchronization purposes as a CPU job has a queue like any other V3D job and can be synchoronized by the multisync extension. Signed-off-by: Melissa Wen <mwen@igalia.com> Co-developed-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-9-mcanal@igalia.com
2023-12-01drm/v3d: Decouple job allocation from job initiationMaíra Canal
We want to allow the IOCTLs to allocate the job without initiating it. This will be useful for the CPU job submission IOCTL, as the CPU job has the need to use information from the user extensions. Currently, the user extensions are parsed before the job allocation, making it impossible to fill the CPU job when parsing the user extensions. Therefore, decouple the job allocation from the job initiation. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-8-mcanal@igalia.com
2023-12-01drm/v3d: Don't allow two multisync extensions in the same jobMaíra Canal
Currently, two multisync extensions can be added to the same job and only the last multisync extension will be used. To avoid this vulnerability, don't allow two multisync extensions in the same job. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-7-mcanal@igalia.com
2023-12-01drm/v3d: Simplify job refcount handlingMelissa Wen
Instead of checking if the job is NULL every time we call the function, check it inside the function. Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-6-mcanal@igalia.com
2023-12-01drm/v3d: Detach job submissions IOCTLs to a new specific fileMelissa Wen
We will include a new job submission type, the CPU job submission. For readability and maintability, separate the job submission IOCTLs and related operations from v3d_gem.c. Minor fix in the CSD submission kernel doc: CSD (texture formatting) -> CSD (compute shader). Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-5-mcanal@igalia.com
2023-12-01drm/v3d: Move wait BO ioctl to the v3d_bo fileMelissa Wen
IOCTLs related to BO operations reside on the file v3d_bo.c. The wait BO ioctl is the only IOCTL regarding BOs that is placed in a different file. So, move it to the v3d_bo.c file. Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-4-mcanal@igalia.com
2023-12-01drm/v3d: Remove unused function headerMelissa Wen
v3d_mmu_get_offset header was added but the function was never defined. Just remove it. Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-3-mcanal@igalia.com
2023-11-10drm/sched: implement dynamic job-flow controlDanilo Krummrich
Currently, job flow control is implemented simply by limiting the number of jobs in flight. Therefore, a scheduler is initialized with a credit limit that corresponds to the number of jobs which can be sent to the hardware. This implies that for each job, drivers need to account for the maximum job size possible in order to not overflow the ring buffer. However, there are drivers, such as Nouveau, where the job size has a rather large range. For such drivers it can easily happen that job submissions not even filling the ring by 1% can block subsequent submissions, which, in the worst case, can lead to the ring run dry. In order to overcome this issue, allow for tracking the actual job size instead of the number of jobs. Therefore, add a field to track a job's credit count, which represents the number of credits a job contributes to the scheduler's credit limit. Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
2023-11-06drm/v3d: Expose the total GPU usage stats on sysfsMaíra Canal
The previous patch exposed the accumulated amount of active time per client for each V3D queue. But this doesn't provide a global notion of the GPU usage. Therefore, provide the accumulated amount of active time for each V3D queue (BIN, RENDER, CSD, TFU and CACHE_CLEAN), considering all the jobs submitted to the queue, independent of the client. This data is exposed through the sysfs interface, so that if the interface is queried at two different points of time the usage percentage of each of the queues can be calculated. Co-developed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Acked-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Melissa Wen <mwen@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230905213416.1290219-3-mcanal@igalia.com
2023-11-06drm/v3d: Implement show_fdinfo() callback for GPU usage statsMaíra Canal
This patch exposes the accumulated amount of active time per client through the fdinfo infrastructure. The amount of active time is exposed for each V3D queue: BIN, RENDER, CSD, TFU and CACHE_CLEAN. In order to calculate the amount of active time per client, a CPU clock is used through the function local_clock(). The point where the jobs has started is marked and is finally compared with the time that the job had finished. Moreover, the number of jobs submitted to each queue is also exposed on fdinfo through the identifier "v3d-jobs-<queue>". Co-developed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Acked-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Melissa Wen <mwen@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230905213416.1290219-3-mcanal@igalia.com
2023-11-02drm/v3d: add brcm,2712-v3d as a compatible V3D deviceIago Toral Quiroga
This is required to get the V3D module to load with Raspberry Pi 5. Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231031073859.25298-5-itoral@igalia.com
2023-11-02drm/v3d: fix up register addresses for V3D 7.xIago Toral Quiroga
This patch updates a number of register addresses that have been changed in Raspberry Pi 5 (V3D 7.1) and updates the code to use the corresponding registers and addresses based on the actual V3D version. Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231031073859.25298-3-itoral@igalia.com
2023-11-01drm/sched: Convert drm scheduler to use a work queue rather than kthreadMatthew Brost
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1 mapping between a drm_gpu_scheduler and drm_sched_entity. At first this seems a bit odd but let us explain the reasoning below. 1. In Xe the submission order from multiple drm_sched_entity is not guaranteed to be the same completion even if targeting the same hardware engine. This is because in Xe we have a firmware scheduler, the GuC, which allowed to reorder, timeslice, and preempt submissions. If a using shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls apart as the TDR expects submission order == completion order. Using a dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. 2. In Xe submissions are done via programming a ring buffer (circular buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow control on the ring for free. A problem with this design is currently a drm_gpu_scheduler uses a kthread for submission / job cleanup. This doesn't scale if a large number of drm_gpu_scheduler are used. To work around the scaling issue, use a worker rather than kthread for submission / job cleanup. v2: - (Rob Clark) Fix msm build - Pass in run work queue v3: - (Boris) don't have loop in worker v4: - (Tvrtko) break out submit ready, stop, start helpers into own patch v5: - (Boris) default to ordered work queue v6: - (Luben / checkpatch) fix alignment in msm_ringbuffer.c - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue - (Luben) Update comment for drm_sched_wqueue_enqueue - (Luben) Positive check for submit_wq in drm_sched_init - (Luben) s/alloc_submit_wq/own_submit_wq v7: - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue v8: - (Luben) Adjust var names / comments Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
2023-10-30drm/v3d: wait for all jobs to finish before unregisteringMaíra Canal
Currently, we are only warning the user if the BIN or RENDER jobs don't finish before we unregister V3D. We must wait for all jobs to finish before unregistering. Therefore, warn the user if TFU or CSD jobs are not done by the time the driver is unregistered. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Maíra Canal <mairacanal@riseup.net> Link: https://patchwork.freedesktop.org/patch/msgid/20231023105927.101502-1-mcanal@igalia.com
2023-10-26drm/sched: Convert the GPU scheduler to variable number of run-queuesLuben Tuikov
The GPU scheduler has now a variable number of run-queues, which are set up at drm_sched_init() time. This way, each driver announces how many run-queues it requires (supports) per each GPU scheduler it creates. Note, that run-queues correspond to scheduler "priorities", thus if the number of run-queues is set to 1 at drm_sched_init(), then that scheduler supports a single run-queue, i.e. single "priority". If a driver further sets a single entity per run-queue, then this creates a 1-to-1 correspondence between a scheduler and a scheduled entity. Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Emma Anholt <emma@anholt.net> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com
2023-10-05drm/v3d: Annotate struct v3d_perfmon with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct v3d_perfmon. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Emma Anholt <emma@anholt.net> Cc: Melissa Wen <mwen@igalia.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230922173216.3823169-9-keescook@chromium.org
2023-07-27drm/v3d: Avoid -Wconstant-logical-operand in nsecs_to_jiffies_timeout()Nathan Chancellor
A proposed update to clang's -Wconstant-logical-operand to warn when the left hand side is a constant shows the following instance in nsecs_to_jiffies_timeout() when NSEC_PER_SEC is not a multiple of HZ, such as CONFIG_HZ=300: In file included from drivers/gpu/drm/v3d/v3d_debugfs.c:12: drivers/gpu/drm/v3d/v3d_drv.h:343:24: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand] 343 | if (NSEC_PER_SEC % HZ && | ~~~~~~~~~~~~~~~~~ ^ drivers/gpu/drm/v3d/v3d_drv.h:343:24: note: use '&' for a bitwise operation 343 | if (NSEC_PER_SEC % HZ && | ^~ | & drivers/gpu/drm/v3d/v3d_drv.h:343:24: note: remove constant to silence this warning 1 warning generated. Turn this into an explicit comparison against zero to make the expression a boolean to make it clear this should be a logical check, not a bitwise one. Link: https://reviews.llvm.org/D142609 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Maíra Canal <mairacanal@riseup.net> Link: https://patchwork.freedesktop.org/patch/msgid/20230718-nsecs_to_jiffies_timeout-constant-logical-operand-v1-1-36ed8fc8faea@kernel.org
2023-06-26drm: Clear fd/handle callbacks in struct drm_driverThomas Zimmermann
Clear all assignments of struct drm_driver's fd/handle callbacks to drm_gem_prime_fd_to_handle() and drm_gem_prime_handle_to_fd(). These functions are called by default. Add a TODO item to convert vmwgfx to the defaults as well. v2: * remove TODO item (Zack) * also update amdgpu's amdgpu_partition_driver Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Simon Ser <contact@emersion.fr> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Jeffrey Hugo <quic_jhugo@quicinc.com> # qaic Link: https://patchwork.freedesktop.org/patch/msgid/20230620080252.16368-3-tzimmermann@suse.de