pm24.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-04-25	fix missing vmalloc.h includes	Kent Overstreet
	Patch series "Memory allocation profiling", v6. Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production. Example output: root@moria-kvm:~# sort -rn /proc/allocinfo 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext 56373248 4737 mm/slub.c:2259 func:alloc_slab_page 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start 3940352 962 mm/memory.c:4214 func:alloc_anon_folio 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node ... Usage: kconfig options: - CONFIG_MEM_ALLOC_PROFILING - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT - CONFIG_MEM_ALLOC_PROFILING_DEBUG adds warnings for allocations that weren't accounted because of a missing annotation sysctl: /proc/sys/vm/mem_profiling Runtime info: /proc/allocinfo Notes: [1]: Overhead To measure the overhead we are comparing the following configurations: (1) Baseline with CONFIG_MEMCG_KMEM=n (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y Performance overhead: To evaluate performance we implemented an in-kernel test executing multiple get_free_page/free_page and kmalloc/kfree calls with allocation sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU affinity set to a specific CPU to minimize the noise. Below are results from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on 56 core Intel Xeon: kmalloc pgalloc (1 baseline) 6.764s 16.902s (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) (6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%) (7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%) Memory overhead: Kernel size: text data bss dec diff (1) 26515311 18890222 17018880 62424413 (2) 26524728 19423818 16740352 62688898 264485 (3) 26524724 19423818 16740352 62688894 264481 (4) 26524728 19423818 16740352 62688898 264485 (5) 26541782 18964374 16957440 62463596 39183 Memory consumption on a 56 core Intel CPU with 125GB of memory: Code tags: 192 kB PageExts: 262144 kB (256MB) SlabExts: 9876 kB (9.6MB) PcpuExts: 512 kB (0.5MB) Total overhead is 0.2% of total memory. Benchmarks: Hackbench tests run 100 times: hackbench -s 512 -l 200 -g 15 -f 25 -P baseline disabled profiling enabled profiling avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023) stdev 0.0137 0.0188 0.0077 hackbench -l 10000 baseline disabled profiling enabled profiling avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859) stdev 0.0933 0.0286 0.0489 stress-ng tests: stress-ng --class memory --seq 4 -t 60 stress-ng --class cpu --seq 4 -t 60 Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/ [2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/ This patch (of 37): The next patch drops vmalloc.h from a system header in order to fix a circular dependency; this adds it to all the files that were pulling it in implicitly. [kent.overstreet@linux.dev: fix arch/alpha/lib/memcpy.c] Link: https://lkml.kernel.org/r/20240327002152.3339937-1-kent.overstreet@linux.dev [surenb@google.com: fix arch/x86/mm/numa_32.c] Link: https://lkml.kernel.org/r/20240402180933.1663992-1-surenb@google.com [kent.overstreet@linux.dev: a few places were depending on sizes.h] Link: https://lkml.kernel.org/r/20240404034744.1664840-1-kent.overstreet@linux.dev [arnd@arndb.de: fix mm/kasan/hw_tags.c] Link: https://lkml.kernel.org/r/20240404124435.3121534-1-arnd@kernel.org [surenb@google.com: fix arc build] Link: https://lkml.kernel.org/r/20240405225115.431056-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-2-surenb@google.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2021-12-09	drm/vmwgfx: add support for updating only offsets of constant buffers	Roland Scheidegger
	This adds support for the SVGA_3D_CMD_DX_SET_VS/PS/GS/HS/DS/CS_CONSTANT_BUFFER_OFFSET commands (which only update the offset, but don't rebind the buffer), which saves some overhead. Signed-off-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Martin Krastev <krastevm@vmware.com> Signed-off-by: Zack Rusin <zackr@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211206172620.3139754-11-zack@kde.org
2021-12-09	drm/vmwgfx: support 64 UAVs	Zack Rusin
	If the host supports SVGA3D_DEVCAP_GL43, we can handle 64 instead of just 8 UAVs. Based on a patch from Roland Scheidegger <sroland@vmware.com>. Signed-off-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Martin Krastev <krastevm@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211206172620.3139754-9-zack@kde.org
2021-12-09	drm/vmwgfx: Remove the dedicated memory accounting	Zack Rusin
	vmwgfx shared very elaborate memory accounting with ttm. It was moved from ttm to vmwgfx in change f07069da6b4c ("drm/ttm: move memory accounting into vmwgfx v4") but because of complexity it was hard to maintain. Some parts of the code weren't freeing memory correctly and some were missing accounting all together. While those would be fairly easy to fix the fundamental reason for memory accounting in the driver was the ability to invoke shrinker which is part of TTM code as well (with support for unified memory hopefully coming soon). That meant that vmwgfx had a lot of code that was either unused or duplicating code from TTM. Removing this code also prevents excessive calls to global swapout which were common during memory pressure because both vmwgfx and TTM would invoke the shrinker when memory usage reached half of RAM. Fixes: f07069da6b4c ("drm/ttm: move memory accounting into vmwgfx v4") Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Martin Krastev <krastevm@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211206172620.3139754-2-zack@kde.org
2021-06-16	drm/vmwgfx: Update device headers	Zack Rusin
	Historically our device headers have been forked versions of the internal device headers, this has made maintaining them a bit of a burden. To fix the situation, going forward, the device headers will be verbatim copies of the internal headers. To do that the driver code has to be adapted to use pristine device headers. This will make future update to the device headers trivial and automatic. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Martin Krastev <krastevm@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210615182336.995192-2-zackr@vmware.com
2021-06-12	drm/vmwgfx: Fix some static checker warnings	Zack Rusin
	Fix some minor issues that Coverity spotted in the code. None of that are serious but they're all valid concerns so fixing them makes sense. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Martin Krastev <krastevm@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210609172307.131929-5-zackr@vmware.com
2021-05-11	drm/vmwgfx: Add basic support for SVGA3	Zack Rusin
	SVGA3 is the next version of our PCI device. Some of the changes include using MMIO for register accesses instead of ioports, deprecating the FIFO MMIO and removing a lot of the old and legacy functionality. SVGA3 doesn't support guest backed objects right now so everything except 3D is working. v2: Fixes all the static analyzer warnings Signed-off-by: Zack Rusin <zackr@vmware.com> Cc: Martin Krastev <krastevm@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210505191007.305872-1-zackr@vmware.com
2021-01-19	drm/vmwgfx/vmwgfx_binding: Provide some missing param descriptions and ↵	Lee Jones
	remove others Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/vmwgfx/vmwgfx_binding.c:340: warning: Function parameter or member 'shader_slot' not described in 'vmw_binding_add' drivers/gpu/drm/vmwgfx/vmwgfx_binding.c:340: warning: Function parameter or member 'slot' not described in 'vmw_binding_add' drivers/gpu/drm/vmwgfx/vmwgfx_binding.c:376: warning: Function parameter or member 'from' not described in 'vmw_binding_transfer' drivers/gpu/drm/vmwgfx/vmwgfx_binding.c:498: warning: Function parameter or member 'to' not described in 'vmw_binding_state_commit' drivers/gpu/drm/vmwgfx/vmwgfx_binding.c:498: warning: Excess function parameter 'ctx' description in 'vmw_binding_state_commit' drivers/gpu/drm/vmwgfx/vmwgfx_binding.c:498: warning: Excess function parameter 'scrubbed' description in 'vmw_binding_state_commit' drivers/gpu/drm/vmwgfx/vmwgfx_binding.c:520: warning: Function parameter or member 'cbs' not described in 'vmw_binding_rebind_all' drivers/gpu/drm/vmwgfx/vmwgfx_binding.c:520: warning: Excess function parameter 'ctx' description in 'vmw_binding_rebind_all' drivers/gpu/drm/vmwgfx/vmwgfx_binding.c:795: warning: Function parameter or member 'shader_slot' not described in 'vmw_emit_set_sr' Cc: VMware Graphics <linux-graphics-maintainer@vmware.com> Cc: Roland Scheidegger <sroland@vmware.com> Cc: Zack Rusin <zackr@vmware.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Zack Rusin <zackr@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210115181601.3432599-5-lee.jones@linaro.org
2021-01-14	drm/vmwgfx: Cleanup the cmd/fifo split	Zack Rusin
	Lets try to cleanup the usage of the term FIFO which we used for both our MMIO based cmd queue processing and for general command processing which could have been using command buffers interface. We're going to rename the functions which are processing commands (and work either via MMIO or command buffers) as _cmd_ and functions which operate on the MMIO based commands as FIFO to match the SVGA device naming. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Martin Krastev <krastevm@vmware.com> Link: https://patchwork.freedesktop.org/patch/414044/?series=85516&rev=2
2020-03-23	drm/vmwgfx: Add support for streamoutput with mob commands	Deepak Rawat
	With SM5 capability a new version of streamoutput is supported by device which need backing mob and a new field. With this change the new command is supported in command buffer. v2: Also track streamoutput context binding in binding manager. v3: Track only one streamoutput as only one can be set to context. v4: Fix comment typos Signed-off-by: Deepak Rawat <drawat.floss@gmail.com> Signed-off-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Thomas Hellström (VMware) <thomas_os@shipmail.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Roland Scheidegger <sroland@vmware.com>
2020-03-23	drm/vmwgfx: Rename stream output target binding tracker struct	Deepak Rawat
	Previous name vmw_ctx_bindinfo_so is misleading because it actually represent so target and stream output is a new resource type that needs tracking for SM5 capable device. Also rename binding type enum and internal functions to reflect these belongs to so targets. Signed-off-by: Deepak Rawat <drawat.floss@gmail.com> Reviewed-by: Thomas Hellström (VMware) <thomas_os@shipmail.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Roland Scheidegger <sroland@vmware.com>
2020-03-23	drm/vmwgfx: Add support for UA view commands	Deepak Rawat
	Virtual device now support new commands to manage unordered access views. Allow them as part of user-space command buffer. This involves adding UA view cotable, binding tracker info, new view type and command verifier functions. v2: fix comment typo v3: style fixes (don't use deprecated PTR_RET) Signed-off-by: Deepak Rawat <drawat.floss@gmail.com> Signed-off-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Thomas Hellström (VMware) <thomas_os@shipmail.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Roland Scheidegger <sroland@vmware.com>
2020-03-23	drm/vmwgfx: Support SM5 shader type in command buffer	Deepak Rawat
	Virtual device now supports new shader types, allow them as valid shader type in command buffer. Also add per shader bind info in binding manager state for new shader type. Signed-off-by: Deepak Rawat <drawat.floss@gmail.com> Reviewed-by: Thomas Hellström (VMware) <thomas_os@shipmail.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Roland Scheidegger <sroland@vmware.com>
2019-04-08	drm/vmwgfx: Use preprocessor macro for FIFO allocation	Deepak Rawat
	Whenever FIFO allocation fails an error message is printed to dmesg. Since this is common operation a lot of similar messages are scattered everywhere. Use preprocessor macro to remove this cluttering. Signed-off-by: Deepak Rawat <drawat@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
2019-04-08	drm/vmwgfx: Be more restrictive when dirtying resources	Thomas Hellstrom
	Currently we flag resources as dirty (GPU contents not yet read back to the backing MOB) whenever they have been part of a command stream. Obviously many resources can't be dirty and others can only be dirty when written to by the GPU. That is when they are either bound to the context as render-targets, depth-stencil, copy / clear destinations and stream-output targets, or similarly when there are corresponding views into them. So mark resources dirty only in these special cases. Context- and cotable resources are always marked dirty when referenced. This is important for upcoming emulated coherent memory, since we can avoid issuing automatic readbacks to non-dirty resources when the CPU tries to access part of the backing MOB. Testing: Unigine Heaven with max GPU memory set to 256MB resulting in heavy resource thrashing. --- v2: Addressed review comments by Deepak Rawat. v3: Added some documentation Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Deepak Rawat <drawat@vmware.com>
2018-06-29	drm/vmwgfx: add SPDX idenitifier and clarify license	Dirk Hohndel (VMware)
	This is dual licensed under GPL-2.0 or MIT. vmwgfx_msg.h is the odd one out that is GPL-2.0+ or MIT. Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dirk Hohndel (VMware) <dirk@hohndel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180506231626.115996-9-dirk@hohndel.org
2017-12-27	drm/ttm: use an operation ctx for ttm_mem_global_alloc	Roger He
	forward the operation context to ttm_mem_global_alloc as well, and the ultimate goal is swapout enablement for reserved BOs Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Roger He <Hongbo.He@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-12	drm/vmwgfx: Initial DX support	Thomas Hellstrom
	Initial DX support. Co-authored with Sinclair Yeh, Charmaine Lee and Jakob Bornecrantz. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Sinclair Yeh <syeh@vmware.com> Signed-off-by: Charmaine Lee <charmainel@vmware.com>