summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
AgeCommit message (Collapse)Author
2021-01-13drm/i915/selftests: Bump the scheduling error threshold for fast heartbeatsChris Wilson
Since we are system_highpri_wq, we expected the heartbeat to be scheduled promptly. However, we see delays of over 10ms upsetting our assertions. Accept this as inevitable and bump the minimum error threshold to 20ms (from 6 jiffies). <6> [616.784749] rcs0: Heartbeat delay: 3570us [2802, 9188] <6> [616.807790] bcs0: Heartbeat delay: 2111us [745, 4372] <6> [616.853776] vcs0: Heartbeat delay: 6485us [2424, 11637] <3> [616.859296] vcs0: Heartbeat delay was 6485us, expected less than 6000us <3> [616.860901] i915/intel_heartbeat_live_selftests: live_heartbeat_fast failed with error -22 v2: More context from CI. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210113163115.5740-2-chris@chris-wilson.co.uk
2020-10-21drm/i915/selftests: Flush the old heartbeat more gentlyChris Wilson
In order to test how fast the heartbeat can respond, we measure with the interval set to its minimum. Before we measure though, we want to be sure we start with a fresh pulse, and so wait until any old one is complete. During that wait though, we were continually flushing the work, and so continually re-evaluating to see if the pulse was complete, and each attempt would count as an unresponsive system. If the engine did not complete the request in the couple of busy-spins, it would flag an error. This is unfortunate, so let's not busy-spin waiting for the old heartbeat, but terminate it and start afresh. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019142841.32273-1-chris@chris-wilson.co.uk
2020-08-17drm/i915: Add a couple of missing i915_active_fini()Chris Wilson
We use i915_active_fini() as a debug check on the i915_active state before freeing. If we forget to call it, we may end up angering the debugobjects contained within. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-06-22drm/i915/params: switch to device specific parametersJani Nikula
Start using device specific parameters instead of module parameters for most things. The module parameters become the immutable initial values for i915 parameters. The device specific parameters in i915->params start life as a copy of i915_modparams. Any later changes are only reflected in the debugfs. The stragglers are: * i915.force_probe and i915.modeset. Needed before dev_priv is available. This is fine because the parameters are read-only and never modified. * i915.verbose_state_checks. Passing dev_priv to I915_STATE_WARN and I915_STATE_WARN_ON would result in massive and ugly churn. This is handled by not exposing the parameter via debugfs, and leaving the parameter writable in sysfs. This may be fixed up in follow-up work. * i915.inject_probe_failure. Only makes sense in terms of the module, not the device. This is handled by not exposing the parameter via debugfs. v2: Fix uc i915 lookup code (Michał Winiarski) Cc: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20200618150402.14022-1-jani.nikula@intel.com
2020-06-18drm/i915/selftests: Enable selftesting of busy-statsChris Wilson
A couple of very simple tests to ensure that the basic properties of per-engine busyness accounting [0% and 100% busy] are faithful. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200617130916.15261-1-chris@chris-wilson.co.uk
2020-02-28drm/i915/selftests: Disable heartbeat around manual pulse testsChris Wilson
Still chasing the mystery of the stray idle flush, let's ensure that the heartbeat does not run at the same time as our test and confuse us. References: https://gitlab.freedesktop.org/drm/intel/issues/541 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-8-chris@chris-wilson.co.uk
2019-12-21drm/i915: Remove i915->kernel_contextChris Wilson
Allocate only an internal intel_context for the kernel_context, forgoing a global GEM context for internal use as we only require a separate address space (for our own protection). Now having weaned GT from requiring ce->gem_context, we can stop referencing it entirely. This also means we no longer have to create random and unnecessary GEM contexts for internal use. GEM contexts are now entirely for tracking GEM clients, and intel_context the execution environment on the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk
2019-11-29drm/i915/selftests: Wait only on the expected barrierChris Wilson
Wait on only the last request on the kernel_context after emitting a barrier so that we do not wait for everything in general and by doing so cause an accidental emission of the barrier! Bugzilla; https://bugs.freedesktop.org/show_bug.cgi?id=112405 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191129103455.744389-1-chris@chris-wilson.co.uk
2019-11-28drm/i915/selftests: Try to show where the pulse wentChris Wilson
We have a case of a mysteriously absent pulse, so dump the engine details to see if we can find out what happened to it. References: https://bugs.freedesktop.org/show_bug.cgi?id=112405 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191128102546.3857140-1-chris@chris-wilson.co.uk
2019-11-02drm/i915/selftests: Flush all active callbacksChris Wilson
Flushing the outer i915_active is not enough, as we need the barrier to be applied across all the active dma_fence callbacks. So we must serialise with each outstanding fence. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112096 References: f79520bb3337 ("drm/i915/selftests: Synchronize checking active status with retirement") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191101181022.25633-1-chris@chris-wilson.co.uk
2019-10-31drm/i915/selftests: Pretty print the i915_activeChris Wilson
If the idle_pulse fails to flush the i915_active, dump the tree to see if that has any clues. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191031101116.19894-1-chris@chris-wilson.co.uk
2019-10-31drm/i915/selftests: Assert that the idle_pulse is sentChris Wilson
When checking the heartbeat pulse, we expect it to have been sent by the time we have slept. We can verify this by checking the engine serial number to see if that matches the predicted pulse serial. It will always be true if, and only if, the pulse was sent by itself (as designed by the test). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191031094259.23028-1-chris@chris-wilson.co.uk
2019-10-28drm/i915/selftests: Initialise err in case there are no engines!Chris Wilson
drivers/gpu/drm/i915//gt/selftest_engine_heartbeat.c:255 live_heartbeat_fast() error: uninitialized symbol 'err'. drivers/gpu/drm/i915//gt/selftest_engine_heartbeat.c:320 live_heartbeat_off() error: uninitialized symbol 'err'. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191025135943.12524-2-chris@chris-wilson.co.uk
2019-10-24drm/i915/selftests: Flush any i915_active callback work as wellChris Wilson
Make trebly sure that all possible callbacks and their delayed brethren are complete before asserting that the i915_active should be idle after flushing all barriers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191023235359.27132-1-chris@chris-wilson.co.uk
2019-10-23drm/i915/gt: Replace hangcheck by heartbeatsChris Wilson
Replace sampling the engine state every so often with a periodic heartbeat request to measure the health of an engine. This is coupled with the forced-preemption to allow long running requests to survive so long as they do not block other users. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-5-chris@chris-wilson.co.uk
2019-10-22drm/i915/selftests: Synchronize checking active status with retirementChris Wilson
If retirement is running on another thread, we may inspect the status of the i915_active before its retirement callback is complete. As we expect it to be running synchronously, we can wait for any callback to complete by acquiring the i915_active.mutex. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191022112111.9317-1-chris@chris-wilson.co.uk
2019-10-21drm/i915/gt: Introduce barrier pulses along enginesChris Wilson
To flush idle barriers, and even inflight requests, we want to send a preemptive 'pulse' along an engine. We use a no-op request along the pinned kernel_context at high priority so that it should run or else kick off the stuck requests. We can use this to ensure idle barriers are immediately flushed, as part of a context cancellation mechanism, or as part of a heartbeat mechanism to detect and reset a stuck GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191021174339.5389-1-chris@chris-wilson.co.uk