diff options
author | Dave Airlie <airlied@redhat.com> | 2021-01-15 15:03:36 +1000 |
---|---|---|
committer | Dave Airlie <airlied@redhat.com> | 2021-01-15 15:03:36 +1000 |
commit | fb5cfcaa2efbb4c71abb1dfbc8f4da727e0bfd89 (patch) | |
tree | 33fc62a96a9f17b90c8d24e3397a4f57d5169161 /drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c | |
parent | dfa7c521bfc0bb2fa9f59ac3435233593e74e424 (diff) | |
parent | 368fd0d79c099493f2b8e80f2ffaa6f70dd0461a (diff) |
Merge tag 'drm-intel-gt-next-2021-01-14' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
UAPI Changes:
- Deprecate I915_PMU_LAST and optimize state tracking (Tvrtko)
Avoid relying on last item ABI marker in i915_drm.h, add a
comment to mark as deprecated.
Cross-subsystem Changes:
Core Changes:
Driver Changes:
- Restore clear residuals security mitigations for Ivybridge and
Baytrail (Chris)
- Close #1858: Allow sysadmin to choose applied GPU security mitigations
through i915.mitigations=... similar to CPU (Chris)
- Fix for #2024: GPU hangs on HSW GT1 (Chris)
- Fix for #2707: Driver hang when editing UVs in Blender (Chris, Ville)
- Fix for #2797: False positive GuC loading error message (Chris)
- Fix for #2859: Missing GuC firmware for older Cometlakes (Chris)
- Lessen probability of GPU hang due to DMAR faults [reason 7,
next page table ptr is invalid] on Tigerlake (Chris)
- Fix REVID macros for TGL to fetch correct stepping (Aditya)
- Limit frequency drop to RPe on parking (Chris, Edward)
- Limit W/A 1406941453 to TGL, RKL and DG1 (Swathi)
- Make W/A 22010271021 permanent on DG1 (Lucas)
- Implement W/A 16011163337 to prevent a HS/DS hang on DG1 (Swathi)
- Only disable preemption on gen8 render engines (Chris)
- Disable arbitration around Braswell's PDP updates (Chris)
- Disable arbitration on no-preempt requests (Chris)
- Check for arbitration after writing start seqno before busywaiting (Chris)
- Retain default context state across shrinking (Venkata, CQ)
- Fix mismatch between misplaced vma check and vma insert for 32-bit
addressing userspaces (Chris, CQ)
- Propagate error for vmap() failure instead kernel NULL deref (Chris)
- Propagate error from cancelled submit due to context closure
immediately (Chris)
- Fix RCU race on HWSP tracking per request (Chris)
- Clear CMD parser shadow and GPU reloc batches (Matt A)
- Populate logical context during first pin (Maarten)
- Optimistically prune dma-resv from the shrinker (Chris)
- Fix for virtual engine ownership race (Chris)
- Remove timeslice suppression to restore fairness for virtual engines (Chris)
- Rearrange IVB/HSW workarounds properly between GT and engine (Chris)
- Taint the reset mutex with the shrinker (Chris)
- Replace direct submit with direct call to tasklet (Chris)
- Multiple corrections to virtual engine dequeue and breadcrumbs code (Chris)
- Avoid wakeref from potentially hard IRQ context in PMU (Tvrtko)
- Use raw clock for RC6 time estimation in PMU (Tvrtko)
- Differentiate OOM failures from invalid map types (Chris)
- Fix Gen9 to have 64 MOCS entries similar to Gen11 (Chris)
- Ignore repeated attempts to suspend request flow across reset (Chris)
- Remove livelock from "do_idle_maps" VT-d W/A (Chris)
- Cancel the preemption timeout early in case engine reset fails (Chris)
- Code flow optimization in the scheduling code (Chris)
- Clear the execlists timers upon reset (Chris)
- Drain the breadcrumbs just once (Chris, Matt A)
- Track the overall GT awake/busy time (Chris)
- Tweak submission tasklet flushing to avoid starvation (Chris)
- Track timelines created using the HWSP to restore on resume (Chris)
- Use cmpxchg64 for 32b compatilibity for active tracking (Chris)
- Prefer recycling an idle GGTT fence to avoid GPU wait (Chris)
- Restructure GT code organization for clearer split between GuC
and execlists (Chris, Daniele, John, Matt A)
- Remove GuC code that will remain unused by new interfaces (Matt B)
- Restructure the CS timestamp clocks code to local to GT (Chris)
- Fix error return paths in perf code (Zhang)
- Replace idr_init() by idr_init_base() in perf (Deepak)
- Fix shmem_pin_map error path (Colin)
- Drop redundant free_work worker for GEM contexts (Chris, Mika)
- Increase readability and understandability of intel_workarounds.c (Lucas)
- Defer enabling the breadcrumb interrupt to after submission (Chris)
- Deal with buddy alloc block sizes beyond 4G (Venkata, Chris)
- Encode fence specific waitqueue behaviour into the wait.flags (Chris)
- Don't cancel the breadcrumb interrupt shadow too early (Chris)
- Cancel submitted requests upon context reset (Chris)
- Use correct locks in GuC code (Tvrtko)
- Prevent use of engine->wa_ctx after error (Chris, Matt R)
- Fix build warning on 32-bit (Arnd)
- Avoid memory leak if platform would have more than 16 W/A (Tvrtko)
- Avoid unnecessary #if CONFIG_PM in PMU code (Chris, Tvrtko)
- Improve debugging output (Chris, Tvrtko, Matt R)
- Make file local variables static (Jani)
- Avoid uint*_t types in i915 (Jani)
- Selftest improvements (Chris, Matt A, Dan)
- Documentation fixes (Chris, Jose)
Signed-off-by: Dave Airlie <airlied@redhat.com>
# Conflicts:
# drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
# drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
# drivers/gpu/drm/i915/gt/intel_lrc.c
# drivers/gpu/drm/i915/gvt/mmio_context.h
# drivers/gpu/drm/i915/i915_drv.h
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210114152232.GA21588@jlahtine-mobl.ger.corp.intel.com
Diffstat (limited to 'drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c')
-rw-r--r-- | drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c | 197 |
1 files changed, 157 insertions, 40 deletions
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c index 999079686846..a4242ca8dcd7 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c @@ -7,34 +7,146 @@ #include "intel_gt.h" #include "intel_gt_clock_utils.h" -#define MHZ_12 12000000 /* 12MHz (24MHz/2), 83.333ns */ -#define MHZ_12_5 12500000 /* 12.5MHz (25MHz/2), 80ns */ -#define MHZ_19_2 19200000 /* 19.2MHz, 52.083ns */ +static u32 read_reference_ts_freq(struct intel_uncore *uncore) +{ + u32 ts_override = intel_uncore_read(uncore, GEN9_TIMESTAMP_OVERRIDE); + u32 base_freq, frac_freq; + + base_freq = ((ts_override & GEN9_TIMESTAMP_OVERRIDE_US_COUNTER_DIVIDER_MASK) >> + GEN9_TIMESTAMP_OVERRIDE_US_COUNTER_DIVIDER_SHIFT) + 1; + base_freq *= 1000000; + + frac_freq = ((ts_override & + GEN9_TIMESTAMP_OVERRIDE_US_COUNTER_DENOMINATOR_MASK) >> + GEN9_TIMESTAMP_OVERRIDE_US_COUNTER_DENOMINATOR_SHIFT); + frac_freq = 1000000 / (frac_freq + 1); + + return base_freq + frac_freq; +} + +static u32 gen10_get_crystal_clock_freq(struct intel_uncore *uncore, + u32 rpm_config_reg) +{ + u32 f19_2_mhz = 19200000; + u32 f24_mhz = 24000000; + u32 crystal_clock = + (rpm_config_reg & GEN9_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_MASK) >> + GEN9_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_SHIFT; -static u32 read_clock_frequency(const struct intel_gt *gt) + switch (crystal_clock) { + case GEN9_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_19_2_MHZ: + return f19_2_mhz; + case GEN9_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_24_MHZ: + return f24_mhz; + default: + MISSING_CASE(crystal_clock); + return 0; + } +} + +static u32 gen11_get_crystal_clock_freq(struct intel_uncore *uncore, + u32 rpm_config_reg) { - if (INTEL_GEN(gt->i915) >= 11) { - u32 config; - - config = intel_uncore_read(gt->uncore, RPM_CONFIG0); - config &= GEN11_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_MASK; - config >>= GEN11_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_SHIFT; - - switch (config) { - case 0: return MHZ_12; - case 1: - case 2: return MHZ_19_2; - default: - case 3: return MHZ_12_5; + u32 f19_2_mhz = 19200000; + u32 f24_mhz = 24000000; + u32 f25_mhz = 25000000; + u32 f38_4_mhz = 38400000; + u32 crystal_clock = + (rpm_config_reg & GEN11_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_MASK) >> + GEN11_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_SHIFT; + + switch (crystal_clock) { + case GEN11_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_24_MHZ: + return f24_mhz; + case GEN11_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_19_2_MHZ: + return f19_2_mhz; + case GEN11_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_38_4_MHZ: + return f38_4_mhz; + case GEN11_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_25_MHZ: + return f25_mhz; + default: + MISSING_CASE(crystal_clock); + return 0; + } +} + +static u32 read_clock_frequency(struct intel_uncore *uncore) +{ + u32 f12_5_mhz = 12500000; + u32 f19_2_mhz = 19200000; + u32 f24_mhz = 24000000; + + if (INTEL_GEN(uncore->i915) <= 4) { + /* + * PRMs say: + * + * "The value in this register increments once every 16 + * hclks." (through the “Clocking Configuration” + * (“CLKCFG”) MCHBAR register) + */ + return RUNTIME_INFO(uncore->i915)->rawclk_freq * 1000 / 16; + } else if (INTEL_GEN(uncore->i915) <= 8) { + /* + * PRMs say: + * + * "The PCU TSC counts 10ns increments; this timestamp + * reflects bits 38:3 of the TSC (i.e. 80ns granularity, + * rolling over every 1.5 hours). + */ + return f12_5_mhz; + } else if (INTEL_GEN(uncore->i915) <= 9) { + u32 ctc_reg = intel_uncore_read(uncore, CTC_MODE); + u32 freq = 0; + + if ((ctc_reg & CTC_SOURCE_PARAMETER_MASK) == CTC_SOURCE_DIVIDE_LOGIC) { + freq = read_reference_ts_freq(uncore); + } else { + freq = IS_GEN9_LP(uncore->i915) ? f19_2_mhz : f24_mhz; + + /* + * Now figure out how the command stream's timestamp + * register increments from this frequency (it might + * increment only every few clock cycle). + */ + freq >>= 3 - ((ctc_reg & CTC_SHIFT_PARAMETER_MASK) >> + CTC_SHIFT_PARAMETER_SHIFT); } - } else if (INTEL_GEN(gt->i915) >= 9) { - if (IS_GEN9_LP(gt->i915)) - return MHZ_19_2; - else - return MHZ_12; - } else { - return MHZ_12_5; + + return freq; + } else if (INTEL_GEN(uncore->i915) <= 12) { + u32 ctc_reg = intel_uncore_read(uncore, CTC_MODE); + u32 freq = 0; + + /* + * First figure out the reference frequency. There are 2 ways + * we can compute the frequency, either through the + * TIMESTAMP_OVERRIDE register or through RPM_CONFIG. CTC_MODE + * tells us which one we should use. + */ + if ((ctc_reg & CTC_SOURCE_PARAMETER_MASK) == CTC_SOURCE_DIVIDE_LOGIC) { + freq = read_reference_ts_freq(uncore); + } else { + u32 c0 = intel_uncore_read(uncore, RPM_CONFIG0); + + if (INTEL_GEN(uncore->i915) <= 10) + freq = gen10_get_crystal_clock_freq(uncore, c0); + else + freq = gen11_get_crystal_clock_freq(uncore, c0); + + /* + * Now figure out how the command stream's timestamp + * register increments from this frequency (it might + * increment only every few clock cycle). + */ + freq >>= 3 - ((c0 & GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK) >> + GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_SHIFT); + } + + return freq; } + + MISSING_CASE("Unknown gen, unable to read command streamer timestamp frequency\n"); + return 0; } void intel_gt_init_clock_frequency(struct intel_gt *gt) @@ -43,20 +155,27 @@ void intel_gt_init_clock_frequency(struct intel_gt *gt) * Note that on gen11+, the clock frequency may be reconfigured. * We do not, and we assume nobody else does. */ - gt->clock_frequency = read_clock_frequency(gt); + gt->clock_frequency = read_clock_frequency(gt->uncore); + if (gt->clock_frequency) + gt->clock_period_ns = intel_gt_clock_interval_to_ns(gt, 1); + GT_TRACE(gt, - "Using clock frequency: %dkHz\n", - gt->clock_frequency / 1000); + "Using clock frequency: %dkHz, period: %dns, wrap: %lldms\n", + gt->clock_frequency / 1000, + gt->clock_period_ns, + div_u64(mul_u32_u32(gt->clock_period_ns, S32_MAX), + USEC_PER_SEC)); + } #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) void intel_gt_check_clock_frequency(const struct intel_gt *gt) { - if (gt->clock_frequency != read_clock_frequency(gt)) { + if (gt->clock_frequency != read_clock_frequency(gt->uncore)) { dev_err(gt->i915->drm.dev, "GT clock frequency changed, was %uHz, now %uHz!\n", gt->clock_frequency, - read_clock_frequency(gt)); + read_clock_frequency(gt->uncore)); } } #endif @@ -66,26 +185,24 @@ static u64 div_u64_roundup(u64 nom, u32 den) return div_u64(nom + den - 1, den); } -u32 intel_gt_clock_interval_to_ns(const struct intel_gt *gt, u32 count) +u64 intel_gt_clock_interval_to_ns(const struct intel_gt *gt, u64 count) { - return div_u64_roundup(mul_u32_u32(count, 1000 * 1000 * 1000), - gt->clock_frequency); + return div_u64_roundup(count * NSEC_PER_SEC, gt->clock_frequency); } -u32 intel_gt_pm_interval_to_ns(const struct intel_gt *gt, u32 count) +u64 intel_gt_pm_interval_to_ns(const struct intel_gt *gt, u64 count) { return intel_gt_clock_interval_to_ns(gt, 16 * count); } -u32 intel_gt_ns_to_clock_interval(const struct intel_gt *gt, u32 ns) +u64 intel_gt_ns_to_clock_interval(const struct intel_gt *gt, u64 ns) { - return div_u64_roundup(mul_u32_u32(gt->clock_frequency, ns), - 1000 * 1000 * 1000); + return div_u64_roundup(gt->clock_frequency * ns, NSEC_PER_SEC); } -u32 intel_gt_ns_to_pm_interval(const struct intel_gt *gt, u32 ns) +u64 intel_gt_ns_to_pm_interval(const struct intel_gt *gt, u64 ns) { - u32 val; + u64 val; /* * Make these a multiple of magic 25 to avoid SNB (eg. Dell XPS @@ -94,9 +211,9 @@ u32 intel_gt_ns_to_pm_interval(const struct intel_gt *gt, u32 ns) * EI/thresholds are "bad", leading to a very sluggish or even * frozen machine. */ - val = DIV_ROUND_UP(intel_gt_ns_to_clock_interval(gt, ns), 16); + val = div_u64_roundup(intel_gt_ns_to_clock_interval(gt, ns), 16); if (IS_GEN(gt->i915, 6)) - val = roundup(val, 25); + val = div_u64_roundup(val, 25) * 25; return val; } |