From 20277d8c1ff57b575dc2c1a1b2898cf211c51800 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Fri, 15 Dec 2023 15:02:03 -0800
Subject: [PATCH 001/200] drm/xe: Fix UBSAN splat in add_preempt_fences()

add_preempt_fences() calls dma_resv_reserve_fences() with num_fences ==
0 resulting in the below UBSAN splat. Short circuit add_preempt_fences()
if num_fences == 0.

[   58.652241] ================================================================================
[   58.660736] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[   58.667281] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[   58.674539] CPU: 2 PID: 1170 Comm: xe_gpgpu_fill Not tainted 6.6.0-rc3-guc+ #630
[   58.674545] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
[   58.674547] Call Trace:
[   58.674548]  <TASK>
[   58.674550]  dump_stack_lvl+0x92/0xb0
[   58.674555]  __ubsan_handle_shift_out_of_bounds+0x15a/0x300
[   58.674559]  ? rcu_is_watching+0x12/0x60
[   58.674564]  ? software_resume+0x141/0x210
[   58.674575]  ? new_vma+0x44b/0x600 [xe]
[   58.674606]  dma_resv_reserve_fences.cold+0x40/0x66
[   58.674612]  new_vma+0x4b3/0x600 [xe]
[   58.674638]  xe_vm_bind_ioctl+0xffd/0x1e00 [xe]
[   58.674663]  ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[   58.674680]  drm_ioctl_kernel+0xc1/0x170
[   58.674686]  ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[   58.674703]  drm_ioctl+0x247/0x4c0
[   58.674709]  ? find_held_lock+0x2b/0x80
[   58.674716]  __x64_sys_ioctl+0x8c/0xb0
[   58.674720]  do_syscall_64+0x3c/0x90
[   58.674723]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   58.674727] RIP: 0033:0x7fce4bd1aaff
[   58.674730] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[   58.674731] RSP: 002b:00007ffc57434050 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   58.674734] RAX: ffffffffffffffda RBX: 00007ffc574340e0 RCX: 00007fce4bd1aaff
[   58.674736] RDX: 00007ffc574340e0 RSI: 0000000040886445 RDI: 0000000000000003
[   58.674737] RBP: 0000000040886445 R08: 0000000000000002 R09: 00007ffc574341b0
[   58.674739] R10: 000055de43eb3780 R11: 0000000000000246 R12: 00007ffc574340e0
[   58.674740] R13: 0000000000000003 R14: 00007ffc574341b0 R15: 0000000000000001
[   58.674747]  </TASK>
[   58.674748] ================================================================================

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231215230203.719244-1-matthew.brost@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 3 +++
 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 322c1ecceccac5..1ca917b8315c2a 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -279,6 +279,9 @@ static int add_preempt_fences(struct xe_vm *vm, struct xe_bo *bo)
 	struct xe_exec_queue *q;
 	int err;
 
+	if (!vm->preempt.num_exec_queues)
+		return 0;
+
 	err = xe_bo_lock(bo, true);
 	if (err)
 		return err;

From 717cf0a78340f5bf0e47ed5000e6b8685890d9d7 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Mon, 18 Dec 2023 08:33:01 -0800
Subject: [PATCH 002/200] drm/xe: Fix warning on impossible condition

Having a different value for op is not possible: this is already kept
out of user-visible warning by the check in xe_wait_user_fence_ioctl()
if op > MAX_OP. The warning is useful as if this switch() is not update
when a new op is added, it should be triggered.

Fix warning as reported by 0-DAY CI Kernel:

	drivers/gpu/drm/xe/xe_wait_user_fence.c:46:2: warning: variable 'passed' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized]

Closes: https://lore.kernel.org/oe-kbuild-all/202312170357.KPSinwPs-lkp@intel.com/
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20231218163301.3453285-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_wait_user_fence.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/xe/xe_wait_user_fence.c b/drivers/gpu/drm/xe/xe_wait_user_fence.c
index b0a7896f7fcb6e..a75eeba7bfe583 100644
--- a/drivers/gpu/drm/xe/xe_wait_user_fence.c
+++ b/drivers/gpu/drm/xe/xe_wait_user_fence.c
@@ -46,6 +46,7 @@ static int do_compare(u64 addr, u64 value, u64 mask, u16 op)
 		break;
 	default:
 		XE_WARN_ON("Not possible");
+		return -EINVAL;
 	}
 
 	return passed ? 0 : 1;

From 4e124151fcfc3b13786b81627b5d4f0373d3c8f1 Mon Sep 17 00:00:00 2001
From: Matt Roper <matthew.d.roper@intel.com>
Date: Fri, 15 Dec 2023 13:45:31 -0800
Subject: [PATCH 003/200] drm/xe/dg2: Drop pre-production workarounds

Pre-production hardware is anything before C0 (for DG2-G10), before B1
(for DG2-G11), or before A1 (for DG2-G12).  Workarounds specific to such
hardware was already removed from i915 in commit eaeb4b361452
("drm/i915/dg2: Drop pre-production GT workarounds") and there's even
less value keeping these around in the Xe driver.

v2:
 - Drop Wa_14011441408 from xe_mocs.c.  (Gustavo)
 - Drop Wa_14010648519, Wa_14010198302, and Wa_1608949956 which were
   mis-implemented; they were only supposed to apply to early steppings
   of DG2-G10, but were being applied unconditionally on all DG2.
   (Gustavo)
 - Drop reference to Wa_16011620976; the implementation stays because it
   still matches Wa_22015475538.  (Gustavo)

Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20231215214531.2576215-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_guc.c        |   8 +-
 drivers/gpu/drm/xe/xe_mocs.c       |  23 +---
 drivers/gpu/drm/xe/xe_wa.c         | 174 +----------------------------
 drivers/gpu/drm/xe/xe_wa_oob.rules |   7 +-
 4 files changed, 5 insertions(+), 207 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 482cb0df9f15bc..76b31d542e1ae5 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -133,13 +133,10 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc)
 	if (XE_WA(gt, 22012773006))
 		flags |= GUC_WA_POLLCS;
 
-	if (XE_WA(gt, 16011759253))
-		flags |= GUC_WA_GAM_CREDITS;
-
 	if (XE_WA(gt, 14014475959))
 		flags |= GUC_WA_HOLD_CCS_SWITCHOUT;
 
-	if (XE_WA(gt, 22011391025) || XE_WA(gt, 14012197797))
+	if (XE_WA(gt, 22011391025))
 		flags |= GUC_WA_DUAL_QUEUE;
 
 	/*
@@ -150,9 +147,6 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc)
 	if (GRAPHICS_VERx100(xe) < 1270)
 		flags |= GUC_WA_PRE_PARSER;
 
-	if (XE_WA(gt, 16011777198))
-		flags |= GUC_WA_RCS_RESET_BEFORE_RC6;
-
 	if (XE_WA(gt, 22012727170) || XE_WA(gt, 22012727685))
 		flags |= GUC_WA_CONTEXT_ISOLATION;
 
diff --git a/drivers/gpu/drm/xe/xe_mocs.c b/drivers/gpu/drm/xe/xe_mocs.c
index ef79552e4f2fe1..d205d684d80903 100644
--- a/drivers/gpu/drm/xe/xe_mocs.c
+++ b/drivers/gpu/drm/xe/xe_mocs.c
@@ -290,18 +290,6 @@ static const struct xe_mocs_entry dg2_mocs_desc[] = {
 	MOCS_ENTRY(3, 0, L3_3_WB | L3_LKUP(1)),
 };
 
-static const struct xe_mocs_entry dg2_mocs_desc_g10_ax[] = {
-	/* Wa_14011441408: Set Go to Memory for MOCS#0 */
-	MOCS_ENTRY(0, 0, L3_1_UC | L3_GLBGO(1) | L3_LKUP(1)),
-	/* UC - Coherent; GO:Memory */
-	MOCS_ENTRY(1, 0, L3_1_UC | L3_GLBGO(1) | L3_LKUP(1)),
-	/* UC - Non-Coherent; GO:Memory */
-	MOCS_ENTRY(2, 0, L3_1_UC | L3_GLBGO(1)),
-
-	/* WB - LC */
-	MOCS_ENTRY(3, 0, L3_3_WB | L3_LKUP(1)),
-};
-
 static const struct xe_mocs_entry pvc_mocs_desc[] = {
 	/* Error */
 	MOCS_ENTRY(0, 0, L3_3_WB),
@@ -409,15 +397,8 @@ static unsigned int get_mocs_settings(struct xe_device *xe,
 		info->unused_entries_index = 1;
 		break;
 	case XE_DG2:
-		if (xe->info.subplatform == XE_SUBPLATFORM_DG2_G10 &&
-		    xe->info.step.graphics >= STEP_A0 &&
-		    xe->info.step.graphics <= STEP_B0) {
-			info->size = ARRAY_SIZE(dg2_mocs_desc_g10_ax);
-			info->table = dg2_mocs_desc_g10_ax;
-		} else {
-			info->size = ARRAY_SIZE(dg2_mocs_desc);
-			info->table = dg2_mocs_desc;
-		}
+		info->size = ARRAY_SIZE(dg2_mocs_desc);
+		info->table = dg2_mocs_desc;
 		info->uc_index = 1;
 		info->n_entries = XELP_NUM_MOCS_ENTRIES;
 		info->unused_entries_index = 3;
diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
index 5f61dd87c58688..7d6b86d602c8a7 100644
--- a/drivers/gpu/drm/xe/xe_wa.c
+++ b/drivers/gpu/drm/xe/xe_wa.c
@@ -125,13 +125,6 @@ static const struct xe_rtp_entry_sr gt_was[] = {
 
 	/* DG2 */
 
-	{ XE_RTP_NAME("16010515920"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10),
-		       GRAPHICS_STEP(A0, B0),
-		       ENGINE_CLASS(VIDEO_DECODE)),
-	  XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F18(0), ALNUNIT_CLKGATE_DIS)),
-	  XE_RTP_ENTRY_FLAG(FOREACH_ENGINE),
-	},
 	{ XE_RTP_NAME("22010523718"),
 	  XE_RTP_RULES(SUBPLATFORM(DG2, G10)),
 	  XE_RTP_ACTIONS(SET(UNSLICE_UNIT_LEVEL_CLKGATE, CG3DDISCFEG_CLKGATE_DIS))
@@ -140,61 +133,6 @@ static const struct xe_rtp_entry_sr gt_was[] = {
 	  XE_RTP_RULES(SUBPLATFORM(DG2, G10)),
 	  XE_RTP_ACTIONS(SET(SUBSLICE_UNIT_LEVEL_CLKGATE, DSS_ROUTER_CLKGATE_DIS))
 	},
-	{ XE_RTP_NAME("14012362059"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(SET(XEHP_MERT_MOD_CTRL, FORCE_MISS_FTLB))
-	},
-	{ XE_RTP_NAME("14012362059"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G11), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(SET(XEHP_MERT_MOD_CTRL, FORCE_MISS_FTLB))
-	},
-	{ XE_RTP_NAME("14010948348"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(SET(UNSLCGCTL9430, MSQDUNIT_CLKGATE_DIS))
-	},
-	{ XE_RTP_NAME("14011037102"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(SET(UNSLCGCTL9444, LTCDD_CLKGATE_DIS))
-	},
-	{ XE_RTP_NAME("14011371254"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(SET(XEHP_SLICE_UNIT_LEVEL_CLKGATE, NODEDSS_CLKGATE_DIS))
-	},
-	{ XE_RTP_NAME("14011431319"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(SET(UNSLCGCTL9440,
-			     GAMTLBOACS_CLKGATE_DIS |
-			     GAMTLBVDBOX7_CLKGATE_DIS | GAMTLBVDBOX6_CLKGATE_DIS |
-			     GAMTLBVDBOX5_CLKGATE_DIS | GAMTLBVDBOX4_CLKGATE_DIS |
-			     GAMTLBVDBOX3_CLKGATE_DIS | GAMTLBVDBOX2_CLKGATE_DIS |
-			     GAMTLBVDBOX1_CLKGATE_DIS | GAMTLBVDBOX0_CLKGATE_DIS |
-			     GAMTLBKCR_CLKGATE_DIS | GAMTLBGUC_CLKGATE_DIS |
-			     GAMTLBBLT_CLKGATE_DIS),
-			 SET(UNSLCGCTL9444,
-			     GAMTLBGFXA0_CLKGATE_DIS | GAMTLBGFXA1_CLKGATE_DIS |
-			     GAMTLBCOMPA0_CLKGATE_DIS | GAMTLBCOMPA1_CLKGATE_DIS |
-			     GAMTLBCOMPB0_CLKGATE_DIS | GAMTLBCOMPB1_CLKGATE_DIS |
-			     GAMTLBCOMPC0_CLKGATE_DIS | GAMTLBCOMPC1_CLKGATE_DIS |
-			     GAMTLBCOMPD0_CLKGATE_DIS | GAMTLBCOMPD1_CLKGATE_DIS |
-			     GAMTLBMERT_CLKGATE_DIS |
-			     GAMTLBVEBOX3_CLKGATE_DIS | GAMTLBVEBOX2_CLKGATE_DIS |
-			     GAMTLBVEBOX1_CLKGATE_DIS | GAMTLBVEBOX0_CLKGATE_DIS))
-	},
-	{ XE_RTP_NAME("14010569222"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(SET(UNSLICE_UNIT_LEVEL_CLKGATE, GAMEDIA_CLKGATE_DIS))
-	},
-	{ XE_RTP_NAME("14011028019"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(SET(SSMCGCTL9530, RTFUNIT_CLKGATE_DIS))
-	},
-	{ XE_RTP_NAME("14010680813"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(SET(XEHP_GAMSTLB_CTRL,
-			     CONTROL_BLOCK_CLKGATE_DIS |
-			     EGRESS_BLOCK_CLKGATE_DIS |
-			     TAG_BLOCK_CLKGATE_DIS))
-	},
 	{ XE_RTP_NAME("14014830051"),
 	  XE_RTP_RULES(PLATFORM(DG2)),
 	  XE_RTP_ACTIONS(CLR(SARB_CHICKEN1, COMP_CKN_IN))
@@ -212,10 +150,6 @@ static const struct xe_rtp_entry_sr gt_was[] = {
 			     INVALIDATION_BROADCAST_MODE_DIS |
 			     GLOBAL_INVALIDATION_MODE))
 	},
-	{ XE_RTP_NAME("14010648519"),
-	  XE_RTP_RULES(PLATFORM(DG2)),
-	  XE_RTP_ACTIONS(SET(XEHP_L3NODEARBCFG, XEHP_LNESPARE))
-	},
 
 	/* PVC */
 
@@ -376,13 +310,6 @@ static const struct xe_rtp_entry_sr engine_was[] = {
 	  XE_RTP_ACTIONS(SET(VFG_PREEMPTION_CHICKEN,
 			     POLYGON_TRIFAN_LINELOOP_DISABLE))
 	},
-	{ XE_RTP_NAME("22012826095, 22013059131"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(B0, C0),
-		       FUNC(xe_rtp_match_first_render_or_compute)),
-	  XE_RTP_ACTIONS(FIELD_SET(LSC_CHICKEN_BIT_0_UDW,
-				   MAXREQS_PER_BANK,
-				   REG_FIELD_PREP(MAXREQS_PER_BANK, 2)))
-	},
 	{ XE_RTP_NAME("22012826095, 22013059131"),
 	  XE_RTP_RULES(SUBPLATFORM(DG2, G11),
 		       FUNC(xe_rtp_match_first_render_or_compute)),
@@ -390,28 +317,11 @@ static const struct xe_rtp_entry_sr engine_was[] = {
 				   MAXREQS_PER_BANK,
 				   REG_FIELD_PREP(MAXREQS_PER_BANK, 2)))
 	},
-	{ XE_RTP_NAME("22013059131"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(B0, C0),
-		       FUNC(xe_rtp_match_first_render_or_compute)),
-	  XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0, FORCE_1_SUB_MESSAGE_PER_FRAGMENT))
-	},
 	{ XE_RTP_NAME("22013059131"),
 	  XE_RTP_RULES(SUBPLATFORM(DG2, G11),
 		       FUNC(xe_rtp_match_first_render_or_compute)),
 	  XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0, FORCE_1_SUB_MESSAGE_PER_FRAGMENT))
 	},
-	{ XE_RTP_NAME("14010918519"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0),
-		       FUNC(xe_rtp_match_first_render_or_compute)),
-	  XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW,
-			     FORCE_SLM_FENCE_SCOPE_TO_TILE |
-			     FORCE_UGM_FENCE_SCOPE_TO_TILE,
-			     /*
-			      * Ignore read back as it always returns 0 in these
-			      * steps
-			      */
-			     .read_mask = 0))
-	},
 	{ XE_RTP_NAME("14015227452"),
 	  XE_RTP_RULES(PLATFORM(DG2),
 		       FUNC(xe_rtp_match_first_render_or_compute)),
@@ -428,21 +338,11 @@ static const struct xe_rtp_entry_sr engine_was[] = {
 		       FUNC(xe_rtp_match_first_render_or_compute)),
 	  XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW, UGM_FRAGMENT_THRESHOLD_TO_3))
 	},
-	{ XE_RTP_NAME("16011620976, 22015475538"),
+	{ XE_RTP_NAME("22015475538"),
 	  XE_RTP_RULES(PLATFORM(DG2),
 		       FUNC(xe_rtp_match_first_render_or_compute)),
 	  XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW, DIS_CHAIN_2XSIMD8))
 	},
-	{ XE_RTP_NAME("22012654132"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, C0),
-		       FUNC(xe_rtp_match_first_render_or_compute)),
-	  XE_RTP_ACTIONS(SET(CACHE_MODE_SS, ENABLE_PREFETCH_INTO_IC,
-			     /*
-			      * Register can't be read back for verification on
-			      * DG2 due to Wa_14012342262
-			      */
-			     .read_mask = 0))
-	},
 	{ XE_RTP_NAME("22012654132"),
 	  XE_RTP_RULES(SUBPLATFORM(DG2, G11),
 		       FUNC(xe_rtp_match_first_render_or_compute)),
@@ -461,68 +361,11 @@ static const struct xe_rtp_entry_sr engine_was[] = {
 	  XE_RTP_RULES(PLATFORM(DG2), ENGINE_CLASS(RENDER)),
 	  XE_RTP_ACTIONS(SET(ROW_CHICKEN2, DISABLE_READ_SUPPRESSION))
 	},
-	{ XE_RTP_NAME("14013392000"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G11), GRAPHICS_STEP(A0, B0),
-		       ENGINE_CLASS(RENDER)),
-	  XE_RTP_ACTIONS(SET(ROW_CHICKEN2, ENABLE_LARGE_GRF_MODE))
-	},
-	{ XE_RTP_NAME("14012419201"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0),
-		       ENGINE_CLASS(RENDER)),
-	  XE_RTP_ACTIONS(SET(ROW_CHICKEN4,
-			     DISABLE_HDR_PAST_PAYLOAD_HOLD_FIX))
-	},
-	{ XE_RTP_NAME("14012419201"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G11), GRAPHICS_STEP(A0, B0),
-		       ENGINE_CLASS(RENDER)),
-	  XE_RTP_ACTIONS(SET(ROW_CHICKEN4,
-			     DISABLE_HDR_PAST_PAYLOAD_HOLD_FIX))
-	},
-	{ XE_RTP_NAME("1308578152"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(B0, C0),
-		       ENGINE_CLASS(RENDER),
-		       FUNC(xe_rtp_match_first_gslice_fused_off)),
-	  XE_RTP_ACTIONS(CLR(CS_DEBUG_MODE1(RENDER_RING_BASE),
-			     REPLAY_MODE_GRANULARITY))
-	},
 	{ XE_RTP_NAME("22010960976, 14013347512"),
 	  XE_RTP_RULES(PLATFORM(DG2), ENGINE_CLASS(RENDER)),
 	  XE_RTP_ACTIONS(CLR(XEHP_HDC_CHICKEN0,
 			     LSC_L1_FLUSH_CTL_3D_DATAPORT_FLUSH_EVENTS_MASK))
 	},
-	{ XE_RTP_NAME("1608949956, 14010198302"),
-	  XE_RTP_RULES(PLATFORM(DG2), ENGINE_CLASS(RENDER)),
-	  XE_RTP_ACTIONS(SET(ROW_CHICKEN,
-			     MDQ_ARBITRATION_MODE | UGM_BACKUP_MODE))
-	},
-	{ XE_RTP_NAME("22010430635"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0),
-		       ENGINE_CLASS(RENDER)),
-	  XE_RTP_ACTIONS(SET(ROW_CHICKEN4,
-			     DISABLE_GRF_CLEAR))
-	},
-	{ XE_RTP_NAME("14013202645"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(B0, C0),
-		       ENGINE_CLASS(RENDER)),
-	  XE_RTP_ACTIONS(SET(RT_CTRL, DIS_NULL_QUERY))
-	},
-	{ XE_RTP_NAME("14013202645"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G11), GRAPHICS_STEP(A0, B0),
-		       ENGINE_CLASS(RENDER)),
-	  XE_RTP_ACTIONS(SET(RT_CTRL, DIS_NULL_QUERY))
-	},
-	{ XE_RTP_NAME("22012532006"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, C0),
-		       ENGINE_CLASS(RENDER)),
-	  XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN7,
-			     DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA))
-	},
-	{ XE_RTP_NAME("22012532006"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G11), GRAPHICS_STEP(A0, B0),
-		       ENGINE_CLASS(RENDER)),
-	  XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN7,
-			     DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA))
-	},
 	{ XE_RTP_NAME("14015150844"),
 	  XE_RTP_RULES(PLATFORM(DG2), FUNC(xe_rtp_match_first_render_or_compute)),
 	  XE_RTP_ACTIONS(SET(XEHP_HDC_CHICKEN0, DIS_ATOMIC_CHAINING_TYPED_WRITES,
@@ -652,21 +495,6 @@ static const struct xe_rtp_entry_sr lrc_was[] = {
 
 	/* DG2 */
 
-	{ XE_RTP_NAME("16011186671"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G11), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(CLR(VFLSKPD, DIS_MULT_MISS_RD_SQUASH),
-			 SET(VFLSKPD, DIS_OVER_FETCH_CACHE))
-	},
-	{ XE_RTP_NAME("14010469329"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(SET(XEHP_COMMON_SLICE_CHICKEN3,
-			     XEHP_DUAL_SIMD8_SEQ_MERGE_DISABLE))
-	},
-	{ XE_RTP_NAME("14010698770, 22010613112, 22010465075"),
-	  XE_RTP_RULES(SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0)),
-	  XE_RTP_ACTIONS(SET(XEHP_COMMON_SLICE_CHICKEN3,
-			     DISABLE_CPS_AWARE_COLOR_PIPE))
-	},
 	{ XE_RTP_NAME("16013271637"),
 	  XE_RTP_RULES(PLATFORM(DG2)),
 	  XE_RTP_ACTIONS(SET(XEHP_SLICE_COMMON_ECO_CHICKEN1,
diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
index 727bdc42921249..e73b84e01ea1be 100644
--- a/drivers/gpu/drm/xe/xe_wa_oob.rules
+++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
@@ -1,13 +1,8 @@
 22012773006	GRAPHICS_VERSION_RANGE(1200, 1250)
-16011759253	SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, B0)
 14014475959	GRAPHICS_VERSION_RANGE(1270, 1271), GRAPHICS_STEP(A0, B0)
 		PLATFORM(DG2)
 22011391025	PLATFORM(DG2)
-14012197797	PLATFORM(DG2), GRAPHICS_STEP(A0, B0)
-16011777198	SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, C0)
-		SUBPLATFORM(DG2, G11), GRAPHICS_STEP(A0, B0)
-22012727170	SUBPLATFORM(DG2, G10), GRAPHICS_STEP(A0, C0)
-		SUBPLATFORM(DG2, G11)
+22012727170	SUBPLATFORM(DG2, G11)
 22012727685	SUBPLATFORM(DG2, G11)
 16015675438	PLATFORM(PVC)
 		SUBPLATFORM(DG2, G10)

From 6901f732691f12154f35ee405c25b00ef51266ab Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 17:53:35 +0100
Subject: [PATCH 004/200] drm/xe: Add command MI_LOAD_REGISTER_MEM

We will need this shortly during context state preparation.

Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-2-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/instructions/xe_mi_commands.h |  3 +++
 drivers/gpu/drm/xe/xe_lrc.c                      | 14 ++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
index 1cfa96167fde31..c74ceb550dce7f 100644
--- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
@@ -56,6 +56,9 @@
 #define   MI_FLUSH_IMM_QW		REG_FIELD_PREP(MI_FLUSH_DW_LEN_DW, 5 - 2)
 #define   MI_FLUSH_DW_USE_GTT		REG_BIT(2)
 
+#define MI_LOAD_REGISTER_MEM		(__MI_INSTR(0x29) | XE_INSTR_NUM_DW(4))
+#define   MI_LRM_USE_GGTT		REG_BIT(22)
+
 #define MI_BATCH_BUFFER_START		__MI_INSTR(0x31)
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index b7fa3831b68451..44586e612eaf5d 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -964,6 +964,20 @@ static int dump_mi_command(struct drm_printer *p,
 			drm_printf(p, " - %#6x = %#010x\n", dw[i], dw[i + 1]);
 		return numdw;
 
+	case MI_LOAD_REGISTER_MEM & MI_OPCODE:
+		drm_printf(p, "[%#010x] MI_LOAD_REGISTER_MEM: %s%s\n",
+			   inst_header,
+			   dw[0] & MI_LRI_LRM_CS_MMIO ? "CS_MMIO " : "",
+			   dw[0] & MI_LRM_USE_GGTT ? "USE_GGTT " : "");
+		if (numdw == 4)
+			drm_printf(p, " - %#6x = %#010llx\n",
+				   dw[1], ((u64)(dw[3]) << 32 | (u64)(dw[2])));
+		else
+			drm_printf(p, " - %*ph (%s)\n",
+				   (int)sizeof(u32) * (numdw - 1), dw + 1,
+				   numdw < 4 ? "truncated" : "malformed");
+		return numdw;
+
 	case MI_FORCE_WAKEUP:
 		drm_printf(p, "[%#010x] MI_FORCE_WAKEUP\n", inst_header);
 		return numdw;

From 54020e2b406d8d4be6d79409957f2130e93b4fa3 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 17:53:36 +0100
Subject: [PATCH 005/200] drm/xe: Define registers used by memory based irq
 processing

The RING_INT_SRC_RPT_PTR register points to a cacheline in memory
to which an engine must report as source of interrupt prior to
generating an interrupt to the host.

The RING_INT_STATUS_RPT_PTR register points to the first cacheline
of the Interrupt Status Report (ISR) page (4KB) in graphics memory
to which all engines report their interrupt status.

The RING_IMR register has the interrupt enables and interrupt masks
for an engine.

We will refer to these registers shortly.

Bspec: 45963, 45964, 45965
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-3-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_engine_regs.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
index 5592774fc69036..bd9c956f48a7c1 100644
--- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
@@ -75,7 +75,9 @@
 #define FF_THREAD_MODE(base)			XE_REG((base) + 0xa0)
 #define   FF_TESSELATION_DOP_GATE_DISABLE	BIT(19)
 
+#define RING_INT_SRC_RPT_PTR(base)		XE_REG((base) + 0xa4)
 #define RING_IMR(base)				XE_REG((base) + 0xa8)
+#define RING_INT_STATUS_RPT_PTR(base)		XE_REG((base) + 0xac)
 
 #define RING_EIR(base)				XE_REG((base) + 0xb0)
 #define RING_EMR(base)				XE_REG((base) + 0xb4)

From e3408839dd27b2645636f91c85a7fd847e36cb91 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 17:53:37 +0100
Subject: [PATCH 006/200] drm/xe: Update LRC context layout definitions

The new memory based interrupt processing uses additional entries
in the context.  Add required definitions.

Bspec: 45585, 60184
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-4-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_lrc_layout.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/xe/regs/xe_lrc_layout.h b/drivers/gpu/drm/xe/regs/xe_lrc_layout.h
index 4be81abc86ad9d..1825d8f79db64e 100644
--- a/drivers/gpu/drm/xe/regs/xe_lrc_layout.h
+++ b/drivers/gpu/drm/xe/regs/xe_lrc_layout.h
@@ -14,4 +14,13 @@
 #define CTX_PDP0_UDW			(0x30 + 1)
 #define CTX_PDP0_LDW			(0x32 + 1)
 
+#define CTX_LRM_INT_MASK_ENABLE		0x50
+#define CTX_INT_MASK_ENABLE_REG		(CTX_LRM_INT_MASK_ENABLE + 1)
+#define CTX_INT_MASK_ENABLE_PTR		(CTX_LRM_INT_MASK_ENABLE + 2)
+#define CTX_LRI_INT_REPORT_PTR		0x55
+#define CTX_INT_STATUS_REPORT_REG	(CTX_LRI_INT_REPORT_PTR + 1)
+#define CTX_INT_STATUS_REPORT_PTR	(CTX_LRI_INT_REPORT_PTR + 2)
+#define CTX_INT_SRC_REPORT_REG		(CTX_LRI_INT_REPORT_PTR + 3)
+#define CTX_INT_SRC_REPORT_PTR		(CTX_LRI_INT_REPORT_PTR + 4)
+
 #endif

From 7158a688935ca90c5036e67b2b95c3119b3a0ac7 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 17:53:38 +0100
Subject: [PATCH 007/200] drm/xe: Update definition of GT_INTR_DW

Add bits definitions that we will be using in upcoming patch.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-5-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_gt_regs.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index 1dd361046b5dc0..6aaaf1f63c728a 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -430,6 +430,15 @@
 #define   VOLTAGE_MASK				REG_GENMASK(10, 0)
 
 #define GT_INTR_DW(x)				XE_REG(0x190018 + ((x) * 4))
+#define   INTR_GSC				REG_BIT(31)
+#define   INTR_GUC				REG_BIT(25)
+#define   INTR_MGUC				REG_BIT(24)
+#define   INTR_BCS8				REG_BIT(23)
+#define   INTR_BCS(x)				REG_BIT(15 - (x))
+#define   INTR_CCS(x)				REG_BIT(4 + (x))
+#define   INTR_RCS0				REG_BIT(0)
+#define   INTR_VECS(x)				REG_BIT(31 - (x))
+#define   INTR_VCS(x)				REG_BIT(x)
 
 #define RENDER_COPY_INTR_ENABLE			XE_REG(0x190030)
 #define VCS_VECS_INTR_ENABLE			XE_REG(0x190034)

From 35c933f68048da55ca043b1a2f1fef386e133a9d Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 17:53:39 +0100
Subject: [PATCH 008/200] drm/xe: Define IRQ offsets used by HW engines

When interrupts are delivered using memory based mechanism, engines
will write status to the report page at the offset (in bytes) that
corresponds to their interrupt bit from the GT_INTR_DW register.

Add engine interrupt offset definitions to engine info as we will
need this to process memory based interrupts.

Bspec: 46149, 50829, 50844
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-6-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_hw_engine.c       | 28 +++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_hw_engine_types.h |  2 ++
 2 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index 1fa5cf5eea9785..832989c83a2534 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -34,6 +34,7 @@ struct engine_info {
 	const char *name;
 	unsigned int class : 8;
 	unsigned int instance : 8;
+	unsigned int irq_offset : 8;
 	enum xe_force_wake_domains domain;
 	u32 mmio_base;
 };
@@ -43,6 +44,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "rcs0",
 		.class = XE_ENGINE_CLASS_RENDER,
 		.instance = 0,
+		.irq_offset = ilog2(INTR_RCS0),
 		.domain = XE_FW_RENDER,
 		.mmio_base = RENDER_RING_BASE,
 	},
@@ -50,6 +52,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "bcs0",
 		.class = XE_ENGINE_CLASS_COPY,
 		.instance = 0,
+		.irq_offset = ilog2(INTR_BCS(0)),
 		.domain = XE_FW_RENDER,
 		.mmio_base = BLT_RING_BASE,
 	},
@@ -57,6 +60,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "bcs1",
 		.class = XE_ENGINE_CLASS_COPY,
 		.instance = 1,
+		.irq_offset = ilog2(INTR_BCS(1)),
 		.domain = XE_FW_RENDER,
 		.mmio_base = XEHPC_BCS1_RING_BASE,
 	},
@@ -64,6 +68,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "bcs2",
 		.class = XE_ENGINE_CLASS_COPY,
 		.instance = 2,
+		.irq_offset = ilog2(INTR_BCS(2)),
 		.domain = XE_FW_RENDER,
 		.mmio_base = XEHPC_BCS2_RING_BASE,
 	},
@@ -71,6 +76,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "bcs3",
 		.class = XE_ENGINE_CLASS_COPY,
 		.instance = 3,
+		.irq_offset = ilog2(INTR_BCS(3)),
 		.domain = XE_FW_RENDER,
 		.mmio_base = XEHPC_BCS3_RING_BASE,
 	},
@@ -78,6 +84,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "bcs4",
 		.class = XE_ENGINE_CLASS_COPY,
 		.instance = 4,
+		.irq_offset = ilog2(INTR_BCS(4)),
 		.domain = XE_FW_RENDER,
 		.mmio_base = XEHPC_BCS4_RING_BASE,
 	},
@@ -85,6 +92,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "bcs5",
 		.class = XE_ENGINE_CLASS_COPY,
 		.instance = 5,
+		.irq_offset = ilog2(INTR_BCS(5)),
 		.domain = XE_FW_RENDER,
 		.mmio_base = XEHPC_BCS5_RING_BASE,
 	},
@@ -92,12 +100,14 @@ static const struct engine_info engine_infos[] = {
 		.name = "bcs6",
 		.class = XE_ENGINE_CLASS_COPY,
 		.instance = 6,
+		.irq_offset = ilog2(INTR_BCS(6)),
 		.domain = XE_FW_RENDER,
 		.mmio_base = XEHPC_BCS6_RING_BASE,
 	},
 	[XE_HW_ENGINE_BCS7] = {
 		.name = "bcs7",
 		.class = XE_ENGINE_CLASS_COPY,
+		.irq_offset = ilog2(INTR_BCS(7)),
 		.instance = 7,
 		.domain = XE_FW_RENDER,
 		.mmio_base = XEHPC_BCS7_RING_BASE,
@@ -106,6 +116,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "bcs8",
 		.class = XE_ENGINE_CLASS_COPY,
 		.instance = 8,
+		.irq_offset = ilog2(INTR_BCS8),
 		.domain = XE_FW_RENDER,
 		.mmio_base = XEHPC_BCS8_RING_BASE,
 	},
@@ -114,6 +125,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vcs0",
 		.class = XE_ENGINE_CLASS_VIDEO_DECODE,
 		.instance = 0,
+		.irq_offset = 32 + ilog2(INTR_VCS(0)),
 		.domain = XE_FW_MEDIA_VDBOX0,
 		.mmio_base = BSD_RING_BASE,
 	},
@@ -121,6 +133,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vcs1",
 		.class = XE_ENGINE_CLASS_VIDEO_DECODE,
 		.instance = 1,
+		.irq_offset = 32 + ilog2(INTR_VCS(1)),
 		.domain = XE_FW_MEDIA_VDBOX1,
 		.mmio_base = BSD2_RING_BASE,
 	},
@@ -128,6 +141,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vcs2",
 		.class = XE_ENGINE_CLASS_VIDEO_DECODE,
 		.instance = 2,
+		.irq_offset = 32 + ilog2(INTR_VCS(2)),
 		.domain = XE_FW_MEDIA_VDBOX2,
 		.mmio_base = BSD3_RING_BASE,
 	},
@@ -135,6 +149,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vcs3",
 		.class = XE_ENGINE_CLASS_VIDEO_DECODE,
 		.instance = 3,
+		.irq_offset = 32 + ilog2(INTR_VCS(3)),
 		.domain = XE_FW_MEDIA_VDBOX3,
 		.mmio_base = BSD4_RING_BASE,
 	},
@@ -142,6 +157,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vcs4",
 		.class = XE_ENGINE_CLASS_VIDEO_DECODE,
 		.instance = 4,
+		.irq_offset = 32 + ilog2(INTR_VCS(4)),
 		.domain = XE_FW_MEDIA_VDBOX4,
 		.mmio_base = XEHP_BSD5_RING_BASE,
 	},
@@ -149,6 +165,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vcs5",
 		.class = XE_ENGINE_CLASS_VIDEO_DECODE,
 		.instance = 5,
+		.irq_offset = 32 + ilog2(INTR_VCS(5)),
 		.domain = XE_FW_MEDIA_VDBOX5,
 		.mmio_base = XEHP_BSD6_RING_BASE,
 	},
@@ -156,6 +173,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vcs6",
 		.class = XE_ENGINE_CLASS_VIDEO_DECODE,
 		.instance = 6,
+		.irq_offset = 32 + ilog2(INTR_VCS(6)),
 		.domain = XE_FW_MEDIA_VDBOX6,
 		.mmio_base = XEHP_BSD7_RING_BASE,
 	},
@@ -163,6 +181,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vcs7",
 		.class = XE_ENGINE_CLASS_VIDEO_DECODE,
 		.instance = 7,
+		.irq_offset = 32 + ilog2(INTR_VCS(7)),
 		.domain = XE_FW_MEDIA_VDBOX7,
 		.mmio_base = XEHP_BSD8_RING_BASE,
 	},
@@ -170,6 +189,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vecs0",
 		.class = XE_ENGINE_CLASS_VIDEO_ENHANCE,
 		.instance = 0,
+		.irq_offset = 32 + ilog2(INTR_VECS(0)),
 		.domain = XE_FW_MEDIA_VEBOX0,
 		.mmio_base = VEBOX_RING_BASE,
 	},
@@ -177,6 +197,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vecs1",
 		.class = XE_ENGINE_CLASS_VIDEO_ENHANCE,
 		.instance = 1,
+		.irq_offset = 32 + ilog2(INTR_VECS(1)),
 		.domain = XE_FW_MEDIA_VEBOX1,
 		.mmio_base = VEBOX2_RING_BASE,
 	},
@@ -184,6 +205,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vecs2",
 		.class = XE_ENGINE_CLASS_VIDEO_ENHANCE,
 		.instance = 2,
+		.irq_offset = 32 + ilog2(INTR_VECS(2)),
 		.domain = XE_FW_MEDIA_VEBOX2,
 		.mmio_base = XEHP_VEBOX3_RING_BASE,
 	},
@@ -191,6 +213,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "vecs3",
 		.class = XE_ENGINE_CLASS_VIDEO_ENHANCE,
 		.instance = 3,
+		.irq_offset = 32 + ilog2(INTR_VECS(3)),
 		.domain = XE_FW_MEDIA_VEBOX3,
 		.mmio_base = XEHP_VEBOX4_RING_BASE,
 	},
@@ -198,6 +221,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "ccs0",
 		.class = XE_ENGINE_CLASS_COMPUTE,
 		.instance = 0,
+		.irq_offset = ilog2(INTR_CCS(0)),
 		.domain = XE_FW_RENDER,
 		.mmio_base = COMPUTE0_RING_BASE,
 	},
@@ -205,6 +229,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "ccs1",
 		.class = XE_ENGINE_CLASS_COMPUTE,
 		.instance = 1,
+		.irq_offset = ilog2(INTR_CCS(1)),
 		.domain = XE_FW_RENDER,
 		.mmio_base = COMPUTE1_RING_BASE,
 	},
@@ -212,6 +237,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "ccs2",
 		.class = XE_ENGINE_CLASS_COMPUTE,
 		.instance = 2,
+		.irq_offset = ilog2(INTR_CCS(2)),
 		.domain = XE_FW_RENDER,
 		.mmio_base = COMPUTE2_RING_BASE,
 	},
@@ -219,6 +245,7 @@ static const struct engine_info engine_infos[] = {
 		.name = "ccs3",
 		.class = XE_ENGINE_CLASS_COMPUTE,
 		.instance = 3,
+		.irq_offset = ilog2(INTR_CCS(3)),
 		.domain = XE_FW_RENDER,
 		.mmio_base = COMPUTE3_RING_BASE,
 	},
@@ -397,6 +424,7 @@ static void hw_engine_init_early(struct xe_gt *gt, struct xe_hw_engine *hwe,
 	hwe->class = info->class;
 	hwe->instance = info->instance;
 	hwe->mmio_base = info->mmio_base;
+	hwe->irq_offset = info->irq_offset;
 	hwe->domain = info->domain;
 	hwe->name = info->name;
 	hwe->fence_irq = &gt->fence_irq[info->class];
diff --git a/drivers/gpu/drm/xe/xe_hw_engine_types.h b/drivers/gpu/drm/xe/xe_hw_engine_types.h
index 39908dec042a49..dfeaaac08b7f95 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine_types.h
+++ b/drivers/gpu/drm/xe/xe_hw_engine_types.h
@@ -116,6 +116,8 @@ struct xe_hw_engine {
 	u16 instance;
 	/** @logical_instance: logical instance of this hw engine */
 	u16 logical_instance;
+	/** @irq_offset: IRQ offset of this hw engine */
+	u16 irq_offset;
 	/** @mmio_base: MMIO base address of this hw engine*/
 	u32 mmio_base;
 	/**

From f15de1936f8d1bb5b4f7ee55da7fdba8c7540792 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 17:53:40 +0100
Subject: [PATCH 009/200] drm/xe: Add XE_BO_NEEDS_UC flag to force UC mode
 instead WB

When we map BO in GGTT, then by default we are using PAT index
that corresponds to XE_CACHE_WB but ppcoming feature will require
use of the PAT index of the XE_CACHE_UC.  Define new BO flag that
could be used during BO creation to force alternate caching.

Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-7-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.h   | 1 +
 drivers/gpu/drm/xe/xe_ggtt.c | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 9b1279aca1272c..97b32528c600f6 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -44,6 +44,7 @@
 #define XE_BO_FIXED_PLACEMENT_BIT	BIT(11)
 #define XE_BO_PAGETABLE			BIT(12)
 #define XE_BO_NEEDS_CPU_ACCESS		BIT(13)
+#define XE_BO_NEEDS_UC			BIT(14)
 /* this one is trigger internally only */
 #define XE_BO_INTERNAL_TEST		BIT(30)
 #define XE_BO_INTERNAL_64K		BIT(31)
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 3efd2d066bf723..c639dbf3bdd27b 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -334,7 +334,8 @@ int xe_ggtt_insert_special_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
 
 void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
 {
-	u16 pat_index = tile_to_xe(ggtt->tile)->pat.idx[XE_CACHE_WB];
+	u16 cache_mode = bo->flags & XE_BO_NEEDS_UC ? XE_CACHE_NONE : XE_CACHE_WB;
+	u16 pat_index = tile_to_xe(ggtt->tile)->pat.idx[cache_mode];
 	u64 start = bo->ggtt_node.start;
 	u64 offset, pte;
 

From a6581ebe76856bf23d1a7f3ee95828173b560a05 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 17:53:41 +0100
Subject: [PATCH 010/200] drm/xe/vf: Introduce Memory Based Interrupts Handler

The register based interrupts infrastructure does not scale
efficiently to allow delivering interrupts to a large number
of virtual machines. Memory based interrupt reporting provides
an efficient and scalable infrastructure.

Define handler to read and dispatch memory based interrupts.
We will use this handler in upcoming patch.

Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-8-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/Makefile          |   4 +-
 drivers/gpu/drm/xe/xe_device.c       |   7 +
 drivers/gpu/drm/xe/xe_device.h       |   5 +
 drivers/gpu/drm/xe/xe_device_types.h |   5 +
 drivers/gpu/drm/xe/xe_memirq.c       | 430 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_memirq.h       |  26 ++
 drivers/gpu/drm/xe/xe_memirq_types.h |  37 +++
 7 files changed, 513 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/xe/xe_memirq.c
 create mode 100644 drivers/gpu/drm/xe/xe_memirq.h
 create mode 100644 drivers/gpu/drm/xe/xe_memirq_types.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 53bd2a8ba1ae5c..9214915d9566cf 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -146,7 +146,9 @@ xe-y += xe_bb.o \
 xe-$(CONFIG_HWMON) += xe_hwmon.o
 
 # graphics virtualization (SR-IOV) support
-xe-y += xe_sriov.o
+xe-y += \
+	xe_memirq.o \
+	xe_sriov.o
 
 xe-$(CONFIG_PCI_IOV) += \
 	xe_lmtt.o \
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index d9ae77fe7382dd..86867d42d53298 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -29,12 +29,14 @@
 #include "xe_gt.h"
 #include "xe_gt_mcr.h"
 #include "xe_irq.h"
+#include "xe_memirq.h"
 #include "xe_mmio.h"
 #include "xe_module.h"
 #include "xe_pat.h"
 #include "xe_pcode.h"
 #include "xe_pm.h"
 #include "xe_query.h"
+#include "xe_sriov.h"
 #include "xe_tile.h"
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_ttm_sys_mgr.h"
@@ -456,6 +458,11 @@ int xe_device_probe(struct xe_device *xe)
 		err = xe_ggtt_init_early(tile->mem.ggtt);
 		if (err)
 			return err;
+		if (IS_SRIOV_VF(xe)) {
+			err = xe_memirq_init(&tile->sriov.vf.memirq);
+			if (err)
+				return err;
+		}
 	}
 
 	err = drmm_add_action_or_reset(&xe->drm, xe_driver_flr_fini, xe);
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 3da83b23320638..af8ac2e9e27095 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -168,6 +168,11 @@ static inline bool xe_device_has_sriov(struct xe_device *xe)
 	return xe->info.has_sriov;
 }
 
+static inline bool xe_device_has_memirq(struct xe_device *xe)
+{
+	return GRAPHICS_VERx100(xe) >= 1250;
+}
+
 u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index c45ef17b347323..71f23ac365e669 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -16,6 +16,7 @@
 #include "xe_heci_gsc.h"
 #include "xe_gt_types.h"
 #include "xe_lmtt_types.h"
+#include "xe_memirq_types.h"
 #include "xe_platform_types.h"
 #include "xe_pt_types.h"
 #include "xe_sriov_types.h"
@@ -192,6 +193,10 @@ struct xe_tile {
 			/** @sriov.pf.lmtt: Local Memory Translation Table. */
 			struct xe_lmtt lmtt;
 		} pf;
+		struct {
+			/** @sriov.vf.memirq: Memory Based Interrupts. */
+			struct xe_memirq memirq;
+		} vf;
 	} sriov;
 
 	/** @migrate: Migration helper for vram blits and clearing */
diff --git a/drivers/gpu/drm/xe/xe_memirq.c b/drivers/gpu/drm/xe/xe_memirq.c
new file mode 100644
index 00000000000000..76e95535d7f60a
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_memirq.c
@@ -0,0 +1,430 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <drm/drm_managed.h>
+
+#include "regs/xe_gt_regs.h"
+#include "regs/xe_guc_regs.h"
+#include "regs/xe_regs.h"
+
+#include "xe_assert.h"
+#include "xe_bo.h"
+#include "xe_device.h"
+#include "xe_device_types.h"
+#include "xe_gt.h"
+#include "xe_gt_printk.h"
+#include "xe_guc.h"
+#include "xe_hw_engine.h"
+#include "xe_map.h"
+#include "xe_memirq.h"
+#include "xe_sriov.h"
+#include "xe_sriov_printk.h"
+
+#define memirq_assert(m, condition)	xe_tile_assert(memirq_to_tile(m), condition)
+#define memirq_debug(m, msg...)		xe_sriov_dbg_verbose(memirq_to_xe(m), "MEMIRQ: " msg)
+
+static struct xe_tile *memirq_to_tile(struct xe_memirq *memirq)
+{
+	return container_of(memirq, struct xe_tile, sriov.vf.memirq);
+}
+
+static struct xe_device *memirq_to_xe(struct xe_memirq *memirq)
+{
+	return tile_to_xe(memirq_to_tile(memirq));
+}
+
+static const char *guc_name(struct xe_guc *guc)
+{
+	return xe_gt_is_media_type(guc_to_gt(guc)) ? "media GuC" : "GuC";
+}
+
+/**
+ * DOC: Memory Based Interrupts
+ *
+ * MMIO register based interrupts infrastructure used for non-virtualized mode
+ * or SRIOV-8 (which supports 8 Virtual Functions) does not scale efficiently
+ * to allow delivering interrupts to a large number of Virtual machines or
+ * containers. Memory based interrupt status reporting provides an efficient
+ * and scalable infrastructure.
+ *
+ * For memory based interrupt status reporting hardware sequence is:
+ *  * Engine writes the interrupt event to memory
+ *    (Pointer to memory location is provided by SW. This memory surface must
+ *    be mapped to system memory and must be marked as un-cacheable (UC) on
+ *    Graphics IP Caches)
+ *  * Engine triggers an interrupt to host.
+ */
+
+/**
+ * DOC: Memory Based Interrupts Page Layout
+ *
+ * `Memory Based Interrupts`_ requires three different objects, which are
+ * called "page" in the specs, even if they aren't page-sized or aligned.
+ *
+ * To simplify the code we allocate a single page size object and then use
+ * offsets to embedded "pages". The address of those "pages" are then
+ * programmed in the HW via LRI and LRM in the context image.
+ *
+ * - _`Interrupt Status Report Page`: this page contains the interrupt
+ *   status vectors for each unit. Each bit in the interrupt vectors is
+ *   converted to a byte, with the byte being set to 0xFF when an
+ *   interrupt is triggered; interrupt vectors are 16b big so each unit
+ *   gets 16B. One space is reserved for each bit in one of the
+ *   GT_INTR_DWx registers, so this object needs a total of 1024B.
+ *   This object needs to be 4KiB aligned.
+ *
+ * - _`Interrupt Source Report Page`: this is the equivalent of the
+ *   GEN11_GT_INTR_DWx registers, with each bit in those registers being
+ *   mapped to a byte here. The offsets are the same, just bytes instead
+ *   of bits. This object needs to be cacheline aligned.
+ *
+ * - Interrupt Mask: the HW needs a location to fetch the interrupt
+ *   mask vector to be used by the LRM in the context, so we just use
+ *   the next available space in the interrupt page.
+ *
+ * ::
+ *
+ *   0x0000   +===========+  <== Interrupt Status Report Page
+ *            |           |
+ *            |           |     ____ +----+----------------+
+ *            |           |    /     |  0 | USER INTERRUPT |
+ *            +-----------+ __/      |  1 |                |
+ *            |  HWE(n)   | __       |    | CTX SWITCH     |
+ *            +-----------+   \      |    | WAIT SEMAPHORE |
+ *            |           |    \____ | 15 |                |
+ *            |           |          +----+----------------+
+ *            |           |
+ *   0x0400   +===========+  <== Interrupt Source Report Page
+ *            |  HWE(0)   |
+ *            |  HWE(1)   |
+ *            |           |
+ *            |  HWE(x)   |
+ *   0x0440   +===========+  <== Interrupt Enable Mask
+ *            |           |
+ *            |           |
+ *            +-----------+
+ */
+
+static void __release_xe_bo(struct drm_device *drm, void *arg)
+{
+	struct xe_bo *bo = arg;
+
+	xe_bo_unpin_map_no_vm(bo);
+}
+
+static int memirq_alloc_pages(struct xe_memirq *memirq)
+{
+	struct xe_device *xe = memirq_to_xe(memirq);
+	struct xe_tile *tile = memirq_to_tile(memirq);
+	struct xe_bo *bo;
+	int err;
+
+	BUILD_BUG_ON(!IS_ALIGNED(XE_MEMIRQ_SOURCE_OFFSET, SZ_64));
+	BUILD_BUG_ON(!IS_ALIGNED(XE_MEMIRQ_STATUS_OFFSET, SZ_4K));
+
+	/* XXX: convert to managed bo */
+	bo = xe_bo_create_pin_map(xe, tile, NULL, SZ_4K,
+				  ttm_bo_type_kernel,
+				  XE_BO_CREATE_SYSTEM_BIT |
+				  XE_BO_CREATE_GGTT_BIT |
+				  XE_BO_NEEDS_UC |
+				  XE_BO_NEEDS_CPU_ACCESS);
+	if (IS_ERR(bo)) {
+		err = PTR_ERR(bo);
+		goto out;
+	}
+
+	memirq_assert(memirq, !xe_bo_is_vram(bo));
+	memirq_assert(memirq, !memirq->bo);
+
+	iosys_map_memset(&bo->vmap, 0, 0, SZ_4K);
+
+	memirq->bo = bo;
+	memirq->source = IOSYS_MAP_INIT_OFFSET(&bo->vmap, XE_MEMIRQ_SOURCE_OFFSET);
+	memirq->status = IOSYS_MAP_INIT_OFFSET(&bo->vmap, XE_MEMIRQ_STATUS_OFFSET);
+	memirq->mask = IOSYS_MAP_INIT_OFFSET(&bo->vmap, XE_MEMIRQ_ENABLE_OFFSET);
+
+	memirq_assert(memirq, !memirq->source.is_iomem);
+	memirq_assert(memirq, !memirq->status.is_iomem);
+	memirq_assert(memirq, !memirq->mask.is_iomem);
+
+	memirq_debug(memirq, "page offsets: source %#x status %#x\n",
+		     xe_memirq_source_ptr(memirq), xe_memirq_status_ptr(memirq));
+
+	return drmm_add_action_or_reset(&xe->drm, __release_xe_bo, memirq->bo);
+
+out:
+	xe_sriov_err(memirq_to_xe(memirq),
+		     "Failed to allocate memirq page (%pe)\n", ERR_PTR(err));
+	return err;
+}
+
+static void memirq_set_enable(struct xe_memirq *memirq, bool enable)
+{
+	iosys_map_wr(&memirq->mask, 0, u32, enable ? GENMASK(15, 0) : 0);
+
+	memirq->enabled = enable;
+}
+
+/**
+ * xe_memirq_init - Initialize data used by `Memory Based Interrupts`_.
+ * @memirq: the &xe_memirq to initialize
+ *
+ * Allocate `Interrupt Source Report Page`_ and `Interrupt Status Report Page`_
+ * used by `Memory Based Interrupts`_.
+ *
+ * These allocations are managed and will be implicitly released on unload.
+ *
+ * Note: This function shall be called only by the VF driver.
+ *
+ * If this function fails then VF driver won't be able to operate correctly.
+ * If `Memory Based Interrupts`_ are not used this function will return 0.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_memirq_init(struct xe_memirq *memirq)
+{
+	struct xe_device *xe = memirq_to_xe(memirq);
+	int err;
+
+	memirq_assert(memirq, IS_SRIOV_VF(xe));
+
+	if (!xe_device_has_memirq(xe))
+		return 0;
+
+	err = memirq_alloc_pages(memirq);
+	if (unlikely(err))
+		return err;
+
+	/* we need to start with all irqs enabled */
+	memirq_set_enable(memirq, true);
+
+	return 0;
+}
+
+/**
+ * xe_memirq_source_ptr - Get GGTT's offset of the `Interrupt Source Report Page`_.
+ * @memirq: the &xe_memirq to query
+ *
+ * Shall be called only on VF driver when `Memory Based Interrupts`_ are used
+ * and xe_memirq_init() didn't fail.
+ *
+ * Return: GGTT's offset of the `Interrupt Source Report Page`_.
+ */
+u32 xe_memirq_source_ptr(struct xe_memirq *memirq)
+{
+	memirq_assert(memirq, IS_SRIOV_VF(memirq_to_xe(memirq)));
+	memirq_assert(memirq, xe_device_has_memirq(memirq_to_xe(memirq)));
+	memirq_assert(memirq, memirq->bo);
+
+	return xe_bo_ggtt_addr(memirq->bo) + XE_MEMIRQ_SOURCE_OFFSET;
+}
+
+/**
+ * xe_memirq_status_ptr - Get GGTT's offset of the `Interrupt Status Report Page`_.
+ * @memirq: the &xe_memirq to query
+ *
+ * Shall be called only on VF driver when `Memory Based Interrupts`_ are used
+ * and xe_memirq_init() didn't fail.
+ *
+ * Return: GGTT's offset of the `Interrupt Status Report Page`_.
+ */
+u32 xe_memirq_status_ptr(struct xe_memirq *memirq)
+{
+	memirq_assert(memirq, IS_SRIOV_VF(memirq_to_xe(memirq)));
+	memirq_assert(memirq, xe_device_has_memirq(memirq_to_xe(memirq)));
+	memirq_assert(memirq, memirq->bo);
+
+	return xe_bo_ggtt_addr(memirq->bo) + XE_MEMIRQ_STATUS_OFFSET;
+}
+
+/**
+ * xe_memirq_enable_ptr - Get GGTT's offset of the Interrupt Enable Mask.
+ * @memirq: the &xe_memirq to query
+ *
+ * Shall be called only on VF driver when `Memory Based Interrupts`_ are used
+ * and xe_memirq_init() didn't fail.
+ *
+ * Return: GGTT's offset of the Interrupt Enable Mask.
+ */
+u32 xe_memirq_enable_ptr(struct xe_memirq *memirq)
+{
+	memirq_assert(memirq, IS_SRIOV_VF(memirq_to_xe(memirq)));
+	memirq_assert(memirq, xe_device_has_memirq(memirq_to_xe(memirq)));
+	memirq_assert(memirq, memirq->bo);
+
+	return xe_bo_ggtt_addr(memirq->bo) + XE_MEMIRQ_ENABLE_OFFSET;
+}
+
+/**
+ * xe_memirq_init_guc - Prepare GuC for `Memory Based Interrupts`_.
+ * @memirq: the &xe_memirq
+ * @guc: the &xe_guc to setup
+ *
+ * Register `Interrupt Source Report Page`_ and `Interrupt Status Report Page`_
+ * to be used by the GuC when `Memory Based Interrupts`_ are required.
+ *
+ * Shall be called only on VF driver when `Memory Based Interrupts`_ are used
+ * and xe_memirq_init() didn't fail.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_memirq_init_guc(struct xe_memirq *memirq, struct xe_guc *guc)
+{
+	bool is_media = xe_gt_is_media_type(guc_to_gt(guc));
+	u32 offset = is_media ? ilog2(INTR_MGUC) : ilog2(INTR_GUC);
+	u32 source, status;
+	int err;
+
+	memirq_assert(memirq, IS_SRIOV_VF(memirq_to_xe(memirq)));
+	memirq_assert(memirq, xe_device_has_memirq(memirq_to_xe(memirq)));
+	memirq_assert(memirq, memirq->bo);
+
+	source = xe_memirq_source_ptr(memirq) + offset;
+	status = xe_memirq_status_ptr(memirq) + offset * SZ_16;
+
+	err = xe_guc_self_cfg64(guc, GUC_KLV_SELF_CFG_MEMIRQ_SOURCE_ADDR_KEY,
+				source);
+	if (unlikely(err))
+		goto failed;
+
+	err = xe_guc_self_cfg64(guc, GUC_KLV_SELF_CFG_MEMIRQ_STATUS_ADDR_KEY,
+				status);
+	if (unlikely(err))
+		goto failed;
+
+	return 0;
+
+failed:
+	xe_sriov_err(memirq_to_xe(memirq),
+		     "Failed to setup report pages in %s (%pe)\n",
+		     guc_name(guc), ERR_PTR(err));
+	return err;
+}
+
+/**
+ * xe_memirq_reset - Disable processing of `Memory Based Interrupts`_.
+ * @memirq: struct xe_memirq
+ *
+ * This is part of the driver IRQ setup flow.
+ *
+ * This function shall only be used by the VF driver on platforms that use
+ * `Memory Based Interrupts`_.
+ */
+void xe_memirq_reset(struct xe_memirq *memirq)
+{
+	memirq_assert(memirq, IS_SRIOV_VF(memirq_to_xe(memirq)));
+	memirq_assert(memirq, xe_device_has_memirq(memirq_to_xe(memirq)));
+
+	if (memirq->bo)
+		memirq_set_enable(memirq, false);
+}
+
+/**
+ * xe_memirq_postinstall - Enable processing of `Memory Based Interrupts`_.
+ * @memirq: the &xe_memirq
+ *
+ * This is part of the driver IRQ setup flow.
+ *
+ * This function shall only be used by the VF driver on platforms that use
+ * `Memory Based Interrupts`_.
+ */
+void xe_memirq_postinstall(struct xe_memirq *memirq)
+{
+	memirq_assert(memirq, IS_SRIOV_VF(memirq_to_xe(memirq)));
+	memirq_assert(memirq, xe_device_has_memirq(memirq_to_xe(memirq)));
+
+	if (memirq->bo)
+		memirq_set_enable(memirq, true);
+}
+
+static bool memirq_received(struct xe_memirq *memirq, struct iosys_map *vector,
+			    u16 offset, const char *name)
+{
+	u8 value;
+
+	value = iosys_map_rd(vector, offset, u8);
+	if (value) {
+		if (value != 0xff)
+			xe_sriov_err_ratelimited(memirq_to_xe(memirq),
+						 "Unexpected memirq value %#x from %s at %u\n",
+						 value, name, offset);
+		iosys_map_wr(vector, offset, u8, 0x00);
+	}
+
+	return value;
+}
+
+static void memirq_dispatch_engine(struct xe_memirq *memirq, struct iosys_map *status,
+				   struct xe_hw_engine *hwe)
+{
+	memirq_debug(memirq, "STATUS %s %*ph\n", hwe->name, 16, status->vaddr);
+
+	if (memirq_received(memirq, status, ilog2(GT_RENDER_USER_INTERRUPT), hwe->name))
+		xe_hw_engine_handle_irq(hwe, GT_RENDER_USER_INTERRUPT);
+}
+
+static void memirq_dispatch_guc(struct xe_memirq *memirq, struct iosys_map *status,
+				struct xe_guc *guc)
+{
+	const char *name = guc_name(guc);
+
+	memirq_debug(memirq, "STATUS %s %*ph\n", name, 16, status->vaddr);
+
+	if (memirq_received(memirq, status, ilog2(GUC_INTR_GUC2HOST), name))
+		xe_guc_irq_handler(guc, GUC_INTR_GUC2HOST);
+}
+
+/**
+ * xe_memirq_handler - The `Memory Based Interrupts`_ Handler.
+ * @memirq: the &xe_memirq
+ *
+ * This function reads and dispatches `Memory Based Interrupts`.
+ */
+void xe_memirq_handler(struct xe_memirq *memirq)
+{
+	struct xe_device *xe = memirq_to_xe(memirq);
+	struct xe_tile *tile = memirq_to_tile(memirq);
+	struct xe_hw_engine *hwe;
+	enum xe_hw_engine_id id;
+	struct iosys_map map;
+	unsigned int gtid;
+	struct xe_gt *gt;
+
+	if (!memirq->bo)
+		return;
+
+	memirq_assert(memirq, !memirq->source.is_iomem);
+	memirq_debug(memirq, "SOURCE %*ph\n", 32, memirq->source.vaddr);
+	memirq_debug(memirq, "SOURCE %*ph\n", 32, memirq->source.vaddr + 32);
+
+	for_each_gt(gt, xe, gtid) {
+		if (gt->tile != tile)
+			continue;
+
+		for_each_hw_engine(hwe, gt, id) {
+			if (memirq_received(memirq, &memirq->source, hwe->irq_offset, "SRC")) {
+				map = IOSYS_MAP_INIT_OFFSET(&memirq->status,
+							    hwe->irq_offset * SZ_16);
+				memirq_dispatch_engine(memirq, &map, hwe);
+			}
+		}
+	}
+
+	/* GuC and media GuC (if present) must be checked separately */
+
+	if (memirq_received(memirq, &memirq->source, ilog2(INTR_GUC), "SRC")) {
+		map = IOSYS_MAP_INIT_OFFSET(&memirq->status, ilog2(INTR_GUC) * SZ_16);
+		memirq_dispatch_guc(memirq, &map, &tile->primary_gt->uc.guc);
+	}
+
+	if (!tile->media_gt)
+		return;
+
+	if (memirq_received(memirq, &memirq->source, ilog2(INTR_MGUC), "SRC")) {
+		map = IOSYS_MAP_INIT_OFFSET(&memirq->status, ilog2(INTR_MGUC) * SZ_16);
+		memirq_dispatch_guc(memirq, &map, &tile->media_gt->uc.guc);
+	}
+}
diff --git a/drivers/gpu/drm/xe/xe_memirq.h b/drivers/gpu/drm/xe/xe_memirq.h
new file mode 100644
index 00000000000000..2d40d03c309561
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_memirq.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_MEMIRQ_H_
+#define _XE_MEMIRQ_H_
+
+#include <linux/types.h>
+
+struct xe_guc;
+struct xe_memirq;
+
+int xe_memirq_init(struct xe_memirq *memirq);
+
+u32 xe_memirq_source_ptr(struct xe_memirq *memirq);
+u32 xe_memirq_status_ptr(struct xe_memirq *memirq);
+u32 xe_memirq_enable_ptr(struct xe_memirq *memirq);
+
+void xe_memirq_reset(struct xe_memirq *memirq);
+void xe_memirq_postinstall(struct xe_memirq *memirq);
+void xe_memirq_handler(struct xe_memirq *memirq);
+
+int xe_memirq_init_guc(struct xe_memirq *memirq, struct xe_guc *guc);
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_memirq_types.h b/drivers/gpu/drm/xe/xe_memirq_types.h
new file mode 100644
index 00000000000000..625b6b8736cc42
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_memirq_types.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_MEMIRQ_TYPES_H_
+#define _XE_MEMIRQ_TYPES_H_
+
+#include <linux/iosys-map.h>
+
+struct xe_bo;
+
+/* ISR */
+#define XE_MEMIRQ_STATUS_OFFSET		0x0
+/* IIR */
+#define XE_MEMIRQ_SOURCE_OFFSET		0x400
+/* IMR */
+#define XE_MEMIRQ_ENABLE_OFFSET		0x440
+
+/**
+ * struct xe_memirq - Data used by the `Memory Based Interrupts`_.
+ *
+ * @bo: buffer object with `Memory Based Interrupts Page Layout`_.
+ * @source: iosys pointer to `Interrupt Source Report Page`_.
+ * @status: iosys pointer to `Interrupt Status Report Page`_.
+ * @mask: iosys pointer to Interrupt Enable Mask.
+ * @enabled: internal flag used to control processing of the interrupts.
+ */
+struct xe_memirq {
+	struct xe_bo *bo;
+	struct iosys_map source;
+	struct iosys_map status;
+	struct iosys_map mask;
+	bool enabled;
+};
+
+#endif

From 9a30b04f15f043cdb5add993413a4fc5b692b25f Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 17:53:42 +0100
Subject: [PATCH 011/200] drm/xe/vf: Update LRC with memory based interrupts
 data

When Memory Based Interrupts are used, the VF driver must provide
in the LRC the references to the Source and Status Report Pages.

Update the LRC according to the requirements.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-9-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_lrc.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 44586e612eaf5d..f17e9785355e8a 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -19,6 +19,8 @@
 #include "xe_gt_printk.h"
 #include "xe_hw_fence.h"
 #include "xe_map.h"
+#include "xe_memirq.h"
+#include "xe_sriov.h"
 #include "xe_vm.h"
 
 #define CTX_VALID				(1 << 0)
@@ -532,6 +534,27 @@ static void set_context_control(u32 *regs, struct xe_hw_engine *hwe)
 	/* TODO: Timestamp */
 }
 
+static void set_memory_based_intr(u32 *regs, struct xe_hw_engine *hwe)
+{
+	struct xe_memirq *memirq = &gt_to_tile(hwe->gt)->sriov.vf.memirq;
+	struct xe_device *xe = gt_to_xe(hwe->gt);
+
+	if (!IS_SRIOV_VF(xe) || !xe_device_has_memirq(xe))
+		return;
+
+	regs[CTX_LRM_INT_MASK_ENABLE] = MI_LOAD_REGISTER_MEM |
+					MI_LRI_LRM_CS_MMIO | MI_LRM_USE_GGTT;
+	regs[CTX_INT_MASK_ENABLE_REG] = RING_IMR(0).addr;
+	regs[CTX_INT_MASK_ENABLE_PTR] = xe_memirq_enable_ptr(memirq);
+
+	regs[CTX_LRI_INT_REPORT_PTR] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(2) |
+				       MI_LRI_LRM_CS_MMIO | MI_LRI_FORCE_POSTED;
+	regs[CTX_INT_STATUS_REPORT_REG] = RING_INT_STATUS_RPT_PTR(0).addr;
+	regs[CTX_INT_STATUS_REPORT_PTR] = xe_memirq_status_ptr(memirq);
+	regs[CTX_INT_SRC_REPORT_REG] = RING_INT_SRC_RPT_PTR(0).addr;
+	regs[CTX_INT_SRC_REPORT_PTR] = xe_memirq_source_ptr(memirq);
+}
+
 static int lrc_ring_mi_mode(struct xe_hw_engine *hwe)
 {
 	struct xe_device *xe = gt_to_xe(hwe->gt);
@@ -667,6 +690,7 @@ static void *empty_lrc_data(struct xe_hw_engine *hwe)
 	regs = data + LRC_PPHWSP_SIZE;
 	set_offsets(regs, reg_offsets(xe, hwe->class), hwe);
 	set_context_control(regs, hwe);
+	set_memory_based_intr(regs, hwe);
 	reset_stop_ring(regs, hwe);
 
 	return data;

From aef4eb7c7dec62f8b289651540fcc851257b1a16 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 17:53:43 +0100
Subject: [PATCH 012/200] drm/xe/vf: Setup memory based interrupts in GuC

When Memory Based Interrupts are used, the VF driver must provide
to the GuC references to the Source and Status Report Pages.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-10-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_guc.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 76b31d542e1ae5..811e8b20127055 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -22,8 +22,10 @@
 #include "xe_guc_log.h"
 #include "xe_guc_pc.h"
 #include "xe_guc_submit.h"
+#include "xe_memirq.h"
 #include "xe_mmio.h"
 #include "xe_platform_types.h"
+#include "xe_sriov.h"
 #include "xe_uc.h"
 #include "xe_uc_fw.h"
 #include "xe_wa.h"
@@ -568,10 +570,20 @@ static void guc_enable_irq(struct xe_guc *guc)
 
 int xe_guc_enable_communication(struct xe_guc *guc)
 {
+	struct xe_device *xe = guc_to_xe(guc);
 	int err;
 
 	guc_enable_irq(guc);
 
+	if (IS_SRIOV_VF(xe) && xe_device_has_memirq(xe)) {
+		struct xe_gt *gt = guc_to_gt(guc);
+		struct xe_tile *tile = gt_to_tile(gt);
+
+		err = xe_memirq_init_guc(&tile->sriov.vf.memirq, guc);
+		if (err)
+			return err;
+	}
+
 	xe_mmio_rmw32(guc_to_gt(guc), PMINTRMSK,
 		      ARAT_EXPIRED_INTRMSK, 0);
 

From b130289b23244dc5bc5fcbd42ac57ad689cccae9 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 17:53:44 +0100
Subject: [PATCH 013/200] drm/xe/vf: Add VF specific interrupt handler

There are small differences in handling of the register based
interrupts on the VF driver as some registers are not accessible
to the VF driver. Additionally VFs must support Memory Based
Interrupts. Add VF specific interrupt handler for this.

Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-11-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_irq.c | 71 +++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
index d1f5ba4bb745a3..907c8ff0fa21fe 100644
--- a/drivers/gpu/drm/xe/xe_irq.c
+++ b/drivers/gpu/drm/xe/xe_irq.c
@@ -17,7 +17,9 @@
 #include "xe_gt.h"
 #include "xe_guc.h"
 #include "xe_hw_engine.h"
+#include "xe_memirq.h"
 #include "xe_mmio.h"
+#include "xe_sriov.h"
 
 /*
  * Interrupt registers for a unit are always consecutive and ordered
@@ -498,6 +500,9 @@ static void xelp_irq_reset(struct xe_tile *tile)
 
 	gt_irq_reset(tile);
 
+	if (IS_SRIOV_VF(tile_to_xe(tile)))
+		return;
+
 	mask_and_disable(tile, PCU_IRQ_OFFSET);
 }
 
@@ -508,6 +513,9 @@ static void dg1_irq_reset(struct xe_tile *tile)
 
 	gt_irq_reset(tile);
 
+	if (IS_SRIOV_VF(tile_to_xe(tile)))
+		return;
+
 	mask_and_disable(tile, PCU_IRQ_OFFSET);
 }
 
@@ -518,11 +526,34 @@ static void dg1_irq_reset_mstr(struct xe_tile *tile)
 	xe_mmio_write32(mmio, GFX_MSTR_IRQ, ~0);
 }
 
+static void vf_irq_reset(struct xe_device *xe)
+{
+	struct xe_tile *tile;
+	unsigned int id;
+
+	xe_assert(xe, IS_SRIOV_VF(xe));
+
+	if (GRAPHICS_VERx100(xe) < 1210)
+		xelp_intr_disable(xe);
+	else
+		xe_assert(xe, xe_device_has_memirq(xe));
+
+	for_each_tile(tile, xe, id) {
+		if (xe_device_has_memirq(xe))
+			xe_memirq_reset(&tile->sriov.vf.memirq);
+		else
+			gt_irq_reset(tile);
+	}
+}
+
 static void xe_irq_reset(struct xe_device *xe)
 {
 	struct xe_tile *tile;
 	u8 id;
 
+	if (IS_SRIOV_VF(xe))
+		return vf_irq_reset(xe);
+
 	for_each_tile(tile, xe, id) {
 		if (GRAPHICS_VERx100(xe) >= 1210)
 			dg1_irq_reset(tile);
@@ -545,8 +576,26 @@ static void xe_irq_reset(struct xe_device *xe)
 	}
 }
 
+static void vf_irq_postinstall(struct xe_device *xe)
+{
+	struct xe_tile *tile;
+	unsigned int id;
+
+	for_each_tile(tile, xe, id)
+		if (xe_device_has_memirq(xe))
+			xe_memirq_postinstall(&tile->sriov.vf.memirq);
+
+	if (GRAPHICS_VERx100(xe) < 1210)
+		xelp_intr_enable(xe, true);
+	else
+		xe_assert(xe, xe_device_has_memirq(xe));
+}
+
 static void xe_irq_postinstall(struct xe_device *xe)
 {
+	if (IS_SRIOV_VF(xe))
+		return vf_irq_postinstall(xe);
+
 	xe_display_irq_postinstall(xe, xe_root_mmio_gt(xe));
 
 	/*
@@ -563,8 +612,30 @@ static void xe_irq_postinstall(struct xe_device *xe)
 		xelp_intr_enable(xe, true);
 }
 
+static irqreturn_t vf_mem_irq_handler(int irq, void *arg)
+{
+	struct xe_device *xe = arg;
+	struct xe_tile *tile;
+	unsigned int id;
+
+	spin_lock(&xe->irq.lock);
+	if (!xe->irq.enabled) {
+		spin_unlock(&xe->irq.lock);
+		return IRQ_NONE;
+	}
+	spin_unlock(&xe->irq.lock);
+
+	for_each_tile(tile, xe, id)
+		xe_memirq_handler(&tile->sriov.vf.memirq);
+
+	return IRQ_HANDLED;
+}
+
 static irq_handler_t xe_irq_handler(struct xe_device *xe)
 {
+	if (IS_SRIOV_VF(xe) && xe_device_has_memirq(xe))
+		return vf_mem_irq_handler;
+
 	if (GRAPHICS_VERx100(xe) >= 1210)
 		return dg1_irq_handler;
 	else

From b2e1f97fb41843ddee5afbf4ba7812642f3cfab9 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 20:06:20 +0100
Subject: [PATCH 014/200] drm/xe: Add GT oriented drm_printers

Instead of using generic drm_err_printer() that just adds static
prefix, or drm_info_printer() that outputs only device name, add
new helpers that create dedicated drm_printers that use our GT
oriented xe_gt_err() and xe_gt_info() functions.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231218190629.502-2-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_printk.h | 44 +++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_printk.h b/drivers/gpu/drm/xe/xe_gt_printk.h
index 5991bcadd47e03..c2b004d3f48e30 100644
--- a/drivers/gpu/drm/xe/xe_gt_printk.h
+++ b/drivers/gpu/drm/xe/xe_gt_printk.h
@@ -43,4 +43,48 @@
 #define xe_gt_WARN_ON_ONCE(_gt, _condition) \
 	xe_gt_WARN_ONCE((_gt), _condition, "%s(%s)", "gt_WARN_ON_ONCE", __stringify(_condition))
 
+static inline void __xe_gt_printfn_err(struct drm_printer *p, struct va_format *vaf)
+{
+	struct xe_gt *gt = p->arg;
+
+	xe_gt_err(gt, "%pV", vaf);
+}
+
+static inline void __xe_gt_printfn_info(struct drm_printer *p, struct va_format *vaf)
+{
+	struct xe_gt *gt = p->arg;
+
+	xe_gt_info(gt, "%pV", vaf);
+}
+
+/**
+ * xe_gt_err_printer - Construct a &drm_printer that outputs to xe_gt_err()
+ * @gt: the &xe_gt pointer to use in xe_gt_err()
+ *
+ * Return: The &drm_printer object.
+ */
+static inline struct drm_printer xe_gt_err_printer(struct xe_gt *gt)
+{
+	struct drm_printer p = {
+		.printfn = __xe_gt_printfn_err,
+		.arg = gt,
+	};
+	return p;
+}
+
+/**
+ * xe_gt_info_printer - Construct a &drm_printer that outputs to xe_gt_info()
+ * @gt: the &xe_gt pointer to use in xe_gt_info()
+ *
+ * Return: The &drm_printer object.
+ */
+static inline struct drm_printer xe_gt_info_printer(struct xe_gt *gt)
+{
+	struct drm_printer p = {
+		.printfn = __xe_gt_printfn_info,
+		.arg = gt,
+	};
+	return p;
+}
+
 #endif

From e8b9b3097ca82f29d4e4e32d0ad79732ed041b7c Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 20:06:21 +0100
Subject: [PATCH 015/200] drm/xe: Report TLB timeout using GT oriented
 functions

We track TLB invalidation seqno per GT and we have GT oriented
message helpers, so it's better to use GT oriented log functions.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231218190629.502-3-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 7eef23a00d77ee..e3a4131ebb5879 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -8,6 +8,7 @@
 #include "abi/guc_actions_abi.h"
 #include "xe_device.h"
 #include "xe_gt.h"
+#include "xe_gt_printk.h"
 #include "xe_guc.h"
 #include "xe_guc_ct.h"
 #include "xe_trace.h"
@@ -30,8 +31,8 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
 			break;
 
 		trace_xe_gt_tlb_invalidation_fence_timeout(fence);
-		drm_err(&gt_to_xe(gt)->drm, "gt%d: TLB invalidation fence timeout, seqno=%d recv=%d",
-			gt->info.id, fence->seqno, gt->tlb_invalidation.seqno_recv);
+		xe_gt_err(gt, "TLB invalidation fence timeout, seqno=%d recv=%d",
+			  fence->seqno, gt->tlb_invalidation.seqno_recv);
 
 		list_del(&fence->link);
 		fence->base.error = -ETIME;
@@ -312,9 +313,7 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
  */
 int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
 {
-	struct xe_device *xe = gt_to_xe(gt);
 	struct xe_guc *guc = &gt->uc.guc;
-	struct drm_printer p = drm_err_printer(__func__);
 	int ret;
 
 	/*
@@ -325,8 +324,10 @@ int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
 				 tlb_invalidation_seqno_past(gt, seqno),
 				 TLB_TIMEOUT);
 	if (!ret) {
-		drm_err(&xe->drm, "gt%d: TLB invalidation time'd out, seqno=%d, recv=%d\n",
-			gt->info.id, seqno, gt->tlb_invalidation.seqno_recv);
+		struct drm_printer p = xe_gt_err_printer(gt);
+
+		xe_gt_err(gt, "TLB invalidation time'd out, seqno=%d, recv=%d\n",
+			  seqno, gt->tlb_invalidation.seqno_recv);
 		xe_guc_ct_print(&guc->ct, &p, true);
 		return -ETIME;
 	}

From 587c73343ac79000223b05e1e58a0657a0b59f01 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 20:06:22 +0100
Subject: [PATCH 016/200] drm/xe: Introduce GuC Doorbells Manager
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The GFX doorbell solution provides a mechanism for submission of
workload to the graphics hardware by a ring3 application without
the penalty of ring transition for each workload submission.

This feature is not currently used by the Linux drivers, but in
SR-IOV mode the doorbells are treated as shared resource and the
PF driver must be able to provision exclusive range of doorbells
IDs across all enabled VFs.

Introduce simple GuC doorbell ID manager that will be used by the
PF driver for VFs provisioning and can later be used by submission
code once we are ready to switch from H2G based notifications to
doorbells mechanism.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20231218190629.502-4-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/Makefile        |   1 +
 drivers/gpu/drm/xe/xe_guc_db_mgr.c | 257 +++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_db_mgr.h |  22 +++
 drivers/gpu/drm/xe/xe_guc_types.h  |  15 ++
 drivers/gpu/drm/xe/xe_uc.c         |   5 +
 5 files changed, 300 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_guc_db_mgr.c
 create mode 100644 drivers/gpu/drm/xe/xe_guc_db_mgr.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 9214915d9566cf..a1aa0028e153d1 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -93,6 +93,7 @@ xe-y += xe_bb.o \
 	xe_guc.o \
 	xe_guc_ads.o \
 	xe_guc_ct.o \
+	xe_guc_db_mgr.o \
 	xe_guc_debugfs.o \
 	xe_guc_hwconfig.o \
 	xe_guc_log.o \
diff --git a/drivers/gpu/drm/xe/xe_guc_db_mgr.c b/drivers/gpu/drm/xe/xe_guc_db_mgr.c
new file mode 100644
index 00000000000000..15816f90e96058
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_guc_db_mgr.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/bitmap.h>
+#include <linux/mutex.h>
+
+#include <drm/drm_managed.h>
+
+#include "regs/xe_guc_regs.h"
+
+#include "xe_assert.h"
+#include "xe_gt_printk.h"
+#include "xe_guc.h"
+#include "xe_guc_db_mgr.h"
+#include "xe_guc_types.h"
+
+/**
+ * DOC: GuC Doorbells
+ *
+ * The GFX doorbell solution provides a mechanism for submission of workload
+ * to the graphics hardware by a ring3 application without the penalty of
+ * ring transition for each workload submission.
+ *
+ * In SR-IOV mode, the doorbells are treated as shared resource and PF must
+ * be able to provision exclusive range of IDs across VFs, which may want to
+ * use this feature.
+ */
+
+static struct xe_guc *dbm_to_guc(struct xe_guc_db_mgr *dbm)
+{
+	return container_of(dbm, struct xe_guc, dbm);
+}
+
+static struct xe_gt *dbm_to_gt(struct xe_guc_db_mgr *dbm)
+{
+	return guc_to_gt(dbm_to_guc(dbm));
+}
+
+static struct xe_device *dbm_to_xe(struct xe_guc_db_mgr *dbm)
+{
+	return gt_to_xe(dbm_to_gt(dbm));
+}
+
+#define dbm_assert(_dbm, _cond)		xe_gt_assert(dbm_to_gt(_dbm), _cond)
+#define dbm_mutex(_dbm)			(&dbm_to_guc(_dbm)->submission_state.lock)
+
+static void __fini_dbm(struct drm_device *drm, void *arg)
+{
+	struct xe_guc_db_mgr *dbm = arg;
+	unsigned int weight;
+
+	mutex_lock(dbm_mutex(dbm));
+
+	weight = bitmap_weight(dbm->bitmap, dbm->count);
+	if (weight) {
+		struct drm_printer p = xe_gt_info_printer(dbm_to_gt(dbm));
+
+		xe_gt_err(dbm_to_gt(dbm), "GuC doorbells manager unclean (%u/%u)\n",
+			  weight, dbm->count);
+		xe_guc_db_mgr_print(dbm, &p, 1);
+	}
+
+	bitmap_free(dbm->bitmap);
+	dbm->bitmap = NULL;
+	dbm->count = 0;
+
+	mutex_unlock(dbm_mutex(dbm));
+}
+
+/**
+ * xe_guc_db_mgr_init() - Initialize GuC Doorbells Manager.
+ * @dbm: the &xe_guc_db_mgr to initialize
+ * @count: number of doorbells to manage
+ *
+ * The bare-metal or PF driver can pass ~0 as &count to indicate that all
+ * doorbells supported by the hardware are available for use.
+ *
+ * Only VF's drivers will have to provide explicit number of doorbells IDs
+ * that they can use.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_guc_db_mgr_init(struct xe_guc_db_mgr *dbm, unsigned int count)
+{
+	int ret;
+
+	if (count == ~0)
+		count = GUC_NUM_DOORBELLS;
+
+	dbm_assert(dbm, !dbm->bitmap);
+	dbm_assert(dbm, count <= GUC_NUM_DOORBELLS);
+
+	if (!count)
+		goto done;
+
+	dbm->bitmap = bitmap_zalloc(count, GFP_KERNEL);
+	if (!dbm->bitmap)
+		return -ENOMEM;
+	dbm->count = count;
+
+	ret = drmm_add_action_or_reset(&dbm_to_xe(dbm)->drm, __fini_dbm, dbm);
+	if (ret)
+		return ret;
+done:
+	xe_gt_dbg(dbm_to_gt(dbm), "using %u doorbell(s)\n", dbm->count);
+	return 0;
+}
+
+static int dbm_reserve_chunk_locked(struct xe_guc_db_mgr *dbm,
+				    unsigned int count, unsigned int spare)
+{
+	unsigned int used;
+	int index;
+
+	dbm_assert(dbm, count);
+	dbm_assert(dbm, count <= GUC_NUM_DOORBELLS);
+	dbm_assert(dbm, dbm->count <= GUC_NUM_DOORBELLS);
+	lockdep_assert_held(dbm_mutex(dbm));
+
+	if (!dbm->count)
+		return -ENODATA;
+
+	if (spare) {
+		used = bitmap_weight(dbm->bitmap, dbm->count);
+		if (used + count + spare > dbm->count)
+			return -EDQUOT;
+	}
+
+	index = bitmap_find_next_zero_area(dbm->bitmap, dbm->count, 0, count, 0);
+	if (index >= dbm->count)
+		return -ENOSPC;
+
+	bitmap_set(dbm->bitmap, index, count);
+
+	return index;
+}
+
+static void dbm_release_chunk_locked(struct xe_guc_db_mgr *dbm,
+				     unsigned int start, unsigned int count)
+{
+	dbm_assert(dbm, count);
+	dbm_assert(dbm, count <= GUC_NUM_DOORBELLS);
+	dbm_assert(dbm, dbm->count);
+	dbm_assert(dbm, dbm->count <= GUC_NUM_DOORBELLS);
+	lockdep_assert_held(dbm_mutex(dbm));
+
+	if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) {
+		unsigned int n;
+
+		for (n = 0; n < count; n++)
+			dbm_assert(dbm, test_bit(start + n, dbm->bitmap));
+	}
+	bitmap_clear(dbm->bitmap, start, count);
+}
+
+/**
+ * xe_guc_db_mgr_reserve_id_locked() - Reserve a single GuC Doorbell ID.
+ * @dbm: the &xe_guc_db_mgr
+ *
+ * This function expects that submission lock is already taken.
+ *
+ * Return: ID of the allocated GuC doorbell or a negative error code on failure.
+ */
+int xe_guc_db_mgr_reserve_id_locked(struct xe_guc_db_mgr *dbm)
+{
+	return dbm_reserve_chunk_locked(dbm, 1, 0);
+}
+
+/**
+ * xe_guc_db_mgr_release_id_locked() - Release a single GuC Doorbell ID.
+ * @dbm: the &xe_guc_db_mgr
+ * @id: the GuC Doorbell ID to release
+ *
+ * This function expects that submission lock is already taken.
+ */
+void xe_guc_db_mgr_release_id_locked(struct xe_guc_db_mgr *dbm, unsigned int id)
+{
+	return dbm_release_chunk_locked(dbm, id, 1);
+}
+
+/**
+ * xe_guc_db_mgr_reserve_range() - Reserve a range of GuC Doorbell IDs.
+ * @dbm: the &xe_guc_db_mgr
+ * @count: number of GuC doorbell IDs to reserve
+ * @spare: number of GuC doorbell IDs to keep available
+ *
+ * This function is dedicated for the for use by the PF which expects that
+ * allocated range for the VF will be contiguous and that there will be at
+ * least &spare IDs still available for the PF use after this reservation.
+ *
+ * Return: starting ID of the allocated GuC doorbell ID range or
+ *         a negative error code on failure.
+ */
+int xe_guc_db_mgr_reserve_range(struct xe_guc_db_mgr *dbm,
+				unsigned int count, unsigned int spare)
+{
+	int ret;
+
+	mutex_lock(dbm_mutex(dbm));
+	ret = dbm_reserve_chunk_locked(dbm, count, spare);
+	mutex_unlock(dbm_mutex(dbm));
+
+	return ret;
+}
+
+/**
+ * xe_guc_db_mgr_release_range() - Release a range of Doorbell IDs.
+ * @dbm: the &xe_guc_db_mgr
+ * @start: the starting ID of GuC doorbell ID range to release
+ * @count: number of GuC doorbell IDs to release
+ */
+void xe_guc_db_mgr_release_range(struct xe_guc_db_mgr *dbm,
+				 unsigned int start, unsigned int count)
+{
+	mutex_lock(dbm_mutex(dbm));
+	dbm_release_chunk_locked(dbm, start, count);
+	mutex_unlock(dbm_mutex(dbm));
+}
+
+/**
+ * xe_guc_db_mgr_print() - Print status of GuC Doorbells Manager.
+ * @dbm: the &xe_guc_db_mgr to print
+ * @p: the &drm_printer to print to
+ * @indent: tab indentation level
+ */
+void xe_guc_db_mgr_print(struct xe_guc_db_mgr *dbm,
+			 struct drm_printer *p, int indent)
+{
+	unsigned int rs, re;
+	unsigned int total;
+
+	drm_printf_indent(p, indent, "count: %u\n", dbm->count);
+	if (!dbm->bitmap)
+		return;
+
+	mutex_lock(dbm_mutex(dbm));
+
+	total = 0;
+	for_each_clear_bitrange(rs, re, dbm->bitmap, dbm->count) {
+		drm_printf_indent(p, indent, "available range: %u..%u (%u)\n",
+				  rs, re - 1, re - rs);
+		total += re - rs;
+	}
+	drm_printf_indent(p, indent, "available total: %u\n", total);
+
+	total = 0;
+	for_each_set_bitrange(rs, re, dbm->bitmap, dbm->count) {
+		drm_printf_indent(p, indent, "reserved range: %u..%u (%u)\n",
+				  rs, re - 1, re - rs);
+		total += re - rs;
+	}
+	drm_printf_indent(p, indent, "reserved total: %u\n", total);
+
+	mutex_unlock(dbm_mutex(dbm));
+}
diff --git a/drivers/gpu/drm/xe/xe_guc_db_mgr.h b/drivers/gpu/drm/xe/xe_guc_db_mgr.h
new file mode 100644
index 00000000000000..c250fa0ca9d63b
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_guc_db_mgr.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_GUC_DB_MGR_H_
+#define _XE_GUC_DB_MGR_H_
+
+struct drm_printer;
+struct xe_guc_db_mgr;
+
+int xe_guc_db_mgr_init(struct xe_guc_db_mgr *dbm, unsigned int count);
+
+int xe_guc_db_mgr_reserve_id_locked(struct xe_guc_db_mgr *dbm);
+void xe_guc_db_mgr_release_id_locked(struct xe_guc_db_mgr *dbm, unsigned int id);
+
+int xe_guc_db_mgr_reserve_range(struct xe_guc_db_mgr *dbm, unsigned int count, unsigned int spare);
+void xe_guc_db_mgr_release_range(struct xe_guc_db_mgr *dbm, unsigned int start, unsigned int count);
+
+void xe_guc_db_mgr_print(struct xe_guc_db_mgr *dbm, struct drm_printer *p, int indent);
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
index cd80802e891893..16de203c62a7e1 100644
--- a/drivers/gpu/drm/xe/xe_guc_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_types.h
@@ -17,6 +17,19 @@
 #include "xe_guc_pc_types.h"
 #include "xe_uc_fw_types.h"
 
+/**
+ * struct xe_guc_db_mgr - GuC Doorbells Manager.
+ *
+ * Note: GuC Doorbells Manager is relying on &xe_guc::submission_state.lock
+ * to protect its members.
+ */
+struct xe_guc_db_mgr {
+	/** @count: number of doorbells to manage */
+	unsigned int count;
+	/** @bitmap: bitmap to track allocated doorbells */
+	unsigned long *bitmap;
+};
+
 /**
  * struct xe_guc - Graphic micro controller
  */
@@ -31,6 +44,8 @@ struct xe_guc {
 	struct xe_guc_ct ct;
 	/** @pc: GuC Power Conservation */
 	struct xe_guc_pc pc;
+	/** @dbm: GuC Doorbell Manager */
+	struct xe_guc_db_mgr dbm;
 	/** @submission_state: GuC submission state */
 	struct {
 		/** @exec_queue_lookup: Lookup an xe_engine from guc_id */
diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
index 25e1ddfd2f86a7..4408ea1751e7ee 100644
--- a/drivers/gpu/drm/xe/xe_uc.c
+++ b/drivers/gpu/drm/xe/xe_uc.c
@@ -9,6 +9,7 @@
 #include "xe_gsc.h"
 #include "xe_gt.h"
 #include "xe_guc.h"
+#include "xe_guc_db_mgr.h"
 #include "xe_guc_pc.h"
 #include "xe_guc_submit.h"
 #include "xe_huc.h"
@@ -60,6 +61,10 @@ int xe_uc_init(struct xe_uc *uc)
 	if (ret)
 		goto err;
 
+	ret = xe_guc_db_mgr_init(&uc->guc.dbm, ~0);
+	if (ret)
+		goto err;
+
 	return 0;
 
 err:

From 4ceb8645bd85aeefab0929ada82a95603c6e1f2b Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 20:06:23 +0100
Subject: [PATCH 017/200] drm/xe/kunit: Set SR-IOV mode of the fake device

We want to add code that will check the driver's SR-IOV mode.

Update xe_pci_fake_device_init() and struct xe_pci_fake_data to
either explicitly specify desired SR-IOV mode of the fake device
or fallback to the default bare-metal mode.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231218190629.502-5-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_pci.c      | 3 +++
 drivers/gpu/drm/xe/tests/xe_pci_test.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/xe/tests/xe_pci.c b/drivers/gpu/drm/xe/tests/xe_pci.c
index 602793644f61d7..f62809ca8b51f5 100644
--- a/drivers/gpu/drm/xe/tests/xe_pci.c
+++ b/drivers/gpu/drm/xe/tests/xe_pci.c
@@ -156,6 +156,9 @@ int xe_pci_fake_device_init(struct xe_device *xe)
 		return -ENODEV;
 
 done:
+	xe->sriov.__mode = data && data->sriov_mode ?
+			   data->sriov_mode : XE_SRIOV_MODE_NONE;
+
 	kunit_activate_static_stub(test, read_gmdid, fake_read_gmdid);
 
 	xe_info_init_early(xe, desc, subplatform_desc);
diff --git a/drivers/gpu/drm/xe/tests/xe_pci_test.h b/drivers/gpu/drm/xe/tests/xe_pci_test.h
index 811ffe5bd9fd2e..f40dcec8399291 100644
--- a/drivers/gpu/drm/xe/tests/xe_pci_test.h
+++ b/drivers/gpu/drm/xe/tests/xe_pci_test.h
@@ -9,6 +9,7 @@
 #include <linux/types.h>
 
 #include "xe_platform_types.h"
+#include "xe_sriov_types.h"
 
 struct xe_device;
 struct xe_graphics_desc;
@@ -23,6 +24,7 @@ void xe_call_for_each_graphics_ip(xe_graphics_fn xe_fn);
 void xe_call_for_each_media_ip(xe_media_fn xe_fn);
 
 struct xe_pci_fake_data {
+	enum xe_sriov_mode sriov_mode;
 	enum xe_platform platform;
 	enum xe_subplatform subplatform;
 	u32 graphics_verx100;

From 5095d13d758b4e602eb78771abf65ee5dc867645 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 20:06:24 +0100
Subject: [PATCH 018/200] drm/xe/kunit: Define helper functions to allocate
 fake xe device

There will be more KUnit tests added that will require fake device.
Define generic helper functions to avoid code duplications.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231218190629.502-6-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/Makefile                 |  3 +
 drivers/gpu/drm/xe/tests/xe_kunit_helpers.c | 80 +++++++++++++++++++++
 drivers/gpu/drm/xe/tests/xe_kunit_helpers.h | 17 +++++
 3 files changed, 100 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/tests/xe_kunit_helpers.c
 create mode 100644 drivers/gpu/drm/xe/tests/xe_kunit_helpers.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index a1aa0028e153d1..df8601d6a59f0c 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -156,6 +156,9 @@ xe-$(CONFIG_PCI_IOV) += \
 	xe_lmtt_2l.o \
 	xe_lmtt_ml.o
 
+xe-$(CONFIG_DRM_XE_KUNIT_TEST) += \
+	tests/xe_kunit_helpers.o
+
 # i915 Display compat #defines and #includes
 subdir-ccflags-$(CONFIG_DRM_XE_DISPLAY) += \
 	-I$(srctree)/$(src)/display/ext \
diff --git a/drivers/gpu/drm/xe/tests/xe_kunit_helpers.c b/drivers/gpu/drm/xe/tests/xe_kunit_helpers.c
new file mode 100644
index 00000000000000..6d72dbf061393e
--- /dev/null
+++ b/drivers/gpu/drm/xe/tests/xe_kunit_helpers.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0 AND MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <kunit/test.h>
+#include <kunit/static_stub.h>
+#include <kunit/visibility.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_kunit_helpers.h>
+
+#include "tests/xe_kunit_helpers.h"
+#include "tests/xe_pci_test.h"
+#include "xe_device_types.h"
+
+/**
+ * xe_kunit_helper_alloc_xe_device - Allocate a &xe_device for a KUnit test.
+ * @test: the &kunit where this &xe_device will be used
+ * @dev: The parent device object
+ *
+ * This function allocates xe_device using drm_kunit_helper_alloc_device().
+ * The xe_device allocation is managed by the test.
+ *
+ * @dev should be allocated using drm_kunit_helper_alloc_device().
+ *
+ * This function uses KUNIT_ASSERT to detect any allocation failures.
+ *
+ * Return: A pointer to the new &xe_device.
+ */
+struct xe_device *xe_kunit_helper_alloc_xe_device(struct kunit *test,
+						  struct device *dev)
+{
+	struct xe_device *xe;
+
+	xe = drm_kunit_helper_alloc_drm_device(test, dev,
+					       struct xe_device,
+					       drm, DRIVER_GEM);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, xe);
+	return xe;
+}
+EXPORT_SYMBOL_IF_KUNIT(xe_kunit_helper_alloc_xe_device);
+
+/**
+ * xe_kunit_helper_xe_device_test_init - Prepare a &xe_device for a KUnit test.
+ * @test: the &kunit where this fake &xe_device will be used
+ *
+ * This function allocates and initializes a fake &xe_device and stores its
+ * pointer as &kunit.priv to allow the test code to access it.
+ *
+ * This function can be directly used as custom implementation of
+ * &kunit_suite.init.
+ *
+ * It is possible to prepare specific variant of the fake &xe_device by passing
+ * in &kunit.priv pointer to the struct xe_pci_fake_data supplemented with
+ * desired parameters prior to calling this function.
+ *
+ * This function uses KUNIT_ASSERT to detect any failures.
+ *
+ * Return: Always 0.
+ */
+int xe_kunit_helper_xe_device_test_init(struct kunit *test)
+{
+	struct xe_device *xe;
+	struct device *dev;
+	int err;
+
+	dev = drm_kunit_helper_alloc_device(test);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, dev);
+
+	xe = xe_kunit_helper_alloc_xe_device(test, dev);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, xe);
+
+	err = xe_pci_fake_device_init(xe);
+	KUNIT_ASSERT_EQ(test, err, 0);
+
+	test->priv = xe;
+	return 0;
+}
+EXPORT_SYMBOL_IF_KUNIT(xe_kunit_helper_xe_device_test_init);
diff --git a/drivers/gpu/drm/xe/tests/xe_kunit_helpers.h b/drivers/gpu/drm/xe/tests/xe_kunit_helpers.h
new file mode 100644
index 00000000000000..067a1babf0497c
--- /dev/null
+++ b/drivers/gpu/drm/xe/tests/xe_kunit_helpers.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 AND MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_KUNIT_HELPERS_H_
+#define _XE_KUNIT_HELPERS_H_
+
+struct device;
+struct kunit;
+struct xe_device;
+
+struct xe_device *xe_kunit_helper_alloc_xe_device(struct kunit *test,
+						  struct device *dev);
+int xe_kunit_helper_xe_device_test_init(struct kunit *test);
+
+#endif

From 0b75475723b182400a4bfa5aaff9a969afdfdb76 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 20:06:25 +0100
Subject: [PATCH 019/200] drm/xe/kunit: Restore test->priv when done with fake
 xe device

Current KUnit implementation does not reset test->priv in case of
parametrized tests and that may lead to wrongly treat our output
pointer to fake xe_device from first call as input pointer with
xe_pci_fake_data on subsequent calls.  Restore test->priv to
original value to avoid invalid access.

Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231218190629.502-7-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_kunit_helpers.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/xe/tests/xe_kunit_helpers.c b/drivers/gpu/drm/xe/tests/xe_kunit_helpers.c
index 6d72dbf061393e..fefe79b3b75a42 100644
--- a/drivers/gpu/drm/xe/tests/xe_kunit_helpers.c
+++ b/drivers/gpu/drm/xe/tests/xe_kunit_helpers.c
@@ -41,6 +41,13 @@ struct xe_device *xe_kunit_helper_alloc_xe_device(struct kunit *test,
 }
 EXPORT_SYMBOL_IF_KUNIT(xe_kunit_helper_alloc_xe_device);
 
+static void kunit_action_restore_priv(void *priv)
+{
+	struct kunit *test = kunit_get_current_test();
+
+	test->priv = priv;
+}
+
 /**
  * xe_kunit_helper_xe_device_test_init - Prepare a &xe_device for a KUnit test.
  * @test: the &kunit where this fake &xe_device will be used
@@ -74,6 +81,9 @@ int xe_kunit_helper_xe_device_test_init(struct kunit *test)
 	err = xe_pci_fake_device_init(xe);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
+	err = kunit_add_action_or_reset(test, kunit_action_restore_priv, test->priv);
+	KUNIT_ASSERT_EQ(test, err, 0);
+
 	test->priv = xe;
 	return 0;
 }

From d8ba1ede4cbd8df3a2a9a8a089df04398b8a7db6 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 20:06:26 +0100
Subject: [PATCH 020/200] drm/xe/kunit: Use xe kunit helper in RTP test

Replace drm_kunit_helper_alloc_drm_device with xe helper.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231218190629.502-8-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_rtp_test.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_rtp_test.c b/drivers/gpu/drm/xe/tests/xe_rtp_test.c
index 4a697289767569..21e77026518b6e 100644
--- a/drivers/gpu/drm/xe/tests/xe_rtp_test.c
+++ b/drivers/gpu/drm/xe/tests/xe_rtp_test.c
@@ -15,6 +15,7 @@
 #include "regs/xe_reg_defs.h"
 #include "xe_device.h"
 #include "xe_device_types.h"
+#include "xe_kunit_helpers.h"
 #include "xe_pci_test.h"
 #include "xe_reg_sr.h"
 #include "xe_rtp.h"
@@ -276,9 +277,7 @@ static int xe_rtp_test_init(struct kunit *test)
 	dev = drm_kunit_helper_alloc_device(test);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, dev);
 
-	xe = drm_kunit_helper_alloc_drm_device(test, dev,
-					       struct xe_device,
-					       drm, DRIVER_GEM);
+	xe = xe_kunit_helper_alloc_xe_device(test, dev);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, xe);
 
 	/* Initialize an empty device */

From 29d52c9c1b9d1abdbddc9b6cbd8eb2d70b025e6c Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 20:06:27 +0100
Subject: [PATCH 021/200] drm/xe/kunit: Use xe kunit helper in WA test

Use xe helper code to allocate fake xe device.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231218190629.502-9-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_wa_test.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_wa_test.c b/drivers/gpu/drm/xe/tests/xe_wa_test.c
index a53c22a1958247..3d7f3ef62ae937 100644
--- a/drivers/gpu/drm/xe/tests/xe_wa_test.c
+++ b/drivers/gpu/drm/xe/tests/xe_wa_test.c
@@ -9,6 +9,7 @@
 #include <kunit/test.h>
 
 #include "xe_device.h"
+#include "xe_kunit_helpers.h"
 #include "xe_pci_test.h"
 #include "xe_reg_sr.h"
 #include "xe_tuning.h"
@@ -108,9 +109,7 @@ static int xe_wa_test_init(struct kunit *test)
 	dev = drm_kunit_helper_alloc_device(test);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, dev);
 
-	xe = drm_kunit_helper_alloc_drm_device(test, dev,
-					       struct xe_device,
-					       drm, DRIVER_GEM);
+	xe = xe_kunit_helper_alloc_xe_device(test, dev);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, xe);
 
 	test->priv = &data;

From 90ad6f3017894860429bc1f8820024e0b177e676 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 20:06:28 +0100
Subject: [PATCH 022/200] drm/xe/kunit: Enable CONFIG_LOCKDEP in tests

Tests might use locking, better to have LOCKDEP enabled.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231218190629.502-10-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/.kunitconfig | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/xe/.kunitconfig b/drivers/gpu/drm/xe/.kunitconfig
index 9590eac91af39d..ad4b9b4a9f5544 100644
--- a/drivers/gpu/drm/xe/.kunitconfig
+++ b/drivers/gpu/drm/xe/.kunitconfig
@@ -11,3 +11,8 @@ CONFIG_DRM_XE_DISPLAY=n
 CONFIG_EXPERT=y
 CONFIG_FB=y
 CONFIG_DRM_XE_KUNIT_TEST=y
+CONFIG_LOCK_DEBUGGING_SUPPORT=y
+CONFIG_PROVE_LOCKING=y
+CONFIG_DEBUG_MUTEXES=y
+CONFIG_LOCKDEP=y
+CONFIG_DEBUG_LOCKDEP=y

From 6e2546131750a7c5e5dc668f9050d6a99c095d51 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Mon, 18 Dec 2023 20:06:29 +0100
Subject: [PATCH 023/200] drm/xe/kunit: Add GuC Doorbells Manager tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add few tests to make sure that basic usage scenarios of the GuC
Doorbells Manager are implemented correctly.

Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20231218190629.502-11-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_guc_db_mgr_test.c | 201 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_db_mgr.c            |   4 +
 2 files changed, 205 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/tests/xe_guc_db_mgr_test.c

diff --git a/drivers/gpu/drm/xe/tests/xe_guc_db_mgr_test.c b/drivers/gpu/drm/xe/tests/xe_guc_db_mgr_test.c
new file mode 100644
index 00000000000000..a87a7b4b040a2f
--- /dev/null
+++ b/drivers/gpu/drm/xe/tests/xe_guc_db_mgr_test.c
@@ -0,0 +1,201 @@
+// SPDX-License-Identifier: GPL-2.0 AND MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <kunit/test.h>
+
+#include "xe_device.h"
+#include "xe_kunit_helpers.h"
+
+static int guc_dbm_test_init(struct kunit *test)
+{
+	struct xe_guc_db_mgr *dbm;
+
+	xe_kunit_helper_xe_device_test_init(test);
+	dbm = &xe_device_get_gt(test->priv, 0)->uc.guc.dbm;
+
+	mutex_init(dbm_mutex(dbm));
+	test->priv = dbm;
+	return 0;
+}
+
+static void test_empty(struct kunit *test)
+{
+	struct xe_guc_db_mgr *dbm = test->priv;
+
+	KUNIT_ASSERT_EQ(test, xe_guc_db_mgr_init(dbm, 0), 0);
+	KUNIT_ASSERT_EQ(test, dbm->count, 0);
+
+	mutex_lock(dbm_mutex(dbm));
+	KUNIT_EXPECT_LT(test, xe_guc_db_mgr_reserve_id_locked(dbm), 0);
+	mutex_unlock(dbm_mutex(dbm));
+
+	KUNIT_EXPECT_LT(test, xe_guc_db_mgr_reserve_range(dbm, 1, 0), 0);
+}
+
+static void test_default(struct kunit *test)
+{
+	struct xe_guc_db_mgr *dbm = test->priv;
+
+	KUNIT_ASSERT_EQ(test, xe_guc_db_mgr_init(dbm, ~0), 0);
+	KUNIT_ASSERT_EQ(test, dbm->count, GUC_NUM_DOORBELLS);
+}
+
+static const unsigned int guc_dbm_params[] = {
+	GUC_NUM_DOORBELLS / 64,
+	GUC_NUM_DOORBELLS / 32,
+	GUC_NUM_DOORBELLS / 8,
+	GUC_NUM_DOORBELLS,
+};
+
+static void uint_param_get_desc(const unsigned int *p, char *desc)
+{
+	snprintf(desc, KUNIT_PARAM_DESC_SIZE, "%u", *p);
+}
+
+KUNIT_ARRAY_PARAM(guc_dbm, guc_dbm_params, uint_param_get_desc);
+
+static void test_size(struct kunit *test)
+{
+	const unsigned int *p = test->param_value;
+	struct xe_guc_db_mgr *dbm = test->priv;
+	unsigned int n;
+	int id;
+
+	KUNIT_ASSERT_EQ(test, xe_guc_db_mgr_init(dbm, *p), 0);
+	KUNIT_ASSERT_EQ(test, dbm->count, *p);
+
+	mutex_lock(dbm_mutex(dbm));
+	for (n = 0; n < *p; n++) {
+		KUNIT_EXPECT_GE(test, id = xe_guc_db_mgr_reserve_id_locked(dbm), 0);
+		KUNIT_EXPECT_LT(test, id, dbm->count);
+	}
+	KUNIT_EXPECT_LT(test, xe_guc_db_mgr_reserve_id_locked(dbm), 0);
+	mutex_unlock(dbm_mutex(dbm));
+
+	mutex_lock(dbm_mutex(dbm));
+	for (n = 0; n < *p; n++)
+		xe_guc_db_mgr_release_id_locked(dbm, n);
+	mutex_unlock(dbm_mutex(dbm));
+}
+
+static void test_reuse(struct kunit *test)
+{
+	const unsigned int *p = test->param_value;
+	struct xe_guc_db_mgr *dbm = test->priv;
+	unsigned int n;
+
+	KUNIT_ASSERT_EQ(test, xe_guc_db_mgr_init(dbm, *p), 0);
+
+	mutex_lock(dbm_mutex(dbm));
+	for (n = 0; n < *p; n++)
+		KUNIT_EXPECT_GE(test, xe_guc_db_mgr_reserve_id_locked(dbm), 0);
+	KUNIT_EXPECT_LT(test, xe_guc_db_mgr_reserve_id_locked(dbm), 0);
+	mutex_unlock(dbm_mutex(dbm));
+
+	mutex_lock(dbm_mutex(dbm));
+	for (n = 0; n < *p; n++) {
+		xe_guc_db_mgr_release_id_locked(dbm, n);
+		KUNIT_EXPECT_EQ(test, xe_guc_db_mgr_reserve_id_locked(dbm), n);
+	}
+	KUNIT_EXPECT_LT(test, xe_guc_db_mgr_reserve_id_locked(dbm), 0);
+	mutex_unlock(dbm_mutex(dbm));
+
+	mutex_lock(dbm_mutex(dbm));
+	for (n = 0; n < *p; n++)
+		xe_guc_db_mgr_release_id_locked(dbm, n);
+	mutex_unlock(dbm_mutex(dbm));
+}
+
+static void test_range_overlap(struct kunit *test)
+{
+	const unsigned int *p = test->param_value;
+	struct xe_guc_db_mgr *dbm = test->priv;
+	int id1, id2, id3;
+	unsigned int n;
+
+	KUNIT_ASSERT_EQ(test, xe_guc_db_mgr_init(dbm, ~0), 0);
+	KUNIT_ASSERT_LE(test, *p, dbm->count);
+
+	KUNIT_ASSERT_GE(test, id1 = xe_guc_db_mgr_reserve_range(dbm, *p, 0), 0);
+	for (n = 0; n < dbm->count - *p; n++) {
+		KUNIT_ASSERT_GE(test, id2 = xe_guc_db_mgr_reserve_range(dbm, 1, 0), 0);
+		KUNIT_ASSERT_NE(test, id2, id1);
+		KUNIT_ASSERT_NE_MSG(test, id2 < id1, id2 > id1 + *p - 1,
+				    "id1=%d id2=%d", id1, id2);
+	}
+	KUNIT_ASSERT_LT(test, xe_guc_db_mgr_reserve_range(dbm, 1, 0), 0);
+	xe_guc_db_mgr_release_range(dbm, 0, dbm->count);
+
+	if (*p >= 1) {
+		KUNIT_ASSERT_GE(test, id1 = xe_guc_db_mgr_reserve_range(dbm, 1, 0), 0);
+		KUNIT_ASSERT_GE(test, id2 = xe_guc_db_mgr_reserve_range(dbm, *p - 1, 0), 0);
+		KUNIT_ASSERT_NE(test, id2, id1);
+		KUNIT_ASSERT_NE_MSG(test, id1 < id2, id1 > id2 + *p - 2,
+				    "id1=%d id2=%d", id1, id2);
+		for (n = 0; n < dbm->count - *p; n++) {
+			KUNIT_ASSERT_GE(test, id3 = xe_guc_db_mgr_reserve_range(dbm, 1, 0), 0);
+			KUNIT_ASSERT_NE(test, id3, id1);
+			KUNIT_ASSERT_NE(test, id3, id2);
+			KUNIT_ASSERT_NE_MSG(test, id3 < id2, id3 > id2 + *p - 2,
+					    "id3=%d id2=%d", id3, id2);
+		}
+		KUNIT_ASSERT_LT(test, xe_guc_db_mgr_reserve_range(dbm, 1, 0), 0);
+		xe_guc_db_mgr_release_range(dbm, 0, dbm->count);
+	}
+}
+
+static void test_range_compact(struct kunit *test)
+{
+	const unsigned int *p = test->param_value;
+	struct xe_guc_db_mgr *dbm = test->priv;
+	unsigned int n;
+
+	KUNIT_ASSERT_EQ(test, xe_guc_db_mgr_init(dbm, ~0), 0);
+	KUNIT_ASSERT_NE(test, *p, 0);
+	KUNIT_ASSERT_LE(test, *p, dbm->count);
+	if (dbm->count % *p)
+		kunit_skip(test, "must be divisible");
+
+	KUNIT_ASSERT_GE(test, xe_guc_db_mgr_reserve_range(dbm, *p, 0), 0);
+	for (n = 1; n < dbm->count / *p; n++)
+		KUNIT_ASSERT_GE(test, xe_guc_db_mgr_reserve_range(dbm, *p, 0), 0);
+	KUNIT_ASSERT_LT(test, xe_guc_db_mgr_reserve_range(dbm, 1, 0), 0);
+	xe_guc_db_mgr_release_range(dbm, 0, dbm->count);
+}
+
+static void test_range_spare(struct kunit *test)
+{
+	const unsigned int *p = test->param_value;
+	struct xe_guc_db_mgr *dbm = test->priv;
+	int id;
+
+	KUNIT_ASSERT_EQ(test, xe_guc_db_mgr_init(dbm, ~0), 0);
+	KUNIT_ASSERT_LE(test, *p, dbm->count);
+
+	KUNIT_ASSERT_LT(test, xe_guc_db_mgr_reserve_range(dbm, *p, dbm->count), 0);
+	KUNIT_ASSERT_LT(test, xe_guc_db_mgr_reserve_range(dbm, *p, dbm->count - *p + 1), 0);
+	KUNIT_ASSERT_EQ(test, id = xe_guc_db_mgr_reserve_range(dbm, *p, dbm->count - *p), 0);
+	KUNIT_ASSERT_LT(test, xe_guc_db_mgr_reserve_range(dbm, 1, dbm->count - *p), 0);
+	xe_guc_db_mgr_release_range(dbm, id, *p);
+}
+
+static struct kunit_case guc_dbm_test_cases[] = {
+	KUNIT_CASE(test_empty),
+	KUNIT_CASE(test_default),
+	KUNIT_CASE_PARAM(test_size, guc_dbm_gen_params),
+	KUNIT_CASE_PARAM(test_reuse, guc_dbm_gen_params),
+	KUNIT_CASE_PARAM(test_range_overlap, guc_dbm_gen_params),
+	KUNIT_CASE_PARAM(test_range_compact, guc_dbm_gen_params),
+	KUNIT_CASE_PARAM(test_range_spare, guc_dbm_gen_params),
+	{}
+};
+
+static struct kunit_suite guc_dbm_suite = {
+	.name = "guc_dbm",
+	.test_cases = guc_dbm_test_cases,
+	.init = guc_dbm_test_init,
+};
+
+kunit_test_suites(&guc_dbm_suite);
diff --git a/drivers/gpu/drm/xe/xe_guc_db_mgr.c b/drivers/gpu/drm/xe/xe_guc_db_mgr.c
index 15816f90e96058..c1c04575d82d92 100644
--- a/drivers/gpu/drm/xe/xe_guc_db_mgr.c
+++ b/drivers/gpu/drm/xe/xe_guc_db_mgr.c
@@ -255,3 +255,7 @@ void xe_guc_db_mgr_print(struct xe_guc_db_mgr *dbm,
 
 	mutex_unlock(dbm_mutex(dbm));
 }
+
+#if IS_BUILTIN(CONFIG_DRM_XE_KUNIT_TEST)
+#include "tests/xe_guc_db_mgr_test.c"
+#endif

From c5be725eb09de1f1083ba9b4762460ebc66b669c Mon Sep 17 00:00:00 2001
From: Tejas Upadhyay <tejas.upadhyay@intel.com>
Date: Tue, 19 Dec 2023 12:26:13 +0530
Subject: [PATCH 024/200] drm/xe/xelpg: Extend Wa_14019877138 for Graphics
 12.70/71

Wa_14019877138 is also needed for xe_lpg graphics 12.70/71

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_wa.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
index 7d6b86d602c8a7..3456ca6f7b323f 100644
--- a/drivers/gpu/drm/xe/xe_wa.c
+++ b/drivers/gpu/drm/xe/xe_wa.c
@@ -536,6 +536,10 @@ static const struct xe_rtp_entry_sr lrc_was[] = {
 	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1270, 1271)),
 	  XE_RTP_ACTIONS(SET(CACHE_MODE_1, MSAA_OPTIMIZATION_REDUC_DISABLE))
 	},
+	{ XE_RTP_NAME("14019877138"),
+	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1270, 1271), ENGINE_CLASS(RENDER)),
+	  XE_RTP_ACTIONS(SET(XEHP_PSS_CHICKEN, FD_END_COLLECT))
+	},
 
 	/* Xe2_LPG */
 

From 0eb16fd26795639d5420b58bb12d11c7705e6dfa Mon Sep 17 00:00:00 2001
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Date: Wed, 6 Dec 2023 13:14:41 -0800
Subject: [PATCH 025/200] drm/xe/guc: Use FAST_REQUEST for non-blocking H2G
 messages

We're currently sending non-blocking H2G messages using the EVENT type,
which suppresses all CTB protocol replies from the GuC, including the
failure cases. This might cause errors to slip through and manifest as
unexpected behavior (e.g. a context state might not be what the driver
thinks it is because the state change command was silently rejected by
the GuC). To avoid this kind of problems, we can use the FAST_REQUEST
type instead, which suppresses the reply only on success; this way we
still get the advantage of not having to wait for an ack from the GuC
(i.e. the H2G is still non-blocking) while still detecting errors.
Since we can't escalate to the caller when a non-blocking message
fails, we need to escalate to GT reset instead.

Note that FAST_REQUEST failures are NOT expected and are usually a sign
that the H2G was either malformed or requested an illegal operation.

v2: assign fence values to FAST_REQUEST messages, fix abi doc, use xe_gt
printers (Michal).

v3: fix doc alignment, fix and improve prints (Michal)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v2
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/abi/guc_messages_abi.h |  2 +
 drivers/gpu/drm/xe/xe_guc_ct.c            | 58 ++++++++++++++++++++---
 2 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/abi/guc_messages_abi.h b/drivers/gpu/drm/xe/abi/guc_messages_abi.h
index 3d199016cf881c..ff888d16bd4f1b 100644
--- a/drivers/gpu/drm/xe/abi/guc_messages_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_messages_abi.h
@@ -24,6 +24,7 @@
  *  |   | 30:28 | **TYPE** - message type                                      |
  *  |   |       |   - _`GUC_HXG_TYPE_REQUEST` = 0                              |
  *  |   |       |   - _`GUC_HXG_TYPE_EVENT` = 1                                |
+ *  |   |       |   - _`GUC_HXG_TYPE_FAST_REQUEST` = 2                         |
  *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3                     |
  *  |   |       |   - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5                    |
  *  |   |       |   - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6                     |
@@ -46,6 +47,7 @@
 #define GUC_HXG_MSG_0_TYPE			(0x7 << 28)
 #define   GUC_HXG_TYPE_REQUEST			0u
 #define   GUC_HXG_TYPE_EVENT			1u
+#define   GUC_HXG_TYPE_FAST_REQUEST		2u
 #define   GUC_HXG_TYPE_NO_RESPONSE_BUSY		3u
 #define   GUC_HXG_TYPE_NO_RESPONSE_RETRY	5u
 #define   GUC_HXG_TYPE_RESPONSE_FAILURE		6u
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 24a33fa36496a4..4cde93c18a2d41 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -17,6 +17,7 @@
 #include "xe_device.h"
 #include "xe_gt.h"
 #include "xe_gt_pagefault.h"
+#include "xe_gt_printk.h"
 #include "xe_gt_tlb_invalidation.h"
 #include "xe_guc.h"
 #include "xe_guc_submit.h"
@@ -448,7 +449,7 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
 				   GUC_HXG_EVENT_MSG_0_DATA0, action[0]);
 	} else {
 		cmd[1] =
-			FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
+			FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_FAST_REQUEST) |
 			FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
 				   GUC_HXG_EVENT_MSG_0_DATA0, action[0]);
 	}
@@ -475,11 +476,31 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
 	return 0;
 }
 
+/*
+ * The CT protocol accepts a 16 bits fence. This field is fully owned by the
+ * driver, the GuC will just copy it to the reply message. Since we need to
+ * be able to distinguish between replies to REQUEST and FAST_REQUEST messages,
+ * we use one bit of the seqno as an indicator for that and a rolling counter
+ * for the remaining 15 bits.
+ */
+#define CT_SEQNO_MASK GENMASK(14, 0)
+#define CT_SEQNO_UNTRACKED BIT(15)
+static u16 next_ct_seqno(struct xe_guc_ct *ct, bool is_g2h_fence)
+{
+	u32 seqno = ct->fence_seqno++ & CT_SEQNO_MASK;
+
+	if (!is_g2h_fence)
+		seqno |= CT_SEQNO_UNTRACKED;
+
+	return seqno;
+}
+
 static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
 				u32 len, u32 g2h_len, u32 num_g2h,
 				struct g2h_fence *g2h_fence)
 {
 	struct xe_device *xe = ct_to_xe(ct);
+	u16 seqno;
 	int ret;
 
 	xe_assert(xe, !g2h_len || !g2h_fence);
@@ -505,7 +526,7 @@ static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
 		if (g2h_fence_needs_alloc(g2h_fence)) {
 			void *ptr;
 
-			g2h_fence->seqno = (ct->fence_seqno++ & 0xffff);
+			g2h_fence->seqno = next_ct_seqno(ct, true);
 			ptr = xa_store(&ct->fence_lookup,
 				       g2h_fence->seqno,
 				       g2h_fence, GFP_ATOMIC);
@@ -514,6 +535,10 @@ static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
 				goto out;
 			}
 		}
+
+		seqno = g2h_fence->seqno;
+	} else {
+		seqno = next_ct_seqno(ct, false);
 	}
 
 	if (g2h_len)
@@ -523,8 +548,7 @@ static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
 	if (unlikely(ret))
 		goto out_unlock;
 
-	ret = h2g_write(ct, action, len, g2h_fence ? g2h_fence->seqno : 0,
-			!!g2h_fence);
+	ret = h2g_write(ct, action, len, seqno, !!g2h_fence);
 	if (unlikely(ret)) {
 		if (ret == -EAGAIN)
 			goto retry;
@@ -786,7 +810,8 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
 
 static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
 {
-	struct xe_device *xe = ct_to_xe(ct);
+	struct xe_gt *gt =  ct_to_gt(ct);
+	struct xe_device *xe = gt_to_xe(gt);
 	u32 response_len = len - GUC_CTB_MSG_MIN_LEN;
 	u32 fence = FIELD_GET(GUC_CTB_MSG_0_FENCE, msg[0]);
 	u32 type = FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[1]);
@@ -794,10 +819,31 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
 
 	lockdep_assert_held(&ct->lock);
 
+	/*
+	 * Fences for FAST_REQUEST messages are not tracked in ct->fence_lookup.
+	 * Those messages should never fail, so if we do get an error back it
+	 * means we're likely doing an illegal operation and the GuC is
+	 * rejecting it. We have no way to inform the code that submitted the
+	 * H2G that the message was rejected, so we need to escalate the
+	 * failure to trigger a reset.
+	 */
+	if (fence & CT_SEQNO_UNTRACKED) {
+		if (type == GUC_HXG_TYPE_RESPONSE_FAILURE)
+			xe_gt_err(gt, "FAST_REQ H2G fence 0x%x failed! e=0x%x, h=%u\n",
+				  fence,
+				  FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, msg[1]),
+				  FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, msg[1]));
+		else
+			xe_gt_err(gt, "unexpected response %u for FAST_REQ H2G fence 0x%x!\n",
+				  type, fence);
+
+		return -EPROTO;
+	}
+
 	g2h_fence = xa_erase(&ct->fence_lookup, fence);
 	if (unlikely(!g2h_fence)) {
 		/* Don't tear down channel, as send could've timed out */
-		drm_warn(&xe->drm, "G2H fence (%u) not found!\n", fence);
+		xe_gt_warn(gt, "G2H fence (%u) not found!\n", fence);
 		g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
 		return 0;
 	}

From 0e209fa7bf66e8a5b8a9efdc4d4926dcb441af18 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 21 Dec 2023 14:28:04 -0800
Subject: [PATCH 026/200] drm/xe: Disable 32bits build

Add a dependency on CONFIG_64BIT since currently the xe driver doesn't
build on 32bits. It may be enabled again after all the issues are fixed.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20231221222809.4123220-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index 5b3da06e7ba304..a53b0fdc15a746 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config DRM_XE
 	tristate "Intel Xe Graphics"
-	depends on DRM && PCI && MMU && (m || (y && KUNIT=y))
+	depends on DRM && PCI && MMU && (m || (y && KUNIT=y)) && 64BIT
 	select INTERVAL_TREE
 	# we need shmfs for the swappable backing store, and in particular
 	# the shmem_readpage() which depends upon tmpfs

From f031c3a7af8ea06790dd0a71872c4f0175084baa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Tue, 26 Dec 2023 08:11:14 -0800
Subject: [PATCH 027/200] drm/xe/uapi: Remove DRM_XE_VM_BIND_FLAG_ASYNC comment
 left over
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This is a comment left over of commit d3d767396a02
("drm/xe/uapi: Remove sync binds").

Fixes: d3d767396a02 ("drm/xe/uapi: Remove sync binds")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
---
 include/uapi/drm/xe_drm.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 9fa3ae324731a6..50bbea0992d9c3 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -832,7 +832,6 @@ struct drm_xe_vm_destroy {
  *
  * and the @flags can be:
  *  - %DRM_XE_VM_BIND_FLAG_READONLY
- *  - %DRM_XE_VM_BIND_FLAG_ASYNC
  *  - %DRM_XE_VM_BIND_FLAG_IMMEDIATE - Valid on a faulting VM only, do the
  *    MAP operation immediately rather than deferring the MAP to the page
  *    fault handler.

From 570a8fc233b2adb659015bfb09f90a46a6b594d4 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 7 Dec 2023 09:51:17 -0800
Subject: [PATCH 028/200] drm/xe/xe2: Add workaround 16020183090

Graphics version 20.04, used in Lunar Lake, needs WA 16020183090 for
steppings A*. Set ENABLE_SEMAPHORE_POLL_BIT in INSTPM(RENDER_RING_BASE)
and whitelist CSBE_DEBUG_STATUS for userspace to be able to use it
and complement the workaround.

Cc: Haridhar Kalvala <haridhar.kalvala@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231207175117.2334022-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_engine_regs.h | 4 ++++
 drivers/gpu/drm/xe/xe_reg_whitelist.c    | 8 ++++++++
 drivers/gpu/drm/xe/xe_wa.c               | 5 +++++
 3 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
index bd9c956f48a7c1..0b1266c88a6af3 100644
--- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
@@ -83,6 +83,9 @@
 #define RING_EMR(base)				XE_REG((base) + 0xb4)
 #define RING_ESR(base)				XE_REG((base) + 0xb8)
 
+#define INSTPM(base)				XE_REG((base) + 0xc0, XE_REG_OPTION_MASKED)
+#define   ENABLE_SEMAPHORE_POLL_BIT		REG_BIT(13)
+
 #define RING_CMD_CCTL(base)			XE_REG((base) + 0xc4, XE_REG_OPTION_MASKED)
 /*
  * CMD_CCTL read/write fields take a MOCS value and _not_ a table index.
@@ -138,6 +141,7 @@
 #define   TAIL_ADDR				0x001FFFF8
 
 #define RING_CTX_TIMESTAMP(base)		XE_REG((base) + 0x3a8)
+#define CSBE_DEBUG_STATUS(base)			XE_REG((base) + 0x3fc)
 
 #define RING_FORCE_TO_NONPRIV(base, i)		XE_REG(((base) + 0x4d0) + (i) * 4)
 #define   RING_FORCE_TO_NONPRIV_DENY		REG_BIT(30)
diff --git a/drivers/gpu/drm/xe/xe_reg_whitelist.c b/drivers/gpu/drm/xe/xe_reg_whitelist.c
index e66ae1bdaf9c04..3fa2ece7d2289c 100644
--- a/drivers/gpu/drm/xe/xe_reg_whitelist.c
+++ b/drivers/gpu/drm/xe/xe_reg_whitelist.c
@@ -7,9 +7,11 @@
 
 #include "regs/xe_engine_regs.h"
 #include "regs/xe_gt_regs.h"
+#include "regs/xe_regs.h"
 #include "xe_gt_types.h"
 #include "xe_platform_types.h"
 #include "xe_rtp.h"
+#include "xe_step.h"
 
 #undef XE_REG_MCR
 #define XE_REG_MCR(...)     XE_REG(__VA_ARGS__, .mcr = 1)
@@ -56,6 +58,12 @@ static const struct xe_rtp_entry_sr register_whitelist[] = {
 				   RING_FORCE_TO_NONPRIV_DENY,
 				   XE_RTP_ACTION_FLAG(ENGINE_BASE)))
 	},
+	{ XE_RTP_NAME("16020183090"),
+	  XE_RTP_RULES(GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0),
+		       ENGINE_CLASS(RENDER)),
+	  XE_RTP_ACTIONS(WHITELIST(CSBE_DEBUG_STATUS(RENDER_RING_BASE), 0))
+	},
+
 	{}
 };
 
diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
index 3456ca6f7b323f..b77d406e083e2d 100644
--- a/drivers/gpu/drm/xe/xe_wa.c
+++ b/drivers/gpu/drm/xe/xe_wa.c
@@ -571,6 +571,11 @@ static const struct xe_rtp_entry_sr lrc_was[] = {
 	  XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)),
 	  XE_RTP_ACTIONS(SET(XEHP_PSS_CHICKEN, FLSH_IGNORES_PSD))
 	},
+	{ XE_RTP_NAME("16020183090"),
+	  XE_RTP_RULES(GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0),
+		       ENGINE_CLASS(RENDER)),
+	  XE_RTP_ACTIONS(SET(INSTPM(RENDER_RING_BASE), ENABLE_SEMAPHORE_POLL_BIT))
+	},
 
 	{}
 };

From fe761f3465c00f372cb4acb004579c7c377ae08b Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula@intel.com>
Date: Thu, 4 Jan 2024 18:46:00 +0200
Subject: [PATCH 029/200] drm/i915: don't make assumptions about
 intel_wakeref_t type

intel_wakeref_t is supposed to be a mostly opaque cookie to its
users. It should only be checked for being non-zero and set to
zero. Debug logging its actual value is meaningless. Switch to just
debug logging whether the async_put_wakeref is non-zero.

The issue dates back to much earlier than
commit b49e894c3fd8 ("drm/i915: Replace custom intel runtime_pm tracker
with ref_tracker library"), but this is the one that brought about a
build failure due to the printf format.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/r/20240102111222.2db11208@canb.auug.org.au
Fixes: b49e894c3fd8 ("drm/i915: Replace custom intel runtime_pm tracker with ref_tracker library")
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240104164600.783371-1-jani.nikula@intel.com
---
 drivers/gpu/drm/i915/display/intel_display_power.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c b/drivers/gpu/drm/i915/display/intel_display_power.c
index f23080a4368dd0..6fd4fa52253a35 100644
--- a/drivers/gpu/drm/i915/display/intel_display_power.c
+++ b/drivers/gpu/drm/i915/display/intel_display_power.c
@@ -405,8 +405,8 @@ print_async_put_domains_state(struct i915_power_domains *power_domains)
 						     struct drm_i915_private,
 						     display.power.domains);
 
-	drm_dbg(&i915->drm, "async_put_wakeref %u\n",
-		power_domains->async_put_wakeref);
+	drm_dbg(&i915->drm, "async_put_wakeref: %s\n",
+		str_yes_no(power_domains->async_put_wakeref));
 
 	print_power_domains(power_domains, "async_put_domains[0]",
 			    &power_domains->async_put_domains[0]);

From fdbadf504375886a0320ac6f84c850322a6b32e1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Thu, 4 Jan 2024 08:18:32 -0800
Subject: [PATCH 030/200] drm/xe: Fix definition of intel_wakeref_t
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

i915 defines it as unsigned long so Xe should do the same to avoid
compilation warnings:

  CC [M]  drivers/gpu/drm/i915/i915_gem.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_power_well.o
In file included from ./include/drm/drm_mm.h:51,
                 from drivers/gpu/drm/xe/xe_bo_types.h:11,
                 from drivers/gpu/drm/xe/xe_bo.h:11,
                 from ./drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h:11,
                 from ./drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h:15,
                 from drivers/gpu/drm/i915/display/intel_display_power.c:8:
drivers/gpu/drm/i915/display/intel_display_power.c: In function ‘print_async_put_domains_state’:
drivers/gpu/drm/i915/display/intel_display_power.c:408:29: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=]
  408 |         drm_dbg(&i915->drm, "async_put_wakeref %lu\n",
      |                             ^~~~~~~~~~~~~~~~~~~~~~~~~
  409 |                 power_domains->async_put_wakeref);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                              |
      |                              int
./include/drm/drm_print.h:410:39: note: in definition of macro ‘drm_dev_dbg’
  410 |         __drm_dev_dbg(NULL, dev, cat, fmt, ##__VA_ARGS__)
      |                                       ^~~
./include/drm/drm_print.h:510:33: note: in expansion of macro ‘drm_dbg_driver’
  510 | #define drm_dbg(drm, fmt, ...)  drm_dbg_driver(drm, fmt, ##__VA_ARGS__)
      |                                 ^~~~~~~~~~~~~~
drivers/gpu/drm/i915/display/intel_display_power.c:408:9: note: in expansion of macro ‘drm_dbg’
  408 |         drm_dbg(&i915->drm, "async_put_wakeref %lu\n",
      |         ^~~~~~~
drivers/gpu/drm/i915/display/intel_display_power.c:408:50: note: format string is defined here
  408 |         drm_dbg(&i915->drm, "async_put_wakeref %lu\n",
      |                                                ~~^
      |                                                  |
      |                                                  long unsigned int
      |                                                %u
  CC [M]  drivers/gpu/drm/i915/i915_gem_evict.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_gtt.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_trace.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_wa.o
  CC [M]  drivers/gpu/drm/i915/i915_query.o

Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/xe/compat-i915-headers/intel_wakeref.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/compat-i915-headers/intel_wakeref.h b/drivers/gpu/drm/xe/compat-i915-headers/intel_wakeref.h
index 1c5e30cf10caa8..ecb1c070770697 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/intel_wakeref.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/intel_wakeref.h
@@ -5,4 +5,4 @@
 
 #include <linux/types.h>
 
-typedef bool intel_wakeref_t;
+typedef unsigned long intel_wakeref_t;

From 9bab383d47c934ff550f31b3e05b4509fa6136bd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Thu, 4 Jan 2024 08:18:41 -0800
Subject: [PATCH 031/200] drm/xe: Use intel_wakeref_t in intel_runtime_pm
 functions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now intel_wakeref_t is a unsigned long and Xe KMD version of those
functions should use the same type, so changing from bool to
intel_wakeref_t.

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h b/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h
index 5d2a77b52db415..420eba0e4be00b 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h
@@ -162,18 +162,18 @@ static inline struct drm_i915_private *kdev_to_i915(struct device *kdev)
 
 #include "intel_wakeref.h"
 
-static inline bool intel_runtime_pm_get(struct xe_runtime_pm *pm)
+static inline intel_wakeref_t intel_runtime_pm_get(struct xe_runtime_pm *pm)
 {
 	struct xe_device *xe = container_of(pm, struct xe_device, runtime_pm);
 
 	if (xe_pm_runtime_get(xe) < 0) {
 		xe_pm_runtime_put(xe);
-		return false;
+		return 0;
 	}
-	return true;
+	return 1;
 }
 
-static inline bool intel_runtime_pm_get_if_in_use(struct xe_runtime_pm *pm)
+static inline intel_wakeref_t intel_runtime_pm_get_if_in_use(struct xe_runtime_pm *pm)
 {
 	struct xe_device *xe = container_of(pm, struct xe_device, runtime_pm);
 
@@ -187,7 +187,7 @@ static inline void intel_runtime_pm_put_unchecked(struct xe_runtime_pm *pm)
 	xe_pm_runtime_put(xe);
 }
 
-static inline void intel_runtime_pm_put(struct xe_runtime_pm *pm, bool wakeref)
+static inline void intel_runtime_pm_put(struct xe_runtime_pm *pm, intel_wakeref_t wakeref)
 {
 	if (wakeref)
 		intel_runtime_pm_put_unchecked(pm);

From 0cfb7caefabd740a13ae0c26d092641a5ac7e785 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 4 Jan 2024 23:20:22 +0100
Subject: [PATCH 032/200] drm/xe: Allocate dedicated workqueue for SR-IOV
 workers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We plan to use several workers where we might be running long
operations. Allocate dedicated workqueue to avoid undesired
interaction with non-virtualized workers.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-2-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c       |  4 ++++
 drivers/gpu/drm/xe/xe_device_types.h |  2 ++
 drivers/gpu/drm/xe/xe_sriov.c        | 32 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_sriov.h        |  1 +
 4 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 86867d42d53298..004e65544e8d82 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -438,6 +438,10 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_pat_init_early(xe);
 
+	err = xe_sriov_init(xe);
+	if (err)
+		return err;
+
 	xe->info.mem_region_mask = 1;
 	err = xe_display_init_nommio(xe);
 	if (err)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 71f23ac365e669..163d889d407bc8 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -321,6 +321,8 @@ struct xe_device {
 	struct {
 		/** @sriov.__mode: SR-IOV mode (Don't access directly!) */
 		enum xe_sriov_mode __mode;
+		/** @sriov.wq: workqueue used by the virtualization workers */
+		struct workqueue_struct *wq;
 	} sriov;
 
 	/** @clients: drm clients info */
diff --git a/drivers/gpu/drm/xe/xe_sriov.c b/drivers/gpu/drm/xe/xe_sriov.c
index 42a0e0c917a091..f295d91886b129 100644
--- a/drivers/gpu/drm/xe/xe_sriov.c
+++ b/drivers/gpu/drm/xe/xe_sriov.c
@@ -3,6 +3,8 @@
  * Copyright © 2023 Intel Corporation
  */
 
+#include <drm/drm_managed.h>
+
 #include "xe_assert.h"
 #include "xe_sriov.h"
 
@@ -53,3 +55,33 @@ void xe_sriov_probe_early(struct xe_device *xe, bool has_sriov)
 		drm_info(&xe->drm, "Running in %s mode\n",
 			 xe_sriov_mode_to_string(xe_device_sriov_mode(xe)));
 }
+
+static void fini_sriov(struct drm_device *drm, void *arg)
+{
+	struct xe_device *xe = arg;
+
+	destroy_workqueue(xe->sriov.wq);
+	xe->sriov.wq = NULL;
+}
+
+/**
+ * xe_sriov_init - Initialize SR-IOV specific data.
+ * @xe: the &xe_device to initialize
+ *
+ * In this function we create dedicated workqueue that will be used
+ * by the SR-IOV specific workers.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_sriov_init(struct xe_device *xe)
+{
+	if (!IS_SRIOV(xe))
+		return 0;
+
+	xe_assert(xe, !xe->sriov.wq);
+	xe->sriov.wq = alloc_workqueue("xe-sriov-wq", 0, 0);
+	if (!xe->sriov.wq)
+		return -ENOMEM;
+
+	return drmm_add_action_or_reset(&xe->drm, fini_sriov, xe);
+}
diff --git a/drivers/gpu/drm/xe/xe_sriov.h b/drivers/gpu/drm/xe/xe_sriov.h
index 5af73a3172b03c..1545552162c9e8 100644
--- a/drivers/gpu/drm/xe/xe_sriov.h
+++ b/drivers/gpu/drm/xe/xe_sriov.h
@@ -13,6 +13,7 @@
 const char *xe_sriov_mode_to_string(enum xe_sriov_mode mode);
 
 void xe_sriov_probe_early(struct xe_device *xe, bool has_sriov);
+int xe_sriov_init(struct xe_device *xe);
 
 static inline enum xe_sriov_mode xe_device_sriov_mode(struct xe_device *xe)
 {

From b97d87039fe5a2fc91feb9f42c5b443ad0927864 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 4 Jan 2024 23:20:23 +0100
Subject: [PATCH 033/200] drm/xe: Define Virtual Function Identifier
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

According to the PCI Express specification, the SR-IOV Virtual
Functions (VFs) are numbered starting with 1 (VF1, VF2, ...).
Additionally, both driver and GuC will refer to Physical Function
(PF) as VF0. Define helper macro to represent VFn and PF.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-3-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_sriov_types.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_sriov_types.h b/drivers/gpu/drm/xe/xe_sriov_types.h
index 999a4311b98bf2..1a138108d1395d 100644
--- a/drivers/gpu/drm/xe/xe_sriov_types.h
+++ b/drivers/gpu/drm/xe/xe_sriov_types.h
@@ -8,6 +8,18 @@
 
 #include <linux/build_bug.h>
 
+/**
+ * VFID - Virtual Function Identifier
+ * @n: VF number
+ *
+ * Helper macro to represent Virtual Function (VF) Identifier.
+ * VFID(0) is used as alias to the PFID that represents Physical Function.
+ *
+ * Note: According to PCI spec, SR-IOV VF's numbers are 1-based (VF1, VF2, ...).
+ */
+#define VFID(n)		(n)
+#define PFID		VFID(0)
+
 /**
  * enum xe_sriov_mode - SR-IOV mode
  * @XE_SRIOV_MODE_NONE: bare-metal mode (non-virtualized)

From 13f976ea62208d64d2f324bce27f79c574394caf Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 4 Jan 2024 23:20:24 +0100
Subject: [PATCH 034/200] drm/xe: Introduce GT-oriented SR-IOV logging macros
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

To simplify logging and help identify SR-IOV specific messages
related to the GT, define set of helper macros that will add
prefix to the messages based on the current SR-IOV mode.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-4-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_sriov_printk.h | 34 +++++++++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_gt_sriov_printk.h

diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_printk.h b/drivers/gpu/drm/xe/xe_gt_sriov_printk.h
new file mode 100644
index 00000000000000..17624b16300ad7
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_printk.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_GT_SRIOV_PRINTK_H_
+#define _XE_GT_SRIOV_PRINTK_H_
+
+#include "xe_gt_printk.h"
+#include "xe_sriov_printk.h"
+
+#define __xe_gt_sriov_printk(gt, _level, fmt, ...) \
+	xe_gt_printk((gt), _level, "%s" fmt, xe_sriov_printk_prefix(gt_to_xe(gt)), ##__VA_ARGS__)
+
+#define xe_gt_sriov_err(_gt, _fmt, ...) \
+	__xe_gt_sriov_printk(_gt, err, _fmt, ##__VA_ARGS__)
+
+#define xe_gt_sriov_notice(_gt, _fmt, ...) \
+	__xe_gt_sriov_printk(_gt, notice, _fmt, ##__VA_ARGS__)
+
+#define xe_gt_sriov_info(_gt, _fmt, ...) \
+	__xe_gt_sriov_printk(_gt, info, _fmt, ##__VA_ARGS__)
+
+#define xe_gt_sriov_dbg(_gt, _fmt, ...) \
+	__xe_gt_sriov_printk(_gt, dbg, _fmt, ##__VA_ARGS__)
+
+/* for low level noisy debug messages */
+#ifdef CONFIG_DRM_XE_DEBUG_SRIOV
+#define xe_gt_sriov_dbg_verbose(_gt, _fmt, ...) xe_gt_sriov_dbg(_gt, _fmt, ##__VA_ARGS__)
+#else
+#define xe_gt_sriov_dbg_verbose(_gt, _fmt, ...) typecheck(struct xe_gt *, (_gt))
+#endif
+
+#endif

From e6cbc458b4f875ce35610af635911d6926804c4c Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 4 Jan 2024 23:20:25 +0100
Subject: [PATCH 035/200] drm/xe/guc: Add helpers for HXG messages
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In addition to MMIO and CTB communication between the host driver
and the GUC firmware, we will start using GuC HXG message protocol
in communication between SR-IOV VFs and PF. Define helpers related
to HXG message protocol to minimize code duplication.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-5-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_hxg_helpers.h | 108 ++++++++++++++++++++++++
 1 file changed, 108 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_guc_hxg_helpers.h

diff --git a/drivers/gpu/drm/xe/xe_guc_hxg_helpers.h b/drivers/gpu/drm/xe/xe_guc_hxg_helpers.h
new file mode 100644
index 00000000000000..aeeb573c6842e5
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_guc_hxg_helpers.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_GUC_HXG_HELPERS_H_
+#define _XE_GUC_HXG_HELPERS_H_
+
+#include <linux/bitfield.h>
+#include <linux/types.h>
+
+#include "abi/guc_messages_abi.h"
+
+/**
+ * hxg_sizeof - Queries size of the object or type (in HXG units).
+ * @T: the object or type
+ *
+ * Force a compilation error if actual size is not aligned to HXG unit (u32).
+ *
+ * Return: size in dwords (u32).
+ */
+#define hxg_sizeof(T)	(sizeof(T) / sizeof(u32) + BUILD_BUG_ON_ZERO(sizeof(T) % sizeof(u32)))
+
+static inline const char *guc_hxg_type_to_string(unsigned int type)
+{
+	switch (type) {
+	case GUC_HXG_TYPE_REQUEST:
+		return "request";
+	case GUC_HXG_TYPE_FAST_REQUEST:
+		return "fast-request";
+	case GUC_HXG_TYPE_EVENT:
+		return "event";
+	case GUC_HXG_TYPE_NO_RESPONSE_BUSY:
+		return "busy";
+	case GUC_HXG_TYPE_NO_RESPONSE_RETRY:
+		return "retry";
+	case GUC_HXG_TYPE_RESPONSE_FAILURE:
+		return "failure";
+	case GUC_HXG_TYPE_RESPONSE_SUCCESS:
+		return "response";
+	default:
+		return "<invalid>";
+	}
+}
+
+static inline bool guc_hxg_type_is_action(unsigned int type)
+{
+	switch (type) {
+	case GUC_HXG_TYPE_REQUEST:
+	case GUC_HXG_TYPE_FAST_REQUEST:
+	case GUC_HXG_TYPE_EVENT:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static inline bool guc_hxg_type_is_reply(unsigned int type)
+{
+	switch (type) {
+	case GUC_HXG_TYPE_NO_RESPONSE_BUSY:
+	case GUC_HXG_TYPE_NO_RESPONSE_RETRY:
+	case GUC_HXG_TYPE_RESPONSE_FAILURE:
+	case GUC_HXG_TYPE_RESPONSE_SUCCESS:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static inline u32 guc_hxg_msg_encode_success(u32 *msg, u32 data0)
+{
+	msg[0] = FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		 FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_RESPONSE_SUCCESS) |
+		 FIELD_PREP(GUC_HXG_RESPONSE_MSG_0_DATA0, data0);
+
+	return GUC_HXG_RESPONSE_MSG_MIN_LEN;
+}
+
+static inline u32 guc_hxg_msg_encode_failure(u32 *msg, u32 error, u32 hint)
+{
+	msg[0] = FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		 FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_RESPONSE_FAILURE) |
+		 FIELD_PREP(GUC_HXG_FAILURE_MSG_0_HINT, hint) |
+		 FIELD_PREP(GUC_HXG_FAILURE_MSG_0_ERROR, error);
+
+	return GUC_HXG_FAILURE_MSG_LEN;
+}
+
+static inline u32 guc_hxg_msg_encode_busy(u32 *msg, u32 counter)
+{
+	msg[0] = FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		 FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_NO_RESPONSE_BUSY) |
+		 FIELD_PREP(GUC_HXG_BUSY_MSG_0_COUNTER, counter);
+
+	return GUC_HXG_BUSY_MSG_LEN;
+}
+
+static inline u32 guc_hxg_msg_encode_retry(u32 *msg, u32 reason)
+{
+	msg[0] = FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		 FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_NO_RESPONSE_RETRY) |
+		 FIELD_PREP(GUC_HXG_RETRY_MSG_0_REASON, reason);
+
+	return GUC_HXG_RETRY_MSG_LEN;
+}
+
+#endif

From e83679985ac73cca54259abaf7d55835c150bbe4 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 4 Jan 2024 23:20:26 +0100
Subject: [PATCH 036/200] drm/xe/guc: Update few GuC CTB ABI definitions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In upcoming new GuC ABI definitions we will want to refer to max
number of dwords that could fit into CTB HXG message. Add explicit
definition named as GUC_CTB_MAX_DWORDS and start using it.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-6-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
index 3b83f907ece461..4aaed1cb4e125d 100644
--- a/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
@@ -81,12 +81,13 @@ static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
 
 #define GUC_CTB_HDR_LEN				1u
 #define GUC_CTB_MSG_MIN_LEN			GUC_CTB_HDR_LEN
-#define GUC_CTB_MSG_MAX_LEN			256u
+#define GUC_CTB_MSG_MAX_LEN			(GUC_CTB_MSG_MIN_LEN + GUC_CTB_MAX_DWORDS)
 #define GUC_CTB_MSG_0_FENCE			(0xffff << 16)
 #define GUC_CTB_MSG_0_FORMAT			(0xf << 12)
 #define   GUC_CTB_FORMAT_HXG			0u
 #define GUC_CTB_MSG_0_RESERVED			(0xf << 8)
 #define GUC_CTB_MSG_0_NUM_DWORDS		(0xff << 0)
+#define   GUC_CTB_MAX_DWORDS			255
 
 /**
  * DOC: CTB HXG Message

From fa6c12e036c9450c43782d52648bf0fb915a7bbb Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 4 Jan 2024 23:20:27 +0100
Subject: [PATCH 037/200] drm/xe/guc: Add Relay Communication ABI definitions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The communication between Virtual Function (VF) drivers and
Physical Function (PF) drivers is based on the GuC firmware
acting as a proxy (relay) agent.

Add related ABI definitions that we will be using in upcoming
patches with our GuC Relay implementation.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-7-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 .../gpu/drm/xe/abi/guc_actions_sriov_abi.h    | 174 ++++++++++++++++++
 .../gpu/drm/xe/abi/guc_relay_actions_abi.h    |  79 ++++++++
 .../drm/xe/abi/guc_relay_communication_abi.h  | 118 ++++++++++++
 3 files changed, 371 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h
 create mode 100644 drivers/gpu/drm/xe/abi/guc_relay_actions_abi.h
 create mode 100644 drivers/gpu/drm/xe/abi/guc_relay_communication_abi.h

diff --git a/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h
new file mode 100644
index 00000000000000..5496a5890847d9
--- /dev/null
+++ b/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h
@@ -0,0 +1,174 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _GUC_ACTIONS_PF_ABI_H
+#define _GUC_ACTIONS_PF_ABI_H
+
+#include "guc_communication_ctb_abi.h"
+
+/**
+ * DOC: GUC2PF_RELAY_FROM_VF
+ *
+ * This message is used by the GuC firmware to forward a VF2PF `Relay Message`_
+ * received from the Virtual Function (VF) driver to this Physical Function (PF)
+ * driver.
+ *
+ * This message is always sent as `CTB HXG Message`_.
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN = GUC_HXG_ORIGIN_GUC_                                 |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_EVENT_                                   |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 27:16 | MBZ                                                          |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  15:0 | ACTION = _`XE_GUC_ACTION_GUC2PF_RELAY_FROM_VF` = 0x5100      |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 | **VFID** - source VF identifier                              |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 2 |  31:0 | **RELAY_ID** - VF/PF message ID                              |
+ *  +---+-------+-----------------+--------------------------------------------+
+ *  | 3 |  31:0 | **RELAY_DATA1** |                                            |
+ *  +---+-------+-----------------+                                            |
+ *  |...|       |                 |       [Embedded `Relay Message`_]          |
+ *  +---+-------+-----------------+                                            |
+ *  | n |  31:0 | **RELAY_DATAx** |                                            |
+ *  +---+-------+-----------------+--------------------------------------------+
+ */
+#define XE_GUC_ACTION_GUC2PF_RELAY_FROM_VF		0x5100
+
+#define GUC2PF_RELAY_FROM_VF_EVENT_MSG_MIN_LEN		(GUC_HXG_EVENT_MSG_MIN_LEN + 2u)
+#define GUC2PF_RELAY_FROM_VF_EVENT_MSG_MAX_LEN \
+	(GUC2PF_RELAY_FROM_VF_EVENT_MSG_MIN_LEN + GUC_RELAY_MSG_MAX_LEN)
+#define GUC2PF_RELAY_FROM_VF_EVENT_MSG_0_MBZ		GUC_HXG_EVENT_MSG_0_DATA0
+#define GUC2PF_RELAY_FROM_VF_EVENT_MSG_1_VFID		GUC_HXG_EVENT_MSG_n_DATAn
+#define GUC2PF_RELAY_FROM_VF_EVENT_MSG_2_RELAY_ID	GUC_HXG_EVENT_MSG_n_DATAn
+#define GUC2PF_RELAY_FROM_VF_EVENT_MSG_3_RELAY_DATA1	GUC_HXG_EVENT_MSG_n_DATAn
+#define GUC2PF_RELAY_FROM_VF_EVENT_MSG_n_RELAY_DATAx	GUC_HXG_EVENT_MSG_n_DATAn
+#define GUC2PF_RELAY_FROM_VF_EVENT_MSG_NUM_RELAY_DATA	GUC_RELAY_MSG_MAX_LEN
+
+/**
+ * DOC: PF2GUC_RELAY_TO_VF
+ *
+ * This H2G message is used by the Physical Function (PF) driver to send embedded
+ * VF2PF `Relay Message`_ to the VF.
+ *
+ * This action message must be sent over CTB as `CTB HXG Message`_.
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN = GUC_HXG_ORIGIN_HOST_                                |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = `GUC_HXG_TYPE_FAST_REQUEST`_                          |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 27:16 | MBZ                                                          |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  15:0 | ACTION = _`XE_GUC_ACTION_PF2GUC_RELAY_TO_VF` = 0x5101        |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 | **VFID** - target VF identifier                              |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 2 |  31:0 | **RELAY_ID** - VF/PF message ID                              |
+ *  +---+-------+-----------------+--------------------------------------------+
+ *  | 3 |  31:0 | **RELAY_DATA1** |                                            |
+ *  +---+-------+-----------------+                                            |
+ *  |...|       |                 |       [Embedded `Relay Message`_]          |
+ *  +---+-------+-----------------+                                            |
+ *  | n |  31:0 | **RELAY_DATAx** |                                            |
+ *  +---+-------+-----------------+--------------------------------------------+
+ */
+#define XE_GUC_ACTION_PF2GUC_RELAY_TO_VF		0x5101
+
+#define PF2GUC_RELAY_TO_VF_REQUEST_MSG_MIN_LEN		(GUC_HXG_REQUEST_MSG_MIN_LEN + 2u)
+#define PF2GUC_RELAY_TO_VF_REQUEST_MSG_MAX_LEN \
+	(PF2GUC_RELAY_TO_VF_REQUEST_MSG_MIN_LEN + GUC_RELAY_MSG_MAX_LEN)
+#define PF2GUC_RELAY_TO_VF_REQUEST_MSG_0_MBZ		GUC_HXG_REQUEST_MSG_0_DATA0
+#define PF2GUC_RELAY_TO_VF_REQUEST_MSG_1_VFID		GUC_HXG_REQUEST_MSG_n_DATAn
+#define PF2GUC_RELAY_TO_VF_REQUEST_MSG_2_RELAY_ID	GUC_HXG_REQUEST_MSG_n_DATAn
+#define PF2GUC_RELAY_TO_VF_REQUEST_MSG_3_RELAY_DATA1	GUC_HXG_REQUEST_MSG_n_DATAn
+#define PF2GUC_RELAY_TO_VF_REQUEST_MSG_n_RELAY_DATAx	GUC_HXG_REQUEST_MSG_n_DATAn
+#define PF2GUC_RELAY_TO_VF_REQUEST_MSG_NUM_RELAY_DATA	GUC_RELAY_MSG_MAX_LEN
+
+/**
+ * DOC: GUC2VF_RELAY_FROM_PF
+ *
+ * This message is used by the GuC firmware to deliver `Relay Message`_ from the
+ * Physical Function (PF) driver to this Virtual Function (VF) driver.
+ * See `GuC Relay Communication`_ for details.
+ *
+ * This message is always sent over CTB.
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN = GUC_HXG_ORIGIN_GUC_                                 |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_EVENT_                                   |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 27:16 | MBZ                                                          |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  15:0 | ACTION = _`XE_GUC_ACTION_GUC2VF_RELAY_FROM_PF` = 0x5102      |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 | **RELAY_ID** - VF/PF message ID                              |
+ *  +---+-------+-----------------+--------------------------------------------+
+ *  | 2 |  31:0 | **RELAY_DATA1** |                                            |
+ *  +---+-------+-----------------+                                            |
+ *  |...|       |                 |       [Embedded `Relay Message`_]          |
+ *  +---+-------+-----------------+                                            |
+ *  | n |  31:0 | **RELAY_DATAx** |                                            |
+ *  +---+-------+-----------------+--------------------------------------------+
+ */
+#define XE_GUC_ACTION_GUC2VF_RELAY_FROM_PF		0x5102
+
+#define GUC2VF_RELAY_FROM_PF_EVENT_MSG_MIN_LEN		(GUC_HXG_EVENT_MSG_MIN_LEN + 1u)
+#define GUC2VF_RELAY_FROM_PF_EVENT_MSG_MAX_LEN \
+	(GUC2VF_RELAY_FROM_PF_EVENT_MSG_MIN_LEN + GUC_RELAY_MSG_MAX_LEN)
+#define GUC2VF_RELAY_FROM_PF_EVENT_MSG_0_MBZ		GUC_HXG_EVENT_MSG_0_DATA0
+#define GUC2VF_RELAY_FROM_PF_EVENT_MSG_1_RELAY_ID	GUC_HXG_EVENT_MSG_n_DATAn
+#define GUC2VF_RELAY_FROM_PF_EVENT_MSG_n_RELAY_DATAx	GUC_HXG_EVENT_MSG_n_DATAn
+#define GUC2VF_RELAY_FROM_PF_EVENT_MSG_NUM_RELAY_DATA	GUC_RELAY_MSG_MAX_LEN
+
+/**
+ * DOC: VF2GUC_RELAY_TO_PF
+ *
+ * This message is used by the Virtual Function (VF) drivers to communicate with
+ * the Physical Function (PF) driver and send `Relay Message`_ to the PF driver.
+ * See `GuC Relay Communication`_ for details.
+ *
+ * This message must be sent over CTB.
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN = GUC_HXG_ORIGIN_HOST_                                |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ or GUC_HXG_TYPE_FAST_REQUEST_   |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 27:16 | MBZ                                                          |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  15:0 | ACTION = _`XE_GUC_ACTION_VF2GUC_RELAY_TO_PF` = 0x5103        |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 | **RELAY_ID** - VF/PF message ID                              |
+ *  +---+-------+-----------------+--------------------------------------------+
+ *  | 2 |  31:0 | **RELAY_DATA1** |                                            |
+ *  +---+-------+-----------------+                                            |
+ *  |...|       |                 |       [Embedded `Relay Message`_]          |
+ *  +---+-------+-----------------+                                            |
+ *  | n |  31:0 | **RELAY_DATAx** |                                            |
+ *  +---+-------+-----------------+--------------------------------------------+
+ */
+#define XE_GUC_ACTION_VF2GUC_RELAY_TO_PF		0x5103
+
+#define VF2GUC_RELAY_TO_PF_REQUEST_MSG_MIN_LEN		(GUC_HXG_REQUEST_MSG_MIN_LEN + 1u)
+#define VF2GUC_RELAY_TO_PF_REQUEST_MSG_MAX_LEN \
+	(VF2GUC_RELAY_TO_PF_REQUEST_MSG_MIN_LEN + GUC_RELAY_MSG_MAX_LEN)
+#define VF2GUC_RELAY_TO_PF_REQUEST_MSG_0_MBZ		GUC_HXG_REQUEST_MSG_0_DATA0
+#define VF2GUC_RELAY_TO_PF_REQUEST_MSG_1_RELAY_ID	GUC_HXG_REQUEST_MSG_n_DATAn
+#define VF2GUC_RELAY_TO_PF_REQUEST_MSG_n_RELAY_DATAx	GUC_HXG_REQUEST_MSG_n_DATAn
+#define VF2GUC_RELAY_TO_PF_REQUEST_MSG_NUM_RELAY_DATA	GUC_RELAY_MSG_MAX_LEN
+
+#endif
diff --git a/drivers/gpu/drm/xe/abi/guc_relay_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_relay_actions_abi.h
new file mode 100644
index 00000000000000..747e428de421f8
--- /dev/null
+++ b/drivers/gpu/drm/xe/abi/guc_relay_actions_abi.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _ABI_GUC_RELAY_ACTIONS_ABI_H_
+#define _ABI_GUC_RELAY_ACTIONS_ABI_H_
+
+/**
+ * DOC: GuC Relay Debug Actions
+ *
+ * This range of action codes is reserved for debugging purposes only and should
+ * be used only on debug builds. These actions may not be supported by the
+ * production drivers. Their definitions could be changed in the future.
+ *
+ *  _`GUC_RELAY_ACTION_DEBUG_ONLY_START` = 0xDEB0
+ *  _`GUC_RELAY_ACTION_DEBUG_ONLY_END` = 0xDEFF
+ */
+
+#define GUC_RELAY_ACTION_DEBUG_ONLY_START	0xDEB0
+#define GUC_RELAY_ACTION_DEBUG_ONLY_END		0xDEFF
+
+/**
+ * DOC: VFXPF_TESTLOOP
+ *
+ * This `Relay Message`_ is used to selftest the `GuC Relay Communication`_.
+ *
+ * The following opcodes are defined:
+ * VFXPF_TESTLOOP_OPCODE_NOP_ will return no data.
+ * VFXPF_TESTLOOP_OPCODE_BUSY_ will reply with BUSY response first.
+ * VFXPF_TESTLOOP_OPCODE_RETRY_ will reply with RETRY response instead.
+ * VFXPF_TESTLOOP_OPCODE_ECHO_ will return same data as received.
+ * VFXPF_TESTLOOP_OPCODE_FAIL_ will always fail with error.
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN = GUC_HXG_ORIGIN_HOST_                                |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ or GUC_HXG_TYPE_FAST_REQUEST_   |
+ *  |   |       | or GUC_HXG_TYPE_EVENT_                                       |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 27:16 | **OPCODE**                                                   |
+ *  |   |       |    - _`VFXPF_TESTLOOP_OPCODE_NOP` = 0x0                      |
+ *  |   |       |    - _`VFXPF_TESTLOOP_OPCODE_BUSY` = 0xB                     |
+ *  |   |       |    - _`VFXPF_TESTLOOP_OPCODE_RETRY` = 0xD                    |
+ *  |   |       |    - _`VFXPF_TESTLOOP_OPCODE_ECHO` = 0xE                     |
+ *  |   |       |    - _`VFXPF_TESTLOOP_OPCODE_FAIL` = 0xF                     |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  15:0 | ACTION = _`IOV_ACTION_SELFTEST_RELAY`                        |
+ *  +---+-------+--------------------------------------------------------------+
+ *  | 1 |  31:0 | **DATA1** = optional, depends on **OPCODE**:                 |
+ *  |   |       | for VFXPF_TESTLOOP_OPCODE_BUSY_: time in ms for reply        |
+ *  |   |       | for VFXPF_TESTLOOP_OPCODE_FAIL_: expected error              |
+ *  |   |       | for VFXPF_TESTLOOP_OPCODE_ECHO_: payload                     |
+ *  +---+-------+--------------------------------------------------------------+
+ *  |...|  31:0 | **DATAn** = only for **OPCODE** VFXPF_TESTLOOP_OPCODE_ECHO_  |
+ *  +---+-------+--------------------------------------------------------------+
+ *
+ *  +---+-------+--------------------------------------------------------------+
+ *  |   | Bits  | Description                                                  |
+ *  +===+=======+==============================================================+
+ *  | 0 |    31 | ORIGIN = GUC_HXG_ORIGIN_HOST_                                |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_                        |
+ *  |   +-------+--------------------------------------------------------------+
+ *  |   |  27:0 | DATA0 = MBZ                                                  |
+ *  +---+-------+--------------------------------------------------------------+
+ *  |...|  31:0 | DATAn = only for **OPCODE** VFXPF_TESTLOOP_OPCODE_ECHO_      |
+ *  +---+-------+--------------------------------------------------------------+
+ */
+#define GUC_RELAY_ACTION_VFXPF_TESTLOOP		(GUC_RELAY_ACTION_DEBUG_ONLY_START + 1)
+#define   VFXPF_TESTLOOP_OPCODE_NOP		0x0
+#define   VFXPF_TESTLOOP_OPCODE_BUSY		0xB
+#define   VFXPF_TESTLOOP_OPCODE_RETRY		0xD
+#define   VFXPF_TESTLOOP_OPCODE_ECHO		0xE
+#define   VFXPF_TESTLOOP_OPCODE_FAIL		0xF
+
+#endif
diff --git a/drivers/gpu/drm/xe/abi/guc_relay_communication_abi.h b/drivers/gpu/drm/xe/abi/guc_relay_communication_abi.h
new file mode 100644
index 00000000000000..f92625f047967a
--- /dev/null
+++ b/drivers/gpu/drm/xe/abi/guc_relay_communication_abi.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _ABI_GUC_RELAY_COMMUNICATION_ABI_H
+#define _ABI_GUC_RELAY_COMMUNICATION_ABI_H
+
+#include <linux/build_bug.h>
+
+#include "guc_actions_sriov_abi.h"
+#include "guc_communication_ctb_abi.h"
+#include "guc_messages_abi.h"
+
+/**
+ * DOC: GuC Relay Communication
+ *
+ * The communication between Virtual Function (VF) drivers and Physical Function
+ * (PF) drivers is based on the GuC firmware acting as a proxy (relay) agent.
+ *
+ * To communicate with the PF driver, VF's drivers use `VF2GUC_RELAY_TO_PF`_
+ * action that takes the `Relay Message`_ as opaque payload and requires the
+ * relay message identifier (RID) as additional parameter.
+ *
+ * This identifier is used by the drivers to match related messages.
+ *
+ * The GuC forwards this `Relay Message`_ and its identifier to the PF driver
+ * in `GUC2PF_RELAY_FROM_VF`_ action. This event message additionally contains
+ * the identifier of the origin VF (VFID).
+ *
+ * Likewise, to communicate with the VF drivers, PF driver use
+ * `VF2GUC_RELAY_TO_PF`_ action that in addition to the `Relay Message`_
+ * and the relay message identifier (RID) also takes the target VF identifier.
+ *
+ * The GuC uses this target VFID from the message to select where to send the
+ * `GUC2VF_RELAY_FROM_PF`_ with the embedded `Relay Message`_ with response::
+ *
+ *      VF                             GuC                              PF
+ *      |                               |                               |
+ *     [ ] VF2GUC_RELAY_TO_PF           |                               |
+ *     [ ]---------------------------> [ ]                              |
+ *     [ ] { rid, msg }                [ ]                              |
+ *     [ ]                             [ ] GUC2PF_RELAY_FROM_VF         |
+ *     [ ]                             [ ]---------------------------> [ ]
+ *     [ ]                              |  { VFID, rid, msg }          [ ]
+ *     [ ]                              |                              [ ]
+ *     [ ]                              |           PF2GUC_RELAY_TO_VF [ ]
+ *     [ ]                             [ ] <---------------------------[ ]
+ *     [ ]                             [ ]        { VFID, rid, reply }  |
+ *     [ ]        GUC2VF_RELAY_FROM_PF [ ]                              |
+ *     [ ] <---------------------------[ ]                              |
+ *      |               { rid, reply }  |                               |
+ *      |                               |                               |
+ *
+ * It is also possible that PF driver will initiate communication with the
+ * selected VF driver. The same GuC action messages will be used::
+ *
+ *      VF                             GuC                              PF
+ *      |                               |                               |
+ *      |                               |           PF2GUC_RELAY_TO_VF [ ]
+ *      |                              [ ] <---------------------------[ ]
+ *      |                              [ ]          { VFID, rid, msg } [ ]
+ *      |         GUC2VF_RELAY_FROM_PF [ ]                             [ ]
+ *     [ ] <---------------------------[ ]                             [ ]
+ *     [ ]                { rid, msg }  |                              [ ]
+ *     [ ]                              |                              [ ]
+ *     [ ] VF2GUC_RELAY_TO_PF           |                              [ ]
+ *     [ ]---------------------------> [ ]                             [ ]
+ *      |  { rid, reply }              [ ]                             [ ]
+ *      |                              [ ] GUC2PF_RELAY_FROM_VF        [ ]
+ *      |                              [ ]---------------------------> [ ]
+ *      |                               | { VFID, rid, reply }          |
+ *      |                               |                               |
+ */
+
+/**
+ * DOC: Relay Message
+ *
+ * The `Relay Message`_ is used by Physical Function (PF) driver and Virtual
+ * Function (VF) drivers to communicate using `GuC Relay Communication`_.
+ *
+ * Format of the `Relay Message`_ follows format of the generic `HXG Message`_.
+ *
+ *  +--------------------------------------------------------------------------+
+ *  |  `Relay Message`_                                                        |
+ *  +==========================================================================+
+ *  |  `HXG Message`_                                                          |
+ *  +--------------------------------------------------------------------------+
+ *
+ * Maximum length of the `Relay Message`_ is limited by the maximum length of
+ * the `CTB HXG Message`_ and format of the `GUC2PF_RELAY_FROM_VF`_ message.
+ */
+
+#define GUC_RELAY_MSG_MIN_LEN GUC_HXG_MSG_MIN_LEN
+#define GUC_RELAY_MSG_MAX_LEN \
+	(GUC_CTB_MAX_DWORDS - GUC2PF_RELAY_FROM_VF_EVENT_MSG_MIN_LEN)
+
+static_assert(PF2GUC_RELAY_TO_VF_REQUEST_MSG_MIN_LEN >
+	      VF2GUC_RELAY_TO_PF_REQUEST_MSG_MIN_LEN);
+
+/**
+ * DOC: Relay Error Codes
+ *
+ * The `GuC Relay Communication`_ can be used to pass `Relay Message`_ between
+ * drivers that run on different Operating Systems. To help in troubleshooting,
+ * `GuC Relay Communication`_ uses error codes that mostly match errno values.
+ */
+
+#define GUC_RELAY_ERROR_UNDISCLOSED			0
+#define GUC_RELAY_ERROR_OPERATION_NOT_PERMITTED		1	/* EPERM */
+#define GUC_RELAY_ERROR_PERMISSION_DENIED		13	/* EACCES */
+#define GUC_RELAY_ERROR_INVALID_ARGUMENT		22	/* EINVAL */
+#define GUC_RELAY_ERROR_INVALID_REQUEST_CODE		56	/* EBADRQC */
+#define GUC_RELAY_ERROR_NO_DATA_AVAILABLE		61	/* ENODATA */
+#define GUC_RELAY_ERROR_PROTOCOL_ERROR			71	/* EPROTO */
+#define GUC_RELAY_ERROR_MESSAGE_SIZE			90	/* EMSGSIZE */
+
+#endif

From 811fe9f556fcb281ea2db1b0fff3bab20f0a4d42 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 4 Jan 2024 23:20:28 +0100
Subject: [PATCH 038/200] drm/xe/guc: Introduce Relay Communication for SR-IOV
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

There are scenarios where SR-IOV Virtual Function (VF) driver will
need to get additional data that is not available over VF MMIO BAR
nor could be queried from the GuC firmware and must be obtained
from the Physical Function (PF) driver.

To allow such communication between VF and PF drivers, GuC supports
set of H2G and G2H actions which allows relaying embedded messages,
that are otherwise opaque for the GuC.

To allow use of this communication mechanism, provide functions for
sending requests and handling replies and placeholder where we will
put handlers for incoming requests.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-8-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/Makefile             |   1 +
 drivers/gpu/drm/xe/xe_guc.c             |   5 +
 drivers/gpu/drm/xe/xe_guc_relay.c       | 925 ++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_relay.h       |  37 +
 drivers/gpu/drm/xe/xe_guc_relay_types.h |  36 +
 drivers/gpu/drm/xe/xe_guc_types.h       |   4 +
 6 files changed, 1008 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_guc_relay.c
 create mode 100644 drivers/gpu/drm/xe/xe_guc_relay.h
 create mode 100644 drivers/gpu/drm/xe/xe_guc_relay_types.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index df8601d6a59f0c..6952da8979ea71 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -148,6 +148,7 @@ xe-$(CONFIG_HWMON) += xe_hwmon.o
 
 # graphics virtualization (SR-IOV) support
 xe-y += \
+	xe_guc_relay.o \
 	xe_memirq.o \
 	xe_sriov.o
 
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 811e8b20127055..311a0364bff159 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -21,6 +21,7 @@
 #include "xe_guc_hwconfig.h"
 #include "xe_guc_log.h"
 #include "xe_guc_pc.h"
+#include "xe_guc_relay.h"
 #include "xe_guc_submit.h"
 #include "xe_memirq.h"
 #include "xe_mmio.h"
@@ -263,6 +264,10 @@ int xe_guc_init(struct xe_guc *guc)
 	if (ret)
 		goto out;
 
+	ret = xe_guc_relay_init(&guc->relay);
+	if (ret)
+		goto out;
+
 	ret = xe_guc_pc_init(&guc->pc);
 	if (ret)
 		goto out;
diff --git a/drivers/gpu/drm/xe/xe_guc_relay.c b/drivers/gpu/drm/xe/xe_guc_relay.c
new file mode 100644
index 00000000000000..0e738325640822
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_guc_relay.c
@@ -0,0 +1,925 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/bitfield.h>
+#include <linux/delay.h>
+
+#include <drm/drm_managed.h>
+
+#include "abi/guc_actions_sriov_abi.h"
+#include "abi/guc_relay_actions_abi.h"
+#include "abi/guc_relay_communication_abi.h"
+
+#include "xe_assert.h"
+#include "xe_device.h"
+#include "xe_gt.h"
+#include "xe_gt_sriov_printk.h"
+#include "xe_guc.h"
+#include "xe_guc_ct.h"
+#include "xe_guc_hxg_helpers.h"
+#include "xe_guc_relay.h"
+#include "xe_guc_relay_types.h"
+#include "xe_sriov.h"
+
+/*
+ * How long should we wait for the response?
+ * XXX this value is subject for the profiling.
+ */
+#define RELAY_TIMEOUT_MSEC	(2500)
+
+static void relays_worker_fn(struct work_struct *w);
+
+static struct xe_guc *relay_to_guc(struct xe_guc_relay *relay)
+{
+	return container_of(relay, struct xe_guc, relay);
+}
+
+static struct xe_guc_ct *relay_to_ct(struct xe_guc_relay *relay)
+{
+	return &relay_to_guc(relay)->ct;
+}
+
+static struct xe_gt *relay_to_gt(struct xe_guc_relay *relay)
+{
+	return guc_to_gt(relay_to_guc(relay));
+}
+
+static struct xe_device *relay_to_xe(struct xe_guc_relay *relay)
+{
+	return gt_to_xe(relay_to_gt(relay));
+}
+
+#define relay_assert(relay, condition)	xe_gt_assert(relay_to_gt(relay), condition)
+#define relay_notice(relay, msg...)	xe_gt_sriov_notice(relay_to_gt(relay), "relay: " msg)
+#define relay_debug(relay, msg...)	xe_gt_sriov_dbg_verbose(relay_to_gt(relay), "relay: " msg)
+
+static int relay_get_totalvfs(struct xe_guc_relay *relay)
+{
+	struct xe_device *xe = relay_to_xe(relay);
+	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+
+	return IS_SRIOV_VF(xe) ? 0 : pci_sriov_get_totalvfs(pdev);
+}
+
+static bool relay_is_ready(struct xe_guc_relay *relay)
+{
+	return mempool_initialized(&relay->pool);
+}
+
+static u32 relay_get_next_rid(struct xe_guc_relay *relay)
+{
+	u32 rid;
+
+	spin_lock(&relay->lock);
+	rid = ++relay->last_rid;
+	spin_unlock(&relay->lock);
+
+	return rid;
+}
+
+/**
+ * struct relay_transaction - internal data used to handle transactions
+ *
+ * Relation between struct relay_transaction members::
+ *
+ *                 <-------------------- GUC_CTB_MAX_DWORDS -------------->
+ *                                  <-------- GUC_RELAY_MSG_MAX_LEN --->
+ *                 <--- offset ---> <--- request_len ------->
+ *                +----------------+-------------------------+----------+--+
+ *                |                |                         |          |  |
+ *                +----------------+-------------------------+----------+--+
+ *                ^                ^
+ *               /                /
+ *    request_buf          request
+ *
+ *                 <-------------------- GUC_CTB_MAX_DWORDS -------------->
+ *                                  <-------- GUC_RELAY_MSG_MAX_LEN --->
+ *                 <--- offset ---> <--- response_len --->
+ *                +----------------+----------------------+-------------+--+
+ *                |                |                      |             |  |
+ *                +----------------+----------------------+-------------+--+
+ *                ^                ^
+ *               /                /
+ *   response_buf         response
+ */
+struct relay_transaction {
+	/**
+	 * @incoming: indicates whether this transaction represents an incoming
+	 *            request from the remote VF/PF or this transaction
+	 *            represents outgoing request to the remote VF/PF.
+	 */
+	bool incoming;
+
+	/**
+	 * @remote: PF/VF identifier of the origin (or target) of the relay
+	 *          request message.
+	 */
+	u32 remote;
+
+	/** @rid: identifier of the VF/PF relay message. */
+	u32 rid;
+
+	/**
+	 * @request: points to the inner VF/PF request message, copied to the
+	 *           #response_buf starting at #offset.
+	 */
+	u32 *request;
+
+	/** @request_len: length of the inner VF/PF request message. */
+	u32 request_len;
+
+	/**
+	 * @response: points to the placeholder buffer where inner VF/PF
+	 *            response will be located, for outgoing transaction
+	 *            this could be caller's buffer (if provided) otherwise
+	 *            it points to the #response_buf starting at #offset.
+	 */
+	u32 *response;
+
+	/**
+	 * @response_len: length of the inner VF/PF response message (only
+	 *                if #status is 0), initially set to the size of the
+	 *                placeholder buffer where response message will be
+	 *                copied.
+	 */
+	u32 response_len;
+
+	/**
+	 * @offset: offset to the start of the inner VF/PF relay message inside
+	 *          buffers; this offset is equal the length of the outer GuC
+	 *          relay header message.
+	 */
+	u32 offset;
+
+	/**
+	 * @request_buf: buffer with VF/PF request message including outer
+	 *               transport message.
+	 */
+	u32 request_buf[GUC_CTB_MAX_DWORDS];
+
+	/**
+	 * @response_buf: buffer with VF/PF response message including outer
+	 *                transport message.
+	 */
+	u32 response_buf[GUC_CTB_MAX_DWORDS];
+
+	/**
+	 * @reply: status of the reply, 0 means that data pointed by the
+	 *         #response is valid.
+	 */
+	int reply;
+
+	/** @done: completion of the outgoing transaction. */
+	struct completion done;
+
+	/** @link: transaction list link */
+	struct list_head link;
+};
+
+static u32 prepare_pf2guc(u32 *msg, u32 target, u32 rid)
+{
+	msg[0] = FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		 FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, XE_GUC_ACTION_PF2GUC_RELAY_TO_VF);
+	msg[1] = FIELD_PREP(PF2GUC_RELAY_TO_VF_REQUEST_MSG_1_VFID, target);
+	msg[2] = FIELD_PREP(PF2GUC_RELAY_TO_VF_REQUEST_MSG_2_RELAY_ID, rid);
+
+	return PF2GUC_RELAY_TO_VF_REQUEST_MSG_MIN_LEN;
+}
+
+static u32 prepare_vf2guc(u32 *msg, u32 rid)
+{
+	msg[0] = FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		 FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, XE_GUC_ACTION_VF2GUC_RELAY_TO_PF);
+	msg[1] = FIELD_PREP(VF2GUC_RELAY_TO_PF_REQUEST_MSG_1_RELAY_ID, rid);
+
+	return VF2GUC_RELAY_TO_PF_REQUEST_MSG_MIN_LEN;
+}
+
+static struct relay_transaction *
+__relay_get_transaction(struct xe_guc_relay *relay, bool incoming, u32 remote, u32 rid,
+			const u32 *action, u32 action_len, u32 *resp, u32 resp_size)
+{
+	struct relay_transaction *txn;
+
+	relay_assert(relay, action_len >= GUC_RELAY_MSG_MIN_LEN);
+	relay_assert(relay, action_len <= GUC_RELAY_MSG_MAX_LEN);
+	relay_assert(relay, !(!!resp ^ !!resp_size));
+	relay_assert(relay, resp_size <= GUC_RELAY_MSG_MAX_LEN);
+	relay_assert(relay, resp_size == 0 || resp_size >= GUC_RELAY_MSG_MIN_LEN);
+
+	if (unlikely(!relay_is_ready(relay)))
+		return ERR_PTR(-ENODEV);
+
+	/*
+	 * For incoming requests we can't use GFP_KERNEL as those are delivered
+	 * with CTB lock held which is marked as used in the reclaim path.
+	 * Btw, that's one of the reason why we use mempool here!
+	 */
+	txn = mempool_alloc(&relay->pool, incoming ? GFP_ATOMIC : GFP_KERNEL);
+	if (!txn)
+		return ERR_PTR(-ENOMEM);
+
+	txn->incoming = incoming;
+	txn->remote = remote;
+	txn->rid = rid;
+	txn->offset = remote ?
+		prepare_pf2guc(incoming ? txn->response_buf : txn->request_buf, remote, rid) :
+		prepare_vf2guc(incoming ? txn->response_buf : txn->request_buf, rid);
+
+	relay_assert(relay, txn->offset);
+	relay_assert(relay, txn->offset + GUC_RELAY_MSG_MAX_LEN <= ARRAY_SIZE(txn->request_buf));
+	relay_assert(relay, txn->offset + GUC_RELAY_MSG_MAX_LEN <= ARRAY_SIZE(txn->response_buf));
+
+	txn->request = txn->request_buf + txn->offset;
+	memcpy(&txn->request_buf[txn->offset], action, sizeof(u32) * action_len);
+	txn->request_len = action_len;
+
+	txn->response = resp ?: txn->response_buf + txn->offset;
+	txn->response_len = resp_size ?: GUC_RELAY_MSG_MAX_LEN;
+	txn->reply = -ENOMSG;
+	INIT_LIST_HEAD(&txn->link);
+	init_completion(&txn->done);
+
+	return txn;
+}
+
+static struct relay_transaction *
+relay_new_transaction(struct xe_guc_relay *relay, u32 target, const u32 *action, u32 len,
+		      u32 *resp, u32 resp_size)
+{
+	u32 rid = relay_get_next_rid(relay);
+
+	return __relay_get_transaction(relay, false, target, rid, action, len, resp, resp_size);
+}
+
+static struct relay_transaction *
+relay_new_incoming_transaction(struct xe_guc_relay *relay, u32 origin, u32 rid,
+			       const u32 *action, u32 len)
+{
+	return __relay_get_transaction(relay, true, origin, rid, action, len, NULL, 0);
+}
+
+static void relay_release_transaction(struct xe_guc_relay *relay, struct relay_transaction *txn)
+{
+	relay_assert(relay, list_empty(&txn->link));
+
+	txn->offset = 0;
+	txn->response = NULL;
+	txn->reply = -ESTALE;
+	mempool_free(txn, &relay->pool);
+}
+
+static int relay_send_transaction(struct xe_guc_relay *relay, struct relay_transaction *txn)
+{
+	u32 len = txn->incoming ? txn->response_len : txn->request_len;
+	u32 *buf = txn->incoming ? txn->response_buf : txn->request_buf;
+	u32 *msg = buf + txn->offset;
+	int ret;
+
+	relay_assert(relay, txn->offset);
+	relay_assert(relay, txn->offset + len <= GUC_CTB_MAX_DWORDS);
+	relay_assert(relay, len >= GUC_RELAY_MSG_MIN_LEN);
+	relay_assert(relay, len <= GUC_RELAY_MSG_MAX_LEN);
+
+	relay_debug(relay, "sending %s.%u to %u = %*ph\n",
+		    guc_hxg_type_to_string(FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0])),
+		    txn->rid, txn->remote, (int)sizeof(u32) * len, msg);
+
+	ret = xe_guc_ct_send_block(relay_to_ct(relay), buf, len + txn->offset);
+
+	if (unlikely(ret > 0)) {
+		relay_notice(relay, "Unexpected data=%d from GuC, wrong ABI?\n", ret);
+		ret = -EPROTO;
+	}
+	if (unlikely(ret < 0)) {
+		relay_notice(relay, "Failed to send %s.%x to GuC (%pe) %*ph ...\n",
+			     guc_hxg_type_to_string(FIELD_GET(GUC_HXG_MSG_0_TYPE, buf[0])),
+			     FIELD_GET(GUC_HXG_REQUEST_MSG_0_ACTION, buf[0]),
+			     ERR_PTR(ret), (int)sizeof(u32) * txn->offset, buf);
+		relay_notice(relay, "Failed to send %s.%u to %u (%pe) %*ph\n",
+			     guc_hxg_type_to_string(FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0])),
+			     txn->rid, txn->remote, ERR_PTR(ret), (int)sizeof(u32) * len, msg);
+	}
+
+	return ret;
+}
+
+static void __fini_relay(struct drm_device *drm, void *arg)
+{
+	struct xe_guc_relay *relay = arg;
+
+	mempool_exit(&relay->pool);
+}
+
+/**
+ * xe_guc_relay_init - Initialize a &xe_guc_relay
+ * @relay: the &xe_guc_relay to initialize
+ *
+ * Initialize remaining members of &xe_guc_relay that may depend
+ * on the SR-IOV mode.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_guc_relay_init(struct xe_guc_relay *relay)
+{
+	const int XE_RELAY_MEMPOOL_MIN_NUM = 1;
+	struct xe_device *xe = relay_to_xe(relay);
+	int err;
+
+	relay_assert(relay, !relay_is_ready(relay));
+
+	if (!IS_SRIOV(xe))
+		return 0;
+
+	spin_lock_init(&relay->lock);
+	INIT_WORK(&relay->worker, relays_worker_fn);
+	INIT_LIST_HEAD(&relay->pending_relays);
+	INIT_LIST_HEAD(&relay->incoming_actions);
+
+	err = mempool_init_kmalloc_pool(&relay->pool, XE_RELAY_MEMPOOL_MIN_NUM +
+					relay_get_totalvfs(relay),
+					sizeof(struct relay_transaction));
+	if (err)
+		return err;
+
+	relay_debug(relay, "using mempool with %d elements\n", relay->pool.min_nr);
+
+	return drmm_add_action_or_reset(&xe->drm, __fini_relay, relay);
+}
+
+static u32 to_relay_error(int err)
+{
+	/* XXX: assume that relay errors match errno codes */
+	return err < 0 ? -err : GUC_RELAY_ERROR_UNDISCLOSED;
+}
+
+static int from_relay_error(u32 error)
+{
+	/* XXX: assume that relay errors match errno codes */
+	return error ? -error : -ENODATA;
+}
+
+static u32 sanitize_relay_error(u32 error)
+{
+	/* XXX TBD if generic error codes will be allowed */
+	if (!IS_ENABLED(CONFIG_DRM_XE_DEBUG))
+		error = GUC_RELAY_ERROR_UNDISCLOSED;
+	return error;
+}
+
+static u32 sanitize_relay_error_hint(u32 hint)
+{
+	/* XXX TBD if generic error codes will be allowed */
+	if (!IS_ENABLED(CONFIG_DRM_XE_DEBUG))
+		hint = 0;
+	return hint;
+}
+
+static u32 prepare_error_reply(u32 *msg, u32 error, u32 hint)
+{
+	msg[0] = FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		 FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_RESPONSE_FAILURE) |
+		 FIELD_PREP(GUC_HXG_FAILURE_MSG_0_HINT, hint) |
+		 FIELD_PREP(GUC_HXG_FAILURE_MSG_0_ERROR, error);
+
+	XE_WARN_ON(!FIELD_FIT(GUC_HXG_FAILURE_MSG_0_ERROR, error));
+	XE_WARN_ON(!FIELD_FIT(GUC_HXG_FAILURE_MSG_0_HINT, hint));
+
+	return GUC_HXG_FAILURE_MSG_LEN;
+}
+
+static int relay_send_message_and_wait(struct xe_guc_relay *relay,
+				       struct relay_transaction *txn,
+				       u32 *buf, u32 buf_size)
+{
+	unsigned long timeout = msecs_to_jiffies(RELAY_TIMEOUT_MSEC);
+	u32 *msg = &txn->request_buf[txn->offset];
+	u32 len = txn->request_len;
+	u32 type, action, data0;
+	int ret;
+	long n;
+
+	type = FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0]);
+	action = FIELD_GET(GUC_HXG_REQUEST_MSG_0_ACTION, msg[0]);
+	data0 = FIELD_GET(GUC_HXG_REQUEST_MSG_0_DATA0, msg[0]);
+
+	relay_debug(relay, "%s.%u to %u action %#x:%u\n",
+		    guc_hxg_type_to_string(type),
+		    txn->rid, txn->remote, action, data0);
+
+	/* list ordering does not need to match RID ordering */
+	spin_lock(&relay->lock);
+	list_add_tail(&txn->link, &relay->pending_relays);
+	spin_unlock(&relay->lock);
+
+resend:
+	ret = relay_send_transaction(relay, txn);
+	if (unlikely(ret < 0))
+		goto unlink;
+
+wait:
+	n = wait_for_completion_timeout(&txn->done, timeout);
+	if (unlikely(n == 0 && txn->reply)) {
+		ret = -ETIME;
+		goto unlink;
+	}
+
+	relay_debug(relay, "%u.%u reply %d after %u msec\n",
+		    txn->remote, txn->rid, txn->reply, jiffies_to_msecs(timeout - n));
+	if (unlikely(txn->reply)) {
+		reinit_completion(&txn->done);
+		if (txn->reply == -EAGAIN)
+			goto resend;
+		if (txn->reply == -EBUSY)
+			goto wait;
+		if (txn->reply > 0)
+			ret = from_relay_error(txn->reply);
+		else
+			ret = txn->reply;
+		goto unlink;
+	}
+
+	relay_debug(relay, "%u.%u response %*ph\n", txn->remote, txn->rid,
+		    (int)sizeof(u32) * txn->response_len, txn->response);
+	relay_assert(relay, txn->response_len >= GUC_RELAY_MSG_MIN_LEN);
+	ret = txn->response_len;
+
+unlink:
+	spin_lock(&relay->lock);
+	list_del_init(&txn->link);
+	spin_unlock(&relay->lock);
+
+	if (unlikely(ret < 0)) {
+		relay_notice(relay, "Unsuccessful %s.%u %#x:%u to %u (%pe) %*ph\n",
+			     guc_hxg_type_to_string(type), txn->rid,
+			     action, data0, txn->remote, ERR_PTR(ret),
+			     (int)sizeof(u32) * len, msg);
+	}
+
+	return ret;
+}
+
+static int relay_send_to(struct xe_guc_relay *relay, u32 target,
+			 const u32 *msg, u32 len, u32 *buf, u32 buf_size)
+{
+	struct relay_transaction *txn;
+	int ret;
+
+	relay_assert(relay, len >= GUC_RELAY_MSG_MIN_LEN);
+	relay_assert(relay, len <= GUC_RELAY_MSG_MAX_LEN);
+	relay_assert(relay, FIELD_GET(GUC_HXG_MSG_0_ORIGIN, msg[0]) == GUC_HXG_ORIGIN_HOST);
+	relay_assert(relay, guc_hxg_type_is_action(FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0])));
+
+	if (unlikely(!relay_is_ready(relay)))
+		return -ENODEV;
+
+	txn = relay_new_transaction(relay, target, msg, len, buf, buf_size);
+	if (IS_ERR(txn))
+		return PTR_ERR(txn);
+
+	switch (FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0])) {
+	case GUC_HXG_TYPE_REQUEST:
+		ret = relay_send_message_and_wait(relay, txn, buf, buf_size);
+		break;
+	case GUC_HXG_TYPE_FAST_REQUEST:
+		relay_assert(relay, !GUC_HXG_TYPE_FAST_REQUEST);
+		fallthrough;
+	case GUC_HXG_TYPE_EVENT:
+		ret = relay_send_transaction(relay, txn);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	relay_release_transaction(relay, txn);
+	return ret;
+}
+
+#ifdef CONFIG_PCI_IOV
+/**
+ * xe_guc_relay_send_to_vf - Send a message to the VF.
+ * @relay: the &xe_guc_relay which will send the message
+ * @target: target VF number
+ * @msg: request message to be sent
+ * @len: length of the request message (in dwords, can't be 0)
+ * @buf: placeholder for the response message
+ * @buf_size: size of the response message placeholder (in dwords)
+ *
+ * This function can only be used by the driver running in the SR-IOV PF mode.
+ *
+ * Return: Non-negative response length (in dwords) or
+ *         a negative error code on failure.
+ */
+int xe_guc_relay_send_to_vf(struct xe_guc_relay *relay, u32 target,
+			    const u32 *msg, u32 len, u32 *buf, u32 buf_size)
+{
+	relay_assert(relay, IS_SRIOV_PF(relay_to_xe(relay)));
+
+	return relay_send_to(relay, target, msg, len, buf, buf_size);
+}
+#endif
+
+/**
+ * xe_guc_relay_send_to_pf - Send a message to the PF.
+ * @relay: the &xe_guc_relay which will send the message
+ * @msg: request message to be sent
+ * @len: length of the message (in dwords, can't be 0)
+ * @buf: placeholder for the response message
+ * @buf_size: size of the response message placeholder (in dwords)
+ *
+ * This function can only be used by driver running in SR-IOV VF mode.
+ *
+ * Return: Non-negative response length (in dwords) or
+ *         a negative error code on failure.
+ */
+int xe_guc_relay_send_to_pf(struct xe_guc_relay *relay,
+			    const u32 *msg, u32 len, u32 *buf, u32 buf_size)
+{
+	relay_assert(relay, IS_SRIOV_VF(relay_to_xe(relay)));
+
+	return relay_send_to(relay, PFID, msg, len, buf, buf_size);
+}
+
+static int relay_handle_reply(struct xe_guc_relay *relay, u32 origin,
+			      u32 rid, int reply, const u32 *msg, u32 len)
+{
+	struct relay_transaction *pending;
+	int err = -ESRCH;
+
+	spin_lock(&relay->lock);
+	list_for_each_entry(pending, &relay->pending_relays, link) {
+		if (pending->remote != origin || pending->rid != rid) {
+			relay_debug(relay, "%u.%u still awaits response\n",
+				    pending->remote, pending->rid);
+			continue;
+		}
+		err = 0; /* found! */
+		if (reply == 0) {
+			if (len > pending->response_len) {
+				reply = -ENOBUFS;
+				err = -ENOBUFS;
+			} else {
+				memcpy(pending->response, msg, 4 * len);
+				pending->response_len = len;
+			}
+		}
+		pending->reply = reply;
+		complete_all(&pending->done);
+		break;
+	}
+	spin_unlock(&relay->lock);
+
+	return err;
+}
+
+static int relay_handle_failure(struct xe_guc_relay *relay, u32 origin,
+				u32 rid, const u32 *msg, u32 len)
+{
+	int error = FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, msg[0]);
+	u32 hint __maybe_unused = FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, msg[0]);
+
+	relay_assert(relay, len);
+	relay_debug(relay, "%u.%u error %#x (%pe) hint %u debug %*ph\n",
+		    origin, rid, error, ERR_PTR(-error), hint, 4 * (len - 1), msg + 1);
+
+	return relay_handle_reply(relay, origin, rid, error ?: -EREMOTEIO, NULL, 0);
+}
+
+static int relay_testloop_action_handler(struct xe_guc_relay *relay, u32 origin,
+					 const u32 *msg, u32 len, u32 *response, u32 size)
+{
+	static ktime_t last_reply = 0;
+	u32 type = FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0]);
+	u32 action = FIELD_GET(GUC_HXG_REQUEST_MSG_0_ACTION, msg[0]);
+	u32 opcode = FIELD_GET(GUC_HXG_REQUEST_MSG_0_DATA0, msg[0]);
+	ktime_t now = ktime_get();
+	bool busy;
+	int ret;
+
+	relay_assert(relay, guc_hxg_type_is_action(type));
+	relay_assert(relay, action == GUC_RELAY_ACTION_VFXPF_TESTLOOP);
+
+	if (!IS_ENABLED(CONFIG_DRM_XE_DEBUG_SRIOV))
+		return -ECONNREFUSED;
+
+	if (!last_reply)
+		last_reply = now;
+	busy = ktime_before(now, ktime_add_ms(last_reply, 2 * RELAY_TIMEOUT_MSEC));
+	if (!busy)
+		last_reply = now;
+
+	switch (opcode) {
+	case VFXPF_TESTLOOP_OPCODE_NOP:
+		if (type == GUC_HXG_TYPE_EVENT)
+			return 0;
+		return guc_hxg_msg_encode_success(response, 0);
+	case VFXPF_TESTLOOP_OPCODE_BUSY:
+		if (type == GUC_HXG_TYPE_EVENT)
+			return -EPROTO;
+		msleep(RELAY_TIMEOUT_MSEC / 8);
+		if (busy)
+			return -EINPROGRESS;
+		return guc_hxg_msg_encode_success(response, 0);
+	case VFXPF_TESTLOOP_OPCODE_RETRY:
+		if (type == GUC_HXG_TYPE_EVENT)
+			return -EPROTO;
+		msleep(RELAY_TIMEOUT_MSEC / 8);
+		if (busy)
+			return guc_hxg_msg_encode_retry(response, 0);
+		return guc_hxg_msg_encode_success(response, 0);
+	case VFXPF_TESTLOOP_OPCODE_ECHO:
+		if (type == GUC_HXG_TYPE_EVENT)
+			return -EPROTO;
+		if (size < len)
+			return -ENOBUFS;
+		ret = guc_hxg_msg_encode_success(response, len);
+		memcpy(response + ret, msg + ret, (len - ret) * sizeof(u32));
+		return len;
+	case VFXPF_TESTLOOP_OPCODE_FAIL:
+		return -EHWPOISON;
+	default:
+		break;
+	}
+
+	relay_notice(relay, "Unexpected action %#x opcode %#x\n", action, opcode);
+	return -EBADRQC;
+}
+
+static int relay_action_handler(struct xe_guc_relay *relay, u32 origin,
+				const u32 *msg, u32 len, u32 *response, u32 size)
+{
+	u32 type;
+	int ret;
+
+	relay_assert(relay, len >= GUC_HXG_MSG_MIN_LEN);
+
+	if (FIELD_GET(GUC_HXG_REQUEST_MSG_0_ACTION, msg[0]) == GUC_RELAY_ACTION_VFXPF_TESTLOOP)
+		return relay_testloop_action_handler(relay, origin, msg, len, response, size);
+
+	type = FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0]);
+
+	/* XXX: PF services will be added later */
+	ret = -EOPNOTSUPP;
+
+	if (type == GUC_HXG_TYPE_EVENT)
+		relay_assert(relay, ret <= 0);
+
+	return ret;
+}
+
+static struct relay_transaction *relay_dequeue_transaction(struct xe_guc_relay *relay)
+{
+	struct relay_transaction *txn;
+
+	spin_lock(&relay->lock);
+	txn = list_first_entry_or_null(&relay->incoming_actions, struct relay_transaction, link);
+	if (txn)
+		list_del_init(&txn->link);
+	spin_unlock(&relay->lock);
+
+	return txn;
+}
+
+static void relay_process_incoming_action(struct xe_guc_relay *relay)
+{
+	struct relay_transaction *txn;
+	bool again = false;
+	u32 type;
+	int ret;
+
+	txn = relay_dequeue_transaction(relay);
+	if (!txn)
+		return;
+
+	type = FIELD_GET(GUC_HXG_MSG_0_TYPE, txn->request_buf[txn->offset]);
+
+	ret = relay_action_handler(relay, txn->remote,
+				   txn->request_buf + txn->offset, txn->request_len,
+				   txn->response_buf + txn->offset,
+				   ARRAY_SIZE(txn->response_buf) - txn->offset);
+
+	if (ret == -EINPROGRESS) {
+		again = true;
+		ret = guc_hxg_msg_encode_busy(txn->response_buf + txn->offset, 0);
+	}
+
+	if (ret > 0) {
+		txn->response_len = ret;
+		ret = relay_send_transaction(relay, txn);
+	}
+
+	if (ret < 0) {
+		u32 error = to_relay_error(ret);
+
+		relay_notice(relay, "Failed to handle %s.%u from %u (%pe) %*ph\n",
+			     guc_hxg_type_to_string(type), txn->rid, txn->remote,
+			     ERR_PTR(ret), 4 * txn->request_len, txn->request_buf + txn->offset);
+
+		txn->response_len = prepare_error_reply(txn->response_buf + txn->offset,
+							txn->remote ?
+							sanitize_relay_error(error) : error,
+							txn->remote ?
+							sanitize_relay_error_hint(-ret) : -ret);
+		ret = relay_send_transaction(relay, txn);
+		again = false;
+	}
+
+	if (again) {
+		spin_lock(&relay->lock);
+		list_add(&txn->link, &relay->incoming_actions);
+		spin_unlock(&relay->lock);
+		return;
+	}
+
+	if (unlikely(ret < 0))
+		relay_notice(relay, "Failed to process action.%u (%pe) %*ph\n",
+			     txn->rid, ERR_PTR(ret), 4 * txn->request_len,
+			     txn->request_buf + txn->offset);
+
+	relay_release_transaction(relay, txn);
+}
+
+static bool relay_needs_worker(struct xe_guc_relay *relay)
+{
+	return !list_empty(&relay->incoming_actions);
+}
+
+static void relay_kick_worker(struct xe_guc_relay *relay)
+{
+	queue_work(relay_to_xe(relay)->sriov.wq, &relay->worker);
+}
+
+static void relays_worker_fn(struct work_struct *w)
+{
+	struct xe_guc_relay *relay = container_of(w, struct xe_guc_relay, worker);
+
+	relay_process_incoming_action(relay);
+
+	if (relay_needs_worker(relay))
+		relay_kick_worker(relay);
+}
+
+static int relay_queue_action_msg(struct xe_guc_relay *relay, u32 origin, u32 rid,
+				  const u32 *msg, u32 len)
+{
+	struct relay_transaction *txn;
+
+	txn = relay_new_incoming_transaction(relay, origin, rid, msg, len);
+	if (IS_ERR(txn))
+		return PTR_ERR(txn);
+
+	spin_lock(&relay->lock);
+	list_add_tail(&txn->link, &relay->incoming_actions);
+	spin_unlock(&relay->lock);
+
+	relay_kick_worker(relay);
+	return 0;
+}
+
+static int relay_process_msg(struct xe_guc_relay *relay, u32 origin, u32 rid,
+			     const u32 *msg, u32 len)
+{
+	u32 type;
+	int err;
+
+	if (unlikely(len < GUC_HXG_MSG_MIN_LEN))
+		return -EPROTO;
+
+	if (FIELD_GET(GUC_HXG_MSG_0_ORIGIN, msg[0]) != GUC_HXG_ORIGIN_HOST)
+		return -EPROTO;
+
+	type = FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0]);
+	relay_debug(relay, "received %s.%u from %u = %*ph\n",
+		    guc_hxg_type_to_string(type), rid, origin, 4 * len, msg);
+
+	switch (type) {
+	case GUC_HXG_TYPE_REQUEST:
+	case GUC_HXG_TYPE_FAST_REQUEST:
+	case GUC_HXG_TYPE_EVENT:
+		err = relay_queue_action_msg(relay, origin, rid, msg, len);
+		break;
+	case GUC_HXG_TYPE_RESPONSE_SUCCESS:
+		err = relay_handle_reply(relay, origin, rid, 0, msg, len);
+		break;
+	case GUC_HXG_TYPE_NO_RESPONSE_BUSY:
+		err = relay_handle_reply(relay, origin, rid, -EBUSY, NULL, 0);
+		break;
+	case GUC_HXG_TYPE_NO_RESPONSE_RETRY:
+		err = relay_handle_reply(relay, origin, rid, -EAGAIN, NULL, 0);
+		break;
+	case GUC_HXG_TYPE_RESPONSE_FAILURE:
+		err = relay_handle_failure(relay, origin, rid, msg, len);
+		break;
+	default:
+		err = -EBADRQC;
+	}
+
+	if (unlikely(err))
+		relay_notice(relay, "Failed to process %s.%u from %u (%pe) %*ph\n",
+			     guc_hxg_type_to_string(type), rid, origin,
+			     ERR_PTR(err), 4 * len, msg);
+
+	return err;
+}
+
+/**
+ * xe_guc_relay_process_guc2vf - Handle relay notification message from the GuC.
+ * @relay: the &xe_guc_relay which will handle the message
+ * @msg: message to be handled
+ * @len: length of the message (in dwords)
+ *
+ * This function will handle relay messages received from the GuC.
+ *
+ * This function is can only be used if driver is running in SR-IOV mode.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_guc_relay_process_guc2vf(struct xe_guc_relay *relay, const u32 *msg, u32 len)
+{
+	u32 rid;
+
+	relay_assert(relay, len >= GUC_HXG_MSG_MIN_LEN);
+	relay_assert(relay, FIELD_GET(GUC_HXG_MSG_0_ORIGIN, msg[0]) == GUC_HXG_ORIGIN_GUC);
+	relay_assert(relay, FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0]) == GUC_HXG_TYPE_EVENT);
+	relay_assert(relay, FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, msg[0]) ==
+		     XE_GUC_ACTION_GUC2VF_RELAY_FROM_PF);
+
+	if (unlikely(!IS_SRIOV_VF(relay_to_xe(relay))))
+		return -EPERM;
+
+	if (unlikely(!relay_is_ready(relay)))
+		return -ENODEV;
+
+	if (unlikely(len < GUC2VF_RELAY_FROM_PF_EVENT_MSG_MIN_LEN))
+		return -EPROTO;
+
+	if (unlikely(len > GUC2VF_RELAY_FROM_PF_EVENT_MSG_MAX_LEN))
+		return -EMSGSIZE;
+
+	if (unlikely(FIELD_GET(GUC_HXG_EVENT_MSG_0_DATA0, msg[0])))
+		return -EPFNOSUPPORT;
+
+	rid = FIELD_GET(GUC2VF_RELAY_FROM_PF_EVENT_MSG_1_RELAY_ID, msg[1]);
+
+	return relay_process_msg(relay, PFID, rid,
+				 msg + GUC2VF_RELAY_FROM_PF_EVENT_MSG_MIN_LEN,
+				 len - GUC2VF_RELAY_FROM_PF_EVENT_MSG_MIN_LEN);
+}
+
+#ifdef CONFIG_PCI_IOV
+/**
+ * xe_guc_relay_process_guc2pf - Handle relay notification message from the GuC.
+ * @relay: the &xe_guc_relay which will handle the message
+ * @msg: message to be handled
+ * @len: length of the message (in dwords)
+ *
+ * This function will handle relay messages received from the GuC.
+ *
+ * This function can only be used if driver is running in SR-IOV PF mode.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_guc_relay_process_guc2pf(struct xe_guc_relay *relay, const u32 *msg, u32 len)
+{
+	u32 origin, rid;
+	int err;
+
+	relay_assert(relay, len >= GUC_HXG_EVENT_MSG_MIN_LEN);
+	relay_assert(relay, FIELD_GET(GUC_HXG_MSG_0_ORIGIN, msg[0]) == GUC_HXG_ORIGIN_GUC);
+	relay_assert(relay, FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0]) == GUC_HXG_TYPE_EVENT);
+	relay_assert(relay, FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, msg[0]) ==
+		     XE_GUC_ACTION_GUC2PF_RELAY_FROM_VF);
+
+	if (unlikely(!IS_SRIOV_PF(relay_to_xe(relay))))
+		return -EPERM;
+
+	if (unlikely(!relay_is_ready(relay)))
+		return -ENODEV;
+
+	if (unlikely(len < GUC2PF_RELAY_FROM_VF_EVENT_MSG_MIN_LEN))
+		return -EPROTO;
+
+	if (unlikely(len > GUC2PF_RELAY_FROM_VF_EVENT_MSG_MAX_LEN))
+		return -EMSGSIZE;
+
+	if (unlikely(FIELD_GET(GUC_HXG_EVENT_MSG_0_DATA0, msg[0])))
+		return -EPFNOSUPPORT;
+
+	origin = FIELD_GET(GUC2PF_RELAY_FROM_VF_EVENT_MSG_1_VFID, msg[1]);
+	rid = FIELD_GET(GUC2PF_RELAY_FROM_VF_EVENT_MSG_2_RELAY_ID, msg[2]);
+
+	if (unlikely(origin > relay_get_totalvfs(relay)))
+		return -ENOENT;
+
+	err = relay_process_msg(relay, origin, rid,
+				msg + GUC2PF_RELAY_FROM_VF_EVENT_MSG_MIN_LEN,
+				len - GUC2PF_RELAY_FROM_VF_EVENT_MSG_MIN_LEN);
+
+	return err;
+}
+#endif
diff --git a/drivers/gpu/drm/xe/xe_guc_relay.h b/drivers/gpu/drm/xe/xe_guc_relay.h
new file mode 100644
index 00000000000000..385429aa188ab4
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_guc_relay.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_GUC_RELAY_H_
+#define _XE_GUC_RELAY_H_
+
+#include <linux/types.h>
+#include <linux/errno.h>
+
+struct xe_guc_relay;
+
+int xe_guc_relay_init(struct xe_guc_relay *relay);
+
+int xe_guc_relay_send_to_pf(struct xe_guc_relay *relay,
+			    const u32 *msg, u32 len, u32 *buf, u32 buf_size);
+
+int xe_guc_relay_process_guc2vf(struct xe_guc_relay *relay, const u32 *msg, u32 len);
+
+#ifdef CONFIG_PCI_IOV
+int xe_guc_relay_send_to_vf(struct xe_guc_relay *relay, u32 target,
+			    const u32 *msg, u32 len, u32 *buf, u32 buf_size);
+int xe_guc_relay_process_guc2pf(struct xe_guc_relay *relay, const u32 *msg, u32 len);
+#else
+static inline int xe_guc_relay_send_to_vf(struct xe_guc_relay *relay, u32 target,
+					  const u32 *msg, u32 len, u32 *buf, u32 buf_size)
+{
+	return -ENODEV;
+}
+static inline int xe_guc_relay_process_guc2pf(struct xe_guc_relay *relay, const u32 *msg, u32 len)
+{
+	return -ENODEV;
+}
+#endif
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_guc_relay_types.h b/drivers/gpu/drm/xe/xe_guc_relay_types.h
new file mode 100644
index 00000000000000..5999fcb77e96c4
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_guc_relay_types.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_GUC_RELAY_TYPES_H_
+#define _XE_GUC_RELAY_TYPES_H_
+
+#include <linux/mempool.h>
+#include <linux/spinlock.h>
+#include <linux/workqueue.h>
+
+/**
+ * struct xe_guc_relay - Data used by the VF-PF Relay Communication over GuC.
+ */
+struct xe_guc_relay {
+	/**@lock: protects all internal data. */
+	spinlock_t lock;
+
+	/** @worker: dispatches incoming action messages. */
+	struct work_struct worker;
+
+	/** @pending_relays: list of sent requests that await a response. */
+	struct list_head pending_relays;
+
+	/** @incoming_actions: list of incoming relay action messages to process. */
+	struct list_head incoming_actions;
+
+	/** @pool: pool of the relay message buffers. */
+	mempool_t pool;
+
+	/** @last_rid: last Relay-ID used while sending a message. */
+	u32 last_rid;
+};
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
index 16de203c62a7e1..dc6059de669c3f 100644
--- a/drivers/gpu/drm/xe/xe_guc_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_types.h
@@ -15,6 +15,7 @@
 #include "xe_guc_fwif.h"
 #include "xe_guc_log_types.h"
 #include "xe_guc_pc_types.h"
+#include "xe_guc_relay_types.h"
 #include "xe_uc_fw_types.h"
 
 /**
@@ -85,6 +86,9 @@ struct xe_guc {
 		u32 size;
 	} hwconfig;
 
+	/** @relay: GuC Relay Communication used in SR-IOV */
+	struct xe_guc_relay relay;
+
 	/**
 	 * @notify_reg: Register which is written to notify GuC of H2G messages
 	 */

From 4469eae6bc52b3746b39941f90b9213bcef0255a Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 4 Jan 2024 23:20:29 +0100
Subject: [PATCH 039/200] drm/xe/kunit: Allow to replace xe_guc_ct_send_recv()
 with stub
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We want to use replacement functions in upcoming kunit tests.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-9-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 4cde93c18a2d41..8f208267ffc638 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -9,6 +9,8 @@
 #include <linux/circ_buf.h>
 #include <linux/delay.h>
 
+#include <kunit/static_stub.h>
+
 #include <drm/drm_managed.h>
 
 #include "abi/guc_actions_abi.h"
@@ -782,6 +784,7 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
 int xe_guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
 			u32 *response_buffer)
 {
+	KUNIT_STATIC_STUB_REDIRECT(xe_guc_ct_send_recv, ct, action, len, response_buffer);
 	return guc_ct_send_recv(ct, action, len, response_buffer, false);
 }
 

From 927b042a8daf2c773fd1802b388e22ca6087235c Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 4 Jan 2024 23:20:30 +0100
Subject: [PATCH 040/200] drm/xe/kunit: Add GuC Relay kunit tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add few tests to make sure that some negative and normal use
scenarios of the GuC Relay are implemented correctly.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-10-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_guc_relay_test.c | 522 +++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_relay.c            |  21 +-
 2 files changed, 540 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/tests/xe_guc_relay_test.c

diff --git a/drivers/gpu/drm/xe/tests/xe_guc_relay_test.c b/drivers/gpu/drm/xe/tests/xe_guc_relay_test.c
new file mode 100644
index 00000000000000..13701451b92351
--- /dev/null
+++ b/drivers/gpu/drm/xe/tests/xe_guc_relay_test.c
@@ -0,0 +1,522 @@
+// SPDX-License-Identifier: GPL-2.0 AND MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <kunit/static_stub.h>
+#include <kunit/test.h>
+#include <kunit/test-bug.h>
+
+#include "xe_device.h"
+#include "xe_kunit_helpers.h"
+#include "xe_pci_test.h"
+
+#define TEST_RID	1234
+#define TEST_VFID	5
+#define TEST_LEN	6
+#define TEST_ACTION	0xa
+#define TEST_DATA(n)	(0xd0 + (n))
+
+static int replacement_relay_get_totalvfs(struct xe_guc_relay *relay)
+{
+	return TEST_VFID;
+}
+
+static int relay_test_init(struct kunit *test)
+{
+	struct xe_pci_fake_data fake = {
+		.sriov_mode = XE_SRIOV_MODE_PF,
+		.platform = XE_TIGERLAKE, /* some random platform */
+		.subplatform = XE_SUBPLATFORM_NONE,
+	};
+	struct xe_guc_relay *relay;
+	struct xe_device *xe;
+
+	test->priv = &fake;
+	xe_kunit_helper_xe_device_test_init(test);
+
+	xe = test->priv;
+	KUNIT_ASSERT_EQ(test, xe_sriov_init(xe), 0);
+
+	relay = &xe_device_get_gt(xe, 0)->uc.guc.relay;
+	kunit_activate_static_stub(test, relay_get_totalvfs,
+				   replacement_relay_get_totalvfs);
+
+	KUNIT_ASSERT_EQ(test, xe_guc_relay_init(relay), 0);
+	KUNIT_EXPECT_TRUE(test, relay_is_ready(relay));
+	relay->last_rid = TEST_RID - 1;
+
+	test->priv = relay;
+	return 0;
+}
+
+static const u32 TEST_MSG[TEST_LEN] = {
+	FIELD_PREP_CONST(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+	FIELD_PREP_CONST(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
+	FIELD_PREP_CONST(GUC_HXG_EVENT_MSG_0_ACTION, TEST_ACTION) |
+	FIELD_PREP_CONST(GUC_HXG_EVENT_MSG_0_DATA0, TEST_DATA(0)),
+	TEST_DATA(1), TEST_DATA(2), TEST_DATA(3), TEST_DATA(4),
+};
+
+static int replacement_xe_guc_ct_send_recv_always_fails(struct xe_guc_ct *ct,
+							const u32 *msg, u32 len,
+							u32 *response_buffer)
+{
+	struct kunit *test = kunit_get_current_test();
+
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ct);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, msg);
+	KUNIT_ASSERT_GE(test, len, GUC_HXG_MSG_MIN_LEN);
+
+	return -ECOMM;
+}
+
+static int replacement_xe_guc_ct_send_recv_expects_pf2guc_relay(struct xe_guc_ct *ct,
+								const u32 *msg, u32 len,
+								u32 *response_buffer)
+{
+	struct kunit *test = kunit_get_current_test();
+
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ct);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, msg);
+	KUNIT_ASSERT_GE(test, len, PF2GUC_RELAY_TO_VF_REQUEST_MSG_MIN_LEN);
+	KUNIT_ASSERT_EQ(test, len, PF2GUC_RELAY_TO_VF_REQUEST_MSG_MIN_LEN + TEST_LEN);
+	KUNIT_EXPECT_EQ(test, GUC_HXG_ORIGIN_HOST, FIELD_GET(GUC_HXG_MSG_0_ORIGIN, msg[0]));
+	KUNIT_EXPECT_EQ(test, GUC_HXG_TYPE_REQUEST, FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0]));
+	KUNIT_EXPECT_EQ(test, XE_GUC_ACTION_PF2GUC_RELAY_TO_VF,
+			FIELD_GET(GUC_HXG_REQUEST_MSG_0_ACTION, msg[0]));
+	KUNIT_EXPECT_EQ(test, TEST_VFID,
+			FIELD_GET(PF2GUC_RELAY_TO_VF_REQUEST_MSG_1_VFID, msg[1]));
+	KUNIT_EXPECT_EQ(test, TEST_RID,
+			FIELD_GET(PF2GUC_RELAY_TO_VF_REQUEST_MSG_2_RELAY_ID, msg[2]));
+	KUNIT_EXPECT_MEMEQ(test, TEST_MSG, msg + PF2GUC_RELAY_TO_VF_REQUEST_MSG_MIN_LEN,
+			   sizeof(u32) * TEST_LEN);
+	return 0;
+}
+
+static const u32 test_guc2pf[GUC2PF_RELAY_FROM_VF_EVENT_MSG_MAX_LEN] = {
+	/* transport */
+	FIELD_PREP_CONST(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_GUC) |
+	FIELD_PREP_CONST(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
+	FIELD_PREP_CONST(GUC_HXG_EVENT_MSG_0_ACTION, XE_GUC_ACTION_GUC2PF_RELAY_FROM_VF),
+	FIELD_PREP_CONST(GUC2PF_RELAY_FROM_VF_EVENT_MSG_1_VFID, TEST_VFID),
+	FIELD_PREP_CONST(GUC2PF_RELAY_FROM_VF_EVENT_MSG_2_RELAY_ID, TEST_RID),
+	/* payload */
+	FIELD_PREP_CONST(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+	FIELD_PREP_CONST(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_RESPONSE_SUCCESS),
+};
+
+static const u32 test_guc2vf[GUC2VF_RELAY_FROM_PF_EVENT_MSG_MAX_LEN] = {
+	/* transport */
+	FIELD_PREP_CONST(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_GUC) |
+	FIELD_PREP_CONST(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
+	FIELD_PREP_CONST(GUC_HXG_EVENT_MSG_0_ACTION, XE_GUC_ACTION_GUC2VF_RELAY_FROM_PF),
+	FIELD_PREP_CONST(GUC2VF_RELAY_FROM_PF_EVENT_MSG_1_RELAY_ID, TEST_RID),
+	/* payload */
+	FIELD_PREP_CONST(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+	FIELD_PREP_CONST(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_RESPONSE_SUCCESS),
+};
+
+static void pf_rejects_guc2pf_too_short(struct kunit *test)
+{
+	const u32 len = GUC2PF_RELAY_FROM_VF_EVENT_MSG_MIN_LEN - 1;
+	struct xe_guc_relay *relay = test->priv;
+	const u32 *msg = test_guc2pf;
+
+	KUNIT_ASSERT_EQ(test, -EPROTO, xe_guc_relay_process_guc2pf(relay, msg, len));
+}
+
+static void pf_rejects_guc2pf_too_long(struct kunit *test)
+{
+	const u32 len = GUC2PF_RELAY_FROM_VF_EVENT_MSG_MAX_LEN + 1;
+	struct xe_guc_relay *relay = test->priv;
+	const u32 *msg = test_guc2pf;
+
+	KUNIT_ASSERT_EQ(test, -EMSGSIZE, xe_guc_relay_process_guc2pf(relay, msg, len));
+}
+
+static void pf_rejects_guc2pf_no_payload(struct kunit *test)
+{
+	const u32 len = GUC2PF_RELAY_FROM_VF_EVENT_MSG_MIN_LEN;
+	struct xe_guc_relay *relay = test->priv;
+	const u32 *msg = test_guc2pf;
+
+	KUNIT_ASSERT_EQ(test, -EPROTO, xe_guc_relay_process_guc2pf(relay, msg, len));
+}
+
+static void pf_fails_no_payload(struct kunit *test)
+{
+	struct xe_guc_relay *relay = test->priv;
+	const u32 msg = 0;
+
+	KUNIT_ASSERT_EQ(test, -EPROTO, relay_process_msg(relay, TEST_VFID, TEST_RID, &msg, 0));
+}
+
+static void pf_fails_bad_origin(struct kunit *test)
+{
+	struct xe_guc_relay *relay = test->priv;
+	static const u32 msg[] = {
+		FIELD_PREP_CONST(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_GUC) |
+		FIELD_PREP_CONST(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_RESPONSE_SUCCESS),
+	};
+	u32 len = ARRAY_SIZE(msg);
+
+	KUNIT_ASSERT_EQ(test, -EPROTO, relay_process_msg(relay, TEST_VFID, TEST_RID, msg, len));
+}
+
+static void pf_fails_bad_type(struct kunit *test)
+{
+	struct xe_guc_relay *relay = test->priv;
+	const u32 msg[] = {
+		FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		FIELD_PREP(GUC_HXG_MSG_0_TYPE, 4), /* only 4 is undefined */
+	};
+	u32 len = ARRAY_SIZE(msg);
+
+	KUNIT_ASSERT_EQ(test, -EBADRQC, relay_process_msg(relay, TEST_VFID, TEST_RID, msg, len));
+}
+
+static void pf_txn_reports_error(struct kunit *test)
+{
+	struct xe_guc_relay *relay = test->priv;
+	struct relay_transaction *txn;
+
+	txn = __relay_get_transaction(relay, false, TEST_VFID, TEST_RID,
+				      TEST_MSG, TEST_LEN, NULL, 0);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, txn);
+
+	kunit_activate_static_stub(test, xe_guc_ct_send_recv,
+				   replacement_xe_guc_ct_send_recv_always_fails);
+	KUNIT_EXPECT_EQ(test, -ECOMM, relay_send_transaction(relay, txn));
+
+	relay_release_transaction(relay, txn);
+}
+
+static void pf_txn_sends_pf2guc(struct kunit *test)
+{
+	struct xe_guc_relay *relay = test->priv;
+	struct relay_transaction *txn;
+
+	txn = __relay_get_transaction(relay, false, TEST_VFID, TEST_RID,
+				      TEST_MSG, TEST_LEN, NULL, 0);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, txn);
+
+	kunit_activate_static_stub(test, xe_guc_ct_send_recv,
+				   replacement_xe_guc_ct_send_recv_expects_pf2guc_relay);
+	KUNIT_ASSERT_EQ(test, 0, relay_send_transaction(relay, txn));
+
+	relay_release_transaction(relay, txn);
+}
+
+static void pf_sends_pf2guc(struct kunit *test)
+{
+	struct xe_guc_relay *relay = test->priv;
+
+	kunit_activate_static_stub(test, xe_guc_ct_send_recv,
+				   replacement_xe_guc_ct_send_recv_expects_pf2guc_relay);
+	KUNIT_ASSERT_EQ(test, 0,
+			xe_guc_relay_send_to_vf(relay, TEST_VFID,
+						TEST_MSG, TEST_LEN, NULL, 0));
+}
+
+static int replacement_xe_guc_ct_send_recv_loopback_relay(struct xe_guc_ct *ct,
+							  const u32 *msg, u32 len,
+							  u32 *response_buffer)
+{
+	struct kunit *test = kunit_get_current_test();
+	struct xe_guc_relay *relay = test->priv;
+	u32 *reply = kunit_kzalloc(test, len * sizeof(u32), GFP_KERNEL);
+	int (*guc2relay)(struct xe_guc_relay *, const u32 *, u32);
+	u32 action;
+	int err;
+
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ct);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, msg);
+	KUNIT_ASSERT_GE(test, len, GUC_HXG_MSG_MIN_LEN);
+	KUNIT_ASSERT_EQ(test, GUC_HXG_TYPE_REQUEST,
+			FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0]));
+	KUNIT_ASSERT_GE(test, len, GUC_HXG_REQUEST_MSG_MIN_LEN);
+	KUNIT_ASSERT_NOT_NULL(test, reply);
+
+	switch (FIELD_GET(GUC_HXG_REQUEST_MSG_0_ACTION, msg[0])) {
+	case XE_GUC_ACTION_PF2GUC_RELAY_TO_VF:
+		KUNIT_ASSERT_GE(test, len, PF2GUC_RELAY_TO_VF_REQUEST_MSG_MIN_LEN);
+		action = XE_GUC_ACTION_GUC2PF_RELAY_FROM_VF;
+		guc2relay = xe_guc_relay_process_guc2pf;
+		break;
+	case XE_GUC_ACTION_VF2GUC_RELAY_TO_PF:
+		KUNIT_ASSERT_GE(test, len, VF2GUC_RELAY_TO_PF_REQUEST_MSG_MIN_LEN);
+		action = XE_GUC_ACTION_GUC2VF_RELAY_FROM_PF;
+		guc2relay = xe_guc_relay_process_guc2vf;
+		break;
+	default:
+		KUNIT_FAIL(test, "bad RELAY action %#x", msg[0]);
+		return -EINVAL;
+	}
+
+	reply[0] = FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_GUC) |
+		   FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
+		   FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION, action);
+	memcpy(reply + 1, msg + 1, sizeof(u32) * (len - 1));
+
+	err = guc2relay(relay, reply, len);
+	KUNIT_EXPECT_EQ(test, err, 0);
+
+	return err;
+}
+
+static void test_requires_relay_testloop(struct kunit *test)
+{
+	/*
+	 * The debug relay action GUC_RELAY_ACTION_VFXPF_TESTLOOP is available
+	 * only on builds with CONFIG_DRM_XE_DEBUG_SRIOV enabled.
+	 * See "kunit.py --kconfig_add" option if it's missing.
+	 */
+	if (!IS_ENABLED(CONFIG_DRM_XE_DEBUG_SRIOV))
+		kunit_skip(test, "requires %s\n", __stringify(CONFIG_DRM_XE_DEBUG_SRIOV));
+}
+
+static void pf_loopback_nop(struct kunit *test)
+{
+	struct xe_guc_relay *relay = test->priv;
+	u32 request[] = {
+		FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_RELAY_ACTION_VFXPF_TESTLOOP) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_DATA0, VFXPF_TESTLOOP_OPCODE_NOP),
+	};
+	u32 response[GUC_HXG_RESPONSE_MSG_MIN_LEN];
+	int ret;
+
+	test_requires_relay_testloop(test);
+
+	kunit_activate_static_stub(test, relay_kick_worker, relay_process_incoming_action);
+	kunit_activate_static_stub(test, xe_guc_ct_send_recv,
+				   replacement_xe_guc_ct_send_recv_loopback_relay);
+	ret = xe_guc_relay_send_to_vf(relay, TEST_VFID,
+				      request, ARRAY_SIZE(request),
+				      response, ARRAY_SIZE(response));
+	KUNIT_ASSERT_EQ(test, ret, GUC_HXG_RESPONSE_MSG_MIN_LEN);
+	KUNIT_EXPECT_EQ(test, FIELD_GET(GUC_HXG_MSG_0_ORIGIN, response[0]),
+			GUC_HXG_ORIGIN_HOST);
+	KUNIT_EXPECT_EQ(test, FIELD_GET(GUC_HXG_MSG_0_TYPE, response[0]),
+			GUC_HXG_TYPE_RESPONSE_SUCCESS);
+	KUNIT_EXPECT_EQ(test, FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, response[0]), 0);
+}
+
+static void pf_loopback_echo(struct kunit *test)
+{
+	struct xe_guc_relay *relay = test->priv;
+	u32 request[] = {
+		FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_RELAY_ACTION_VFXPF_TESTLOOP) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_DATA0, VFXPF_TESTLOOP_OPCODE_ECHO),
+		TEST_DATA(1), TEST_DATA(2), TEST_DATA(3), TEST_DATA(4),
+	};
+	u32 response[ARRAY_SIZE(request)];
+	unsigned int n;
+	int ret;
+
+	test_requires_relay_testloop(test);
+
+	kunit_activate_static_stub(test, relay_kick_worker, relay_process_incoming_action);
+	kunit_activate_static_stub(test, xe_guc_ct_send_recv,
+				   replacement_xe_guc_ct_send_recv_loopback_relay);
+	ret = xe_guc_relay_send_to_vf(relay, TEST_VFID,
+				      request, ARRAY_SIZE(request),
+				      response, ARRAY_SIZE(response));
+	KUNIT_ASSERT_EQ(test, ret, ARRAY_SIZE(response));
+	KUNIT_EXPECT_EQ(test, FIELD_GET(GUC_HXG_MSG_0_ORIGIN, response[0]),
+			GUC_HXG_ORIGIN_HOST);
+	KUNIT_EXPECT_EQ(test, FIELD_GET(GUC_HXG_MSG_0_TYPE, response[0]),
+			GUC_HXG_TYPE_RESPONSE_SUCCESS);
+	KUNIT_EXPECT_EQ(test, FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, response[0]),
+			ARRAY_SIZE(response));
+	for (n = GUC_HXG_RESPONSE_MSG_MIN_LEN; n < ret; n++)
+		KUNIT_EXPECT_EQ(test, request[n], response[n]);
+}
+
+static void pf_loopback_fail(struct kunit *test)
+{
+	struct xe_guc_relay *relay = test->priv;
+	u32 request[] = {
+		FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_RELAY_ACTION_VFXPF_TESTLOOP) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_DATA0, VFXPF_TESTLOOP_OPCODE_FAIL),
+	};
+	u32 response[GUC_HXG_RESPONSE_MSG_MIN_LEN];
+	int ret;
+
+	test_requires_relay_testloop(test);
+
+	kunit_activate_static_stub(test, relay_kick_worker, relay_process_incoming_action);
+	kunit_activate_static_stub(test, xe_guc_ct_send_recv,
+				   replacement_xe_guc_ct_send_recv_loopback_relay);
+	ret = xe_guc_relay_send_to_vf(relay, TEST_VFID,
+				      request, ARRAY_SIZE(request),
+				      response, ARRAY_SIZE(response));
+	KUNIT_ASSERT_EQ(test, ret, -EREMOTEIO);
+}
+
+static void pf_loopback_busy(struct kunit *test)
+{
+	struct xe_guc_relay *relay = test->priv;
+	u32 request[] = {
+		FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_RELAY_ACTION_VFXPF_TESTLOOP) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_DATA0, VFXPF_TESTLOOP_OPCODE_BUSY),
+		TEST_DATA(0xb),
+	};
+	u32 response[GUC_HXG_RESPONSE_MSG_MIN_LEN];
+	int ret;
+
+	test_requires_relay_testloop(test);
+
+	kunit_activate_static_stub(test, relay_testonly_nop, relay_process_incoming_action);
+	kunit_activate_static_stub(test, relay_kick_worker, relay_process_incoming_action);
+	kunit_activate_static_stub(test, xe_guc_ct_send_recv,
+				   replacement_xe_guc_ct_send_recv_loopback_relay);
+	ret = xe_guc_relay_send_to_vf(relay, TEST_VFID,
+				      request, ARRAY_SIZE(request),
+				      response, ARRAY_SIZE(response));
+	KUNIT_ASSERT_EQ(test, ret, GUC_HXG_RESPONSE_MSG_MIN_LEN);
+}
+
+static void pf_loopback_retry(struct kunit *test)
+{
+	struct xe_guc_relay *relay = test->priv;
+	u32 request[] = {
+		FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
+		FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_RELAY_ACTION_VFXPF_TESTLOOP) |
+		FIELD_PREP(GUC_HXG_REQUEST_MSG_0_DATA0, VFXPF_TESTLOOP_OPCODE_RETRY),
+		TEST_DATA(0xd), TEST_DATA(0xd),
+	};
+	u32 response[GUC_HXG_RESPONSE_MSG_MIN_LEN];
+	int ret;
+
+	test_requires_relay_testloop(test);
+
+	kunit_activate_static_stub(test, relay_kick_worker, relay_process_incoming_action);
+	kunit_activate_static_stub(test, xe_guc_ct_send_recv,
+				   replacement_xe_guc_ct_send_recv_loopback_relay);
+	ret = xe_guc_relay_send_to_vf(relay, TEST_VFID,
+				      request, ARRAY_SIZE(request),
+				      response, ARRAY_SIZE(response));
+	KUNIT_ASSERT_EQ(test, ret, GUC_HXG_RESPONSE_MSG_MIN_LEN);
+}
+
+static struct kunit_case pf_relay_test_cases[] = {
+	KUNIT_CASE(pf_rejects_guc2pf_too_short),
+	KUNIT_CASE(pf_rejects_guc2pf_too_long),
+	KUNIT_CASE(pf_rejects_guc2pf_no_payload),
+	KUNIT_CASE(pf_fails_no_payload),
+	KUNIT_CASE(pf_fails_bad_origin),
+	KUNIT_CASE(pf_fails_bad_type),
+	KUNIT_CASE(pf_txn_reports_error),
+	KUNIT_CASE(pf_txn_sends_pf2guc),
+	KUNIT_CASE(pf_sends_pf2guc),
+	KUNIT_CASE(pf_loopback_nop),
+	KUNIT_CASE(pf_loopback_echo),
+	KUNIT_CASE(pf_loopback_fail),
+	KUNIT_CASE_SLOW(pf_loopback_busy),
+	KUNIT_CASE_SLOW(pf_loopback_retry),
+	{}
+};
+
+static struct kunit_suite pf_relay_suite = {
+	.name = "pf_relay",
+	.test_cases = pf_relay_test_cases,
+	.init = relay_test_init,
+};
+
+static void vf_rejects_guc2vf_too_short(struct kunit *test)
+{
+	const u32 len = GUC2VF_RELAY_FROM_PF_EVENT_MSG_MIN_LEN - 1;
+	struct xe_guc_relay *relay = test->priv;
+	const u32 *msg = test_guc2vf;
+
+	KUNIT_ASSERT_EQ(test, -EPROTO, xe_guc_relay_process_guc2vf(relay, msg, len));
+}
+
+static void vf_rejects_guc2vf_too_long(struct kunit *test)
+{
+	const u32 len = GUC2VF_RELAY_FROM_PF_EVENT_MSG_MAX_LEN + 1;
+	struct xe_guc_relay *relay = test->priv;
+	const u32 *msg = test_guc2vf;
+
+	KUNIT_ASSERT_EQ(test, -EMSGSIZE, xe_guc_relay_process_guc2vf(relay, msg, len));
+}
+
+static void vf_rejects_guc2vf_no_payload(struct kunit *test)
+{
+	const u32 len = GUC2VF_RELAY_FROM_PF_EVENT_MSG_MIN_LEN;
+	struct xe_guc_relay *relay = test->priv;
+	const u32 *msg = test_guc2vf;
+
+	KUNIT_ASSERT_EQ(test, -EPROTO, xe_guc_relay_process_guc2vf(relay, msg, len));
+}
+
+static struct kunit_case vf_relay_test_cases[] = {
+	KUNIT_CASE(vf_rejects_guc2vf_too_short),
+	KUNIT_CASE(vf_rejects_guc2vf_too_long),
+	KUNIT_CASE(vf_rejects_guc2vf_no_payload),
+	{}
+};
+
+static struct kunit_suite vf_relay_suite = {
+	.name = "vf_relay",
+	.test_cases = vf_relay_test_cases,
+	.init = relay_test_init,
+};
+
+static void xe_drops_guc2pf_if_not_ready(struct kunit *test)
+{
+	struct xe_device *xe = test->priv;
+	struct xe_guc_relay *relay = &xe_device_get_gt(xe, 0)->uc.guc.relay;
+	const u32 *msg = test_guc2pf;
+	u32 len = GUC2PF_RELAY_FROM_VF_EVENT_MSG_MIN_LEN + GUC_RELAY_MSG_MIN_LEN;
+
+	KUNIT_ASSERT_EQ(test, -ENODEV, xe_guc_relay_process_guc2pf(relay, msg, len));
+}
+
+static void xe_drops_guc2vf_if_not_ready(struct kunit *test)
+{
+	struct xe_device *xe = test->priv;
+	struct xe_guc_relay *relay = &xe_device_get_gt(xe, 0)->uc.guc.relay;
+	const u32 *msg = test_guc2vf;
+	u32 len = GUC2VF_RELAY_FROM_PF_EVENT_MSG_MIN_LEN + GUC_RELAY_MSG_MIN_LEN;
+
+	KUNIT_ASSERT_EQ(test, -ENODEV, xe_guc_relay_process_guc2vf(relay, msg, len));
+}
+
+static void xe_rejects_send_if_not_ready(struct kunit *test)
+{
+	struct xe_device *xe = test->priv;
+	struct xe_guc_relay *relay = &xe_device_get_gt(xe, 0)->uc.guc.relay;
+	u32 msg[GUC_RELAY_MSG_MIN_LEN];
+	u32 len = ARRAY_SIZE(msg);
+
+	KUNIT_ASSERT_EQ(test, -ENODEV, xe_guc_relay_send_to_pf(relay, msg, len, NULL, 0));
+	KUNIT_ASSERT_EQ(test, -ENODEV, relay_send_to(relay, TEST_VFID, msg, len, NULL, 0));
+}
+
+static struct kunit_case no_relay_test_cases[] = {
+	KUNIT_CASE(xe_drops_guc2pf_if_not_ready),
+	KUNIT_CASE(xe_drops_guc2vf_if_not_ready),
+	KUNIT_CASE(xe_rejects_send_if_not_ready),
+	{}
+};
+
+static struct kunit_suite no_relay_suite = {
+	.name = "no_relay",
+	.test_cases = no_relay_test_cases,
+	.init = xe_kunit_helper_xe_device_test_init,
+};
+
+kunit_test_suites(&no_relay_suite,
+		  &pf_relay_suite,
+		  &vf_relay_suite);
diff --git a/drivers/gpu/drm/xe/xe_guc_relay.c b/drivers/gpu/drm/xe/xe_guc_relay.c
index 0e738325640822..b772088979bddc 100644
--- a/drivers/gpu/drm/xe/xe_guc_relay.c
+++ b/drivers/gpu/drm/xe/xe_guc_relay.c
@@ -8,6 +8,8 @@
 
 #include <drm/drm_managed.h>
 
+#include <kunit/static_stub.h>
+
 #include "abi/guc_actions_sriov_abi.h"
 #include "abi/guc_relay_actions_abi.h"
 #include "abi/guc_relay_communication_abi.h"
@@ -60,6 +62,7 @@ static int relay_get_totalvfs(struct xe_guc_relay *relay)
 	struct xe_device *xe = relay_to_xe(relay);
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
 
+	KUNIT_STATIC_STUB_REDIRECT(relay_get_totalvfs, relay);
 	return IS_SRIOV_VF(xe) ? 0 : pci_sriov_get_totalvfs(pdev);
 }
 
@@ -392,6 +395,11 @@ static u32 prepare_error_reply(u32 *msg, u32 error, u32 hint)
 	return GUC_HXG_FAILURE_MSG_LEN;
 }
 
+static void relay_testonly_nop(struct xe_guc_relay *relay)
+{
+	KUNIT_STATIC_STUB_REDIRECT(relay_testonly_nop, relay);
+}
+
 static int relay_send_message_and_wait(struct xe_guc_relay *relay,
 				       struct relay_transaction *txn,
 				       u32 *buf, u32 buf_size)
@@ -434,8 +442,10 @@ static int relay_send_message_and_wait(struct xe_guc_relay *relay,
 		reinit_completion(&txn->done);
 		if (txn->reply == -EAGAIN)
 			goto resend;
-		if (txn->reply == -EBUSY)
+		if (txn->reply == -EBUSY) {
+			relay_testonly_nop(relay);
 			goto wait;
+		}
 		if (txn->reply > 0)
 			ret = from_relay_error(txn->reply);
 		else
@@ -751,6 +761,7 @@ static bool relay_needs_worker(struct xe_guc_relay *relay)
 
 static void relay_kick_worker(struct xe_guc_relay *relay)
 {
+	KUNIT_STATIC_STUB_REDIRECT(relay_kick_worker, relay);
 	queue_work(relay_to_xe(relay)->sriov.wq, &relay->worker);
 }
 
@@ -849,7 +860,7 @@ int xe_guc_relay_process_guc2vf(struct xe_guc_relay *relay, const u32 *msg, u32
 	relay_assert(relay, FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, msg[0]) ==
 		     XE_GUC_ACTION_GUC2VF_RELAY_FROM_PF);
 
-	if (unlikely(!IS_SRIOV_VF(relay_to_xe(relay))))
+	if (unlikely(!IS_SRIOV_VF(relay_to_xe(relay)) && !kunit_get_current_test()))
 		return -EPERM;
 
 	if (unlikely(!relay_is_ready(relay)))
@@ -895,7 +906,7 @@ int xe_guc_relay_process_guc2pf(struct xe_guc_relay *relay, const u32 *msg, u32
 	relay_assert(relay, FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, msg[0]) ==
 		     XE_GUC_ACTION_GUC2PF_RELAY_FROM_VF);
 
-	if (unlikely(!IS_SRIOV_PF(relay_to_xe(relay))))
+	if (unlikely(!IS_SRIOV_PF(relay_to_xe(relay)) && !kunit_get_current_test()))
 		return -EPERM;
 
 	if (unlikely(!relay_is_ready(relay)))
@@ -923,3 +934,7 @@ int xe_guc_relay_process_guc2pf(struct xe_guc_relay *relay, const u32 *msg, u32
 	return err;
 }
 #endif
+
+#if IS_BUILTIN(CONFIG_DRM_XE_KUNIT_TEST)
+#include "tests/xe_guc_relay_test.c"
+#endif

From 26d4481ac23fe16bb7d64d2b43db250bcb65003e Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 4 Jan 2024 23:20:31 +0100
Subject: [PATCH 041/200] drm/xe/guc: Start handling GuC Relay event messages
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

GuC Relay infrastructure is ready, start handling relay messages
from the GuC to unblock testing on the live system.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-11-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 8f208267ffc638..c29f095aa1b98a 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -14,6 +14,7 @@
 #include <drm/drm_managed.h>
 
 #include "abi/guc_actions_abi.h"
+#include "abi/guc_actions_sriov_abi.h"
 #include "abi/guc_klvs_abi.h"
 #include "xe_bo.h"
 #include "xe_device.h"
@@ -22,6 +23,7 @@
 #include "xe_gt_printk.h"
 #include "xe_gt_tlb_invalidation.h"
 #include "xe_guc.h"
+#include "xe_guc_relay.h"
 #include "xe_guc_submit.h"
 #include "xe_map.h"
 #include "xe_pm.h"
@@ -969,6 +971,12 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
 		ret = xe_guc_access_counter_notify_handler(guc, payload,
 							   adj_len);
 		break;
+	case XE_GUC_ACTION_GUC2PF_RELAY_FROM_VF:
+		ret = xe_guc_relay_process_guc2pf(&guc->relay, payload, adj_len);
+		break;
+	case XE_GUC_ACTION_GUC2VF_RELAY_FROM_PF:
+		ret = xe_guc_relay_process_guc2vf(&guc->relay, payload, adj_len);
+		break;
 	default:
 		drm_err(&xe->drm, "unexpected action 0x%04x\n", action);
 	}

From 0d68d06553ee9ad6d4ffc000599765211cad4930 Mon Sep 17 00:00:00 2001
From: Ruthuvikas Ravikumar <ruthuvikas.ravikumar@intel.com>
Date: Sat, 23 Dec 2023 00:53:52 +0530
Subject: [PATCH 042/200] drm/xe: Add mocs reset kunit

This kunit verifies the mocs registers content
with the KMD programmed values before and after
GT reset.

v2: Remove extra blank lines between the local variables
definitions (Matt Roper)

Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Ruthuvikas Ravikumar <ruthuvikas.ravikumar@intel.com>
Signed-off-by: Janga Rahul Kumar <janga.rahul.kumar@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231222192352.927101-1-janga.rahul.kumar@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_mocs.c      | 36 +++++++++++++++++++++++++
 drivers/gpu/drm/xe/tests/xe_mocs_test.c |  1 +
 drivers/gpu/drm/xe/tests/xe_mocs_test.h |  1 +
 3 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/xe/tests/xe_mocs.c b/drivers/gpu/drm/xe/tests/xe_mocs.c
index 7dd34f94e8094c..df5c36b70ab45b 100644
--- a/drivers/gpu/drm/xe/tests/xe_mocs.c
+++ b/drivers/gpu/drm/xe/tests/xe_mocs.c
@@ -128,3 +128,39 @@ void xe_live_mocs_kernel_kunit(struct kunit *test)
 	xe_call_for_each_device(mocs_kernel_test_run_device);
 }
 EXPORT_SYMBOL_IF_KUNIT(xe_live_mocs_kernel_kunit);
+
+static int mocs_reset_test_run_device(struct xe_device *xe)
+{
+	/* Check the mocs setup is retained over GT reset */
+
+	struct live_mocs mocs;
+	struct xe_gt *gt;
+	unsigned int flags;
+	int id;
+	struct kunit *test = xe_cur_kunit();
+
+	for_each_gt(gt, xe, id) {
+		flags = live_mocs_init(&mocs, gt);
+		kunit_info(test, "mocs_reset_test before reset\n");
+		if (flags & HAS_GLOBAL_MOCS)
+			read_mocs_table(gt, &mocs.table);
+		if (flags & HAS_LNCF_MOCS)
+			read_l3cc_table(gt, &mocs.table);
+
+		xe_gt_reset_async(gt);
+		flush_work(&gt->reset.worker);
+
+		kunit_info(test, "mocs_reset_test after reset\n");
+		if (flags & HAS_GLOBAL_MOCS)
+			read_mocs_table(gt, &mocs.table);
+		if (flags & HAS_LNCF_MOCS)
+			read_l3cc_table(gt, &mocs.table);
+	}
+	return 0;
+}
+
+void xe_live_mocs_reset_kunit(struct kunit *test)
+{
+	xe_call_for_each_device(mocs_reset_test_run_device);
+}
+EXPORT_SYMBOL_IF_KUNIT(xe_live_mocs_reset_kunit);
diff --git a/drivers/gpu/drm/xe/tests/xe_mocs_test.c b/drivers/gpu/drm/xe/tests/xe_mocs_test.c
index ef56bd517b28c2..4f62e7a4270bdb 100644
--- a/drivers/gpu/drm/xe/tests/xe_mocs_test.c
+++ b/drivers/gpu/drm/xe/tests/xe_mocs_test.c
@@ -9,6 +9,7 @@
 
 static struct kunit_case xe_mocs_tests[] = {
 	KUNIT_CASE(xe_live_mocs_kernel_kunit),
+	KUNIT_CASE(xe_live_mocs_reset_kunit),
 	{}
 };
 
diff --git a/drivers/gpu/drm/xe/tests/xe_mocs_test.h b/drivers/gpu/drm/xe/tests/xe_mocs_test.h
index 7faa3575e6c3a9..e7699d4954115a 100644
--- a/drivers/gpu/drm/xe/tests/xe_mocs_test.h
+++ b/drivers/gpu/drm/xe/tests/xe_mocs_test.h
@@ -9,5 +9,6 @@
 struct kunit;
 
 void xe_live_mocs_kernel_kunit(struct kunit *test);
+void xe_live_mocs_reset_kunit(struct kunit *test);
 
 #endif

From 2b35ae108c7f5486adbc9e70377110ab8c91f61b Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Fri, 5 Jan 2024 18:19:47 +0100
Subject: [PATCH 043/200] drm/xe: Fix compilation without CONFIG_KUNIT
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

It looks that declaration of kunit_get_current_test() was
available in xe_guc_relay.c only with CONFIG_KUNIT enabled.
If CONFIG_KUNIT is disabled we fail with:

In file included from ../include/linux/build_bug.h:5,
                 from ../include/linux/bitfield.h:10,
                 from ../drivers/gpu/drm/xe/xe_guc_relay.c:6:
../drivers/gpu/drm/xe/xe_guc_relay.c: In function ‘xe_guc_relay_process_guc2vf’:
../drivers/gpu/drm/xe/xe_guc_relay.c:863:52: error: implicit declaration of function ‘kunit_get_current_test’ [-Werror=implicit-function-declaration]
  863 |  if (unlikely(!IS_SRIOV_VF(relay_to_xe(relay)) && !kunit_get_current_test()))
      |                                                    ^~~~~~~~~~~~~~~~~~~~~~
../include/linux/compiler.h:77:42: note: in definition of macro ‘unlikely’
   77 | # define unlikely(x) __builtin_expect(!!(x), 0)
      |                                          ^

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Fixes: 811fe9f556fc ("drm/xe/guc: Introduce Relay Communication for SR-IOV")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20240105171947.321-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_relay.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_relay.c b/drivers/gpu/drm/xe/xe_guc_relay.c
index b772088979bddc..c0a2d8d5d3b3cd 100644
--- a/drivers/gpu/drm/xe/xe_guc_relay.c
+++ b/drivers/gpu/drm/xe/xe_guc_relay.c
@@ -9,6 +9,7 @@
 #include <drm/drm_managed.h>
 
 #include <kunit/static_stub.h>
+#include <kunit/test-bug.h>
 
 #include "abi/guc_actions_sriov_abi.h"
 #include "abi/guc_relay_actions_abi.h"

From 264ed178781c05f87735e2712a34c4ab35b0c91c Mon Sep 17 00:00:00 2001
From: Colin Ian King <colin.i.king@gmail.com>
Date: Tue, 2 Jan 2024 09:20:14 +0000
Subject: [PATCH 044/200] drm/xe: Fix spelling mistake "gueue" -> "queue"

There is a spelling mistake in a drm_info message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20240102092014.3347566-1-colin.i.king@gmail.com
---
 drivers/gpu/drm/xe/xe_wait_user_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_wait_user_fence.c b/drivers/gpu/drm/xe/xe_wait_user_fence.c
index a75eeba7bfe583..f69721339201d6 100644
--- a/drivers/gpu/drm/xe/xe_wait_user_fence.c
+++ b/drivers/gpu/drm/xe/xe_wait_user_fence.c
@@ -148,7 +148,7 @@ int xe_wait_user_fence_ioctl(struct drm_device *dev, void *data,
 
 		if (q) {
 			if (q->ops->reset_status(q)) {
-				drm_info(&xe->drm, "exec gueue reset detected\n");
+				drm_info(&xe->drm, "exec queue reset detected\n");
 				err = -EIO;
 				break;
 			}

From be8755a0a81866bbf89bf3fb03ae180978b5a91f Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 21 Dec 2023 08:32:13 -0800
Subject: [PATCH 045/200] drm/xe/kunit: Drop xe_wa tests for pre-production DG2

As workarounds for pre-production DG2 were dropped in
commit 707d1b992cfe ("drm/xe/dg2: Drop pre-production workarounds"),
there's no point running the kunit tests for them. Drop
those steppings from kunit.

Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231221163213.3849523-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_wa_test.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_wa_test.c b/drivers/gpu/drm/xe/tests/xe_wa_test.c
index 3d7f3ef62ae937..5cfd5844c17b3d 100644
--- a/drivers/gpu/drm/xe/tests/xe_wa_test.c
+++ b/drivers/gpu/drm/xe/tests/xe_wa_test.c
@@ -66,14 +66,8 @@ static const struct platform_test_case cases[] = {
 	PLATFORM_CASE(ALDERLAKE_P, C0),
 	SUBPLATFORM_CASE(ALDERLAKE_S, RPLS, D0),
 	SUBPLATFORM_CASE(ALDERLAKE_P, RPLU, E0),
-	SUBPLATFORM_CASE(DG2, G10, A0),
-	SUBPLATFORM_CASE(DG2, G10, A1),
-	SUBPLATFORM_CASE(DG2, G10, B0),
 	SUBPLATFORM_CASE(DG2, G10, C0),
-	SUBPLATFORM_CASE(DG2, G11, A0),
-	SUBPLATFORM_CASE(DG2, G11, B0),
 	SUBPLATFORM_CASE(DG2, G11, B1),
-	SUBPLATFORM_CASE(DG2, G12, A0),
 	SUBPLATFORM_CASE(DG2, G12, A1),
 	PLATFORM_CASE(PVC, B0),
 	PLATFORM_CASE(PVC, B1),

From ddb5bade29de7a3e1e1ce42df33f4a98f8a9f323 Mon Sep 17 00:00:00 2001
From: Nirmoy Das <nirmoy.das@intel.com>
Date: Thu, 4 Jan 2024 19:26:15 +0100
Subject: [PATCH 046/200] drm/xe/xe2: synchronise CS_CHICKEN1 with WMTP support

Recommendation is to read FUSE4 register to check if WMTP has been
enabled/disabled by HW. If enabled we don't need to do anything special,
however if disabled recommendation is to also disable the WMTP mode in
the FF_SLICE_CS_CHICKEN2 register, falling back to thread-group and
mid-batch preemption only. However on Linux, the per-context CS_CHICKEN1
is how userspace controls pre-emption, so instead use the default lrc to
disable WMTP using CS_CHICKEN1, if disabled by HW. Userspace is still
free to set CS_CHICKEN1 to whatever they want later.

v2: remove redundant version check and also add descriptive name(Matt)
v3: remove usage of REG_FIELD_GET(Matt)

Cc: Matt Roper <matthew.d.roper@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20240104182615.21327-1-nirmoy.das@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_gt_regs.h |  1 +
 drivers/gpu/drm/xe/xe_hw_engine.c    | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index 6aaaf1f63c728a..6dfad86aaea647 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -146,6 +146,7 @@
 
 /* Fuse readout registers for GT */
 #define XEHP_FUSE4				XE_REG(0x9114)
+#define   CFEG_WMTP_DISABLE			REG_BIT(20)
 #define   CCS_EN_MASK				REG_GENMASK(19, 16)
 #define   GT_L3_EXC_MASK			REG_GENMASK(6, 4)
 
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index 832989c83a2534..e279ef6c527cd7 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -316,6 +316,19 @@ static bool xe_hw_engine_match_fixed_cslice_mode(const struct xe_gt *gt,
 	       xe_rtp_match_first_render_or_compute(gt, hwe);
 }
 
+static bool xe_rtp_cfeg_wmtp_disabled(const struct xe_gt *gt,
+				      const struct xe_hw_engine *hwe)
+{
+	if (GRAPHICS_VER(gt_to_xe(gt)) < 20)
+		return false;
+
+	if (hwe->class != XE_ENGINE_CLASS_COMPUTE &&
+	    hwe->class != XE_ENGINE_CLASS_RENDER)
+		return false;
+
+	return xe_mmio_read32(hwe->gt, XEHP_FUSE4) & CFEG_WMTP_DISABLE;
+}
+
 void
 xe_hw_engine_setup_default_lrc_state(struct xe_hw_engine *hwe)
 {
@@ -346,6 +359,14 @@ xe_hw_engine_setup_default_lrc_state(struct xe_hw_engine *hwe)
 		  XE_RTP_ACTIONS(FIELD_SET(RCU_MODE, RCU_MODE_FIXED_SLICE_CCS_MODE,
 					   RCU_MODE_FIXED_SLICE_CCS_MODE))
 		},
+		/* Disable WMTP if HW doesn't support it */
+		{ XE_RTP_NAME("DISABLE_WMTP_ON_UNSUPPORTED_HW"),
+		  XE_RTP_RULES(FUNC(xe_rtp_cfeg_wmtp_disabled)),
+		  XE_RTP_ACTIONS(FIELD_SET(CS_CHICKEN1(0),
+					   PREEMPT_GPGPU_LEVEL_MASK,
+					   PREEMPT_GPGPU_THREAD_GROUP_LEVEL)),
+		  XE_RTP_ENTRY_FLAG(FOREACH_ENGINE)
+		},
 		{}
 	};
 

From fa78e188d8d1df850eb232a2631012093aeeb0e0 Mon Sep 17 00:00:00 2001
From: Badal Nilawar <badal.nilawar@intel.com>
Date: Thu, 4 Jan 2024 18:37:02 +0530
Subject: [PATCH 047/200] drm/xe/dgfx: Release mmap mappings on rpm suspend
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Release all mmap mappings for all vram objects which are associated
with userfault such that, while pcie function in D3hot, any access
to memory mappings will raise a userfault.

Upon userfault, in order to access memory mappings, if graphics
function is in D3 then runtime resume of dgpu will be triggered to
transition to D0.

v2:
  - Avoid iomem check before bo migration check as bo can migrate
    to system memory (Matthew Auld)
v3:
  - Delete bo userfault link during bo destroy
  - Upon bo move (vram-smem), do bo userfault link deletion in
    xe_bo_move_notify instead of xe_bo_move (Thomas Hellström)
  - Grab lock in rpm hook while deleting bo userfault link (Matthew Auld)
v4:
  - Add kernel doc and wrap vram_userfault related
    stuff in the structure (Matthew Auld)
  - Get rpm wakeref before taking dma reserve lock (Matthew Auld)
  - In suspend path apply lock for entire list op
    including list iteration (Matthew Auld)
v5:
  - Use mutex lock instead of spin lock
v6:
  - Fix review comments (Matthew Auld)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #For the xe_bo_move_notify() changes
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20240104130702.950078-1-badal.nilawar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c           | 56 ++++++++++++++++++++++++++--
 drivers/gpu/drm/xe/xe_bo.h           |  2 +
 drivers/gpu/drm/xe/xe_bo_types.h     |  3 ++
 drivers/gpu/drm/xe/xe_device_types.h | 16 ++++++++
 drivers/gpu/drm/xe/xe_pci.c          |  2 +
 drivers/gpu/drm/xe/xe_pm.c           | 17 +++++++++
 drivers/gpu/drm/xe/xe_pm.h           |  1 +
 7 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 8e4a3b1f6b938e..2e4d2157179c89 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -586,6 +586,8 @@ static int xe_bo_move_notify(struct xe_bo *bo,
 {
 	struct ttm_buffer_object *ttm_bo = &bo->ttm;
 	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
+	struct ttm_resource *old_mem = ttm_bo->resource;
+	u32 old_mem_type = old_mem ? old_mem->mem_type : XE_PL_SYSTEM;
 	int ret;
 
 	/*
@@ -605,6 +607,18 @@ static int xe_bo_move_notify(struct xe_bo *bo,
 	if (ttm_bo->base.dma_buf && !ttm_bo->base.import_attach)
 		dma_buf_move_notify(ttm_bo->base.dma_buf);
 
+	/*
+	 * TTM has already nuked the mmap for us (see ttm_bo_unmap_virtual),
+	 * so if we moved from VRAM make sure to unlink this from the userfault
+	 * tracking.
+	 */
+	if (mem_type_is_vram(old_mem_type)) {
+		mutex_lock(&xe->mem_access.vram_userfault.lock);
+		if (!list_empty(&bo->vram_userfault_link))
+			list_del_init(&bo->vram_userfault_link);
+		mutex_unlock(&xe->mem_access.vram_userfault.lock);
+	}
+
 	return 0;
 }
 
@@ -1063,6 +1077,11 @@ static void xe_ttm_bo_destroy(struct ttm_buffer_object *ttm_bo)
 	if (bo->vm && xe_bo_is_user(bo))
 		xe_vm_put(bo->vm);
 
+	mutex_lock(&xe->mem_access.vram_userfault.lock);
+	if (!list_empty(&bo->vram_userfault_link))
+		list_del(&bo->vram_userfault_link);
+	mutex_unlock(&xe->mem_access.vram_userfault.lock);
+
 	kfree(bo);
 }
 
@@ -1110,16 +1129,20 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 {
 	struct ttm_buffer_object *tbo = vmf->vma->vm_private_data;
 	struct drm_device *ddev = tbo->base.dev;
+	struct xe_device *xe = to_xe_device(ddev);
+	struct xe_bo *bo = ttm_to_xe_bo(tbo);
+	bool needs_rpm = bo->flags & XE_BO_CREATE_VRAM_MASK;
 	vm_fault_t ret;
 	int idx, r = 0;
 
+	if (needs_rpm)
+		xe_device_mem_access_get(xe);
+
 	ret = ttm_bo_vm_reserve(tbo, vmf);
 	if (ret)
-		return ret;
+		goto out;
 
 	if (drm_dev_enter(ddev, &idx)) {
-		struct xe_bo *bo = ttm_to_xe_bo(tbo);
-
 		trace_xe_bo_cpu_fault(bo);
 
 		if (should_migrate_to_system(bo)) {
@@ -1137,10 +1160,24 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 	} else {
 		ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
 	}
+
 	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
-		return ret;
+		goto out;
+	/*
+	 * ttm_bo_vm_reserve() already has dma_resv_lock.
+	 */
+	if (ret == VM_FAULT_NOPAGE && mem_type_is_vram(tbo->resource->mem_type)) {
+		mutex_lock(&xe->mem_access.vram_userfault.lock);
+		if (list_empty(&bo->vram_userfault_link))
+			list_add(&bo->vram_userfault_link, &xe->mem_access.vram_userfault.list);
+		mutex_unlock(&xe->mem_access.vram_userfault.lock);
+	}
 
 	dma_resv_unlock(tbo->base.resv);
+out:
+	if (needs_rpm)
+		xe_device_mem_access_put(xe);
+
 	return ret;
 }
 
@@ -1254,6 +1291,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 #ifdef CONFIG_PROC_FS
 	INIT_LIST_HEAD(&bo->client_link);
 #endif
+	INIT_LIST_HEAD(&bo->vram_userfault_link);
 
 	drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
 
@@ -2264,6 +2302,16 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
 	return err;
 }
 
+void xe_bo_runtime_pm_release_mmap_offset(struct xe_bo *bo)
+{
+	struct ttm_buffer_object *tbo = &bo->ttm;
+	struct ttm_device *bdev = tbo->bdev;
+
+	drm_vma_node_unmap(&tbo->base.vma_node, bdev->dev_mapping);
+
+	list_del_init(&bo->vram_userfault_link);
+}
+
 #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
 #include "tests/xe_bo.c"
 #endif
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 97b32528c600f6..350cc73cadf8d8 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -249,6 +249,8 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file);
 int xe_gem_mmap_offset_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file);
+void xe_bo_runtime_pm_release_mmap_offset(struct xe_bo *bo);
+
 int xe_bo_dumb_create(struct drm_file *file_priv,
 		      struct drm_device *dev,
 		      struct drm_mode_create_dumb *args);
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index 64c2249a4e4079..14ef13b7b421f3 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -88,6 +88,9 @@ struct xe_bo {
 	 * objects.
 	 */
 	u16 cpu_caching;
+
+	/** @vram_userfault_link: Link into @mem_access.vram_userfault.list */
+		struct list_head vram_userfault_link;
 };
 
 #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 163d889d407bc8..8404685b241824 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -387,6 +387,22 @@ struct xe_device {
 	struct {
 		/** @ref: ref count of memory accesses */
 		atomic_t ref;
+
+		/** @vram_userfault: Encapsulate vram_userfault related stuff */
+		struct {
+			/**
+			 * @lock: Protects access to @vram_usefault.list
+			 * Using mutex instead of spinlock as lock is applied to entire
+			 * list operation which may sleep
+			 */
+			struct mutex lock;
+
+			/**
+			 * @list: Keep list of userfaulted vram bo, which require to release their
+			 * mmap mappings at runtime suspend path
+			 */
+			struct list_head list;
+		} vram_userfault;
 	} mem_access;
 
 	/**
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index dcc5ded1558e71..7ba2000f535bda 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -774,6 +774,8 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		str_yes_no(xe_device_has_sriov(xe)),
 		xe_sriov_mode_to_string(xe_device_sriov_mode(xe)));
 
+	xe_pm_init_early(xe);
+
 	err = xe_device_probe(xe);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index b429c2876a7640..d5f219796d7ef2 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -163,6 +163,12 @@ static void xe_pm_runtime_init(struct xe_device *xe)
 	pm_runtime_put(dev);
 }
 
+void xe_pm_init_early(struct xe_device *xe)
+{
+	INIT_LIST_HEAD(&xe->mem_access.vram_userfault.list);
+	drmm_mutex_init(&xe->drm, &xe->mem_access.vram_userfault.lock);
+}
+
 void xe_pm_init(struct xe_device *xe)
 {
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
@@ -214,6 +220,7 @@ struct task_struct *xe_pm_read_callback_task(struct xe_device *xe)
 
 int xe_pm_runtime_suspend(struct xe_device *xe)
 {
+	struct xe_bo *bo, *on;
 	struct xe_gt *gt;
 	u8 id;
 	int err = 0;
@@ -247,6 +254,16 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
 	 */
 	lock_map_acquire(&xe_device_mem_access_lockdep_map);
 
+	/*
+	 * Applying lock for entire list op as xe_ttm_bo_destroy and xe_bo_move_notify
+	 * also checks and delets bo entry from user fault list.
+	 */
+	mutex_lock(&xe->mem_access.vram_userfault.lock);
+	list_for_each_entry_safe(bo, on,
+				 &xe->mem_access.vram_userfault.list, vram_userfault_link)
+		xe_bo_runtime_pm_release_mmap_offset(bo);
+	mutex_unlock(&xe->mem_access.vram_userfault.lock);
+
 	if (xe->d3cold.allowed) {
 		err = xe_bo_evict_all(xe);
 		if (err)
diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
index 6b9031f7af2439..64a97c6726a727 100644
--- a/drivers/gpu/drm/xe/xe_pm.h
+++ b/drivers/gpu/drm/xe/xe_pm.h
@@ -20,6 +20,7 @@ struct xe_device;
 int xe_pm_suspend(struct xe_device *xe);
 int xe_pm_resume(struct xe_device *xe);
 
+void xe_pm_init_early(struct xe_device *xe);
 void xe_pm_init(struct xe_device *xe);
 void xe_pm_runtime_fini(struct xe_device *xe);
 int xe_pm_runtime_suspend(struct xe_device *xe);

From 29f424eb8702b686cb6f07ddd659c6312e0c796d Mon Sep 17 00:00:00 2001
From: Matthew Auld <matthew.auld@intel.com>
Date: Wed, 13 Dec 2023 17:47:04 +0000
Subject: [PATCH 048/200] drm/xe/exec: move fence reservation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We currently assume that we can upfront know exactly how many fence
slots we will need at the start of the exec, however the TTM bo_validate
can itself consume numerous fence slots, and due to how the
dma_resv_reserve_fences() works it only ensures that at least that many
fence slots are available. With this it is quite possible that TTM
steals some of the fence slots and then when it comes time to do the vma
binding and final exec stage we are lacking enough fence slots, leading
to some nasty BUG_ON(). A simple fix is to reserve our own fences later,
after the validate stage.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/698
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Tested-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c | 40 ++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index ba92e5619da390..63e82e5285bc6f 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -96,7 +96,44 @@
 
 static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
 {
-	return drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
+	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
+	struct drm_gem_object *obj;
+	unsigned long index;
+	int num_fences;
+	int ret;
+
+	ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
+	if (ret)
+		return ret;
+
+	/*
+	 * 1 fence slot for the final submit, and one more for every per-tile
+	 * bind. Note that there are potentially many vma per object/dma-resv,
+	 * however the fence slot will just be re-used, since they are largely
+	 * the same timeline and the seqno should be in order.
+	 */
+	num_fences = 1 + vm->xe->info.tile_count;
+
+	/*
+	 * We don't know upfront exactly how many fence slots we will need at
+	 * the start of the exec, since the TTM bo_validate above can consume
+	 * numerous fence slots. Also due to how the dma_resv_reserve_fences()
+	 * works it only ensures that at least that many fence slots are
+	 * available i.e if there are already 10 slots available and we reserve
+	 * two more, it can just noop without reserving anything.  With this it
+	 * is quite possible that TTM steals some of the fence slots and then
+	 * when it comes time to do the vma binding and final exec stage we are
+	 * lacking enough fence slots, leading to some nasty BUG_ON() when
+	 * adding the fences. Hence just add our own fences here, after the
+	 * validate stage.
+	 */
+	drm_exec_for_each_locked_object(&vm_exec->exec, index, obj) {
+		ret = dma_resv_reserve_fences(obj->resv, num_fences);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
 }
 
 int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
@@ -189,7 +226,6 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	}
 
 	vm_exec.vm = &vm->gpuvm;
-	vm_exec.num_fences = 1 + vm->xe->info.tile_count;
 	vm_exec.flags = DRM_EXEC_INTERRUPTIBLE_WAIT;
 	if (xe_vm_in_lr_mode(vm)) {
 		drm_exec_init(exec, vm_exec.flags);

From f4e8ab468fc6cfaf718bb8610940d57a5e2309ba Mon Sep 17 00:00:00 2001
From: Matthew Auld <matthew.auld@intel.com>
Date: Wed, 13 Dec 2023 17:47:05 +0000
Subject: [PATCH 049/200] drm/xe/exec: reserve fence slot for CPU bind
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Looks possible to switch from CPU binding to GPU binding mid exec, and
if that happens for the same dma-resv we might use two fence slots, once
for the dummy fence, and another for the actual GPU bind.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/698
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 63e82e5285bc6f..0c78a377f4532c 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -107,12 +107,14 @@ static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
 		return ret;
 
 	/*
-	 * 1 fence slot for the final submit, and one more for every per-tile
-	 * bind. Note that there are potentially many vma per object/dma-resv,
-	 * however the fence slot will just be re-used, since they are largely
-	 * the same timeline and the seqno should be in order.
+	 * 1 fence slot for the final submit, and 1 more for every per-tile for
+	 * GPU bind and 1 extra for CPU bind. Note that there are potentially
+	 * many vma per object/dma-resv, however the fence slot will just be
+	 * re-used, since they are largely the same timeline and the seqno
+	 * should be in order. In the case of CPU bind there is dummy fence used
+	 * for all CPU binds, so no need to have a per-tile slot for that.
 	 */
-	num_fences = 1 + vm->xe->info.tile_count;
+	num_fences = 1 + 1 + vm->xe->info.tile_count;
 
 	/*
 	 * We don't know upfront exactly how many fence slots we will need at

From 97d0047cbb17318431eaf37dfe1a6855539340f9 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Thu, 4 Jan 2024 00:00:39 -0800
Subject: [PATCH 050/200] drm/xe: Fix exec IOCTL long running exec queue ring
 full condition

The intent is to return -EWOULDBLOCK to the user if a long running exec
queue is full during the exec IOCTL. -EWOULDBLOCK aliases to -EAGAIN
which results in the exec IOCTL doing a retry loop. Fix this by ensuring
the retry loop is broken when returning -EWOULDBLOCK.

Fixes: 8ae8a2e8dd21 ("drm/xe: Long running job update")
Reported-by: Sai Gowtham Ch <sai.gowtham.ch@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 0c78a377f4532c..9db1867878a42e 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -154,7 +154,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct xe_sched_job *job;
 	struct dma_fence *rebind_fence;
 	struct xe_vm *vm;
-	bool write_locked;
+	bool write_locked, skip_retry = false;
 	ktime_t end = 0;
 	int err = 0;
 
@@ -265,7 +265,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	}
 
 	if (xe_exec_queue_is_lr(q) && xe_exec_queue_ring_full(q)) {
-		err = -EWOULDBLOCK;
+		err = -EWOULDBLOCK;	/* Aliased to -EAGAIN */
+		skip_retry = true;
 		goto err_exec;
 	}
 
@@ -375,7 +376,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		up_write(&vm->lock);
 	else
 		up_read(&vm->lock);
-	if (err == -EAGAIN)
+	if (err == -EAGAIN && !skip_retry)
 		goto retry;
 err_syncs:
 	for (i = 0; i < num_syncs; i++)

From 5030e16140b655ba00217d47680e697480ac3587 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Tue, 2 Jan 2024 12:35:38 -0800
Subject: [PATCH 051/200] drm/xe/guc: Only take actions in CT irq handler if
 CTs are enabled

Protect entire IRQ handler by CT being enabled rather than just G2H
handler.

v2: Return on not enabled in CT irq handler (Michal)

Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h
index f15f8a4857e07e..9ecb67db8ec404 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.h
+++ b/drivers/gpu/drm/xe/xe_guc_ct.h
@@ -24,9 +24,11 @@ void xe_guc_ct_print(struct xe_guc_ct *ct, struct drm_printer *p, bool atomic);
 
 static inline void xe_guc_ct_irq_handler(struct xe_guc_ct *ct)
 {
+	if (!ct->enabled)
+		return;
+
 	wake_up_all(&ct->wq);
-	if (ct->enabled)
-		queue_work(system_unbound_wq, &ct->g2h_worker);
+	queue_work(system_unbound_wq, &ct->g2h_worker);
 	xe_guc_ct_fast_path(ct);
 }
 

From 9d0c1c5618be02c5acda7e6bbb728007b0632984 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Fri, 22 Dec 2023 18:59:04 +0100
Subject: [PATCH 052/200] drm/xe/vm: Fix an error path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If using the VM_BIND_OP_UNMAP_ALL without any bound vmas for the
vm, we will end up dereferencing an uninitialized variable and leak a
bo lock. Fix this.

v2:
- Updated commit message (Lucas De Marchi)

Reported-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Closes: https://lore.kernel.org/intel-xe/jrwua7ckbiozfcaodx4gg2h4taiuxs53j5zlpf3qzvyhyiyl2d@pbs3plurokrj/
Suggested-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Fixes: b06d47be7c83 ("drm/xe: Port Xe to GPUVA")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222175904.16732-1-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 1ca917b8315c2a..127842656a2395 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2063,9 +2063,11 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 		if (err)
 			return ERR_PTR(err);
 
-		vm_bo = drm_gpuvm_bo_find(&vm->gpuvm, obj);
-		if (!vm_bo)
-			break;
+		vm_bo = drm_gpuvm_bo_obtain(&vm->gpuvm, obj);
+		if (IS_ERR(vm_bo)) {
+			xe_bo_unlock(bo);
+			return ERR_CAST(vm_bo);
+		}
 
 		ops = drm_gpuvm_bo_unmap_ops_create(vm_bo);
 		drm_gpuvm_bo_put(vm_bo);

From 9d03bf30e78673d827484bbc17a6fd8f5e43a039 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Tue, 9 Jan 2024 12:24:02 +0100
Subject: [PATCH 053/200] drm/xe: Use __iomem for the regs pointer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The regs pointer points to IO memory. Annotate it properly and
fix the corresponding sparse warning.

Fixes: a4e2f3a299ea ("drm/xe: refactor xe_mmio_probe_tiles to support MMIO extension")
Cc: Koby Elbaz <kelbaz@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240109112405.108136-2-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_mmio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index f660cfb79f504e..c8c5d74b6e9041 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -303,7 +303,7 @@ void xe_mmio_probe_tiles(struct xe_device *xe)
 	u8 id, tile_count = xe->info.tile_count;
 	struct xe_gt *gt = xe_root_mmio_gt(xe);
 	struct xe_tile *tile;
-	void *regs;
+	void __iomem *regs;
 	u32 mtcfg;
 
 	if (tile_count == 1)

From 20855b62a30538361e587cfc7c5245f07d4f826a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Tue, 9 Jan 2024 12:24:03 +0100
Subject: [PATCH 054/200] drm/xe: Annotate xe_mem_region::mapping with __iomem
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The pointer points to IO memory, but the __iomem annotation was
incorrectly placed. Annotate it correctly, update its usage accordingly
and fix the corresponding sparse error.

Fixes: 0887a2e7ab62 ("drm/xe: Make xe_mem_region struct")
Cc: Oak Zeng <oak.zeng@intel.com>
Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240109112405.108136-3-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_bo.c           | 4 ++--
 drivers/gpu/drm/xe/xe_device_types.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 2e4d2157179c89..338f9688a2c939 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -442,7 +442,7 @@ static int xe_ttm_io_mem_reserve(struct ttm_device *bdev,
 
 		if (vram->mapping &&
 		    mem->placement & TTM_PL_FLAG_CONTIGUOUS)
-			mem->bus.addr = (u8 *)vram->mapping +
+			mem->bus.addr = (u8 __force *)vram->mapping +
 				mem->bus.offset;
 
 		mem->bus.offset += vram->io_start;
@@ -748,7 +748,7 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 			/* Create a new VMAP once kernel BO back in VRAM */
 			if (!ret && resource_is_vram(new_mem)) {
 				struct xe_mem_region *vram = res_to_mem_region(new_mem);
-				void *new_addr = vram->mapping +
+				void __iomem *new_addr = vram->mapping +
 					(new_mem->start << PAGE_SHIFT);
 
 				if (XE_WARN_ON(new_mem->start == XE_BO_INVALID_OFFSET)) {
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 8404685b241824..3c2bcd597d2a52 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -98,7 +98,7 @@ struct xe_mem_region {
 	 */
 	resource_size_t actual_physical_size;
 	/** @mapping: pointer to VRAM mappable space */
-	void *__iomem mapping;
+	void __iomem *mapping;
 };
 
 /**

From 9d612ee52c6096bc70d43f54921ba2831ffbf1ad Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Tue, 9 Jan 2024 12:24:04 +0100
Subject: [PATCH 055/200] drm/xe: Annotate multiple mmio pointers with __iomem
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

There are a couple of pointers pointing to MMIO space. Annotate them
with __iomem and fix the corresponding sparse warnings.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Fixes: 3b0d4a557996 ("drm/xe: Move register MMIO into xe_tile")
Fixes: 399a13323f0d ("drm/xe: add 28-bit address support in struct xe_reg")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Koby Elbaz <kelbaz@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240109112405.108136-4-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_device_types.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 3c2bcd597d2a52..7eda86bd4c2a43 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -147,7 +147,7 @@ struct xe_tile {
 		size_t size;
 
 		/** @regs: pointer to tile's MMIO space (starting with registers) */
-		void *regs;
+		void __iomem *regs;
 	} mmio;
 
 	/**
@@ -160,7 +160,7 @@ struct xe_tile {
 		size_t size;
 
 		/** @regs: pointer to tile's additional MMIO-extension space */
-		void *regs;
+		void __iomem *regs;
 	} mmio_ext;
 
 	/** @mem: memory management info for tile */
@@ -306,7 +306,7 @@ struct xe_device {
 		/** @size: size of MMIO space for device */
 		size_t size;
 		/** @regs: pointer to MMIO space for device */
-		void *regs;
+		void __iomem *regs;
 	} mmio;
 
 	/** @mem: memory info for device */

From dcddb6f0b06d454c9a3b2b240a43f0e7310c7f7c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Tue, 9 Jan 2024 12:24:05 +0100
Subject: [PATCH 056/200] drm/xe: Annotate xe_ttm_stolen_mgr::mapping with
 __iomem
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The pointer points to IO memory, but the __iomem annotation was
incorrectly placed. Annotate it correctly, update its usage accordingly
and fix the corresponding sparse error.

Fixes: d8b52a02cb40 ("drm/xe: Implement stolen memory.")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240109112405.108136-5-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
index d2b00d0bf1e203..e5d7d5e2bec129 100644
--- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
@@ -31,7 +31,7 @@ struct xe_ttm_stolen_mgr {
 	/* GPU base offset */
 	resource_size_t stolen_base;
 
-	void *__iomem mapping;
+	void __iomem *mapping;
 };
 
 static inline struct xe_ttm_stolen_mgr *
@@ -275,7 +275,7 @@ static int __xe_ttm_stolen_io_mem_reserve_bar2(struct xe_device *xe,
 	drm_WARN_ON(&xe->drm, !(mem->placement & TTM_PL_FLAG_CONTIGUOUS));
 
 	if (mem->placement & TTM_PL_FLAG_CONTIGUOUS && mgr->mapping)
-		mem->bus.addr = (u8 *)mgr->mapping + mem->bus.offset;
+		mem->bus.addr = (u8 __force *)mgr->mapping + mem->bus.offset;
 
 	mem->bus.offset += mgr->io_base;
 	mem->bus.is_iomem = true;

From 9fbedddfc90062e09426108335585487647067e3 Mon Sep 17 00:00:00 2001
From: Shekhar Chauhan <shekhar.chauhan@intel.com>
Date: Tue, 9 Jan 2024 11:25:50 +0530
Subject: [PATCH 057/200] drm/xe/xe2_lpg: Add Wa_16018610683

Force max 128KB SLM during WMTP PASS1 Restore.

BSpec: 70202
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20240109055550.679289-1-shekhar.chauhan@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_gt_regs.h | 3 +++
 drivers/gpu/drm/xe/xe_wa.c           | 5 ++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index 6dfad86aaea647..4017319c63000f 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -345,6 +345,9 @@
 #define ROW_CHICKEN3				XE_REG_MCR(0xe49c, XE_REG_OPTION_MASKED)
 #define   DIS_FIX_EOT1_FLUSH			REG_BIT(9)
 
+#define TDL_TSL_CHICKEN				XE_REG_MCR(0xe4c4, XE_REG_OPTION_MASKED)
+#define   SLM_WMTP_RESTORE			REG_BIT(11)
+
 #define ROW_CHICKEN				XE_REG_MCR(0xe4f0, XE_REG_OPTION_MASKED)
 #define   UGM_BACKUP_MODE			REG_BIT(13)
 #define   MDQ_ARBITRATION_MODE			REG_BIT(12)
diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
index b77d406e083e2d..3299130ba10ae0 100644
--- a/drivers/gpu/drm/xe/xe_wa.c
+++ b/drivers/gpu/drm/xe/xe_wa.c
@@ -455,7 +455,10 @@ static const struct xe_rtp_entry_sr engine_was[] = {
 			     PPHWSP_CSB_AND_TIMESTAMP_REPORT_DIS,
 			     XE_RTP_ACTION_FLAG(ENGINE_BASE)))
 	},
-
+	{ XE_RTP_NAME("16018610683"),
+	  XE_RTP_RULES(GRAPHICS_VERSION(2004), FUNC(xe_rtp_match_first_render_or_compute)),
+	  XE_RTP_ACTIONS(SET(TDL_TSL_CHICKEN, SLM_WMTP_RESTORE))
+	},
 	{}
 };
 

From b16483f9f8120b530327879fa3ea576e897946da Mon Sep 17 00:00:00 2001
From: Brian Welty <brian.welty@intel.com>
Date: Fri, 5 Jan 2024 11:04:39 -0800
Subject: [PATCH 058/200] drm/xe: Fix guc_exec_queue_set_priority

We need to set q->priority prior to calling guc_exec_queue_add_msg() as
that will call init_policies() and sets the scheduling properties to those
stored in the exec_queue.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 21ac68e3246f86..5de3ac47c4623d 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1308,8 +1308,8 @@ static int guc_exec_queue_set_priority(struct xe_exec_queue *q,
 	if (!msg)
 		return -ENOMEM;
 
-	guc_exec_queue_add_msg(q, msg, SET_SCHED_PROPS);
 	q->priority = priority;
+	guc_exec_queue_add_msg(q, msg, SET_SCHED_PROPS);
 
 	return 0;
 }

From a8004af338f6b3319476ecbed63ea49bf393fc1f Mon Sep 17 00:00:00 2001
From: Brian Welty <brian.welty@intel.com>
Date: Fri, 5 Jan 2024 11:04:40 -0800
Subject: [PATCH 059/200] drm/xe: Fix modifying exec_queue priority in
 xe_migrate_init

After exec_queue has been created, we cannot simply modify q->priority.
This needs to be done by the backend via q->ops.  However in this case,
it would be more efficient to simply pass a flag when creating the
exec_queue and set the desired priority upfront during queue creation.

To that end: new flag EXEC_QUEUE_FLAG_HIGH_PRIORITY is introduced.
The priority field is moved to be with other scheduling properties and
is now exec_queue.sched_props.priority. This is no longer set to initial
value by the backend, but is now set within __xe_exec_queue_create().

Fixes: b4eecedc75c1 ("drm/xe: Fix potential deadlock handling page faults")
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c       | 5 +++++
 drivers/gpu/drm/xe/xe_exec_queue_types.h | 6 ++++--
 drivers/gpu/drm/xe/xe_guc_submit.c       | 7 +++----
 drivers/gpu/drm/xe/xe_migrate.c          | 5 ++---
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 44fe8097b7cdac..bcfc4127c7c59f 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -67,6 +67,11 @@ static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
 	q->sched_props.timeslice_us = hwe->eclass->sched_props.timeslice_us;
 	q->sched_props.preempt_timeout_us =
 				hwe->eclass->sched_props.preempt_timeout_us;
+	if (q->flags & EXEC_QUEUE_FLAG_KERNEL &&
+	    q->flags & EXEC_QUEUE_FLAG_HIGH_PRIORITY)
+		q->sched_props.priority = XE_EXEC_QUEUE_PRIORITY_KERNEL;
+	else
+		q->sched_props.priority = XE_EXEC_QUEUE_PRIORITY_NORMAL;
 
 	if (xe_exec_queue_is_parallel(q)) {
 		q->parallel.composite_fence_ctx = dma_fence_context_alloc(1);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 3d7e704ec3d9f3..8d4b7feb8c306b 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -52,8 +52,6 @@ struct xe_exec_queue {
 	struct xe_vm *vm;
 	/** @class: class of this exec queue */
 	enum xe_engine_class class;
-	/** @priority: priority of this exec queue */
-	enum xe_exec_queue_priority priority;
 	/**
 	 * @logical_mask: logical mask of where job submitted to exec queue can run
 	 */
@@ -84,6 +82,8 @@ struct xe_exec_queue {
 #define EXEC_QUEUE_FLAG_VM			BIT(4)
 /* child of VM queue for multi-tile VM jobs */
 #define EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD	BIT(5)
+/* kernel exec_queue only, set priority to highest level */
+#define EXEC_QUEUE_FLAG_HIGH_PRIORITY		BIT(6)
 
 	/**
 	 * @flags: flags for this exec queue, should statically setup aside from ban
@@ -142,6 +142,8 @@ struct xe_exec_queue {
 		u32 timeslice_us;
 		/** @preempt_timeout_us: preemption timeout in micro-seconds */
 		u32 preempt_timeout_us;
+		/** @priority: priority of this exec queue */
+		enum xe_exec_queue_priority priority;
 	} sched_props;
 
 	/** @compute: compute exec queue state */
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 5de3ac47c4623d..54ffcfcdd41f9c 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -421,7 +421,7 @@ static void init_policies(struct xe_guc *guc, struct xe_exec_queue *q)
 {
 	struct exec_queue_policy policy;
 	struct xe_device *xe = guc_to_xe(guc);
-	enum xe_exec_queue_priority prio = q->priority;
+	enum xe_exec_queue_priority prio = q->sched_props.priority;
 	u32 timeslice_us = q->sched_props.timeslice_us;
 	u32 preempt_timeout_us = q->sched_props.preempt_timeout_us;
 
@@ -1231,7 +1231,6 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 	err = xe_sched_entity_init(&ge->entity, sched);
 	if (err)
 		goto err_sched;
-	q->priority = XE_EXEC_QUEUE_PRIORITY_NORMAL;
 
 	if (xe_exec_queue_is_lr(q))
 		INIT_WORK(&q->guc->lr_tdr, xe_guc_exec_queue_lr_cleanup);
@@ -1301,14 +1300,14 @@ static int guc_exec_queue_set_priority(struct xe_exec_queue *q,
 {
 	struct xe_sched_msg *msg;
 
-	if (q->priority == priority || exec_queue_killed_or_banned(q))
+	if (q->sched_props.priority == priority || exec_queue_killed_or_banned(q))
 		return 0;
 
 	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
 	if (!msg)
 		return -ENOMEM;
 
-	q->priority = priority;
+	q->sched_props.priority = priority;
 	guc_exec_queue_add_msg(q, msg, SET_SCHED_PROPS);
 
 	return 0;
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index adf1dab5eba253..02fca8f9adc284 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -344,7 +344,8 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
 
 		m->q = xe_exec_queue_create(xe, vm, logical_mask, 1, hwe,
 					    EXEC_QUEUE_FLAG_KERNEL |
-					    EXEC_QUEUE_FLAG_PERMANENT);
+					    EXEC_QUEUE_FLAG_PERMANENT |
+					    EXEC_QUEUE_FLAG_HIGH_PRIORITY);
 	} else {
 		m->q = xe_exec_queue_create_class(xe, primary_gt, vm,
 						  XE_ENGINE_CLASS_COPY,
@@ -355,8 +356,6 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
 		xe_vm_close_and_put(vm);
 		return ERR_CAST(m->q);
 	}
-	if (xe->info.has_usm)
-		m->q->priority = XE_EXEC_QUEUE_PRIORITY_KERNEL;
 
 	mutex_init(&m->job_mutex);
 

From 4ae3aeab32d7f37cde4724524f5525703e5a9b54 Mon Sep 17 00:00:00 2001
From: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Date: Tue, 9 Jan 2024 16:34:18 +0530
Subject: [PATCH 060/200] drm/xe: Add vram frequency sysfs attributes

Add vram frequency sysfs attributes under the below hierarchy;

/device/tile#/memory/freq0
			|-max_freq
			|-min_freq

v2: Drop "vram" from attribute names (Rodrigo)

v3: Add documentation for new sysfs (Riana)
    Drop prefix from XEHP_PCODE_FREQUENCY_CONFIG (Riana)

v4: Create sysfs under tile#/freq0 after removal of
    physical_memsize attrbute

v5: Revert back to creating sysfs under tile#/memory/freq0
    Remove definition of GT_FREQUENCY_MULTIPLIER (Rodrigo)

v6: Rename attributes to max/min_freq (Anshuman)
    Fix review comments (Rodrigo)

v7: Make documentation more verbose
    Move sysfs to separate file (Anshuman)

v8: Fix platform specific conditions and add kernel doc (Anshuman)
    Fix typos and remove redundant headers (Riana)

v9: Fix typo (Riana)
    Change function name to include "sysfs" (Lucas)

Signed-off-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://lore.kernel.org/r/20240109110418.2065101-1-sujaritha.sundaresan@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/Makefile        |   1 +
 drivers/gpu/drm/xe/xe_pcode_api.h  |   7 ++
 drivers/gpu/drm/xe/xe_tile_sysfs.c |   3 +
 drivers/gpu/drm/xe/xe_vram_freq.c  | 129 +++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vram_freq.h  |  13 +++
 5 files changed, 153 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_vram_freq.c
 create mode 100644 drivers/gpu/drm/xe/xe_vram_freq.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 6952da8979ea71..f3495d0e701ccd 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -139,6 +139,7 @@ xe-y += xe_bb.o \
 	xe_uc_debugfs.o \
 	xe_uc_fw.o \
 	xe_vm.o \
+	xe_vram_freq.o \
 	xe_wait_user_fence.o \
 	xe_wa.o \
 	xe_wopcm.o
diff --git a/drivers/gpu/drm/xe/xe_pcode_api.h b/drivers/gpu/drm/xe/xe_pcode_api.h
index 5935cfe302044e..f153ce96f69a79 100644
--- a/drivers/gpu/drm/xe/xe_pcode_api.h
+++ b/drivers/gpu/drm/xe/xe_pcode_api.h
@@ -42,6 +42,13 @@
 #define	    POWER_SETUP_I1_SHIFT		6	/* 10.6 fixed point format */
 #define	    POWER_SETUP_I1_DATA_MASK		REG_GENMASK(15, 0)
 
+#define   PCODE_FREQUENCY_CONFIG		0x6e
+/* Frequency Config Sub Commands (param1) */
+#define     PCODE_MBOX_FC_SC_READ_FUSED_P0	0x0
+#define     PCODE_MBOX_FC_SC_READ_FUSED_PN	0x1
+/* Domain IDs (param2) */
+#define     PCODE_MBOX_DOMAIN_HBM		0x2
+
 struct pcode_err_decode {
 	int errno;
 	const char *str;
diff --git a/drivers/gpu/drm/xe/xe_tile_sysfs.c b/drivers/gpu/drm/xe/xe_tile_sysfs.c
index 0f8d3e7fce46a2..0662968d7bcbe6 100644
--- a/drivers/gpu/drm/xe/xe_tile_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_tile_sysfs.c
@@ -9,6 +9,7 @@
 
 #include "xe_tile.h"
 #include "xe_tile_sysfs.h"
+#include "xe_vram_freq.h"
 
 static void xe_tile_sysfs_kobj_release(struct kobject *kobj)
 {
@@ -50,6 +51,8 @@ void xe_tile_sysfs_init(struct xe_tile *tile)
 
 	tile->sysfs = &kt->base;
 
+	xe_vram_freq_sysfs_init(tile);
+
 	err = drmm_add_action_or_reset(&xe->drm, tile_sysfs_fini, tile);
 	if (err)
 		drm_warn(&xe->drm, "%s: drmm_add_action_or_reset failed, err: %d\n",
diff --git a/drivers/gpu/drm/xe/xe_vram_freq.c b/drivers/gpu/drm/xe/xe_vram_freq.c
new file mode 100644
index 00000000000000..7331462933070b
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_vram_freq.c
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+#include <linux/sysfs.h>
+#include <drm/drm_managed.h>
+
+#include "xe_gt_types.h"
+#include "xe_pcode.h"
+#include "xe_pcode_api.h"
+#include "xe_tile.h"
+#include "xe_tile_sysfs.h"
+#include "xe_vram_freq.h"
+
+/**
+ * DOC: Xe VRAM freq
+ *
+ * Provides sysfs entries for vram frequency in tile
+ *
+ * device/tile#/memory/freq0/max_freq - This is maximum frequency. This value is read-only as it
+ *					is the fixed fuse point P0. It is not the system
+ *					configuration.
+ * device/tile#/memory/freq0/min_freq - This is minimum frequency. This value is read-only as it
+ *					is the fixed fuse point PN. It is not the system
+ *					configuration.
+ */
+
+static struct xe_tile *dev_to_tile(struct device *dev)
+{
+	return kobj_to_tile(dev->kobj.parent);
+}
+
+static ssize_t max_freq_show(struct device *dev, struct device_attribute *attr,
+			     char *buf)
+{
+	struct xe_tile *tile = dev_to_tile(dev);
+	struct xe_gt *gt = tile->primary_gt;
+	u32 val, mbox;
+	int err;
+
+	mbox = REG_FIELD_PREP(PCODE_MB_COMMAND, PCODE_FREQUENCY_CONFIG)
+		| REG_FIELD_PREP(PCODE_MB_PARAM1, PCODE_MBOX_FC_SC_READ_FUSED_P0)
+		| REG_FIELD_PREP(PCODE_MB_PARAM2, PCODE_MBOX_DOMAIN_HBM);
+
+	err = xe_pcode_read(gt, mbox, &val, NULL);
+	if (err)
+		return err;
+
+	/* data_out - Fused P0 for domain ID in units of 50 MHz */
+	val *= 50;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+static DEVICE_ATTR_RO(max_freq);
+
+static ssize_t min_freq_show(struct device *dev, struct device_attribute *attr,
+			     char *buf)
+{
+	struct xe_tile *tile = dev_to_tile(dev);
+	struct xe_gt *gt = tile->primary_gt;
+	u32 val, mbox;
+	int err;
+
+	mbox = REG_FIELD_PREP(PCODE_MB_COMMAND, PCODE_FREQUENCY_CONFIG)
+		| REG_FIELD_PREP(PCODE_MB_PARAM1, PCODE_MBOX_FC_SC_READ_FUSED_PN)
+		| REG_FIELD_PREP(PCODE_MB_PARAM2, PCODE_MBOX_DOMAIN_HBM);
+
+	err = xe_pcode_read(gt, mbox, &val, NULL);
+	if (err)
+		return err;
+
+	/* data_out - Fused Pn for domain ID in units of 50 MHz */
+	val *= 50;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+static DEVICE_ATTR_RO(min_freq);
+
+static struct attribute *freq_attrs[] = {
+	&dev_attr_max_freq.attr,
+	&dev_attr_min_freq.attr,
+	NULL
+};
+
+static const struct attribute_group freq_group_attrs = {
+	.name = "freq0",
+	.attrs = freq_attrs,
+};
+
+static void vram_freq_sysfs_fini(struct drm_device *drm, void *arg)
+{
+	struct kobject *kobj = arg;
+
+	sysfs_remove_group(kobj, &freq_group_attrs);
+	kobject_put(kobj);
+}
+
+/*
+ * xe_vram_freq_init - Initialize vram frequency component
+ * @tile: Xe Tile object
+ *
+ * It needs to be initialized after the main tile component is ready
+ */
+
+void xe_vram_freq_sysfs_init(struct xe_tile *tile)
+{
+	struct xe_device *xe = tile_to_xe(tile);
+	struct kobject *kobj;
+	int err;
+
+	if (xe->info.platform != XE_PVC)
+		return;
+
+	kobj = kobject_create_and_add("memory", tile->sysfs);
+	if (!kobj)
+		drm_warn(&xe->drm, "failed to add memory directory, err: %d\n", -ENOMEM);
+
+	err = sysfs_create_group(kobj, &freq_group_attrs);
+	if (err) {
+		kobject_put(kobj);
+		drm_warn(&xe->drm, "failed to register vram freq sysfs, err: %d\n", err);
+		return;
+	}
+
+	err = drmm_add_action_or_reset(&xe->drm, vram_freq_sysfs_fini, kobj);
+	if (err)
+		drm_warn(&xe->drm, "%s: drmm_add_action_or_reset failed, err: %d\n",
+			 __func__, err);
+}
diff --git a/drivers/gpu/drm/xe/xe_vram_freq.h b/drivers/gpu/drm/xe/xe_vram_freq.h
new file mode 100644
index 00000000000000..cbe8c12fbd6459
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_vram_freq.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_VRAM_FREQ_H_
+#define _XE_VRAM_FREQ_H_
+
+struct xe_tile;
+
+void xe_vram_freq_sysfs_init(struct xe_tile *tile);
+
+#endif /* _XE_VRAM_FREQ_H_ */

From 69cac0a8f3ef8db4d62441c4a2686ec676c9facd Mon Sep 17 00:00:00 2001
From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Date: Mon, 8 Jan 2024 14:58:42 -0800
Subject: [PATCH 061/200] drm/xe: Check skip_guc_pc before setting SLPC flag

Don't set SLPC GuC feature ctl flag if skip_guc_pc is true.

v2: Skip the freq related sysfs creation as well (Badal)
v3: Remove unnecessary parenthesis (Lucas)

Fixes: 975e4a3795d4 ("drm/xe: Manually setup C6 when skip_guc_pc is set")
Fixes: bef52b5c7a19 ("drm/xe: Create a xe_gt_freq component for raw management and sysfs")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20240108225842.966066-1-vinay.belgaumkar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_freq.c | 3 +++
 drivers/gpu/drm/xe/xe_guc.c     | 7 ++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_freq.c b/drivers/gpu/drm/xe/xe_gt_freq.c
index 3adfa6686e7cf9..e5b0f4ecdbe826 100644
--- a/drivers/gpu/drm/xe/xe_gt_freq.c
+++ b/drivers/gpu/drm/xe/xe_gt_freq.c
@@ -196,6 +196,9 @@ void xe_gt_freq_init(struct xe_gt *gt)
 	struct xe_device *xe = gt_to_xe(gt);
 	int err;
 
+	if (xe->info.skip_guc_pc)
+		return;
+
 	gt->freq = kobject_create_and_add("freq0", gt->sysfs);
 	if (!gt->freq) {
 		drm_warn(&xe->drm, "failed to add freq0 directory to %s\n",
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 311a0364bff159..6e73ebf6725148 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -63,7 +63,12 @@ static u32 guc_ctl_debug_flags(struct xe_guc *guc)
 
 static u32 guc_ctl_feature_flags(struct xe_guc *guc)
 {
-	return GUC_CTL_ENABLE_SLPC;
+	u32 flags = 0;
+
+	if (!guc_to_xe(guc)->info.skip_guc_pc)
+		flags |= GUC_CTL_ENABLE_SLPC;
+
+	return flags;
 }
 
 static u32 guc_ctl_log_params_flags(struct xe_guc *guc)

From a109d19992294736abd4f4232ea639e03eb1f9e7 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@kernel.org>
Date: Wed, 10 Jan 2024 06:48:29 -0800
Subject: [PATCH 062/200] drm/xe: Fix build bug for GCC 11
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Building drivers/gpu/drm/xe/xe_gt_pagefault.c with GCC 11 results
in the following build errors:

./include/linux/fortify-string.h:57:33: error: writing 16 bytes into a region of size 0 [-Werror=stringop-overflow=]
   57 | #define __underlying_memcpy     __builtin_memcpy
      |                                 ^
./include/linux/fortify-string.h:644:9: note: in expansion of macro ‘__underlying_memcpy’
  644 |         __underlying_##op(p, q, __fortify_size);                        \
      |         ^~~~~~~~~~~~~
./include/linux/fortify-string.h:689:26: note: in expansion of macro ‘__fortify_memcpy_chk’
  689 | #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
      |                          ^~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/xe/xe_gt_pagefault.c:340:17: note: in expansion of macro ‘memcpy’
  340 |                 memcpy(pf_queue->data + pf_queue->tail, msg, len * sizeof(u32));
      |                 ^~~~~~
In file included from drivers/gpu/drm/xe/xe_device_types.h:17,
                 from drivers/gpu/drm/xe/xe_vm_types.h:16,
                 from drivers/gpu/drm/xe/xe_bo.h:13,
                 from drivers/gpu/drm/xe/xe_gt_pagefault.c:16:
drivers/gpu/drm/xe/xe_gt_types.h:102:25: note: at offset [1144, 265324] into destination object ‘tile’ of size 8
  102 |         struct xe_tile *tile;
      |                         ^~~~

Fix these by removing -Wstringop-overflow from drm/xe builds.

Closes: https://lore.kernel.org/all/45ad1d0f-a10f-483e-848a-76a30252edbe@paulmck-laptop/
Fixes: 7a8bc11782d3 ("drm/xe: Enable W=1 warnings by default")
Suggested-by: Stephen Rothwell <sfr@rothwell.id.au>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ This particular warning is broken on GCC11. In future changes it will
  be moved to the normal C flags in the top level Makefile (out of
  Makefile.extrawarn), but accounting for the compiler support. Just
  remove it out of xe's forced extra warnings for now ]
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index f3495d0e701ccd..e16b84f79ddf36 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -17,7 +17,6 @@ subdir-ccflags-y += $(call cc-option, -Wunused-const-variable)
 subdir-ccflags-y += $(call cc-option, -Wpacked-not-aligned)
 subdir-ccflags-y += $(call cc-option, -Wformat-overflow)
 subdir-ccflags-y += $(call cc-option, -Wformat-truncation)
-subdir-ccflags-y += $(call cc-option, -Wstringop-overflow)
 subdir-ccflags-y += $(call cc-option, -Wstringop-truncation)
 # The following turn off the warnings enabled by -Wextra
 ifeq ($(findstring 2, $(KBUILD_EXTRA_WARN)),)

From 6e144a7d6f8a22f22f49f2ecf4268da1c75bcc4a Mon Sep 17 00:00:00 2001
From: Brian Welty <brian.welty@intel.com>
Date: Wed, 10 Jan 2024 09:32:49 -0800
Subject: [PATCH 063/200] drm/xe: Refactor __xe_exec_queue_create()

Split __xe_exec_queue_create() into two functions, alloc and init.

We have an issue in that exec_queue_user_extensions are applied too late.
In the case of USM properties, these need to be set prior to xe_lrc_init().
Refactor the logic here, so we can resolve this in follow-on. We only need
the xe_vm_lock held during the exec_queue_init function.

Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c | 58 ++++++++++++++++++++----------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index bcfc4127c7c59f..cd2b9a8e51f561 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -30,16 +30,14 @@ enum xe_exec_queue_sched_prop {
 	XE_EXEC_QUEUE_SCHED_PROP_MAX = 3,
 };
 
-static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
-						    struct xe_vm *vm,
-						    u32 logical_mask,
-						    u16 width, struct xe_hw_engine *hwe,
-						    u32 flags)
+static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
+						   struct xe_vm *vm,
+						   u32 logical_mask,
+						   u16 width, struct xe_hw_engine *hwe,
+						   u32 flags)
 {
 	struct xe_exec_queue *q;
 	struct xe_gt *gt = hwe->gt;
-	int err;
-	int i;
 
 	/* only kernel queues can be permanent */
 	XE_WARN_ON((flags & EXEC_QUEUE_FLAG_PERMANENT) && !(flags & EXEC_QUEUE_FLAG_KERNEL));
@@ -82,8 +80,23 @@ static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
 		q->bind.fence_seqno = XE_FENCE_INITIAL_SEQNO;
 	}
 
-	for (i = 0; i < width; ++i) {
-		err = xe_lrc_init(q->lrc + i, hwe, q, vm, SZ_16K);
+	return q;
+}
+
+static void __xe_exec_queue_free(struct xe_exec_queue *q)
+{
+	if (q->vm)
+		xe_vm_put(q->vm);
+	kfree(q);
+}
+
+static int __xe_exec_queue_init(struct xe_exec_queue *q)
+{
+	struct xe_device *xe = gt_to_xe(q->gt);
+	int i, err;
+
+	for (i = 0; i < q->width; ++i) {
+		err = xe_lrc_init(q->lrc + i, q->hwe, q, q->vm, SZ_16K);
 		if (err)
 			goto err_lrc;
 	}
@@ -100,16 +113,15 @@ static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
 	 * can perform GuC CT actions when needed. Caller is expected to have
 	 * already grabbed the rpm ref outside any sensitive locks.
 	 */
-	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !vm))
+	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm))
 		drm_WARN_ON(&xe->drm, !xe_device_mem_access_get_if_ongoing(xe));
 
-	return q;
+	return 0;
 
 err_lrc:
 	for (i = i - 1; i >= 0; --i)
 		xe_lrc_finish(q->lrc + i);
-	kfree(q);
-	return ERR_PTR(err);
+	return err;
 }
 
 struct xe_exec_queue *xe_exec_queue_create(struct xe_device *xe, struct xe_vm *vm,
@@ -119,16 +131,27 @@ struct xe_exec_queue *xe_exec_queue_create(struct xe_device *xe, struct xe_vm *v
 	struct xe_exec_queue *q;
 	int err;
 
+	q = __xe_exec_queue_alloc(xe, vm, logical_mask, width, hwe, flags);
+	if (IS_ERR(q))
+		return q;
+
 	if (vm) {
 		err = xe_vm_lock(vm, true);
 		if (err)
-			return ERR_PTR(err);
+			goto err_post_alloc;
 	}
-	q = __xe_exec_queue_create(xe, vm, logical_mask, width, hwe, flags);
+
+	err = __xe_exec_queue_init(q);
 	if (vm)
 		xe_vm_unlock(vm);
+	if (err)
+		goto err_post_alloc;
 
 	return q;
+
+err_post_alloc:
+	__xe_exec_queue_free(q);
+	return ERR_PTR(err);
 }
 
 struct xe_exec_queue *xe_exec_queue_create_class(struct xe_device *xe, struct xe_gt *gt,
@@ -179,10 +202,7 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
 		xe_lrc_finish(q->lrc + i);
 	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm))
 		xe_device_mem_access_put(gt_to_xe(q->gt));
-	if (q->vm)
-		xe_vm_put(q->vm);
-
-	kfree(q);
+	__xe_exec_queue_free(q);
 }
 
 void xe_exec_queue_assign_name(struct xe_exec_queue *q, u32 instance)

From 6ae24344e2e3e12e06f7b382af4bba2fd417b2ff Mon Sep 17 00:00:00 2001
From: Brian Welty <brian.welty@intel.com>
Date: Wed, 10 Jan 2024 09:32:50 -0800
Subject: [PATCH 064/200] drm/xe: Add exec_queue.sched_props.job_timeout_ms

The purpose here is to allow to optimize exec_queue_set_job_timeout()
in follow-on patch.  Currently it does q->ops->set_job_timeout(...).
But we'd like to apply exec_queue_user_extensions much earlier and
q->ops cannot be called before __xe_exec_queue_init().

It will be much more efficient to instead only have to set
q->sched_props.job_timeout_ms when applying user extensions. That value
will then be used during q->ops->init().

Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c       | 2 ++
 drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
 drivers/gpu/drm/xe/xe_guc_submit.c       | 2 +-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index cd2b9a8e51f561..4ff7e50112ca95 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -65,6 +65,8 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 	q->sched_props.timeslice_us = hwe->eclass->sched_props.timeslice_us;
 	q->sched_props.preempt_timeout_us =
 				hwe->eclass->sched_props.preempt_timeout_us;
+	q->sched_props.job_timeout_ms =
+				hwe->eclass->sched_props.job_timeout_ms;
 	if (q->flags & EXEC_QUEUE_FLAG_KERNEL &&
 	    q->flags & EXEC_QUEUE_FLAG_HIGH_PRIORITY)
 		q->sched_props.priority = XE_EXEC_QUEUE_PRIORITY_KERNEL;
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 8d4b7feb8c306b..6041892d894e23 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -142,6 +142,8 @@ struct xe_exec_queue {
 		u32 timeslice_us;
 		/** @preempt_timeout_us: preemption timeout in micro-seconds */
 		u32 preempt_timeout_us;
+		/** @job_timeout_ms: job timeout in milliseconds */
+		u32 job_timeout_ms;
 		/** @priority: priority of this exec queue */
 		enum xe_exec_queue_priority priority;
 	} sched_props;
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 54ffcfcdd41f9c..392cbde6295732 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1218,7 +1218,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 	init_waitqueue_head(&ge->suspend_wait);
 
 	timeout = (q->vm && xe_vm_in_lr_mode(q->vm)) ? MAX_SCHEDULE_TIMEOUT :
-		  q->hwe->eclass->sched_props.job_timeout_ms;
+		  q->sched_props.job_timeout_ms;
 	err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops,
 			    get_submit_wq(guc),
 			    q->lrc[0].ring.size / MAX_JOB_SIZE_BYTES, 64,

From 25ce7c5063b335808e1753ced5f0069981073f17 Mon Sep 17 00:00:00 2001
From: Brian Welty <brian.welty@intel.com>
Date: Wed, 10 Jan 2024 09:32:51 -0800
Subject: [PATCH 065/200] drm/xe: Finish refactoring of exec_queue_create

Setting of exec_queue user extensions is moved from the end of the ioctl
function earlier, into __xe_exec_queue_alloc().
This fixes bug in that the USM attributes for access counters were being
applied too late, and effectively were ignored.

However, in order to apply user extensions this early, we can no longer
call q->ops functions.  Instead, make it more efficient. The user extension
functions can simply update the q->sched_props values and they will be
applied by the backend during q->ops->init().

v2: minor changes for readability (Matt)

Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c | 74 ++++++++++++++++++++----------
 drivers/gpu/drm/xe/xe_exec_queue.h |  3 +-
 drivers/gpu/drm/xe/xe_gsc.c        |  2 +-
 drivers/gpu/drm/xe/xe_gt.c         |  4 +-
 drivers/gpu/drm/xe/xe_migrate.c    |  2 +-
 5 files changed, 57 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 4ff7e50112ca95..c0b7434e78f114 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -30,14 +30,18 @@ enum xe_exec_queue_sched_prop {
 	XE_EXEC_QUEUE_SCHED_PROP_MAX = 3,
 };
 
+static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
+				      u64 extensions, int ext_number, bool create);
+
 static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 						   struct xe_vm *vm,
 						   u32 logical_mask,
 						   u16 width, struct xe_hw_engine *hwe,
-						   u32 flags)
+						   u32 flags, u64 extensions)
 {
 	struct xe_exec_queue *q;
 	struct xe_gt *gt = hwe->gt;
+	int err;
 
 	/* only kernel queues can be permanent */
 	XE_WARN_ON((flags & EXEC_QUEUE_FLAG_PERMANENT) && !(flags & EXEC_QUEUE_FLAG_KERNEL));
@@ -50,8 +54,6 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 	q->flags = flags;
 	q->hwe = hwe;
 	q->gt = gt;
-	if (vm)
-		q->vm = xe_vm_get(vm);
 	q->class = hwe->class;
 	q->width = width;
 	q->logical_mask = logical_mask;
@@ -73,6 +75,21 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 	else
 		q->sched_props.priority = XE_EXEC_QUEUE_PRIORITY_NORMAL;
 
+	if (extensions) {
+		/*
+		 * may set q->usm, must come before xe_lrc_init(),
+		 * may overwrite q->sched_props, must come before q->ops->init()
+		 */
+		err = exec_queue_user_extensions(xe, q, extensions, 0, true);
+		if (err) {
+			kfree(q);
+			return ERR_PTR(err);
+		}
+	}
+
+	if (vm)
+		q->vm = xe_vm_get(vm);
+
 	if (xe_exec_queue_is_parallel(q)) {
 		q->parallel.composite_fence_ctx = dma_fence_context_alloc(1);
 		q->parallel.composite_fence_seqno = XE_FENCE_INITIAL_SEQNO;
@@ -128,12 +145,14 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q)
 
 struct xe_exec_queue *xe_exec_queue_create(struct xe_device *xe, struct xe_vm *vm,
 					   u32 logical_mask, u16 width,
-					   struct xe_hw_engine *hwe, u32 flags)
+					   struct xe_hw_engine *hwe, u32 flags,
+					   u64 extensions)
 {
 	struct xe_exec_queue *q;
 	int err;
 
-	q = __xe_exec_queue_alloc(xe, vm, logical_mask, width, hwe, flags);
+	q = __xe_exec_queue_alloc(xe, vm, logical_mask, width, hwe, flags,
+				  extensions);
 	if (IS_ERR(q))
 		return q;
 
@@ -178,7 +197,7 @@ struct xe_exec_queue *xe_exec_queue_create_class(struct xe_device *xe, struct xe
 	if (!logical_mask)
 		return ERR_PTR(-ENODEV);
 
-	return xe_exec_queue_create(xe, vm, logical_mask, 1, hwe0, flags);
+	return xe_exec_queue_create(xe, vm, logical_mask, 1, hwe0, flags, 0);
 }
 
 void xe_exec_queue_destroy(struct kref *ref)
@@ -262,7 +281,11 @@ static int exec_queue_set_priority(struct xe_device *xe, struct xe_exec_queue *q
 	if (XE_IOCTL_DBG(xe, value > xe_exec_queue_device_get_max_priority(xe)))
 		return -EPERM;
 
-	return q->ops->set_priority(q, value);
+	if (!create)
+		return q->ops->set_priority(q, value);
+
+	q->sched_props.priority = value;
+	return 0;
 }
 
 static bool xe_exec_queue_enforce_schedule_limit(void)
@@ -329,7 +352,11 @@ static int exec_queue_set_timeslice(struct xe_device *xe, struct xe_exec_queue *
 	    !xe_hw_engine_timeout_in_range(value, min, max))
 		return -EINVAL;
 
-	return q->ops->set_timeslice(q, value);
+	if (!create)
+		return q->ops->set_timeslice(q, value);
+
+	q->sched_props.timeslice_us = value;
+	return 0;
 }
 
 static int exec_queue_set_preemption_timeout(struct xe_device *xe,
@@ -345,7 +372,11 @@ static int exec_queue_set_preemption_timeout(struct xe_device *xe,
 	    !xe_hw_engine_timeout_in_range(value, min, max))
 		return -EINVAL;
 
-	return q->ops->set_preempt_timeout(q, value);
+	if (!create)
+		return q->ops->set_preempt_timeout(q, value);
+
+	q->sched_props.preempt_timeout_us = value;
+	return 0;
 }
 
 static int exec_queue_set_persistence(struct xe_device *xe, struct xe_exec_queue *q,
@@ -380,7 +411,9 @@ static int exec_queue_set_job_timeout(struct xe_device *xe, struct xe_exec_queue
 	    !xe_hw_engine_timeout_in_range(value, min, max))
 		return -EINVAL;
 
-	return q->ops->set_job_timeout(q, value);
+	q->sched_props.job_timeout_ms = value;
+
+	return 0;
 }
 
 static int exec_queue_set_acc_trigger(struct xe_device *xe, struct xe_exec_queue *q,
@@ -655,6 +688,7 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 	if (eci[0].engine_class == DRM_XE_ENGINE_CLASS_VM_BIND) {
 		for_each_gt(gt, xe, id) {
 			struct xe_exec_queue *new;
+			u32 flags;
 
 			if (xe_gt_is_media_type(gt))
 				continue;
@@ -673,14 +707,13 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 			/* The migration vm doesn't hold rpm ref */
 			xe_device_mem_access_get(xe);
 
+			flags = EXEC_QUEUE_FLAG_PERSISTENT | EXEC_QUEUE_FLAG_VM |
+				(id ? EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD : 0);
+
 			migrate_vm = xe_migrate_get_vm(gt_to_tile(gt)->migrate);
 			new = xe_exec_queue_create(xe, migrate_vm, logical_mask,
-						   args->width, hwe,
-						   EXEC_QUEUE_FLAG_PERSISTENT |
-						   EXEC_QUEUE_FLAG_VM |
-						   (id ?
-						    EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD :
-						    0));
+						   args->width, hwe, flags,
+						   args->extensions);
 
 			xe_device_mem_access_put(xe); /* now held by engine */
 
@@ -728,7 +761,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 		q = xe_exec_queue_create(xe, vm, logical_mask,
 					 args->width, hwe,
 					 xe_vm_in_lr_mode(vm) ? 0 :
-					 EXEC_QUEUE_FLAG_PERSISTENT);
+					 EXEC_QUEUE_FLAG_PERSISTENT,
+					 args->extensions);
 		up_read(&vm->lock);
 		xe_vm_put(vm);
 		if (IS_ERR(q))
@@ -744,12 +778,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 		}
 	}
 
-	if (args->extensions) {
-		err = exec_queue_user_extensions(xe, q, args->extensions, 0, true);
-		if (XE_IOCTL_DBG(xe, err))
-			goto kill_exec_queue;
-	}
-
 	q->persistent.xef = xef;
 
 	mutex_lock(&xef->exec_queue.lock);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
index d959cc4a1a82ab..02ce8d204622f7 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue.h
@@ -16,7 +16,8 @@ struct xe_file;
 
 struct xe_exec_queue *xe_exec_queue_create(struct xe_device *xe, struct xe_vm *vm,
 					   u32 logical_mask, u16 width,
-					   struct xe_hw_engine *hw_engine, u32 flags);
+					   struct xe_hw_engine *hw_engine, u32 flags,
+					   u64 extensions);
 struct xe_exec_queue *xe_exec_queue_create_class(struct xe_device *xe, struct xe_gt *gt,
 						 struct xe_vm *vm,
 						 enum xe_engine_class class, u32 flags);
diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index a8a895cf4b4488..5b84fc9ab8adcd 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -356,7 +356,7 @@ int xe_gsc_init_post_hwconfig(struct xe_gsc *gsc)
 	q = xe_exec_queue_create(xe, NULL,
 				 BIT(hwe->logical_instance), 1, hwe,
 				 EXEC_QUEUE_FLAG_KERNEL |
-				 EXEC_QUEUE_FLAG_PERMANENT);
+				 EXEC_QUEUE_FLAG_PERMANENT, 0);
 	if (IS_ERR(q)) {
 		xe_gt_err(gt, "Failed to create queue for GSC submission\n");
 		err = PTR_ERR(q);
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 3af2adec129561..0f2258dc4a0053 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -235,7 +235,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
 			return -ENOMEM;
 
 		q = xe_exec_queue_create(xe, NULL, BIT(hwe->logical_instance), 1,
-					 hwe, EXEC_QUEUE_FLAG_KERNEL);
+					 hwe, EXEC_QUEUE_FLAG_KERNEL, 0);
 		if (IS_ERR(q)) {
 			err = PTR_ERR(q);
 			xe_gt_err(gt, "hwe %s: xe_exec_queue_create failed (%pe)\n",
@@ -252,7 +252,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
 		}
 
 		nop_q = xe_exec_queue_create(xe, NULL, BIT(hwe->logical_instance),
-					     1, hwe, EXEC_QUEUE_FLAG_KERNEL);
+					     1, hwe, EXEC_QUEUE_FLAG_KERNEL, 0);
 		if (IS_ERR(nop_q)) {
 			err = PTR_ERR(nop_q);
 			xe_gt_err(gt, "hwe %s: nop xe_exec_queue_create failed (%pe)\n",
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 02fca8f9adc284..83fa92d2a35a14 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -345,7 +345,7 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
 		m->q = xe_exec_queue_create(xe, vm, logical_mask, 1, hwe,
 					    EXEC_QUEUE_FLAG_KERNEL |
 					    EXEC_QUEUE_FLAG_PERMANENT |
-					    EXEC_QUEUE_FLAG_HIGH_PRIORITY);
+					    EXEC_QUEUE_FLAG_HIGH_PRIORITY, 0);
 	} else {
 		m->q = xe_exec_queue_create_class(xe, primary_gt, vm,
 						  XE_ENGINE_CLASS_COPY,

From 801e8c7ed6705bd34508f52376cdbb3fc374c921 Mon Sep 17 00:00:00 2001
From: Brian Welty <brian.welty@intel.com>
Date: Wed, 10 Jan 2024 09:32:52 -0800
Subject: [PATCH 066/200] drm/xe: Remove set_job_timeout_ms() from
 exec_queue_ops

This function is no longer used as the job_timeout is now
updated prior to calling queue_ops.init().

Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 --
 drivers/gpu/drm/xe/xe_execlist.c         |  8 --------
 drivers/gpu/drm/xe/xe_guc_submit.c       | 16 ----------------
 3 files changed, 26 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 6041892d894e23..e7f84dee5275ff 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -200,8 +200,6 @@ struct xe_exec_queue_ops {
 	int (*set_timeslice)(struct xe_exec_queue *q, u32 timeslice_us);
 	/** @set_preempt_timeout: Set preemption timeout for exec queue */
 	int (*set_preempt_timeout)(struct xe_exec_queue *q, u32 preempt_timeout_us);
-	/** @set_job_timeout: Set job timeout for exec queue */
-	int (*set_job_timeout)(struct xe_exec_queue *q, u32 job_timeout_ms);
 	/**
 	 * @suspend: Suspend exec queue from executing, allowed to be called
 	 * multiple times in a row before resume with the caveat that
diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
index 96b5224eb4787d..58dfe6a78ffebe 100644
--- a/drivers/gpu/drm/xe/xe_execlist.c
+++ b/drivers/gpu/drm/xe/xe_execlist.c
@@ -418,13 +418,6 @@ static int execlist_exec_queue_set_preempt_timeout(struct xe_exec_queue *q,
 	return 0;
 }
 
-static int execlist_exec_queue_set_job_timeout(struct xe_exec_queue *q,
-					       u32 job_timeout_ms)
-{
-	/* NIY */
-	return 0;
-}
-
 static int execlist_exec_queue_suspend(struct xe_exec_queue *q)
 {
 	/* NIY */
@@ -455,7 +448,6 @@ static const struct xe_exec_queue_ops execlist_exec_queue_ops = {
 	.set_priority = execlist_exec_queue_set_priority,
 	.set_timeslice = execlist_exec_queue_set_timeslice,
 	.set_preempt_timeout = execlist_exec_queue_set_preempt_timeout,
-	.set_job_timeout = execlist_exec_queue_set_job_timeout,
 	.suspend = execlist_exec_queue_suspend,
 	.suspend_wait = execlist_exec_queue_suspend_wait,
 	.resume = execlist_exec_queue_resume,
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 392cbde6295732..7c29b8333c7192 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1350,21 +1350,6 @@ static int guc_exec_queue_set_preempt_timeout(struct xe_exec_queue *q,
 	return 0;
 }
 
-static int guc_exec_queue_set_job_timeout(struct xe_exec_queue *q, u32 job_timeout_ms)
-{
-	struct xe_gpu_scheduler *sched = &q->guc->sched;
-	struct xe_guc *guc = exec_queue_to_guc(q);
-	struct xe_device *xe = guc_to_xe(guc);
-
-	xe_assert(xe, !exec_queue_registered(q));
-	xe_assert(xe, !exec_queue_banned(q));
-	xe_assert(xe, !exec_queue_killed(q));
-
-	sched->base.timeout = job_timeout_ms;
-
-	return 0;
-}
-
 static int guc_exec_queue_suspend(struct xe_exec_queue *q)
 {
 	struct xe_sched_msg *msg = q->guc->static_msgs + STATIC_MSG_SUSPEND;
@@ -1415,7 +1400,6 @@ static const struct xe_exec_queue_ops guc_exec_queue_ops = {
 	.set_priority = guc_exec_queue_set_priority,
 	.set_timeslice = guc_exec_queue_set_timeslice,
 	.set_preempt_timeout = guc_exec_queue_set_preempt_timeout,
-	.set_job_timeout = guc_exec_queue_set_job_timeout,
 	.suspend = guc_exec_queue_suspend,
 	.suspend_wait = guc_exec_queue_suspend_wait,
 	.resume = guc_exec_queue_resume,

From 86f41f4333e31b62d143c5e38c0c58c85193c4c8 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Tue, 9 Jan 2024 17:24:36 -0800
Subject: [PATCH 067/200] drm/xe: Add build on bug to assert page fault queue
 works

If PF_QUEUE_NUM_DW % PF_MSG_LEN_DW != 0 then the page fault queue logic
does not work when wrapping occurs. Add a build bug on to assert
PF_QUEUE_NUM_DW % PF_MSG_LEN_DW == 0 to enforce this restriction and
document the code.

v2:
- s/NUM_PF_QUEUE/PF_QUEUE_NUM_DW (Brian)

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_pagefault.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 4489aadc7a525f..0a61e4413679ec 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -328,6 +328,11 @@ int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	u32 asid;
 	bool full;
 
+	/*
+	 * The below logic doesn't work unless PF_QUEUE_NUM_DW % PF_MSG_LEN_DW == 0
+	 */
+	BUILD_BUG_ON(PF_QUEUE_NUM_DW % PF_MSG_LEN_DW);
+
 	if (unlikely(len != PF_MSG_LEN_DW))
 		return -EPROTO;
 

From 1fd77ceaf0d843af2b7fde83e447b0738d0404cb Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Tue, 9 Jan 2024 17:24:37 -0800
Subject: [PATCH 068/200] drm/xe: Invert page fault queue head / tail

Convention for queues in Linux is the producer moves the head and
consumer moves the tail. Fix the page fault queue to conform to this
convention.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_pagefault.c | 14 +++++++-------
 drivers/gpu/drm/xe/xe_gt_types.h     | 12 ++++++------
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 0a61e4413679ec..3ca715e2ec19ec 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -282,9 +282,9 @@ static bool get_pagefault(struct pf_queue *pf_queue, struct pagefault *pf)
 	bool ret = false;
 
 	spin_lock_irq(&pf_queue->lock);
-	if (pf_queue->head != pf_queue->tail) {
+	if (pf_queue->tail != pf_queue->head) {
 		desc = (const struct xe_guc_pagefault_desc *)
-			(pf_queue->data + pf_queue->head);
+			(pf_queue->data + pf_queue->tail);
 
 		pf->fault_level = FIELD_GET(PFD_FAULT_LEVEL, desc->dw0);
 		pf->trva_fault = FIELD_GET(XE2_PFD_TRVA_FAULT, desc->dw0);
@@ -302,7 +302,7 @@ static bool get_pagefault(struct pf_queue *pf_queue, struct pagefault *pf)
 		pf->page_addr |= FIELD_GET(PFD_VIRTUAL_ADDR_LO, desc->dw2) <<
 			PFD_VIRTUAL_ADDR_LO_SHIFT;
 
-		pf_queue->head = (pf_queue->head + PF_MSG_LEN_DW) %
+		pf_queue->tail = (pf_queue->tail + PF_MSG_LEN_DW) %
 			PF_QUEUE_NUM_DW;
 		ret = true;
 	}
@@ -315,7 +315,7 @@ static bool pf_queue_full(struct pf_queue *pf_queue)
 {
 	lockdep_assert_held(&pf_queue->lock);
 
-	return CIRC_SPACE(pf_queue->tail, pf_queue->head, PF_QUEUE_NUM_DW) <=
+	return CIRC_SPACE(pf_queue->head, pf_queue->tail, PF_QUEUE_NUM_DW) <=
 		PF_MSG_LEN_DW;
 }
 
@@ -342,8 +342,8 @@ int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	spin_lock_irqsave(&pf_queue->lock, flags);
 	full = pf_queue_full(pf_queue);
 	if (!full) {
-		memcpy(pf_queue->data + pf_queue->tail, msg, len * sizeof(u32));
-		pf_queue->tail = (pf_queue->tail + len) % PF_QUEUE_NUM_DW;
+		memcpy(pf_queue->data + pf_queue->head, msg, len * sizeof(u32));
+		pf_queue->head = (pf_queue->head + len) % PF_QUEUE_NUM_DW;
 		queue_work(gt->usm.pf_wq, &pf_queue->worker);
 	} else {
 		drm_warn(&xe->drm, "PF Queue full, shouldn't be possible");
@@ -389,7 +389,7 @@ static void pf_queue_work_func(struct work_struct *w)
 		send_pagefault_reply(&gt->uc.guc, &reply);
 
 		if (time_after(jiffies, threshold) &&
-		    pf_queue->head != pf_queue->tail) {
+		    pf_queue->tail != pf_queue->head) {
 			queue_work(gt->usm.pf_wq, w);
 			break;
 		}
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index f7468466047593..b15503dabba4b4 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -225,16 +225,16 @@ struct xe_gt {
 #define PF_QUEUE_NUM_DW	128
 			/** @data: data in the page fault queue */
 			u32 data[PF_QUEUE_NUM_DW];
-			/**
-			 * @head: head pointer in DWs for page fault queue,
-			 * moved by worker which processes faults.
-			 */
-			u16 head;
 			/**
 			 * @tail: tail pointer in DWs for page fault queue,
-			 * moved by G2H handler.
+			 * moved by worker which processes faults (consumer).
 			 */
 			u16 tail;
+			/**
+			 * @head: head pointer in DWs for page fault queue,
+			 * moved by G2H handler (producer).
+			 */
+			u16 head;
 			/** @lock: protects page fault queue */
 			spinlock_t lock;
 			/** @worker: to process page faults */

From d0ca70c0339838198a704b15b7e6c3318f887536 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Tue, 9 Jan 2024 17:24:38 -0800
Subject: [PATCH 069/200] drm/xe: Add build on bug to assert access counter
 queue works

If ACC_QUEUE_NUM_DW % ACC_MSG_LEN_DW != 0 then the access counter queue
logic does not work when wrapping occurs. Add a build bug on to assert
ACC_QUEUE_NUM_DW % ACC_MSG_LEN_DW == 0 to enforce this restriction and
document the code.

v2:
- s/NUM_ACC_QUEUE/ACC_QUEUE_NUM_DW (Brian)

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_pagefault.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 3ca715e2ec19ec..13183088401f65 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -629,6 +629,11 @@ int xe_guc_access_counter_notify_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	u32 asid;
 	bool full;
 
+	/*
+	 * The below logic doesn't work unless ACC_QUEUE_NUM_DW % ACC_MSG_LEN_DW == 0
+	 */
+	BUILD_BUG_ON(ACC_QUEUE_NUM_DW % ACC_MSG_LEN_DW);
+
 	if (unlikely(len != ACC_MSG_LEN_DW))
 		return -EPROTO;
 

From 7c0f97cb62dcc57463e3c66301330648cbf9b24a Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Tue, 9 Jan 2024 17:24:39 -0800
Subject: [PATCH 070/200] drm/xe: Invert access counter queue head / tail

Convention for queues in Linux is the producer moves the head and
consumer moves the tail. Fix the access counter queue to conform to
this convention.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_pagefault.c | 14 +++++++-------
 drivers/gpu/drm/xe/xe_gt_types.h     | 13 +++++++------
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 13183088401f65..5c2603075af963 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -564,9 +564,9 @@ static bool get_acc(struct acc_queue *acc_queue, struct acc *acc)
 	bool ret = false;
 
 	spin_lock(&acc_queue->lock);
-	if (acc_queue->head != acc_queue->tail) {
+	if (acc_queue->tail != acc_queue->head) {
 		desc = (const struct xe_guc_acc_desc *)
-			(acc_queue->data + acc_queue->head);
+			(acc_queue->data + acc_queue->tail);
 
 		acc->granularity = FIELD_GET(ACC_GRANULARITY, desc->dw2);
 		acc->sub_granularity = FIELD_GET(ACC_SUBG_HI, desc->dw1) << 31 |
@@ -579,7 +579,7 @@ static bool get_acc(struct acc_queue *acc_queue, struct acc *acc)
 		acc->va_range_base = make_u64(desc->dw3 & ACC_VIRTUAL_ADDR_RANGE_HI,
 					      desc->dw2 & ACC_VIRTUAL_ADDR_RANGE_LO);
 
-		acc_queue->head = (acc_queue->head + ACC_MSG_LEN_DW) %
+		acc_queue->tail = (acc_queue->tail + ACC_MSG_LEN_DW) %
 				  ACC_QUEUE_NUM_DW;
 		ret = true;
 	}
@@ -607,7 +607,7 @@ static void acc_queue_work_func(struct work_struct *w)
 		}
 
 		if (time_after(jiffies, threshold) &&
-		    acc_queue->head != acc_queue->tail) {
+		    acc_queue->tail != acc_queue->head) {
 			queue_work(gt->usm.acc_wq, w);
 			break;
 		}
@@ -618,7 +618,7 @@ static bool acc_queue_full(struct acc_queue *acc_queue)
 {
 	lockdep_assert_held(&acc_queue->lock);
 
-	return CIRC_SPACE(acc_queue->tail, acc_queue->head, ACC_QUEUE_NUM_DW) <=
+	return CIRC_SPACE(acc_queue->head, acc_queue->tail, ACC_QUEUE_NUM_DW) <=
 		ACC_MSG_LEN_DW;
 }
 
@@ -643,9 +643,9 @@ int xe_guc_access_counter_notify_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	spin_lock(&acc_queue->lock);
 	full = acc_queue_full(acc_queue);
 	if (!full) {
-		memcpy(acc_queue->data + acc_queue->tail, msg,
+		memcpy(acc_queue->data + acc_queue->head, msg,
 		       len * sizeof(u32));
-		acc_queue->tail = (acc_queue->tail + len) % ACC_QUEUE_NUM_DW;
+		acc_queue->head = (acc_queue->head + len) % ACC_QUEUE_NUM_DW;
 		queue_work(gt->usm.acc_wq, &acc_queue->worker);
 	} else {
 		drm_warn(&gt_to_xe(gt)->drm, "ACC Queue full, dropping ACC");
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index b15503dabba4b4..047cde6cda107e 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -252,15 +252,16 @@ struct xe_gt {
 			/** @data: data in the page fault queue */
 			u32 data[ACC_QUEUE_NUM_DW];
 			/**
-			 * @head: head pointer in DWs for page fault queue,
-			 * moved by worker which processes faults.
+			 * @tail: tail pointer in DWs for access counter queue,
+			 * moved by worker which processes counters
+			 * (consumer).
 			 */
-			u16 head;
+			u16 tail;
 			/**
-			 * @tail: tail pointer in DWs for page fault queue,
-			 * moved by G2H handler.
+			 * @head: head pointer in DWs for access counter queue,
+			 * moved by G2H handler (producer).
 			 */
-			u16 tail;
+			u16 head;
 			/** @lock: protects page fault queue */
 			spinlock_t lock;
 			/** @worker: to process access counters */

From c10da95afa68060e13c5f920d96671943a7e54d9 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter@linaro.org>
Date: Fri, 5 Jan 2024 15:22:23 +0300
Subject: [PATCH 071/200] drm/xe/device: clean up on error in probe()

This error path should clean up before returning.

Smatch detected this bug:
  drivers/gpu/drm/xe/xe_device.c:487 xe_device_probe() warn: missing unwind goto?

Fixes: 4cb12b71923b ("drm/xe/xe2: Determine bios enablement for flat ccs on igfx")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 004e65544e8d82..a94c0b27f04e8f 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -495,7 +495,7 @@ int xe_device_probe(struct xe_device *xe)
 
 	err = xe_device_set_has_flat_ccs(xe);
 	if (err)
-		return err;
+		goto err_irq_shutdown;
 
 	err = xe_mmio_probe_vram(xe);
 	if (err)

From 88ec23528b32ddb9ce2e8492f2629b0056353697 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter@linaro.org>
Date: Fri, 5 Jan 2024 15:20:35 +0300
Subject: [PATCH 072/200] drm/xe/selftests: Fix an error pointer dereference
 bug

Check if "bo" is an error pointer before calling xe_bo_lock() on it.

Fixes: d6abc18d6693 ("drm/xe/xe2: Modify xe_bo_test for system memory")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_bo.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
index 412b2e7ce40cb3..3436fd9cf2b273 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo.c
@@ -125,14 +125,13 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
 
 	bo = xe_bo_create_user(xe, NULL, NULL, SZ_1M, DRM_XE_GEM_CPU_CACHING_WC,
 			       ttm_bo_type_device, bo_flags);
-
-	xe_bo_lock(bo, false);
-
 	if (IS_ERR(bo)) {
 		KUNIT_FAIL(test, "Failed to create bo.\n");
 		return;
 	}
 
+	xe_bo_lock(bo, false);
+
 	kunit_info(test, "Verifying that CCS data is cleared on creation.\n");
 	ret = ccs_test_migrate(tile, bo, false, 0ULL, 0xdeadbeefdeadbeefULL,
 			       test);

From cf46019e8550a810cc023af7aa020ba43103b44d Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter@linaro.org>
Date: Fri, 5 Jan 2024 15:20:22 +0300
Subject: [PATCH 073/200] drm/xe: unlock on error path in
 xe_vm_add_compute_exec_queue()

Drop the "&vm->lock" before returning.

Fixes: 24f947d58fe5 ("drm/xe: Use DRM GPUVM helpers for external- and evicted objects")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 127842656a2395..a7e7a0b240991a 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -335,13 +335,13 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 	down_write(&vm->lock);
 	err = drm_gpuvm_exec_lock(&vm_exec);
 	if (err)
-		return err;
+		goto out_up_write;
 
 	pfence = xe_preempt_fence_create(q, q->compute.context,
 					 ++q->compute.seqno);
 	if (!pfence) {
 		err = -ENOMEM;
-		goto out_unlock;
+		goto out_fini;
 	}
 
 	list_add(&q->compute.link, &vm->preempt.exec_queues);
@@ -364,8 +364,9 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 
 	up_read(&vm->userptr.notifier_lock);
 
-out_unlock:
+out_fini:
 	drm_exec_fini(exec);
+out_up_write:
 	up_write(&vm->lock);
 
 	return err;

From ef51d7542d143f3fd9a48d4e2c307563661668aa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Wed, 10 Jan 2024 17:34:15 +0100
Subject: [PATCH 074/200] drm/xe/migrate: Fix CCS copy for small VRAM copy
 chunks
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Since the migrate code is using the identity map for addressing VRAM,
copy chunks may become as small as 64K if the VRAM resource is fragmented.

However, a chunk size smaller that 1MiB may lead to the *next* chunk's
offset into the CCS metadata backup memory may not be page-aligned, and
the XY_CTRL_SURF_COPY_BLT command can't handle that, and even if it could,
the current code doesn't handle the offset calculaton correctly.

To fix this, make sure we align the size of VRAM copy chunks to 1MiB. If
the remaining data to copy is smaller than that, that's not a problem,
so use the remaining size. If the VRAM copy cunk becomes fragmented due
to the size alignment restriction, don't use the identity map, but instead
emit PTEs into the page-table like we do for system memory.

v2:
- Rebase
v3:
- Future proof somewhat by taking into account the real data size to
  flat CCS metadata size ratio. (Matt Roper)
- Invert a couple of if-statements for better readability.
- Fix support for 4K-granularity VRAM sizes. (Tested on DG1).
v4:
- Fix up code comments
- Fix debug printout format typo.
v5:
- Add a Fixes: tag.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Fixes: e89b384cde62 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240110163415.524165-1-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/tests/xe_migrate.c |   2 +-
 drivers/gpu/drm/xe/xe_migrate.c       | 128 ++++++++++++++++----------
 2 files changed, 80 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index 7a32faa2f68880..a6523df0f1d39f 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -331,7 +331,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
 		xe_res_first_sg(xe_bo_sg(pt), 0, pt->size, &src_it);
 
 	emit_pte(m, bb, NUM_KERNEL_PDE - 1, xe_bo_is_vram(pt), false,
-		 &src_it, XE_PAGE_SIZE, pt);
+		 &src_it, XE_PAGE_SIZE, pt->ttm.resource);
 
 	run_sanity_job(m, xe, bb, bb->len, "Writing PTE for our fake PT", test);
 
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 83fa92d2a35a14..88a32b272dda6c 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -62,6 +62,8 @@ struct xe_migrate {
 	 * out of the pt_bo.
 	 */
 	struct drm_suballoc_manager vm_update_sa;
+	/** @min_chunk_size: For dgfx, Minimum chunk size */
+	u64 min_chunk_size;
 };
 
 #define MAX_PREEMPTDISABLE_TRANSFER SZ_8M /* Around 1ms. */
@@ -363,6 +365,19 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
 	if (err)
 		return ERR_PTR(err);
 
+	if (IS_DGFX(xe)) {
+		if (xe_device_has_flat_ccs(xe))
+			/* min chunk size corresponds to 4K of CCS Metadata */
+			m->min_chunk_size = SZ_4K * SZ_64K /
+				xe_device_ccs_bytes(xe, SZ_64K);
+		else
+			/* Somewhat arbitrary to avoid a huge amount of blits */
+			m->min_chunk_size = SZ_64K;
+		m->min_chunk_size = roundup_pow_of_two(m->min_chunk_size);
+		drm_dbg(&xe->drm, "Migrate min chunk size is 0x%08llx\n",
+			(unsigned long long)m->min_chunk_size);
+	}
+
 	return m;
 }
 
@@ -374,16 +389,35 @@ static u64 max_mem_transfer_per_pass(struct xe_device *xe)
 	return MAX_PREEMPTDISABLE_TRANSFER;
 }
 
-static u64 xe_migrate_res_sizes(struct xe_device *xe, struct xe_res_cursor *cur)
+static u64 xe_migrate_res_sizes(struct xe_migrate *m, struct xe_res_cursor *cur)
 {
-	/*
-	 * For VRAM we use identity mapped pages so we are limited to current
-	 * cursor size. For system we program the pages ourselves so we have no
-	 * such limitation.
-	 */
-	return min_t(u64, max_mem_transfer_per_pass(xe),
-		     mem_type_is_vram(cur->mem_type) ? cur->size :
-		     cur->remaining);
+	struct xe_device *xe = tile_to_xe(m->tile);
+	u64 size = min_t(u64, max_mem_transfer_per_pass(xe), cur->remaining);
+
+	if (mem_type_is_vram(cur->mem_type)) {
+		/*
+		 * VRAM we want to blit in chunks with sizes aligned to
+		 * min_chunk_size in order for the offset to CCS metadata to be
+		 * page-aligned. If it's the last chunk it may be smaller.
+		 *
+		 * Another constraint is that we need to limit the blit to
+		 * the VRAM block size, unless size is smaller than
+		 * min_chunk_size.
+		 */
+		u64 chunk = max_t(u64, cur->size, m->min_chunk_size);
+
+		size = min_t(u64, size, chunk);
+		if (size > m->min_chunk_size)
+			size = round_down(size, m->min_chunk_size);
+	}
+
+	return size;
+}
+
+static bool xe_migrate_allow_identity(u64 size, const struct xe_res_cursor *cur)
+{
+	/* If the chunk is not fragmented, allow identity map. */
+	return cur->size >= size;
 }
 
 static u32 pte_update_size(struct xe_migrate *m,
@@ -396,7 +430,12 @@ static u32 pte_update_size(struct xe_migrate *m,
 	u32 cmds = 0;
 
 	*L0_pt = pt_ofs;
-	if (!is_vram) {
+	if (is_vram && xe_migrate_allow_identity(*L0, cur)) {
+		/* Offset into identity map. */
+		*L0_ofs = xe_migrate_vram_ofs(tile_to_xe(m->tile),
+					      cur->start + vram_region_gpu_offset(res));
+		cmds += cmd_size;
+	} else {
 		/* Clip L0 to available size */
 		u64 size = min(*L0, (u64)avail_pts * SZ_2M);
 		u64 num_4k_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
@@ -412,11 +451,6 @@ static u32 pte_update_size(struct xe_migrate *m,
 
 		/* Each chunk has a single blit command */
 		cmds += cmd_size;
-	} else {
-		/* Offset into identity map. */
-		*L0_ofs = xe_migrate_vram_ofs(tile_to_xe(m->tile),
-					      cur->start + vram_region_gpu_offset(res));
-		cmds += cmd_size;
 	}
 
 	return cmds;
@@ -426,10 +460,10 @@ static void emit_pte(struct xe_migrate *m,
 		     struct xe_bb *bb, u32 at_pt,
 		     bool is_vram, bool is_comp_pte,
 		     struct xe_res_cursor *cur,
-		     u32 size, struct xe_bo *bo)
+		     u32 size, struct ttm_resource *res)
 {
 	struct xe_device *xe = tile_to_xe(m->tile);
-
+	struct xe_vm *vm = m->q->vm;
 	u16 pat_index;
 	u32 ptes;
 	u64 ofs = at_pt * XE_PAGE_SIZE;
@@ -442,13 +476,6 @@ static void emit_pte(struct xe_migrate *m,
 	else
 		pat_index = xe->pat.idx[XE_CACHE_WB];
 
-	/*
-	 * FIXME: Emitting VRAM PTEs to L0 PTs is forbidden. Currently
-	 * we're only emitting VRAM PTEs during sanity tests, so when
-	 * that's moved to a Kunit test, we should condition VRAM PTEs
-	 * on running tests.
-	 */
-
 	ptes = DIV_ROUND_UP(size, XE_PAGE_SIZE);
 
 	while (ptes) {
@@ -468,20 +495,22 @@ static void emit_pte(struct xe_migrate *m,
 
 			addr = xe_res_dma(cur) & PAGE_MASK;
 			if (is_vram) {
-				/* Is this a 64K PTE entry? */
-				if ((m->q->vm->flags & XE_VM_FLAG_64K) &&
-				    !(cur_ofs & (16 * 8 - 1))) {
-					xe_tile_assert(m->tile, IS_ALIGNED(addr, SZ_64K));
+				if (vm->flags & XE_VM_FLAG_64K) {
+					u64 va = cur_ofs * XE_PAGE_SIZE / 8;
+
+					xe_assert(xe, (va & (SZ_64K - 1)) ==
+						  (addr & (SZ_64K - 1)));
+
 					flags |= XE_PTE_PS64;
 				}
 
-				addr += vram_region_gpu_offset(bo->ttm.resource);
+				addr += vram_region_gpu_offset(res);
 				devmem = true;
 			}
 
-			addr = m->q->vm->pt_ops->pte_encode_addr(m->tile->xe,
-								 addr, pat_index,
-								 0, devmem, flags);
+			addr = vm->pt_ops->pte_encode_addr(m->tile->xe,
+							   addr, pat_index,
+							   0, devmem, flags);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
 
@@ -693,8 +722,8 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m,
 		bool usm = xe->info.has_usm;
 		u32 avail_pts = max_mem_transfer_per_pass(xe) / LEVEL0_PAGE_TABLE_ENCODE_SIZE;
 
-		src_L0 = xe_migrate_res_sizes(xe, &src_it);
-		dst_L0 = xe_migrate_res_sizes(xe, &dst_it);
+		src_L0 = xe_migrate_res_sizes(m, &src_it);
+		dst_L0 = xe_migrate_res_sizes(m, &dst_it);
 
 		drm_dbg(&xe->drm, "Pass %u, sizes: %llu & %llu\n",
 			pass++, src_L0, dst_L0);
@@ -715,6 +744,7 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m,
 						      &ccs_ofs, &ccs_pt, 0,
 						      2 * avail_pts,
 						      avail_pts);
+			xe_assert(xe, IS_ALIGNED(ccs_it.start, PAGE_SIZE));
 		}
 
 		/* Add copy commands size here */
@@ -727,20 +757,20 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m,
 			goto err_sync;
 		}
 
-		if (!src_is_vram)
-			emit_pte(m, bb, src_L0_pt, src_is_vram, true, &src_it, src_L0,
-				 src_bo);
-		else
+		if (src_is_vram && xe_migrate_allow_identity(src_L0, &src_it))
 			xe_res_next(&src_it, src_L0);
-
-		if (!dst_is_vram)
-			emit_pte(m, bb, dst_L0_pt, dst_is_vram, true, &dst_it, src_L0,
-				 dst_bo);
 		else
+			emit_pte(m, bb, src_L0_pt, src_is_vram, true, &src_it, src_L0,
+				 src);
+
+		if (dst_is_vram && xe_migrate_allow_identity(src_L0, &dst_it))
 			xe_res_next(&dst_it, src_L0);
+		else
+			emit_pte(m, bb, dst_L0_pt, dst_is_vram, true, &dst_it, src_L0,
+				 dst);
 
 		if (copy_system_ccs)
-			emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src_bo);
+			emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
 
 		bb->cs[bb->len++] = MI_BATCH_BUFFER_END;
 		update_idx = bb->len;
@@ -949,7 +979,7 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
 		bool usm = xe->info.has_usm;
 		u32 avail_pts = max_mem_transfer_per_pass(xe) / LEVEL0_PAGE_TABLE_ENCODE_SIZE;
 
-		clear_L0 = xe_migrate_res_sizes(xe, &src_it);
+		clear_L0 = xe_migrate_res_sizes(m, &src_it);
 
 		drm_dbg(&xe->drm, "Pass %u, size: %llu\n", pass++, clear_L0);
 
@@ -976,12 +1006,12 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
 
 		size -= clear_L0;
 		/* Preemption is enabled again by the ring ops. */
-		if (!clear_vram) {
-			emit_pte(m, bb, clear_L0_pt, clear_vram, true, &src_it, clear_L0,
-				 bo);
-		} else {
+		if (clear_vram && xe_migrate_allow_identity(clear_L0, &src_it))
 			xe_res_next(&src_it, clear_L0);
-		}
+		else
+			emit_pte(m, bb, clear_L0_pt, clear_vram, true, &src_it, clear_L0,
+				 dst);
+
 		bb->cs[bb->len++] = MI_BATCH_BUFFER_END;
 		update_idx = bb->len;
 

From 88cbf8502023dcb97bf9e40655d4848ba14350e0 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 11 Jan 2024 17:20:51 +0100
Subject: [PATCH 075/200] drm/xe: Split GuC communication initialization

Soon we will be trying to communicate with the GuC firmware very
early during VF driver probe, before we finish normal init steps.
Split GuC communication initialization code so the GuC MMIO based
communication xe_guc_mmio_send() functions will work where needed.

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20240111162051.585-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_guc.c | 21 +++++++++++++++++----
 drivers/gpu/drm/xe/xe_guc.h |  1 +
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 6e73ebf6725148..0fd9b5efe4c2d6 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -243,6 +243,22 @@ static void guc_fini(struct drm_device *drm, void *arg)
 	xe_force_wake_put(gt_to_fw(guc_to_gt(guc)), XE_FORCEWAKE_ALL);
 }
 
+/**
+ * xe_guc_comm_init_early - early initialization of GuC communication
+ * @guc: the &xe_guc to initialize
+ *
+ * Must be called prior to first MMIO communication with GuC firmware.
+ */
+void xe_guc_comm_init_early(struct xe_guc *guc)
+{
+	struct xe_gt *gt = guc_to_gt(guc);
+
+	if (xe_gt_is_media_type(gt))
+		guc->notify_reg = MED_GUC_HOST_INTERRUPT;
+	else
+		guc->notify_reg = GUC_HOST_INTERRUPT;
+}
+
 int xe_guc_init(struct xe_guc *guc)
 {
 	struct xe_device *xe = guc_to_xe(guc);
@@ -283,10 +299,7 @@ int xe_guc_init(struct xe_guc *guc)
 
 	guc_init_params(guc);
 
-	if (xe_gt_is_media_type(gt))
-		guc->notify_reg = MED_GUC_HOST_INTERRUPT;
-	else
-		guc->notify_reg = GUC_HOST_INTERRUPT;
+	xe_guc_comm_init_early(guc);
 
 	xe_uc_fw_change_status(&guc->fw, XE_UC_FIRMWARE_LOADABLE);
 
diff --git a/drivers/gpu/drm/xe/xe_guc.h b/drivers/gpu/drm/xe/xe_guc.h
index d3e49e7fd7c34f..94f2dc5f6f90ba 100644
--- a/drivers/gpu/drm/xe/xe_guc.h
+++ b/drivers/gpu/drm/xe/xe_guc.h
@@ -13,6 +13,7 @@
 
 struct drm_printer;
 
+void xe_guc_comm_init_early(struct xe_guc *guc);
 int xe_guc_init(struct xe_guc *guc);
 int xe_guc_init_post_hwconfig(struct xe_guc *guc);
 int xe_guc_post_load_init(struct xe_guc *guc);

From 3c01e01214026114609c577ce31f81d4e037dd50 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 11 Jan 2024 16:48:38 +0100
Subject: [PATCH 076/200] drm/xe/guc: Treat non-response message after BUSY as
 unexpected

Once GuC replied with GUC_HXG_TYPE_NO_RESPONSE_BUSY message then
we may expect that only RESPONSE_SUCCESS or FAILURE message will
be sent, anything else is a violation of the HXG protocol.

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20240111154838.541-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_guc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 0fd9b5efe4c2d6..235d27b17ff993 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -731,8 +731,12 @@ int xe_guc_mmio_send_recv(struct xe_guc *guc, const u32 *request,
 		if (unlikely(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, header) !=
 			     GUC_HXG_ORIGIN_GUC))
 			goto proto;
-		if (unlikely(ret))
+		if (unlikely(ret)) {
+			if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) !=
+			    GUC_HXG_TYPE_NO_RESPONSE_BUSY)
+				goto proto;
 			goto timeout;
+		}
 	}
 
 	if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) ==

From d898c2e55593fea5da068de48a878c66520a4af8 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 11 Jan 2024 16:27:24 +0100
Subject: [PATCH 077/200] drm/xe/guc: Return CTB response length

Not all CTB responses from the GuC are fixed size and we need to
pass response length to the caller, if there was a response_buffer.
Easiest solution is to return it as positive value from all
xe_guc_ct_send_recv() functions.  The CTB response length is always
between 1 and 254 (ie. GUC_HXG_MSG_MIN_LEN and GUC_CTB_MAX_DWORDS
- GUC_HXG_MSG_MIN_LEN).

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20240111152724.497-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index c29f095aa1b98a..d490a4d42c01bb 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -780,7 +780,7 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
 		ret = -EIO;
 	}
 
-	return ret > 0 ? 0 : ret;
+	return ret > 0 ? response_buffer ? g2h_fence.response_len : 0 : ret;
 }
 
 int xe_guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,

From d4978a67ae97a2b875c8e8b6684866ee1d35fa80 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 11 Jan 2024 22:06:32 +0100
Subject: [PATCH 078/200] drm/xe/guc: Use HXG definitions on HXG messages

While parsing and processing CTB G2H messages we should extract
underlying HXG message and use HXG definitions on such message.
Using outer CTB layer message in HXG definitions require use of
shifted dword index, which might be confusing:

	FIELD_GET(GUC_HXG_MSG_0_xxx, msg[1])

instead of:

	FIELD_GET(GUC_HXG_MSG_0_xxx, hxg[0])

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20240111210632.717-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 72 +++++++++++++++++++++-------------
 1 file changed, 44 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index d490a4d42c01bb..b2cc3f993eb6ca 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -796,9 +796,20 @@ int xe_guc_ct_send_recv_no_fail(struct xe_guc_ct *ct, const u32 *action,
 	return guc_ct_send_recv(ct, action, len, response_buffer, true);
 }
 
+static u32 *msg_to_hxg(u32 *msg)
+{
+	return msg + GUC_CTB_MSG_MIN_LEN;
+}
+
+static u32 msg_len_to_hxg_len(u32 len)
+{
+	return len - GUC_CTB_MSG_MIN_LEN;
+}
+
 static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
 {
-	u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, msg[1]);
+	u32 *hxg = msg_to_hxg(msg);
+	u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]);
 
 	lockdep_assert_held(&ct->lock);
 
@@ -817,9 +828,10 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
 {
 	struct xe_gt *gt =  ct_to_gt(ct);
 	struct xe_device *xe = gt_to_xe(gt);
-	u32 response_len = len - GUC_CTB_MSG_MIN_LEN;
+	u32 *hxg = msg_to_hxg(msg);
+	u32 hxg_len = msg_len_to_hxg_len(len);
 	u32 fence = FIELD_GET(GUC_CTB_MSG_0_FENCE, msg[0]);
-	u32 type = FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[1]);
+	u32 type = FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]);
 	struct g2h_fence *g2h_fence;
 
 	lockdep_assert_held(&ct->lock);
@@ -836,8 +848,8 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
 		if (type == GUC_HXG_TYPE_RESPONSE_FAILURE)
 			xe_gt_err(gt, "FAST_REQ H2G fence 0x%x failed! e=0x%x, h=%u\n",
 				  fence,
-				  FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, msg[1]),
-				  FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, msg[1]));
+				  FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, hxg[0]),
+				  FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, hxg[0]));
 		else
 			xe_gt_err(gt, "unexpected response %u for FAST_REQ H2G fence 0x%x!\n",
 				  type, fence);
@@ -857,18 +869,14 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
 
 	if (type == GUC_HXG_TYPE_RESPONSE_FAILURE) {
 		g2h_fence->fail = true;
-		g2h_fence->error =
-			FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, msg[1]);
-		g2h_fence->hint =
-			FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, msg[1]);
+		g2h_fence->error = FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, hxg[0]);
+		g2h_fence->hint = FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, hxg[0]);
 	} else if (type == GUC_HXG_TYPE_NO_RESPONSE_RETRY) {
 		g2h_fence->retry = true;
-		g2h_fence->reason =
-			FIELD_GET(GUC_HXG_RETRY_MSG_0_REASON, msg[1]);
+		g2h_fence->reason = FIELD_GET(GUC_HXG_RETRY_MSG_0_REASON, hxg[0]);
 	} else if (g2h_fence->response_buffer) {
-		g2h_fence->response_len = response_len;
-		memcpy(g2h_fence->response_buffer, msg + GUC_CTB_MSG_MIN_LEN,
-		       response_len * sizeof(u32));
+		g2h_fence->response_len = hxg_len;
+		memcpy(g2h_fence->response_buffer, hxg, hxg_len * sizeof(u32));
 	}
 
 	g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
@@ -884,14 +892,13 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
 static int parse_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
 {
 	struct xe_device *xe = ct_to_xe(ct);
-	u32 hxg, origin, type;
+	u32 *hxg = msg_to_hxg(msg);
+	u32 origin, type;
 	int ret;
 
 	lockdep_assert_held(&ct->lock);
 
-	hxg = msg[1];
-
-	origin = FIELD_GET(GUC_HXG_MSG_0_ORIGIN, hxg);
+	origin = FIELD_GET(GUC_HXG_MSG_0_ORIGIN, hxg[0]);
 	if (unlikely(origin != GUC_HXG_ORIGIN_GUC)) {
 		drm_err(&xe->drm,
 			"G2H channel broken on read, origin=%d, reset required\n",
@@ -901,7 +908,7 @@ static int parse_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
 		return -EPROTO;
 	}
 
-	type = FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg);
+	type = FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]);
 	switch (type) {
 	case GUC_HXG_TYPE_EVENT:
 		ret = parse_g2h_event(ct, msg, len);
@@ -927,14 +934,19 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
 {
 	struct xe_device *xe = ct_to_xe(ct);
 	struct xe_guc *guc = ct_to_guc(ct);
-	u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, msg[1]);
-	u32 *payload = msg + GUC_CTB_HXG_MSG_MIN_LEN;
-	u32 adj_len = len - GUC_CTB_HXG_MSG_MIN_LEN;
+	u32 hxg_len = msg_len_to_hxg_len(len);
+	u32 *hxg = msg_to_hxg(msg);
+	u32 action, adj_len;
+	u32 *payload;
 	int ret = 0;
 
-	if (FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[1]) != GUC_HXG_TYPE_EVENT)
+	if (FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT)
 		return 0;
 
+	action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]);
+	payload = hxg + GUC_HXG_EVENT_MSG_MIN_LEN;
+	adj_len = hxg_len - GUC_HXG_EVENT_MSG_MIN_LEN;
+
 	switch (action) {
 	case XE_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
 		ret = xe_guc_sched_done_handler(guc, payload, adj_len);
@@ -995,6 +1007,7 @@ static int g2h_read(struct xe_guc_ct *ct, u32 *msg, bool fast_path)
 	u32 tail, head, len;
 	s32 avail;
 	u32 action;
+	u32 *hxg;
 
 	lockdep_assert_held(&ct->fast_lock);
 
@@ -1045,10 +1058,11 @@ static int g2h_read(struct xe_guc_ct *ct, u32 *msg, bool fast_path)
 				   avail * sizeof(u32));
 	}
 
-	action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, msg[1]);
+	hxg = msg_to_hxg(msg);
+	action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]);
 
 	if (fast_path) {
-		if (FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[1]) != GUC_HXG_TYPE_EVENT)
+		if (FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT)
 			return 0;
 
 		switch (action) {
@@ -1074,9 +1088,11 @@ static void g2h_fast_path(struct xe_guc_ct *ct, u32 *msg, u32 len)
 {
 	struct xe_device *xe = ct_to_xe(ct);
 	struct xe_guc *guc = ct_to_guc(ct);
-	u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, msg[1]);
-	u32 *payload = msg + GUC_CTB_HXG_MSG_MIN_LEN;
-	u32 adj_len = len - GUC_CTB_HXG_MSG_MIN_LEN;
+	u32 hxg_len = msg_len_to_hxg_len(len);
+	u32 *hxg = msg_to_hxg(msg);
+	u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]);
+	u32 *payload = hxg + GUC_HXG_MSG_MIN_LEN;
+	u32 adj_len = hxg_len - GUC_HXG_MSG_MIN_LEN;
 	int ret = 0;
 
 	switch (action) {

From 33ff1f21bd2fb69620d5ffc7afccf74cbc403097 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 11 Jan 2024 19:25:59 +0100
Subject: [PATCH 079/200] drm/xe: Allow to exclude part of GGTT from
 allocations
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Soon we will be required to exclude some of the GGTT addresses
from the allocations, since on some platforms running the SR-IOV VF
mode, we will be able to use only selected range of the GGTT space.

Add helper functions to manage such GGTT range exclusions, and
follow the naming from the similar concept used by GVT-g.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240111182559.629-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_ggtt.c | 71 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ggtt.h |  3 ++
 2 files changed, 74 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index c639dbf3bdd27b..6fdf830678b3bd 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -11,9 +11,12 @@
 #include <drm/i915_drm.h>
 
 #include "regs/xe_gt_regs.h"
+#include "regs/xe_regs.h"
+#include "xe_assert.h"
 #include "xe_bo.h"
 #include "xe_device.h"
 #include "xe_gt.h"
+#include "xe_gt_printk.h"
 #include "xe_gt_tlb_invalidation.h"
 #include "xe_map.h"
 #include "xe_mmio.h"
@@ -312,6 +315,74 @@ void xe_ggtt_printk(struct xe_ggtt *ggtt, const char *prefix)
 	}
 }
 
+static void xe_ggtt_dump_node(struct xe_ggtt *ggtt,
+			      const struct drm_mm_node *node, const char *description)
+{
+	char buf[10];
+
+	if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) {
+		string_get_size(node->size, 1, STRING_UNITS_2, buf, sizeof(buf));
+		xe_gt_dbg(ggtt->tile->primary_gt, "GGTT %#llx-%#llx (%s) %s\n",
+			  node->start, node->start + node->size, buf, description);
+	}
+}
+
+/**
+ * xe_ggtt_balloon - prevent allocation of specified GGTT addresses
+ * @ggtt: the &xe_ggtt where we want to make reservation
+ * @start: the starting GGTT address of the reserved region
+ * @end: then end GGTT address of the reserved region
+ * @node: the &drm_mm_node to hold reserved GGTT node
+ *
+ * Use xe_ggtt_deballoon() to release a reserved GGTT node.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_ggtt_balloon(struct xe_ggtt *ggtt, u64 start, u64 end, struct drm_mm_node *node)
+{
+	int err;
+
+	xe_tile_assert(ggtt->tile, start < end);
+	xe_tile_assert(ggtt->tile, IS_ALIGNED(start, XE_PAGE_SIZE));
+	xe_tile_assert(ggtt->tile, IS_ALIGNED(end, XE_PAGE_SIZE));
+	xe_tile_assert(ggtt->tile, !drm_mm_node_allocated(node));
+
+	node->color = 0;
+	node->start = start;
+	node->size = end - start;
+
+	mutex_lock(&ggtt->lock);
+	err = drm_mm_reserve_node(&ggtt->mm, node);
+	mutex_unlock(&ggtt->lock);
+
+	if (xe_gt_WARN(ggtt->tile->primary_gt, err,
+		       "Failed to balloon GGTT %#llx-%#llx (%pe)\n",
+		       node->start, node->start + node->size, ERR_PTR(err)))
+		return err;
+
+	xe_ggtt_dump_node(ggtt, node, "balloon");
+	return 0;
+}
+
+/**
+ * xe_ggtt_deballoon - release a reserved GGTT region
+ * @ggtt: the &xe_ggtt where reserved node belongs
+ * @node: the &drm_mm_node with reserved GGTT region
+ *
+ * See xe_ggtt_balloon() for details.
+ */
+void xe_ggtt_deballoon(struct xe_ggtt *ggtt, struct drm_mm_node *node)
+{
+	if (!drm_mm_node_allocated(node))
+		return;
+
+	xe_ggtt_dump_node(ggtt, node, "deballoon");
+
+	mutex_lock(&ggtt->lock);
+	drm_mm_remove_node(node);
+	mutex_unlock(&ggtt->lock);
+}
+
 int xe_ggtt_insert_special_node_locked(struct xe_ggtt *ggtt, struct drm_mm_node *node,
 				       u32 size, u32 align, u32 mm_flags)
 {
diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
index a09c166dff7010..42705e1338e123 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.h
+++ b/drivers/gpu/drm/xe/xe_ggtt.h
@@ -16,6 +16,9 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt);
 int xe_ggtt_init(struct xe_ggtt *ggtt);
 void xe_ggtt_printk(struct xe_ggtt *ggtt, const char *prefix);
 
+int xe_ggtt_balloon(struct xe_ggtt *ggtt, u64 start, u64 size, struct drm_mm_node *node);
+void xe_ggtt_deballoon(struct xe_ggtt *ggtt, struct drm_mm_node *node);
+
 int xe_ggtt_insert_special_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
 				u32 size, u32 align);
 int xe_ggtt_insert_special_node_locked(struct xe_ggtt *ggtt,

From 1113e52ffee7b45def230d10edb1f2924c7b3f9e Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu, 11 Jan 2024 19:56:03 +0100
Subject: [PATCH 080/200] drm/xe: Fix potential deadlock in __fini_dbm
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If Doorbell Manager is in unclean state during fini phase, for
debug purposes we try to print it's state, but we missed the fact
that we are already holding a lock so the xe_guc_db_mgr_print()
will deadlock since it also attempts to grab the same lock.

Fixes: 587c73343ac7 ("drm/xe: Introduce GuC Doorbells Manager")
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240111185603.673-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_db_mgr.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_db_mgr.c b/drivers/gpu/drm/xe/xe_guc_db_mgr.c
index c1c04575d82d92..8d9a0287df6b9b 100644
--- a/drivers/gpu/drm/xe/xe_guc_db_mgr.c
+++ b/drivers/gpu/drm/xe/xe_guc_db_mgr.c
@@ -46,6 +46,8 @@ static struct xe_device *dbm_to_xe(struct xe_guc_db_mgr *dbm)
 #define dbm_assert(_dbm, _cond)		xe_gt_assert(dbm_to_gt(_dbm), _cond)
 #define dbm_mutex(_dbm)			(&dbm_to_guc(_dbm)->submission_state.lock)
 
+static void dbm_print_locked(struct xe_guc_db_mgr *dbm, struct drm_printer *p, int indent);
+
 static void __fini_dbm(struct drm_device *drm, void *arg)
 {
 	struct xe_guc_db_mgr *dbm = arg;
@@ -59,7 +61,7 @@ static void __fini_dbm(struct drm_device *drm, void *arg)
 
 		xe_gt_err(dbm_to_gt(dbm), "GuC doorbells manager unclean (%u/%u)\n",
 			  weight, dbm->count);
-		xe_guc_db_mgr_print(dbm, &p, 1);
+		dbm_print_locked(dbm, &p, 1);
 	}
 
 	bitmap_free(dbm->bitmap);
@@ -219,14 +221,7 @@ void xe_guc_db_mgr_release_range(struct xe_guc_db_mgr *dbm,
 	mutex_unlock(dbm_mutex(dbm));
 }
 
-/**
- * xe_guc_db_mgr_print() - Print status of GuC Doorbells Manager.
- * @dbm: the &xe_guc_db_mgr to print
- * @p: the &drm_printer to print to
- * @indent: tab indentation level
- */
-void xe_guc_db_mgr_print(struct xe_guc_db_mgr *dbm,
-			 struct drm_printer *p, int indent)
+static void dbm_print_locked(struct xe_guc_db_mgr *dbm, struct drm_printer *p, int indent)
 {
 	unsigned int rs, re;
 	unsigned int total;
@@ -235,8 +230,6 @@ void xe_guc_db_mgr_print(struct xe_guc_db_mgr *dbm,
 	if (!dbm->bitmap)
 		return;
 
-	mutex_lock(dbm_mutex(dbm));
-
 	total = 0;
 	for_each_clear_bitrange(rs, re, dbm->bitmap, dbm->count) {
 		drm_printf_indent(p, indent, "available range: %u..%u (%u)\n",
@@ -252,7 +245,19 @@ void xe_guc_db_mgr_print(struct xe_guc_db_mgr *dbm,
 		total += re - rs;
 	}
 	drm_printf_indent(p, indent, "reserved total: %u\n", total);
+}
 
+/**
+ * xe_guc_db_mgr_print() - Print status of GuC Doorbells Manager.
+ * @dbm: the &xe_guc_db_mgr to print
+ * @p: the &drm_printer to print to
+ * @indent: tab indentation level
+ */
+void xe_guc_db_mgr_print(struct xe_guc_db_mgr *dbm,
+			 struct drm_printer *p, int indent)
+{
+	mutex_lock(dbm_mutex(dbm));
+	dbm_print_locked(dbm, p, indent);
 	mutex_unlock(dbm_mutex(dbm));
 }
 

From ca630876aa98c5118ada07604ed8688ee707ddfa Mon Sep 17 00:00:00 2001
From: Matt Roper <matthew.d.roper@intel.com>
Date: Thu, 11 Jan 2024 14:02:38 -0800
Subject: [PATCH 081/200] drm/xe/migrate: Cap PTEs written by MI_STORE_DATA_IMM
 to 510

Although MI_STORE_DATA_IMM's "length" field is 10-bits, 0x3FE is
considered the largest legal value accepted.  Since that instruction
field is always encoded in (val-2) format, this translates to 0x400
dwords for the true maximum length of the instruction.  Subtracting the
instruction header (1 dword) and address (2 dwords), that leaves 0x3FD
dwords (i.e., 0x1FE qwords) for PTE values.

Bspec: 60246, 45753
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20240111220238.1467572-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 88a32b272dda6c..44725f978f3e1e 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -72,6 +72,15 @@ struct xe_migrate {
 #define NUM_PT_SLOTS 32
 #define LEVEL0_PAGE_TABLE_ENCODE_SIZE SZ_2M
 
+/*
+ * Although MI_STORE_DATA_IMM's "length" field is 10-bits, 0x3FE is the largest
+ * legal value accepted.  Since that instruction field is always stored in
+ * (val-2) format, this translates to 0x400 dwords for the true maximum length
+ * of the instruction.  Subtracting the instruction header (1 dword) and
+ * address (2 dwords), that leaves 0x3FD dwords (0x1FE qwords) for PTE values.
+ */
+#define MAX_PTE_PER_SDI 0x1FE
+
 /**
  * xe_tile_migrate_engine() - Get this tile's migrate engine.
  * @tile: The tile.
@@ -444,7 +453,7 @@ static u32 pte_update_size(struct xe_migrate *m,
 		*L0_ofs = xe_migrate_vm_addr(pt_ofs, 0);
 
 		/* MI_STORE_DATA_IMM */
-		cmds += 3 * DIV_ROUND_UP(num_4k_pages, 0x1ff);
+		cmds += 3 * DIV_ROUND_UP(num_4k_pages, MAX_PTE_PER_SDI);
 
 		/* PDE qwords */
 		cmds += num_4k_pages * 2;
@@ -479,7 +488,7 @@ static void emit_pte(struct xe_migrate *m,
 	ptes = DIV_ROUND_UP(size, XE_PAGE_SIZE);
 
 	while (ptes) {
-		u32 chunk = min(0x1ffU, ptes);
+		u32 chunk = min(MAX_PTE_PER_SDI, ptes);
 
 		bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk);
 		bb->cs[bb->len++] = ofs;
@@ -1098,7 +1107,7 @@ static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs,
 	 * This shouldn't be possible in practice.. might change when 16K
 	 * pages are used. Hence the assert.
 	 */
-	xe_tile_assert(tile, update->qwords <= 0x1ff);
+	xe_tile_assert(tile, update->qwords <= MAX_PTE_PER_SDI);
 	if (!ppgtt_ofs)
 		ppgtt_ofs = xe_migrate_vram_ofs(tile_to_xe(tile),
 						xe_bo_addr(update->pt_bo, 0,
@@ -1107,7 +1116,7 @@ static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs,
 	do {
 		u64 addr = ppgtt_ofs + ofs * 8;
 
-		chunk = min(update->qwords, 0x1ffU);
+		chunk = min(update->qwords, MAX_PTE_PER_SDI);
 
 		/* Ensure populatefn can do memset64 by aligning bb->cs */
 		if (!(bb->len & 1))
@@ -1283,7 +1292,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
 		batch_size = 6 + num_updates * 2;
 
 	for (i = 0; i < num_updates; i++) {
-		u32 num_cmds = DIV_ROUND_UP(updates[i].qwords, 0x1ff);
+		u32 num_cmds = DIV_ROUND_UP(updates[i].qwords, MAX_PTE_PER_SDI);
 
 		/* align noop + MI_STORE_DATA_IMM cmd prefix */
 		batch_size += 4 * num_cmds + updates[i].qwords * 2;

From 52e3fa3e3ea3ee05e32c1a8d72bb3ae306a4da64 Mon Sep 17 00:00:00 2001
From: Brian Welty <brian.welty@intel.com>
Date: Wed, 10 Jan 2024 16:21:11 -0800
Subject: [PATCH 082/200] drm/xe: Fix bounds checking in
 __xe_bo_placement_for_flags()

Requesting all memory regions on PVC will fill bo->placements up to
XE_BO_MAX_PLACEMENTS. The subsequent call to try_add_stolen() will trip
over the bounds checking even though XE_PL_STOLEN is not expected to
be used in this case.

This is hit with igt@xe_exec_fault_mode@once-basic-prefetch:
    xe 0000:8c:00.0: [drm] Assertion `*c < (sizeof(bo->placements) / sizeof((bo->placements)[0]) + ((int)(sizeof(struct { int:(-!!(__builtin_types_compatible_p(typeof((bo->placements)), typeof(&(bo->placements)[0])))); }))))` failed!
    WARNING: CPU: 30 PID: 6161 at drivers/gpu/drm/xe/xe_bo.c:203 __xe_bo_placement_for_flags+0x218/0x240 [xe]

Is fixed here by moving the bounds checks closer to where we actually
write into the bo->placement array.

Fixes: 8c54ee8a8606 ("drm/xe: Ensure that we don't access the placements array out-of-bounds")
Link: https://patchwork.freedesktop.org/patch/msgid/20240111002111.10190-1-brian.welty@intel.com
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 338f9688a2c939..26fe73f58d7207 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -125,9 +125,9 @@ static struct xe_mem_region *res_to_mem_region(struct ttm_resource *res)
 static void try_add_system(struct xe_device *xe, struct xe_bo *bo,
 			   u32 bo_flags, u32 *c)
 {
-	xe_assert(xe, *c < ARRAY_SIZE(bo->placements));
-
 	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
+		xe_assert(xe, *c < ARRAY_SIZE(bo->placements));
+
 		bo->placements[*c] = (struct ttm_place) {
 			.mem_type = XE_PL_TT,
 		};
@@ -145,6 +145,8 @@ static void add_vram(struct xe_device *xe, struct xe_bo *bo,
 	struct xe_mem_region *vram;
 	u64 io_size;
 
+	xe_assert(xe, *c < ARRAY_SIZE(bo->placements));
+
 	vram = to_xe_ttm_vram_mgr(ttm_manager_type(&xe->ttm, mem_type))->vram;
 	xe_assert(xe, vram && vram->usable_size);
 	io_size = vram->io_size;
@@ -175,8 +177,6 @@ static void add_vram(struct xe_device *xe, struct xe_bo *bo,
 static void try_add_vram(struct xe_device *xe, struct xe_bo *bo,
 			 u32 bo_flags, u32 *c)
 {
-	xe_assert(xe, *c < ARRAY_SIZE(bo->placements));
-
 	if (bo->props.preferred_gt == XE_GT1) {
 		if (bo_flags & XE_BO_CREATE_VRAM1_BIT)
 			add_vram(xe, bo, bo->placements, bo_flags, XE_PL_VRAM1, c);
@@ -193,9 +193,9 @@ static void try_add_vram(struct xe_device *xe, struct xe_bo *bo,
 static void try_add_stolen(struct xe_device *xe, struct xe_bo *bo,
 			   u32 bo_flags, u32 *c)
 {
-	xe_assert(xe, *c < ARRAY_SIZE(bo->placements));
-
 	if (bo_flags & XE_BO_CREATE_STOLEN_BIT) {
+		xe_assert(xe, *c < ARRAY_SIZE(bo->placements));
+
 		bo->placements[*c] = (struct ttm_place) {
 			.mem_type = XE_PL_STOLEN,
 			.flags = bo_flags & (XE_BO_CREATE_PINNED_BIT |

From 1c7531f50eaa425eca8ff726287b8df3a4a51e55 Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula@intel.com>
Date: Thu, 11 Jan 2024 12:47:16 +0200
Subject: [PATCH 083/200] drm/xe: display support should not depend on EXPERT

Remove the DRM_XE_DISPLAY config dependency on EXPERT. I can only
presume the idea was only experts should be able to disable it, but the
effect is the opposite.

Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240111104716.3548744-1-jani.nikula@intel.com
---
 drivers/gpu/drm/xe/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index a53b0fdc15a746..1b57ae38210d4c 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -47,7 +47,7 @@ config DRM_XE
 
 config DRM_XE_DISPLAY
 	bool "Enable display support"
-	depends on DRM_XE && EXPERT && DRM_XE=m
+	depends on DRM_XE && DRM_XE=m
 	select FB_IOMEM_HELPERS
 	select I2C
 	select I2C_ALGOBIT

From ddc3c0877e16669eb61782f0fe3abc786cc426a1 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Fri, 12 Jan 2024 17:06:52 +0100
Subject: [PATCH 084/200] drm/xe: Use kstrdup while creating snapshot

There is no need to copy string step by step, use existing helper.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20240112160652.893-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_hw_engine.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index e279ef6c527cd7..3aaab507f37feb 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -749,7 +749,6 @@ struct xe_hw_engine_snapshot *
 xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe)
 {
 	struct xe_hw_engine_snapshot *snapshot;
-	int len;
 
 	if (!xe_hw_engine_is_valid(hwe))
 		return NULL;
@@ -759,11 +758,7 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe)
 	if (!snapshot)
 		return NULL;
 
-	len = strlen(hwe->name) + 1;
-	snapshot->name = kzalloc(len, GFP_ATOMIC);
-	if (snapshot->name)
-		strscpy(snapshot->name, hwe->name, len);
-
+	snapshot->name = kstrdup(hwe->name, GFP_ATOMIC);
 	snapshot->class = hwe->class;
 	snapshot->logical_instance = hwe->logical_instance;
 	snapshot->forcewake.domain = hwe->domain;

From 85f3b79fb5788e2a1ec938a70d8f5c7150a670b9 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Wed, 22 Nov 2023 12:31:46 -0800
Subject: [PATCH 085/200] drm/xe: Group normal kunit tests in a single module

Creating one module for each compilation unit to be tested seems
excessive as the number of tests increase. Group them all in a single
kunit test module called xe_test.ko.

The tests requiring the physical device, aka "live" tests, are still
kept in separate modules since they are normally triggered via igt,
and not via kunit.py. After igt is converted, those can be merged in
a single module as well.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231122203147.988021-2-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/tests/Makefile      |  7 ++++++-
 drivers/gpu/drm/xe/tests/xe_pci_test.c |  5 -----
 drivers/gpu/drm/xe/tests/xe_rtp_test.c |  5 -----
 drivers/gpu/drm/xe/tests/xe_test_mod.c | 10 ++++++++++
 drivers/gpu/drm/xe/tests/xe_wa_test.c  |  5 -----
 5 files changed, 16 insertions(+), 16 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/tests/xe_test_mod.c

diff --git a/drivers/gpu/drm/xe/tests/Makefile b/drivers/gpu/drm/xe/tests/Makefile
index 39d8a0892274ab..9d1d88af8b2fa5 100644
--- a/drivers/gpu/drm/xe/tests/Makefile
+++ b/drivers/gpu/drm/xe/tests/Makefile
@@ -1,10 +1,15 @@
 # SPDX-License-Identifier: GPL-2.0
 
+# "live" kunit tests
 obj-$(CONFIG_DRM_XE_KUNIT_TEST) += \
 	xe_bo_test.o \
 	xe_dma_buf_test.o \
 	xe_migrate_test.o \
-	xe_mocs_test.o \
+	xe_mocs_test.o
+
+# Normal kunit tests
+obj-$(CONFIG_DRM_XE_KUNIT_TEST) += xe_test.o
+xe_test-y = xe_test_mod.o \
 	xe_pci_test.o \
 	xe_rtp_test.o \
 	xe_wa_test.o
diff --git a/drivers/gpu/drm/xe/tests/xe_pci_test.c b/drivers/gpu/drm/xe/tests/xe_pci_test.c
index 171e4180f1aa92..a6705a536391d9 100644
--- a/drivers/gpu/drm/xe/tests/xe_pci_test.c
+++ b/drivers/gpu/drm/xe/tests/xe_pci_test.c
@@ -64,8 +64,3 @@ static struct kunit_suite xe_pci_test_suite = {
 };
 
 kunit_test_suite(xe_pci_test_suite);
-
-MODULE_AUTHOR("Intel Corporation");
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("xe_pci kunit test");
-MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING);
diff --git a/drivers/gpu/drm/xe/tests/xe_rtp_test.c b/drivers/gpu/drm/xe/tests/xe_rtp_test.c
index 21e77026518b6e..06759d7547838f 100644
--- a/drivers/gpu/drm/xe/tests/xe_rtp_test.c
+++ b/drivers/gpu/drm/xe/tests/xe_rtp_test.c
@@ -311,8 +311,3 @@ static struct kunit_suite xe_rtp_test_suite = {
 };
 
 kunit_test_suite(xe_rtp_test_suite);
-
-MODULE_AUTHOR("Intel Corporation");
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("xe_rtp kunit test");
-MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING);
diff --git a/drivers/gpu/drm/xe/tests/xe_test_mod.c b/drivers/gpu/drm/xe/tests/xe_test_mod.c
new file mode 100644
index 00000000000000..875f3e6f965e0c
--- /dev/null
+++ b/drivers/gpu/drm/xe/tests/xe_test_mod.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+#include <linux/module.h>
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("xe kunit tests");
+MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING);
diff --git a/drivers/gpu/drm/xe/tests/xe_wa_test.c b/drivers/gpu/drm/xe/tests/xe_wa_test.c
index 5cfd5844c17b3d..439477593fafe2 100644
--- a/drivers/gpu/drm/xe/tests/xe_wa_test.c
+++ b/drivers/gpu/drm/xe/tests/xe_wa_test.c
@@ -156,8 +156,3 @@ static struct kunit_suite xe_rtp_test_suite = {
 };
 
 kunit_test_suite(xe_rtp_test_suite);
-
-MODULE_AUTHOR("Intel Corporation");
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("xe_wa kunit test");
-MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING);

From e2dc52f849f8694bdabb75127164c9df622af459 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Wed, 17 Jan 2024 14:40:44 +0100
Subject: [PATCH 086/200] drm/xe/dmabuf: Make xe_dmabuf_ops static
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

It is not referenced outside of the xe_dma_buf.c source file.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-2-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_dma_buf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 64ed303728fda9..da2627ed6ae7a9 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -175,7 +175,7 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 	return 0;
 }
 
-const struct dma_buf_ops xe_dmabuf_ops = {
+static const struct dma_buf_ops xe_dmabuf_ops = {
 	.attach = xe_dma_buf_attach,
 	.detach = xe_dma_buf_detach,
 	.pin = xe_dma_buf_pin,

From 79f8eacbdf9dad7ead39b3319e31e12d4dc6529e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Wed, 17 Jan 2024 14:40:48 +0100
Subject: [PATCH 087/200] drm/xe: Use a NULL pointer instead of 0.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The last argument of xe_pcode_read() is a pointer. Use NULL instead of 0.

Fixes: 92d44a422d0d ("drm/xe/hwmon: Expose card reactive critical power")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-6-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_hwmon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
index 6ef2aa1eae8b09..174ed2185481e3 100644
--- a/drivers/gpu/drm/xe/xe_hwmon.c
+++ b/drivers/gpu/drm/xe/xe_hwmon.c
@@ -419,7 +419,7 @@ static int xe_hwmon_pcode_read_i1(struct xe_gt *gt, u32 *uval)
 
 	return xe_pcode_read(gt, PCODE_MBOX(PCODE_POWER_SETUP,
 			     POWER_SETUP_SUBCOMMAND_READ_I1, 0),
-			     uval, 0);
+			     uval, NULL);
 }
 
 static int xe_hwmon_pcode_write_i1(struct xe_gt *gt, u32 uval)

From c5a06c9169f3b1db0564019296ee41792a368f5a Mon Sep 17 00:00:00 2001
From: Karthik Poosa <karthik.poosa@intel.com>
Date: Wed, 17 Jan 2024 11:20:35 +0530
Subject: [PATCH 088/200] drm/xe/guc: Enable WA 14018913170

The GuC handles the WA, the KMD just needs to set the flag to enable
it on the appropriate platforms.

v2:
  - Fixed CI checkpatch warning, alignment should match open parenthesis.
  - Fixed GUC FW version check to use XE_UC_FW_VER_RELEASE which points to
    current GUC FW version instead of XE_UC_FW_VER_COMPATIBILITY which
    holds GUC FW I/F version (Badal).
v3:
  - Removed extra character in debug print.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117055035.2417711-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_guc.c        | 13 +++++++++++++
 drivers/gpu/drm/xe/xe_guc_fwif.h   |  1 +
 drivers/gpu/drm/xe/xe_wa_oob.rules |  5 +++++
 3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 235d27b17ff993..2891b0cc4f7f9a 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -132,10 +132,15 @@ static u32 guc_ctl_ads_flags(struct xe_guc *guc)
 	return flags;
 }
 
+#define GUC_VER(maj, min, pat)	(((maj) << 16) | ((min) << 8) | (pat))
+
 static u32 guc_ctl_wa_flags(struct xe_guc *guc)
 {
 	struct xe_device *xe = guc_to_xe(guc);
 	struct xe_gt *gt = guc_to_gt(guc);
+	struct xe_uc_fw *uc_fw = &guc->fw;
+	struct xe_uc_fw_version *version = &uc_fw->versions.found[XE_UC_FW_VER_RELEASE];
+
 	u32 flags = 0;
 
 	if (XE_WA(gt, 22012773006))
@@ -165,6 +170,14 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc)
 	if (XE_WA(gt, 1509372804))
 		flags |= GUC_WA_RENDER_RST_RC6_EXIT;
 
+	if (XE_WA(gt, 14018913170)) {
+		if (GUC_VER(version->major, version->minor, version->patch) >= GUC_VER(70, 7, 0))
+			flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
+		else
+			drm_warn(&xe->drm, "can't apply WA 14018913170, GUC version expected >= 70.7.0, found %u %u %u\n",
+				 version->major, version->minor, version->patch);
+	}
+
 	return flags;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
index 4dd5a88a782658..c281fdbfd2d679 100644
--- a/drivers/gpu/drm/xe/xe_guc_fwif.h
+++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
@@ -97,6 +97,7 @@ struct guc_update_exec_queue_policy {
 #define   GUC_WA_POLLCS			BIT(18)
 #define   GUC_WA_RENDER_RST_RC6_EXIT	BIT(19)
 #define   GUC_WA_RCS_REGS_IN_CCS_REGS_LIST	BIT(21)
+#define   GUC_WA_ENABLE_TSC_CHECK_ON_RC6	BIT(22)
 
 #define GUC_CTL_FEATURE			2
 #define   GUC_CTL_ENABLE_SLPC		BIT(2)
diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
index e73b84e01ea1be..b138cbd51bdb1f 100644
--- a/drivers/gpu/drm/xe/xe_wa_oob.rules
+++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
@@ -17,3 +17,8 @@
 14019821291	MEDIA_VERSION_RANGE(1300, 2000)
 14015076503	MEDIA_VERSION(1300)
 16020292621	GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0)
+14018913170	GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0)
+		MEDIA_VERSION(2000), GRAPHICS_STEP(A0, A1)
+		GRAPHICS_VERSION_RANGE(1270, 1274)
+		MEDIA_VERSION(1300)
+		PLATFORM(DG2)

From 34e9d836f9d0362a45009d61e211e0d5fbdcc28a Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Tue, 16 Jan 2024 13:02:07 +0100
Subject: [PATCH 089/200] drm/xe: Mark internal gmdid mappings as const

The mapping between HW IP version and its description is const, so
mark it as such.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240116120207.1133-1-michal.wajdeczko@intel.com
---
 drivers/gpu/drm/xe/xe_pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 7ba2000f535bda..6664d1b2efdb9a 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -340,14 +340,14 @@ static const struct xe_device_desc lnl_desc = {
 __diag_pop();
 
 /* Map of GMD_ID values to graphics IP */
-static struct gmdid_map graphics_ip_map[] = {
+static const struct gmdid_map graphics_ip_map[] = {
 	{ 1270, &graphics_xelpg },
 	{ 1271, &graphics_xelpg },
 	{ 2004, &graphics_xe2 },
 };
 
 /* Map of GMD_ID values to media IP */
-static struct gmdid_map media_ip_map[] = {
+static const struct gmdid_map media_ip_map[] = {
 	{ 1300, &media_xelpmp },
 	{ 2000, &media_xe2 },
 };

From a54e016ace26304505dfd1bd2fb0278a91dae310 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Fri, 12 Jan 2024 11:25:53 +0100
Subject: [PATCH 090/200] drm/xe/guc: Return CTB HXG response DATA0 if no
 buffer provided

Most of the synchronous GuC HXG action responses are defined in
such a way that only mandatory DATA0 from the HXG header is used
and only in few cases it is more than MBZ (must be zero).

For those cases where HXG action returns just DATA0, return that
value if caller didn't provide buffer for the full response.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240112102554.761-1-michal.wajdeczko@intel.com
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index b2cc3f993eb6ca..4ae1a0cd9537aa 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -33,6 +33,7 @@
 struct g2h_fence {
 	u32 *response_buffer;
 	u32 seqno;
+	u32 response_data;
 	u16 response_len;
 	u16 error;
 	u16 hint;
@@ -45,6 +46,7 @@ struct g2h_fence {
 static void g2h_fence_init(struct g2h_fence *g2h_fence, u32 *response_buffer)
 {
 	g2h_fence->response_buffer = response_buffer;
+	g2h_fence->response_data = 0;
 	g2h_fence->response_len = 0;
 	g2h_fence->fail = false;
 	g2h_fence->retry = false;
@@ -780,7 +782,7 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
 		ret = -EIO;
 	}
 
-	return ret > 0 ? response_buffer ? g2h_fence.response_len : 0 : ret;
+	return ret > 0 ? response_buffer ? g2h_fence.response_len : g2h_fence.response_data : ret;
 }
 
 int xe_guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
@@ -877,6 +879,8 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
 	} else if (g2h_fence->response_buffer) {
 		g2h_fence->response_len = hxg_len;
 		memcpy(g2h_fence->response_buffer, hxg, hxg_len * sizeof(u32));
+	} else {
+		g2h_fence->response_data = FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, hxg[0]);
 	}
 
 	g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);

From 6af7ee08279cdb2e1d832f718f2f3c3dcbef5a14 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Fri, 12 Jan 2024 11:25:54 +0100
Subject: [PATCH 091/200] drm/xe/guc: Add kernel-doc for xe_guc_ct_send_recv()

Add initial documentation for recently updated xe_guc_ct_send_recv().

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240112102554.761-2-michal.wajdeczko@intel.com
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 4ae1a0cd9537aa..ee5d99456aebc8 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -785,6 +785,24 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
 	return ret > 0 ? response_buffer ? g2h_fence.response_len : g2h_fence.response_data : ret;
 }
 
+/**
+ * xe_guc_ct_send_recv - Send and receive HXG to the GuC
+ * @ct: the &xe_guc_ct
+ * @action: the dword array with `HXG Request`_ message (can't be NULL)
+ * @len: length of the `HXG Request`_ message (in dwords, can't be 0)
+ * @response_buffer: placeholder for the `HXG Response`_ message (can be NULL)
+ *
+ * Send a `HXG Request`_ message to the GuC over CT communication channel and
+ * blocks until GuC replies with a `HXG Response`_ message.
+ *
+ * For non-blocking communication with GuC use xe_guc_ct_send().
+ *
+ * Note: The size of &response_buffer must be at least GUC_CTB_MAX_DWORDS_.
+ *
+ * Return: response length (in dwords) if &response_buffer was not NULL, or
+ *         DATA0 from `HXG Response`_ if &response_buffer was NULL, or
+ *         a negative error code on failure.
+ */
 int xe_guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
 			u32 *response_buffer)
 {

From 997a55caa1c3b770979836bbfd82b311addf95c7 Mon Sep 17 00:00:00 2001
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Date: Wed, 17 Jan 2024 10:26:19 -0800
Subject: [PATCH 092/200] drm/xe/gsc: Initialize GSC proxy

The GSC uC needs to communicate with the CSME to perform certain
operations. Since the GSC can't perform this communication directly on
platforms where it is integrated in GT, the graphics driver needs to
transfer the messages from GSC to CSME and back. The proxy flow must be
manually started after the GSC is loaded to signal to GSC that we're
ready to handle its messages and allow it to query its init data from
CSME.

Note that the component must be removed before the pci_remove call
completes, so we can't use a drmm helper for it and we need to instead
perform the cleanup as part of the removal flow.

v2: add function documentation, more targeted memory clear, clearer logs
and variable names (Alan)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117182621.2653049-2-daniele.ceraolospurio@intel.com
---
 drivers/gpu/drm/xe/Makefile                   |   1 +
 .../gpu/drm/xe/abi/gsc_proxy_commands_abi.h   |  44 ++
 drivers/gpu/drm/xe/xe_device.c                |  22 +-
 drivers/gpu/drm/xe/xe_gsc.c                   |  52 +-
 drivers/gpu/drm/xe/xe_gsc.h                   |   1 +
 drivers/gpu/drm/xe/xe_gsc_proxy.c             | 468 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_gsc_proxy.h             |  17 +
 drivers/gpu/drm/xe/xe_gsc_submit.c            |  13 +
 drivers/gpu/drm/xe/xe_gsc_submit.h            |   1 +
 drivers/gpu/drm/xe/xe_gsc_types.h             |  24 +
 drivers/gpu/drm/xe/xe_gt.c                    |  13 +
 drivers/gpu/drm/xe/xe_gt.h                    |   1 +
 drivers/gpu/drm/xe/xe_uc.c                    |  14 +
 drivers/gpu/drm/xe/xe_uc.h                    |   1 +
 14 files changed, 659 insertions(+), 13 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/abi/gsc_proxy_commands_abi.h
 create mode 100644 drivers/gpu/drm/xe/xe_gsc_proxy.c
 create mode 100644 drivers/gpu/drm/xe/xe_gsc_proxy.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index e16b84f79ddf36..fe8b266a981918 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -76,6 +76,7 @@ xe-y += xe_bb.o \
 	xe_ggtt.o \
 	xe_gpu_scheduler.o \
 	xe_gsc.o \
+	xe_gsc_proxy.o \
 	xe_gsc_submit.o \
 	xe_gt.o \
 	xe_gt_ccs_mode.o \
diff --git a/drivers/gpu/drm/xe/abi/gsc_proxy_commands_abi.h b/drivers/gpu/drm/xe/abi/gsc_proxy_commands_abi.h
new file mode 100644
index 00000000000000..80bbf06a3eb8c7
--- /dev/null
+++ b/drivers/gpu/drm/xe/abi/gsc_proxy_commands_abi.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _ABI_GSC_PROXY_COMMANDS_ABI_H
+#define _ABI_GSC_PROXY_COMMANDS_ABI_H
+
+#include <linux/types.h>
+
+/* Heci client ID for proxy commands */
+#define HECI_MEADDRESS_PROXY 10
+
+/* FW-defined proxy header */
+struct xe_gsc_proxy_header {
+	/*
+	 * hdr:
+	 * Bits 0-7: type of the proxy message (see enum xe_gsc_proxy_type)
+	 * Bits 8-15: rsvd
+	 * Bits 16-31: length in bytes of the payload following the proxy header
+	 */
+	u32 hdr;
+#define GSC_PROXY_TYPE		 GENMASK(7, 0)
+#define GSC_PROXY_PAYLOAD_LENGTH GENMASK(31, 16)
+
+	u32 source;		/* Source of the Proxy message */
+	u32 destination;	/* Destination of the Proxy message */
+#define GSC_PROXY_ADDRESSING_KMD  0x10000
+#define GSC_PROXY_ADDRESSING_GSC  0x20000
+#define GSC_PROXY_ADDRESSING_CSME 0x30000
+
+	u32 status;		/* Command status */
+} __packed;
+
+/* FW-defined proxy types */
+enum xe_gsc_proxy_type {
+	GSC_PROXY_MSG_TYPE_PROXY_INVALID = 0,
+	GSC_PROXY_MSG_TYPE_PROXY_QUERY = 1,
+	GSC_PROXY_MSG_TYPE_PROXY_PAYLOAD = 2,
+	GSC_PROXY_MSG_TYPE_PROXY_END = 3,
+	GSC_PROXY_MSG_TYPE_PROXY_NOTIFICATION = 4,
+};
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index a94c0b27f04e8f..7a3c1c13dc3471 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -26,6 +26,7 @@
 #include "xe_exec_queue.h"
 #include "xe_exec.h"
 #include "xe_ggtt.h"
+#include "xe_gsc_proxy.h"
 #include "xe_gt.h"
 #include "xe_gt_mcr.h"
 #include "xe_irq.h"
@@ -434,6 +435,7 @@ int xe_device_probe(struct xe_device *xe)
 	struct xe_tile *tile;
 	struct xe_gt *gt;
 	int err;
+	u8 last_gt;
 	u8 id;
 
 	xe_pat_init_early(xe);
@@ -521,16 +523,18 @@ int xe_device_probe(struct xe_device *xe)
 		goto err_irq_shutdown;
 
 	for_each_gt(gt, xe, id) {
+		last_gt = id;
+
 		err = xe_gt_init(gt);
 		if (err)
-			goto err_irq_shutdown;
+			goto err_fini_gt;
 	}
 
 	xe_heci_gsc_init(xe);
 
 	err = xe_display_init(xe);
 	if (err)
-		goto err_irq_shutdown;
+		goto err_fini_gt;
 
 	err = drm_dev_register(&xe->drm, 0);
 	if (err)
@@ -551,6 +555,14 @@ int xe_device_probe(struct xe_device *xe)
 err_fini_display:
 	xe_display_driver_remove(xe);
 
+err_fini_gt:
+	for_each_gt(gt, xe, id) {
+		if (id < last_gt)
+			xe_gt_remove(gt);
+		else
+			break;
+	}
+
 err_irq_shutdown:
 	xe_irq_shutdown(xe);
 err:
@@ -568,12 +580,18 @@ static void xe_device_remove_display(struct xe_device *xe)
 
 void xe_device_remove(struct xe_device *xe)
 {
+	struct xe_gt *gt;
+	u8 id;
+
 	xe_device_remove_display(xe);
 
 	xe_display_fini(xe);
 
 	xe_heci_gsc_fini(xe);
 
+	for_each_gt(gt, xe, id)
+		xe_gt_remove(gt);
+
 	xe_irq_shutdown(xe);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index 5b84fc9ab8adcd..85074b7304021a 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -13,6 +13,7 @@
 #include "xe_bo.h"
 #include "xe_device.h"
 #include "xe_exec_queue.h"
+#include "xe_gsc_proxy.h"
 #include "xe_gsc_submit.h"
 #include "xe_gt.h"
 #include "xe_gt_printk.h"
@@ -242,8 +243,31 @@ static int gsc_upload(struct xe_gsc *gsc)
 	if (err)
 		return err;
 
+	return 0;
+}
+
+static int gsc_upload_and_init(struct xe_gsc *gsc)
+{
+	struct xe_gt *gt = gsc_to_gt(gsc);
+	int ret;
+
+	ret = gsc_upload(gsc);
+	if (ret)
+		return ret;
+
+	xe_uc_fw_change_status(&gsc->fw, XE_UC_FIRMWARE_TRANSFERRED);
 	xe_gt_dbg(gt, "GSC FW async load completed\n");
 
+	/* HuC auth failure is not fatal */
+	if (xe_huc_is_authenticated(&gt->uc.huc, XE_HUC_AUTH_VIA_GUC))
+		xe_huc_auth(&gt->uc.huc, XE_HUC_AUTH_VIA_GSC);
+
+	ret = xe_gsc_proxy_start(gsc);
+	if (ret)
+		return ret;
+
+	xe_gt_dbg(gt, "GSC proxy init completed\n");
+
 	return 0;
 }
 
@@ -257,19 +281,12 @@ static void gsc_work(struct work_struct *work)
 	xe_device_mem_access_get(xe);
 	xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC);
 
-	ret = gsc_upload(gsc);
-	if (ret && ret != -EEXIST) {
+	ret = gsc_upload_and_init(gsc);
+	if (ret && ret != -EEXIST)
 		xe_uc_fw_change_status(&gsc->fw, XE_UC_FIRMWARE_LOAD_FAIL);
-		goto out;
-	}
-
-	xe_uc_fw_change_status(&gsc->fw, XE_UC_FIRMWARE_TRANSFERRED);
-
-	/* HuC auth failure is not fatal */
-	if (xe_huc_is_authenticated(&gt->uc.huc, XE_HUC_AUTH_VIA_GUC))
-		xe_huc_auth(&gt->uc.huc, XE_HUC_AUTH_VIA_GSC);
+	else
+		xe_uc_fw_change_status(&gsc->fw, XE_UC_FIRMWARE_RUNNING);
 
-out:
 	xe_force_wake_put(gt_to_fw(gt), XE_FW_GSC);
 	xe_device_mem_access_put(xe);
 }
@@ -302,6 +319,10 @@ int xe_gsc_init(struct xe_gsc *gsc)
 	else if (ret)
 		goto out;
 
+	ret = xe_gsc_proxy_init(gsc);
+	if (ret && ret != -ENODEV)
+		goto out;
+
 	return 0;
 
 out:
@@ -410,6 +431,15 @@ void xe_gsc_wait_for_worker_completion(struct xe_gsc *gsc)
 		flush_work(&gsc->work);
 }
 
+/**
+ * xe_gsc_remove() - Clean up the GSC structures before driver removal
+ * @gsc: the GSC uC
+ */
+void xe_gsc_remove(struct xe_gsc *gsc)
+{
+	xe_gsc_proxy_remove(gsc);
+}
+
 /*
  * wa_14015076503: if the GSC FW is loaded, we need to alert it before doing a
  * GSC engine reset by writing a notification bit in the GS1 register and then
diff --git a/drivers/gpu/drm/xe/xe_gsc.h b/drivers/gpu/drm/xe/xe_gsc.h
index bc1ef7f31ea2c2..c6fb32e3fd7964 100644
--- a/drivers/gpu/drm/xe/xe_gsc.h
+++ b/drivers/gpu/drm/xe/xe_gsc.h
@@ -14,6 +14,7 @@ int xe_gsc_init(struct xe_gsc *gsc);
 int xe_gsc_init_post_hwconfig(struct xe_gsc *gsc);
 void xe_gsc_wait_for_worker_completion(struct xe_gsc *gsc);
 void xe_gsc_load_start(struct xe_gsc *gsc);
+void xe_gsc_remove(struct xe_gsc *gsc);
 
 void xe_gsc_wa_14015076503(struct xe_gt *gt, bool prep);
 
diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.c b/drivers/gpu/drm/xe/xe_gsc_proxy.c
new file mode 100644
index 00000000000000..86353c5a81cd14
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_gsc_proxy.c
@@ -0,0 +1,468 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include "xe_gsc_proxy.h"
+
+#include <linux/component.h>
+#include <linux/delay.h>
+
+#include <drm/drm_managed.h>
+#include <drm/i915_component.h>
+#include <drm/i915_gsc_proxy_mei_interface.h>
+
+#include "abi/gsc_proxy_commands_abi.h"
+#include "regs/xe_gsc_regs.h"
+#include "xe_bo.h"
+#include "xe_gsc.h"
+#include "xe_gsc_submit.h"
+#include "xe_gt.h"
+#include "xe_gt_printk.h"
+#include "xe_map.h"
+#include "xe_mmio.h"
+
+/*
+ * GSC proxy:
+ * The GSC uC needs to communicate with the CSME to perform certain operations.
+ * Since the GSC can't perform this communication directly on platforms where it
+ * is integrated in GT, the graphics driver needs to transfer the messages from
+ * GSC to CSME and back. The proxy flow must be manually started after the GSC
+ * is loaded to signal to GSC that we're ready to handle its messages and allow
+ * it to query its init data from CSME; GSC will then trigger an HECI2 interrupt
+ * if it needs to send messages to CSME again.
+ * The proxy flow is as follow:
+ * 1 - Xe submits a request to GSC asking for the message to CSME
+ * 2 - GSC replies with the proxy header + payload for CSME
+ * 3 - Xe sends the reply from GSC as-is to CSME via the mei proxy component
+ * 4 - CSME replies with the proxy header + payload for GSC
+ * 5 - Xe submits a request to GSC with the reply from CSME
+ * 6 - GSC replies either with a new header + payload (same as step 2, so we
+ *     restart from there) or with an end message.
+ */
+
+/*
+ * The component should load quite quickly in most cases, but it could take
+ * a bit. Using a very big timeout just to cover the worst case scenario
+ */
+#define GSC_PROXY_INIT_TIMEOUT_MS 20000
+
+/* shorthand define for code compactness */
+#define PROXY_HDR_SIZE (sizeof(struct xe_gsc_proxy_header))
+
+/* the protocol supports up to 32K in each direction */
+#define GSC_PROXY_BUFFER_SIZE SZ_32K
+#define GSC_PROXY_CHANNEL_SIZE (GSC_PROXY_BUFFER_SIZE * 2)
+
+static struct xe_gt *
+gsc_to_gt(struct xe_gsc *gsc)
+{
+	return container_of(gsc, struct xe_gt, uc.gsc);
+}
+
+static inline struct xe_device *kdev_to_xe(struct device *kdev)
+{
+	return dev_get_drvdata(kdev);
+}
+
+static bool gsc_proxy_init_done(struct xe_gsc *gsc)
+{
+	struct xe_gt *gt = gsc_to_gt(gsc);
+	u32 fwsts1 = xe_mmio_read32(gt, HECI_FWSTS1(MTL_GSC_HECI1_BASE));
+
+	return REG_FIELD_GET(HECI1_FWSTS1_CURRENT_STATE, fwsts1) ==
+	       HECI1_FWSTS1_PROXY_STATE_NORMAL;
+}
+
+static int proxy_send_to_csme(struct xe_gsc *gsc, u32 size)
+{
+	struct xe_gt *gt = gsc_to_gt(gsc);
+	struct i915_gsc_proxy_component *comp = gsc->proxy.component;
+	int ret;
+
+	ret = comp->ops->send(comp->mei_dev, gsc->proxy.to_csme, size);
+	if (ret < 0) {
+		xe_gt_err(gt, "Failed to send CSME proxy message\n");
+		return ret;
+	}
+
+	ret = comp->ops->recv(comp->mei_dev, gsc->proxy.from_csme, GSC_PROXY_BUFFER_SIZE);
+	if (ret < 0) {
+		xe_gt_err(gt, "Failed to receive CSME proxy message\n");
+		return ret;
+	}
+
+	return ret;
+}
+
+static int proxy_send_to_gsc(struct xe_gsc *gsc, u32 size)
+{
+	struct xe_gt *gt = gsc_to_gt(gsc);
+	u64 addr_in = xe_bo_ggtt_addr(gsc->proxy.bo);
+	u64 addr_out = addr_in + GSC_PROXY_BUFFER_SIZE;
+	int err;
+
+	/* the message must contain at least the gsc and proxy headers */
+	if (size > GSC_PROXY_BUFFER_SIZE) {
+		xe_gt_err(gt, "Invalid GSC proxy message size: %u\n", size);
+		return -EINVAL;
+	}
+
+	err = xe_gsc_pkt_submit_kernel(gsc, addr_in, size,
+				       addr_out, GSC_PROXY_BUFFER_SIZE);
+	if (err) {
+		xe_gt_err(gt, "Failed to submit gsc proxy rq (%pe)\n", ERR_PTR(err));
+		return err;
+	}
+
+	return 0;
+}
+
+static int validate_proxy_header(struct xe_gsc_proxy_header *header,
+				 u32 source, u32 dest, u32 max_size)
+{
+	u32 type = FIELD_GET(GSC_PROXY_TYPE, header->hdr);
+	u32 length = FIELD_GET(GSC_PROXY_PAYLOAD_LENGTH, header->hdr);
+
+	if (header->destination != dest || header->source != source)
+		return -ENOEXEC;
+
+	if (length + PROXY_HDR_SIZE > max_size)
+		return -E2BIG;
+
+	switch (type) {
+	case GSC_PROXY_MSG_TYPE_PROXY_PAYLOAD:
+		if (length > 0)
+			break;
+		fallthrough;
+	case GSC_PROXY_MSG_TYPE_PROXY_INVALID:
+		return -EIO;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+#define proxy_header_wr(xe_, map_, offset_, field_, val_) \
+	xe_map_wr_field(xe_, map_, offset_, struct xe_gsc_proxy_header, field_, val_)
+
+#define proxy_header_rd(xe_, map_, offset_, field_) \
+	xe_map_rd_field(xe_, map_, offset_, struct xe_gsc_proxy_header, field_)
+
+static u32 emit_proxy_header(struct xe_device *xe, struct iosys_map *map, u32 offset)
+{
+	xe_map_memset(xe, map, offset, 0, PROXY_HDR_SIZE);
+
+	proxy_header_wr(xe, map, offset, hdr,
+			FIELD_PREP(GSC_PROXY_TYPE, GSC_PROXY_MSG_TYPE_PROXY_QUERY) |
+			FIELD_PREP(GSC_PROXY_PAYLOAD_LENGTH, 0));
+
+	proxy_header_wr(xe, map, offset, source, GSC_PROXY_ADDRESSING_KMD);
+	proxy_header_wr(xe, map, offset, destination, GSC_PROXY_ADDRESSING_GSC);
+	proxy_header_wr(xe, map, offset, status, 0);
+
+	return offset + PROXY_HDR_SIZE;
+}
+
+static int proxy_query(struct xe_gsc *gsc)
+{
+	struct xe_gt *gt = gsc_to_gt(gsc);
+	struct xe_device *xe = gt_to_xe(gt);
+	struct xe_gsc_proxy_header *to_csme_hdr = gsc->proxy.to_csme;
+	void *to_csme_payload = gsc->proxy.to_csme + PROXY_HDR_SIZE;
+	u32 wr_offset;
+	u32 reply_offset;
+	u32 size;
+	int ret;
+
+	wr_offset = xe_gsc_emit_header(xe, &gsc->proxy.to_gsc, 0,
+				       HECI_MEADDRESS_PROXY, 0, PROXY_HDR_SIZE);
+	wr_offset = emit_proxy_header(xe, &gsc->proxy.to_gsc, wr_offset);
+
+	size = wr_offset;
+
+	while (1) {
+		/*
+		 * Poison the GSC response header space to make sure we don't
+		 * read a stale reply.
+		 */
+		xe_gsc_poison_header(xe, &gsc->proxy.from_gsc, 0);
+
+		/* send proxy message to GSC */
+		ret = proxy_send_to_gsc(gsc, size);
+		if (ret)
+			goto proxy_error;
+
+		/* check the reply from GSC */
+		ret = xe_gsc_read_out_header(xe, &gsc->proxy.from_gsc, 0,
+					     PROXY_HDR_SIZE, &reply_offset);
+		if (ret) {
+			xe_gt_err(gt, "Invalid gsc header in proxy reply (%pe)\n",
+				  ERR_PTR(ret));
+			goto proxy_error;
+		}
+
+		/* copy the proxy header reply from GSC */
+		xe_map_memcpy_from(xe, to_csme_hdr, &gsc->proxy.from_gsc,
+				   reply_offset, PROXY_HDR_SIZE);
+
+		/* stop if this was the last message */
+		if (FIELD_GET(GSC_PROXY_TYPE, to_csme_hdr->hdr) == GSC_PROXY_MSG_TYPE_PROXY_END)
+			break;
+
+		/* make sure the GSC-to-CSME proxy header is sane */
+		ret = validate_proxy_header(to_csme_hdr,
+					    GSC_PROXY_ADDRESSING_GSC,
+					    GSC_PROXY_ADDRESSING_CSME,
+					    GSC_PROXY_BUFFER_SIZE - reply_offset);
+		if (ret) {
+			xe_gt_err(gt, "invalid GSC to CSME proxy header! (%pe)\n",
+				  ERR_PTR(ret));
+			goto proxy_error;
+		}
+
+		/* copy the rest of the message */
+		size = FIELD_GET(GSC_PROXY_PAYLOAD_LENGTH, to_csme_hdr->hdr);
+		xe_map_memcpy_from(xe, to_csme_payload, &gsc->proxy.from_gsc,
+				   reply_offset + PROXY_HDR_SIZE, size);
+
+		/* send the GSC message to the CSME */
+		ret = proxy_send_to_csme(gsc, size + PROXY_HDR_SIZE);
+		if (ret < 0)
+			goto proxy_error;
+
+		/* reply size from CSME, including the proxy header */
+		size = ret;
+		if (size < PROXY_HDR_SIZE) {
+			xe_gt_err(gt, "CSME to GSC proxy msg too small: 0x%x\n", size);
+			ret = -EPROTO;
+			goto proxy_error;
+		}
+
+		/* make sure the CSME-to-GSC proxy header is sane */
+		ret = validate_proxy_header(gsc->proxy.from_csme,
+					    GSC_PROXY_ADDRESSING_CSME,
+					    GSC_PROXY_ADDRESSING_GSC,
+					    GSC_PROXY_BUFFER_SIZE - reply_offset);
+		if (ret) {
+			xe_gt_err(gt, "invalid CSME to GSC proxy header! %d\n", ret);
+			goto proxy_error;
+		}
+
+		/* Emit a new header for sending the reply to the GSC */
+		wr_offset = xe_gsc_emit_header(xe, &gsc->proxy.to_gsc, 0,
+					       HECI_MEADDRESS_PROXY, 0, size);
+
+		/* copy the CSME reply and update the total msg size to include the GSC header */
+		xe_map_memcpy_to(xe, &gsc->proxy.to_gsc, wr_offset, gsc->proxy.from_csme, size);
+
+		size += wr_offset;
+	}
+
+proxy_error:
+	return ret < 0 ? ret : 0;
+}
+
+static int gsc_proxy_request_handler(struct xe_gsc *gsc)
+{
+	struct xe_gt *gt = gsc_to_gt(gsc);
+	int slept;
+	int err;
+
+	if (!gsc->proxy.component_added)
+		return -ENODEV;
+
+	/* when GSC is loaded, we can queue this before the component is bound */
+	for (slept = 0; slept < GSC_PROXY_INIT_TIMEOUT_MS; slept += 100) {
+		if (gsc->proxy.component)
+			break;
+
+		msleep(100);
+	}
+
+	mutex_lock(&gsc->proxy.mutex);
+	if (!gsc->proxy.component) {
+		xe_gt_err(gt, "GSC proxy component not bound!\n");
+		err = -EIO;
+	} else {
+		err = proxy_query(gsc);
+	}
+	mutex_unlock(&gsc->proxy.mutex);
+	return err;
+}
+
+static int xe_gsc_proxy_component_bind(struct device *xe_kdev,
+				       struct device *mei_kdev, void *data)
+{
+	struct xe_device *xe = kdev_to_xe(xe_kdev);
+	struct xe_gt *gt = xe->tiles[0].media_gt;
+	struct xe_gsc *gsc = &gt->uc.gsc;
+
+	mutex_lock(&gsc->proxy.mutex);
+	gsc->proxy.component = data;
+	gsc->proxy.component->mei_dev = mei_kdev;
+	mutex_unlock(&gsc->proxy.mutex);
+
+	return 0;
+}
+
+static void xe_gsc_proxy_component_unbind(struct device *xe_kdev,
+					  struct device *mei_kdev, void *data)
+{
+	struct xe_device *xe = kdev_to_xe(xe_kdev);
+	struct xe_gt *gt = xe->tiles[0].media_gt;
+	struct xe_gsc *gsc = &gt->uc.gsc;
+
+	xe_gsc_wait_for_worker_completion(gsc);
+
+	mutex_lock(&gsc->proxy.mutex);
+	gsc->proxy.component = NULL;
+	mutex_unlock(&gsc->proxy.mutex);
+}
+
+static const struct component_ops xe_gsc_proxy_component_ops = {
+	.bind   = xe_gsc_proxy_component_bind,
+	.unbind = xe_gsc_proxy_component_unbind,
+};
+
+static void proxy_channel_free(struct drm_device *drm, void *arg)
+{
+	struct xe_gsc *gsc = arg;
+
+	if (!gsc->proxy.bo)
+		return;
+
+	if (gsc->proxy.to_csme) {
+		kfree(gsc->proxy.to_csme);
+		gsc->proxy.to_csme = NULL;
+		gsc->proxy.from_csme = NULL;
+	}
+
+	if (gsc->proxy.bo) {
+		iosys_map_clear(&gsc->proxy.to_gsc);
+		iosys_map_clear(&gsc->proxy.from_gsc);
+		xe_bo_unpin_map_no_vm(gsc->proxy.bo);
+		gsc->proxy.bo = NULL;
+	}
+}
+
+static int proxy_channel_alloc(struct xe_gsc *gsc)
+{
+	struct xe_gt *gt = gsc_to_gt(gsc);
+	struct xe_tile *tile = gt_to_tile(gt);
+	struct xe_device *xe = gt_to_xe(gt);
+	struct xe_bo *bo;
+	void *csme;
+	int err;
+
+	csme = kzalloc(GSC_PROXY_CHANNEL_SIZE, GFP_KERNEL);
+	if (!csme)
+		return -ENOMEM;
+
+	bo = xe_bo_create_pin_map(xe, tile, NULL, GSC_PROXY_CHANNEL_SIZE,
+				  ttm_bo_type_kernel,
+				  XE_BO_CREATE_SYSTEM_BIT |
+				  XE_BO_CREATE_GGTT_BIT);
+	if (IS_ERR(bo)) {
+		kfree(csme);
+		return PTR_ERR(bo);
+	}
+
+	gsc->proxy.bo = bo;
+	gsc->proxy.to_gsc = IOSYS_MAP_INIT_OFFSET(&bo->vmap, 0);
+	gsc->proxy.from_gsc = IOSYS_MAP_INIT_OFFSET(&bo->vmap, GSC_PROXY_BUFFER_SIZE);
+	gsc->proxy.to_csme = csme;
+	gsc->proxy.from_csme = csme + GSC_PROXY_BUFFER_SIZE;
+
+	err = drmm_add_action_or_reset(&xe->drm, proxy_channel_free, gsc);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+/**
+ * xe_gsc_proxy_init() - init objects and MEI component required by GSC proxy
+ * @gsc: the GSC uC
+ *
+ * Return: 0 if the initialization was successful, a negative errno otherwise.
+ */
+int xe_gsc_proxy_init(struct xe_gsc *gsc)
+{
+	int err;
+	struct xe_gt *gt = gsc_to_gt(gsc);
+	struct xe_tile *tile = gt_to_tile(gt);
+	struct xe_device *xe = tile_to_xe(tile);
+
+	mutex_init(&gsc->proxy.mutex);
+
+	if (!IS_ENABLED(CONFIG_INTEL_MEI_GSC_PROXY)) {
+		xe_gt_info(gt, "can't init GSC proxy due to missing mei component\n");
+		return -ENODEV;
+	}
+
+	/* no multi-tile devices with this feature yet */
+	if (tile->id > 0) {
+		xe_gt_err(gt, "unexpected GSC proxy init on tile %u\n", tile->id);
+		return -EINVAL;
+	}
+
+	err = proxy_channel_alloc(gsc);
+	if (err)
+		return err;
+
+	err = component_add_typed(xe->drm.dev, &xe_gsc_proxy_component_ops,
+				  I915_COMPONENT_GSC_PROXY);
+	if (err < 0) {
+		xe_gt_err(gt, "Failed to add GSC_PROXY component (%pe)\n", ERR_PTR(err));
+		return err;
+	}
+
+	gsc->proxy.component_added = true;
+
+	/* the component must be removed before unload, so can't use drmm for cleanup */
+
+	return 0;
+}
+
+/**
+ * xe_gsc_proxy_remove() - remove the GSC proxy MEI component
+ * @gsc: the GSC uC
+ */
+void xe_gsc_proxy_remove(struct xe_gsc *gsc)
+{
+	struct xe_gt *gt = gsc_to_gt(gsc);
+	struct xe_device *xe = gt_to_xe(gt);
+
+	if (gsc->proxy.component_added) {
+		component_del(xe->drm.dev, &xe_gsc_proxy_component_ops);
+		gsc->proxy.component_added = false;
+	}
+}
+
+/**
+ * xe_gsc_proxy_start() - start the proxy by submitting the first request
+ * @gsc: the GSC uC
+ *
+ * Return: 0 if the proxy are now enabled, a negative errno otherwise.
+ */
+int xe_gsc_proxy_start(struct xe_gsc *gsc)
+{
+	int err;
+
+	/*
+	 * The handling of the first proxy request must be manually triggered to
+	 * notify the GSC that we're ready to support the proxy flow.
+	 */
+	err = gsc_proxy_request_handler(gsc);
+	if (err)
+		return err;
+
+	if (!gsc_proxy_init_done(gsc)) {
+		xe_gt_err(gsc_to_gt(gsc), "GSC FW reports proxy init not completed\n");
+		return -EIO;
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.h b/drivers/gpu/drm/xe/xe_gsc_proxy.h
new file mode 100644
index 00000000000000..5dc6321efbafb8
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_gsc_proxy.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_GSC_PROXY_H_
+#define _XE_GSC_PROXY_H_
+
+#include <linux/types.h>
+
+struct xe_gsc;
+
+int xe_gsc_proxy_init(struct xe_gsc *gsc);
+void xe_gsc_proxy_remove(struct xe_gsc *gsc);
+int xe_gsc_proxy_start(struct xe_gsc *gsc);
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_gsc_submit.c b/drivers/gpu/drm/xe/xe_gsc_submit.c
index 8c5381e5913fc6..9ecc1ead684470 100644
--- a/drivers/gpu/drm/xe/xe_gsc_submit.c
+++ b/drivers/gpu/drm/xe/xe_gsc_submit.c
@@ -5,6 +5,8 @@
 
 #include "xe_gsc_submit.h"
 
+#include <linux/poison.h>
+
 #include "abi/gsc_command_header_abi.h"
 #include "xe_bb.h"
 #include "xe_exec_queue.h"
@@ -68,6 +70,17 @@ u32 xe_gsc_emit_header(struct xe_device *xe, struct iosys_map *map, u32 offset,
 	return offset + GSC_HDR_SIZE;
 };
 
+/**
+ * xe_gsc_poison_header - poison the MTL GSC header in memory
+ * @xe: the Xe device
+ * @map: the iosys map to write to
+ * @offset: offset from the start of the map at which the header resides
+ */
+void xe_gsc_poison_header(struct xe_device *xe, struct iosys_map *map, u32 offset)
+{
+	xe_map_memset(xe, map, offset, POISON_FREE, GSC_HDR_SIZE);
+};
+
 /**
  * xe_gsc_check_and_update_pending - check the pending bit and update the input
  * header with the retry handle from the output header
diff --git a/drivers/gpu/drm/xe/xe_gsc_submit.h b/drivers/gpu/drm/xe/xe_gsc_submit.h
index 0801da5d446a9d..1939855031a625 100644
--- a/drivers/gpu/drm/xe/xe_gsc_submit.h
+++ b/drivers/gpu/drm/xe/xe_gsc_submit.h
@@ -14,6 +14,7 @@ struct xe_gsc;
 
 u32 xe_gsc_emit_header(struct xe_device *xe, struct iosys_map *map, u32 offset,
 		       u8 heci_client_id, u64 host_session_id, u32 payload_size);
+void xe_gsc_poison_header(struct xe_device *xe, struct iosys_map *map, u32 offset);
 
 bool xe_gsc_check_and_update_pending(struct xe_device *xe,
 				     struct iosys_map *in, u32 offset_in,
diff --git a/drivers/gpu/drm/xe/xe_gsc_types.h b/drivers/gpu/drm/xe/xe_gsc_types.h
index 57fefd66a7ea2f..07bfa87ec0026b 100644
--- a/drivers/gpu/drm/xe/xe_gsc_types.h
+++ b/drivers/gpu/drm/xe/xe_gsc_types.h
@@ -6,12 +6,16 @@
 #ifndef _XE_GSC_TYPES_H_
 #define _XE_GSC_TYPES_H_
 
+#include <linux/iosys-map.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
 #include <linux/workqueue.h>
 
 #include "xe_uc_fw_types.h"
 
 struct xe_bo;
 struct xe_exec_queue;
+struct i915_gsc_proxy_component;
 
 /**
  * struct xe_gsc - GSC
@@ -34,6 +38,26 @@ struct xe_gsc {
 
 	/** @work: delayed load and proxy handling work */
 	struct work_struct work;
+
+	/** @proxy: sub-structure containing the SW proxy-related variables */
+	struct {
+		/** @component: struct for communication with mei component */
+		struct i915_gsc_proxy_component *component;
+		/** @mutex: protects the component binding and usage */
+		struct mutex mutex;
+		/** @component_added: whether the component has been added */
+		bool component_added;
+		/** @bo: object to store message to and from the GSC */
+		struct xe_bo *bo;
+		/** @to_gsc: map of the memory used to send messages to the GSC */
+		struct iosys_map to_gsc;
+		/** @from_gsc: map of the memory used to recv messages from the GSC */
+		struct iosys_map from_gsc;
+		/** @to_csme: pointer to the memory used to send messages to CSME */
+		void *to_csme;
+		/** @from_csme: pointer to the memory used to recv messages from CSME */
+		void *from_csme;
+	} proxy;
 };
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 0f2258dc4a0053..1fe4d54409d344 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -78,6 +78,19 @@ void xe_gt_sanitize(struct xe_gt *gt)
 	gt->uc.guc.submission_state.enabled = false;
 }
 
+/**
+ * xe_gt_remove() - Clean up the GT structures before driver removal
+ * @gt: the GT object
+ *
+ * This function should only act on objects/structures that must be cleaned
+ * before the driver removal callback is complete and therefore can't be
+ * deferred to a drmm action.
+ */
+void xe_gt_remove(struct xe_gt *gt)
+{
+	xe_uc_remove(&gt->uc);
+}
+
 static void gt_fini(struct drm_device *drm, void *arg)
 {
 	struct xe_gt *gt = arg;
diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
index 4486e083f5eff3..c1675bd44cf6d6 100644
--- a/drivers/gpu/drm/xe/xe_gt.h
+++ b/drivers/gpu/drm/xe/xe_gt.h
@@ -41,6 +41,7 @@ int xe_gt_suspend(struct xe_gt *gt);
 int xe_gt_resume(struct xe_gt *gt);
 void xe_gt_reset_async(struct xe_gt *gt);
 void xe_gt_sanitize(struct xe_gt *gt);
+void xe_gt_remove(struct xe_gt *gt);
 
 /**
  * xe_gt_any_hw_engine_by_reset_domain - scan the list of engines and return the
diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
index 4408ea1751e7ee..8f37a809525fb3 100644
--- a/drivers/gpu/drm/xe/xe_uc.c
+++ b/drivers/gpu/drm/xe/xe_uc.c
@@ -7,6 +7,7 @@
 
 #include "xe_device.h"
 #include "xe_gsc.h"
+#include "xe_gsc_proxy.h"
 #include "xe_gt.h"
 #include "xe_guc.h"
 #include "xe_guc_db_mgr.h"
@@ -261,3 +262,16 @@ int xe_uc_suspend(struct xe_uc *uc)
 
 	return xe_guc_suspend(&uc->guc);
 }
+
+/**
+ * xe_uc_remove() - Clean up the UC structures before driver removal
+ * @uc: the UC object
+ *
+ * This function should only act on objects/structures that must be cleaned
+ * before the driver removal callback is complete and therefore can't be
+ * deferred to a drmm action.
+ */
+void xe_uc_remove(struct xe_uc *uc)
+{
+	xe_gsc_remove(&uc->gsc);
+}
diff --git a/drivers/gpu/drm/xe/xe_uc.h b/drivers/gpu/drm/xe/xe_uc.h
index 5d5110c0c834b9..e4d4e3c99f0e7a 100644
--- a/drivers/gpu/drm/xe/xe_uc.h
+++ b/drivers/gpu/drm/xe/xe_uc.h
@@ -20,5 +20,6 @@ int xe_uc_stop(struct xe_uc *uc);
 int xe_uc_start(struct xe_uc *uc);
 int xe_uc_suspend(struct xe_uc *uc);
 int xe_uc_sanitize_reset(struct xe_uc *uc);
+void xe_uc_remove(struct xe_uc *uc);
 
 #endif

From eb08104f90fc474054211244d668d3fe1d84bccb Mon Sep 17 00:00:00 2001
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Date: Wed, 17 Jan 2024 10:26:20 -0800
Subject: [PATCH 093/200] drm/xe/gsc: add support for GSC proxy interrupt

The GSC notifies us of a proxy request via the HECI2 interrupt. The
interrupt must be enabled both in the HECI layer and in our usual gt irq
programming; for the latter, the interrupt is enabled via the same enable
register as the GSC CS, but it does have its own mask register. When the
interrupt is received, we also need to de-assert it in both layers.

The handling of the proxy request is deferred to the same worker that we
use for GSC load. New flags have been added to distinguish between the
init case and the proxy interrupt.

v2: rename irq define, fix include ordering (Alan)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117182621.2653049-3-daniele.ceraolospurio@intel.com
---
 drivers/gpu/drm/xe/regs/xe_gt_regs.h |  2 +
 drivers/gpu/drm/xe/xe_gsc.c          | 26 +++++++--
 drivers/gpu/drm/xe/xe_gsc_proxy.c    | 81 +++++++++++++++++++++++++---
 drivers/gpu/drm/xe/xe_gsc_proxy.h    |  3 ++
 drivers/gpu/drm/xe/xe_gsc_types.h    |  9 ++++
 drivers/gpu/drm/xe/xe_irq.c          | 44 +++++++++++----
 6 files changed, 143 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index 4017319c63000f..0d4bfc35ff37ef 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -459,6 +459,7 @@
 #define   INTR_ENGINE_CLASS(x)			REG_FIELD_GET(GENMASK(18, 16), x)
 #define   INTR_ENGINE_INTR(x)			REG_FIELD_GET(GENMASK(15, 0), x)
 #define   OTHER_GUC_INSTANCE			0
+#define   OTHER_GSC_HECI2_INSTANCE		3
 #define   OTHER_GSC_INSTANCE			6
 
 #define IIR_REG_SELECTOR(x)			XE_REG(0x190070 + ((x) * 4))
@@ -467,6 +468,7 @@
 #define VCS0_VCS1_INTR_MASK			XE_REG(0x1900a8)
 #define VCS2_VCS3_INTR_MASK			XE_REG(0x1900ac)
 #define VECS0_VECS1_INTR_MASK			XE_REG(0x1900d0)
+#define HECI2_RSVD_INTR_MASK			XE_REG(0x1900e4)
 #define GUC_SG_INTR_MASK			XE_REG(0x1900e8)
 #define GPM_WGBOXPERF_INTR_MASK			XE_REG(0x1900ec)
 #define GUNIT_GSC_INTR_MASK			XE_REG(0x1900f4)
diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index 85074b7304021a..0b90fd9ef63a44 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -276,16 +276,27 @@ static void gsc_work(struct work_struct *work)
 	struct xe_gsc *gsc = container_of(work, typeof(*gsc), work);
 	struct xe_gt *gt = gsc_to_gt(gsc);
 	struct xe_device *xe = gt_to_xe(gt);
+	u32 actions;
 	int ret;
 
+	spin_lock_irq(&gsc->lock);
+	actions = gsc->work_actions;
+	gsc->work_actions = 0;
+	spin_unlock_irq(&gsc->lock);
+
 	xe_device_mem_access_get(xe);
 	xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC);
 
-	ret = gsc_upload_and_init(gsc);
-	if (ret && ret != -EEXIST)
-		xe_uc_fw_change_status(&gsc->fw, XE_UC_FIRMWARE_LOAD_FAIL);
-	else
-		xe_uc_fw_change_status(&gsc->fw, XE_UC_FIRMWARE_RUNNING);
+	if (actions & GSC_ACTION_FW_LOAD) {
+		ret = gsc_upload_and_init(gsc);
+		if (ret && ret != -EEXIST)
+			xe_uc_fw_change_status(&gsc->fw, XE_UC_FIRMWARE_LOAD_FAIL);
+		else
+			xe_uc_fw_change_status(&gsc->fw, XE_UC_FIRMWARE_RUNNING);
+	}
+
+	if (actions & GSC_ACTION_SW_PROXY)
+		xe_gsc_proxy_request_handler(gsc);
 
 	xe_force_wake_put(gt_to_fw(gt), XE_FW_GSC);
 	xe_device_mem_access_put(xe);
@@ -299,6 +310,7 @@ int xe_gsc_init(struct xe_gsc *gsc)
 
 	gsc->fw.type = XE_UC_FW_TYPE_GSC;
 	INIT_WORK(&gsc->work, gsc_work);
+	spin_lock_init(&gsc->lock);
 
 	/* The GSC uC is only available on the media GT */
 	if (tile->media_gt && (gt != tile->media_gt)) {
@@ -422,6 +434,10 @@ void xe_gsc_load_start(struct xe_gsc *gsc)
 		return;
 	}
 
+	spin_lock_irq(&gsc->lock);
+	gsc->work_actions |= GSC_ACTION_FW_LOAD;
+	spin_unlock_irq(&gsc->lock);
+
 	queue_work(gsc->wq, &gsc->work);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.c b/drivers/gpu/drm/xe/xe_gsc_proxy.c
index 86353c5a81cd14..309ef80e3b95e8 100644
--- a/drivers/gpu/drm/xe/xe_gsc_proxy.c
+++ b/drivers/gpu/drm/xe/xe_gsc_proxy.c
@@ -21,6 +21,7 @@
 #include "xe_gt_printk.h"
 #include "xe_map.h"
 #include "xe_mmio.h"
+#include "xe_pm.h"
 
 /*
  * GSC proxy:
@@ -74,6 +75,30 @@ static bool gsc_proxy_init_done(struct xe_gsc *gsc)
 	       HECI1_FWSTS1_PROXY_STATE_NORMAL;
 }
 
+static void __gsc_proxy_irq_rmw(struct xe_gsc *gsc, u32 clr, u32 set)
+{
+	struct xe_gt *gt = gsc_to_gt(gsc);
+
+	/* make sure we never accidentally write the RST bit */
+	clr |= HECI_H_CSR_RST;
+
+	xe_mmio_rmw32(gt, HECI_H_CSR(MTL_GSC_HECI2_BASE), clr, set);
+}
+
+static void gsc_proxy_irq_clear(struct xe_gsc *gsc)
+{
+	/* The status bit is cleared by writing to it */
+	__gsc_proxy_irq_rmw(gsc, 0, HECI_H_CSR_IS);
+}
+
+static void gsc_proxy_irq_toggle(struct xe_gsc *gsc, bool enabled)
+{
+	u32 set = enabled ? HECI_H_CSR_IE : 0;
+	u32 clr = enabled ? 0 : HECI_H_CSR_IE;
+
+	__gsc_proxy_irq_rmw(gsc, clr, set);
+}
+
 static int proxy_send_to_csme(struct xe_gsc *gsc, u32 size)
 {
 	struct xe_gt *gt = gsc_to_gt(gsc);
@@ -264,7 +289,7 @@ static int proxy_query(struct xe_gsc *gsc)
 	return ret < 0 ? ret : 0;
 }
 
-static int gsc_proxy_request_handler(struct xe_gsc *gsc)
+int xe_gsc_proxy_request_handler(struct xe_gsc *gsc)
 {
 	struct xe_gt *gt = gsc_to_gt(gsc);
 	int slept;
@@ -286,12 +311,36 @@ static int gsc_proxy_request_handler(struct xe_gsc *gsc)
 		xe_gt_err(gt, "GSC proxy component not bound!\n");
 		err = -EIO;
 	} else {
+		/*
+		 * clear the pending interrupt and allow new proxy requests to
+		 * be generated while we handle the current one
+		 */
+		gsc_proxy_irq_clear(gsc);
 		err = proxy_query(gsc);
 	}
 	mutex_unlock(&gsc->proxy.mutex);
 	return err;
 }
 
+void xe_gsc_proxy_irq_handler(struct xe_gsc *gsc, u32 iir)
+{
+	struct xe_gt *gt = gsc_to_gt(gsc);
+
+	if (unlikely(!iir))
+		return;
+
+	if (!gsc->proxy.component) {
+		xe_gt_err(gt, "GSC proxy irq received without the component being bound!\n");
+		return;
+	}
+
+	spin_lock(&gsc->lock);
+	gsc->work_actions |= GSC_ACTION_SW_PROXY;
+	spin_unlock(&gsc->lock);
+
+	queue_work(gsc->wq, &gsc->work);
+}
+
 static int xe_gsc_proxy_component_bind(struct device *xe_kdev,
 				       struct device *mei_kdev, void *data)
 {
@@ -434,11 +483,28 @@ void xe_gsc_proxy_remove(struct xe_gsc *gsc)
 {
 	struct xe_gt *gt = gsc_to_gt(gsc);
 	struct xe_device *xe = gt_to_xe(gt);
+	int err = 0;
 
-	if (gsc->proxy.component_added) {
-		component_del(xe->drm.dev, &xe_gsc_proxy_component_ops);
-		gsc->proxy.component_added = false;
-	}
+	if (!gsc->proxy.component_added)
+		return;
+
+	/* disable HECI2 IRQs */
+	xe_pm_runtime_get(xe);
+	err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC);
+	if (err)
+		xe_gt_err(gt, "failed to get forcewake to disable GSC interrupts\n");
+
+	/* try do disable irq even if forcewake failed */
+	gsc_proxy_irq_toggle(gsc, false);
+
+	if (!err)
+		xe_force_wake_put(gt_to_fw(gt), XE_FW_GSC);
+	xe_pm_runtime_put(xe);
+
+	xe_gsc_wait_for_worker_completion(gsc);
+
+	component_del(xe->drm.dev, &xe_gsc_proxy_component_ops);
+	gsc->proxy.component_added = false;
 }
 
 /**
@@ -451,11 +517,14 @@ int xe_gsc_proxy_start(struct xe_gsc *gsc)
 {
 	int err;
 
+	/* enable the proxy interrupt in the GSC shim layer */
+	gsc_proxy_irq_toggle(gsc, true);
+
 	/*
 	 * The handling of the first proxy request must be manually triggered to
 	 * notify the GSC that we're ready to support the proxy flow.
 	 */
-	err = gsc_proxy_request_handler(gsc);
+	err = xe_gsc_proxy_request_handler(gsc);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.h b/drivers/gpu/drm/xe/xe_gsc_proxy.h
index 5dc6321efbafb8..908f9441f09340 100644
--- a/drivers/gpu/drm/xe/xe_gsc_proxy.h
+++ b/drivers/gpu/drm/xe/xe_gsc_proxy.h
@@ -14,4 +14,7 @@ int xe_gsc_proxy_init(struct xe_gsc *gsc);
 void xe_gsc_proxy_remove(struct xe_gsc *gsc);
 int xe_gsc_proxy_start(struct xe_gsc *gsc);
 
+int xe_gsc_proxy_request_handler(struct xe_gsc *gsc);
+void xe_gsc_proxy_irq_handler(struct xe_gsc *gsc, u32 iir);
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_gsc_types.h b/drivers/gpu/drm/xe/xe_gsc_types.h
index 07bfa87ec0026b..060d0fe848ad7f 100644
--- a/drivers/gpu/drm/xe/xe_gsc_types.h
+++ b/drivers/gpu/drm/xe/xe_gsc_types.h
@@ -8,6 +8,7 @@
 
 #include <linux/iosys-map.h>
 #include <linux/mutex.h>
+#include <linux/spinlock.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
 
@@ -39,6 +40,14 @@ struct xe_gsc {
 	/** @work: delayed load and proxy handling work */
 	struct work_struct work;
 
+	/** @lock: protects access to the work_actions mask */
+	spinlock_t lock;
+
+	/** @work_actions: mask of actions to be performed in the work */
+	u32 work_actions;
+#define GSC_ACTION_FW_LOAD BIT(0)
+#define GSC_ACTION_SW_PROXY BIT(1)
+
 	/** @proxy: sub-structure containing the SW proxy-related variables */
 	struct {
 		/** @component: struct for communication with mei component */
diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
index 907c8ff0fa21fe..2fd8cc26fc9fe0 100644
--- a/drivers/gpu/drm/xe/xe_irq.c
+++ b/drivers/gpu/drm/xe/xe_irq.c
@@ -14,6 +14,7 @@
 #include "xe_device.h"
 #include "xe_display.h"
 #include "xe_drv.h"
+#include "xe_gsc_proxy.h"
 #include "xe_gt.h"
 #include "xe_guc.h"
 #include "xe_hw_engine.h"
@@ -131,6 +132,7 @@ void xe_irq_enable_hwe(struct xe_gt *gt)
 	u32 ccs_mask, bcs_mask;
 	u32 irqs, dmask, smask;
 	u32 gsc_mask = 0;
+	u32 heci_mask = 0;
 
 	if (xe_device_uc_enabled(xe)) {
 		irqs = GT_RENDER_USER_INTERRUPT |
@@ -180,14 +182,23 @@ void xe_irq_enable_hwe(struct xe_gt *gt)
 		xe_mmio_write32(gt, VCS2_VCS3_INTR_MASK, ~dmask);
 		xe_mmio_write32(gt, VECS0_VECS1_INTR_MASK, ~dmask);
 
-		if (xe_hw_engine_mask_per_class(gt, XE_ENGINE_CLASS_OTHER))
+		/*
+		 * the heci2 interrupt is enabled via the same register as the
+		 * GSCCS interrupts, but it has its own mask register.
+		 */
+		if (xe_hw_engine_mask_per_class(gt, XE_ENGINE_CLASS_OTHER)) {
 			gsc_mask = irqs;
-		else if (HAS_HECI_GSCFI(xe))
+			heci_mask = GSC_IRQ_INTF(1);
+		} else if (HAS_HECI_GSCFI(xe)) {
 			gsc_mask = GSC_IRQ_INTF(1);
+		}
+
 		if (gsc_mask) {
-			xe_mmio_write32(gt, GUNIT_GSC_INTR_ENABLE, gsc_mask);
+			xe_mmio_write32(gt, GUNIT_GSC_INTR_ENABLE, gsc_mask | heci_mask);
 			xe_mmio_write32(gt, GUNIT_GSC_INTR_MASK, ~gsc_mask);
 		}
+		if (heci_mask)
+			xe_mmio_write32(gt, HECI2_RSVD_INTR_MASK, ~(heci_mask << 16));
 	}
 }
 
@@ -234,6 +245,8 @@ gt_other_irq_handler(struct xe_gt *gt, const u8 instance, const u16 iir)
 		return xe_guc_irq_handler(&gt->uc.guc, iir);
 	if (instance == OTHER_MEDIA_GUC_INSTANCE && xe_gt_is_media_type(gt))
 		return xe_guc_irq_handler(&gt->uc.guc, iir);
+	if (instance == OTHER_GSC_HECI2_INSTANCE && xe_gt_is_media_type(gt))
+		return xe_gsc_proxy_irq_handler(&gt->uc.gsc, iir);
 
 	if (instance != OTHER_GUC_INSTANCE &&
 	    instance != OTHER_MEDIA_GUC_INSTANCE) {
@@ -251,15 +264,23 @@ static struct xe_gt *pick_engine_gt(struct xe_tile *tile,
 	if (MEDIA_VER(xe) < 13)
 		return tile->primary_gt;
 
-	if (class == XE_ENGINE_CLASS_VIDEO_DECODE ||
-	    class == XE_ENGINE_CLASS_VIDEO_ENHANCE)
+	switch (class) {
+	case XE_ENGINE_CLASS_VIDEO_DECODE:
+	case XE_ENGINE_CLASS_VIDEO_ENHANCE:
 		return tile->media_gt;
-
-	if (class == XE_ENGINE_CLASS_OTHER &&
-	    (instance == OTHER_MEDIA_GUC_INSTANCE || instance == OTHER_GSC_INSTANCE))
-		return tile->media_gt;
-
-	return tile->primary_gt;
+	case XE_ENGINE_CLASS_OTHER:
+		switch (instance) {
+		case OTHER_MEDIA_GUC_INSTANCE:
+		case OTHER_GSC_INSTANCE:
+		case OTHER_GSC_HECI2_INSTANCE:
+			return tile->media_gt;
+		default:
+			break;
+		};
+		fallthrough;
+	default:
+		return tile->primary_gt;
+	}
 }
 
 static void gt_irq_handler(struct xe_tile *tile,
@@ -486,6 +507,7 @@ static void gt_irq_reset(struct xe_tile *tile)
 	    HAS_HECI_GSCFI(tile_to_xe(tile))) {
 		xe_mmio_write32(mmio, GUNIT_GSC_INTR_ENABLE, 0);
 		xe_mmio_write32(mmio, GUNIT_GSC_INTR_MASK, ~0);
+		xe_mmio_write32(mmio, HECI2_RSVD_INTR_MASK, ~0);
 	}
 
 	xe_mmio_write32(mmio, GPM_WGBOXPERF_INTR_ENABLE, 0);

From 43d48379c9399654059bd2af5898fc464641837c Mon Sep 17 00:00:00 2001
From: Fei Yang <fei.yang@intel.com>
Date: Tue, 16 Jan 2024 14:37:09 -0800
Subject: [PATCH 094/200] drm/xe: correct the calculation of remaining size

In function write_pgtable, the calculation of chunk in the do-while
loop is wrong, we should always compare against remaining size instead
of the total size update->qwords.

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240116223709.652585-2-fei.yang@intel.com
---
 drivers/gpu/drm/xe/xe_migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 44725f978f3e1e..d5392cbbdb4910 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1116,7 +1116,7 @@ static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs,
 	do {
 		u64 addr = ppgtt_ofs + ofs * 8;
 
-		chunk = min(update->qwords, MAX_PTE_PER_SDI);
+		chunk = min(size, MAX_PTE_PER_SDI);
 
 		/* Ensure populatefn can do memset64 by aligning bb->cs */
 		if (!(bb->len & 1))

From 8ea8c918e7dbd5a61f9e98b8624437f1e295804c Mon Sep 17 00:00:00 2001
From: Vinod Govindapillai <vinod.govindapillai@intel.com>
Date: Fri, 12 Jan 2024 11:28:03 +0200
Subject: [PATCH 095/200] drm/xe: Modify the cfb size to be page size aligned
 for FBC
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

drm_gem_private_object_init expect the object size be page size
aligned. The xe_bo create functions do not update the size for
any alignment requirements. So align cfb size to be page size
aligned in xe stolen memory handling.

Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240112092803.61664-2-vinod.govindapillai@intel.com
---
 drivers/gpu/drm/xe/compat-i915-headers/i915_gem_stolen.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/xe/compat-i915-headers/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/i915_gem_stolen.h
index 888e7a87a92570..bd233007c1b74a 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/i915_gem_stolen.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/i915_gem_stolen.h
@@ -19,6 +19,9 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
 	int err;
 	u32 flags = XE_BO_CREATE_PINNED_BIT | XE_BO_CREATE_STOLEN_BIT;
 
+	if (align)
+		size = ALIGN(size, align);
+
 	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
 				       NULL, size, start, end,
 				       ttm_bo_type_kernel, flags);

From 2fe36db5fd24c11071acca5d2994a647b3774347 Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula@intel.com>
Date: Wed, 17 Jan 2024 14:20:40 +0200
Subject: [PATCH 096/200] drm/xe: make xe_ttm_funcs const
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Place the function pointers in rodata. Also drop the extra declaration
while at it.

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117122044.1544174-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 2 +-
 drivers/gpu/drm/xe/xe_bo.h | 2 +-
 drivers/gpu/drm/xe/xe_vm.h | 2 --
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 26fe73f58d7207..686d716c558158 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1041,7 +1041,7 @@ static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
 	}
 }
 
-struct ttm_device_funcs xe_ttm_funcs = {
+const struct ttm_device_funcs xe_ttm_funcs = {
 	.ttm_tt_create = xe_ttm_tt_create,
 	.ttm_tt_populate = xe_ttm_tt_populate,
 	.ttm_tt_unpopulate = xe_ttm_tt_unpopulate,
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 350cc73cadf8d8..db4b2db6b07307 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -243,7 +243,7 @@ int xe_bo_evict(struct xe_bo *bo, bool force_alloc);
 int xe_bo_evict_pinned(struct xe_bo *bo);
 int xe_bo_restore_pinned(struct xe_bo *bo);
 
-extern struct ttm_device_funcs xe_ttm_funcs;
+extern const struct ttm_device_funcs xe_ttm_funcs;
 
 int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file);
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index cf2f96e8c1ab92..e9c907cbcd891e 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -199,8 +199,6 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
 
 int xe_vm_invalidate_vma(struct xe_vma *vma);
 
-extern struct ttm_device_funcs xe_ttm_funcs;
-
 static inline void xe_vm_queue_rebind_worker(struct xe_vm *vm)
 {
 	xe_assert(vm->xe, xe_vm_in_preempt_fence_mode(vm));

From c96baaa8399389312ba6b542e18cbff9c60e3001 Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula@intel.com>
Date: Wed, 17 Jan 2024 14:20:41 +0200
Subject: [PATCH 097/200] drm/xe: make heci_gsc_irq_chip const
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The irq_chip definition can be const, make it so.

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117122044.1544174-2-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
---
 drivers/gpu/drm/xe/xe_heci_gsc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_heci_gsc.c b/drivers/gpu/drm/xe/xe_heci_gsc.c
index bfdd33b9b23b40..1c9d38b6f5f181 100644
--- a/drivers/gpu/drm/xe/xe_heci_gsc.c
+++ b/drivers/gpu/drm/xe/xe_heci_gsc.c
@@ -29,7 +29,7 @@ static void heci_gsc_irq_unmask(struct irq_data *d)
 	/* generic irq handling */
 }
 
-static struct irq_chip heci_gsc_irq_chip = {
+static const struct irq_chip heci_gsc_irq_chip = {
 	.name = "gsc_irq_chip",
 	.irq_mask = heci_gsc_irq_mask,
 	.irq_unmask = heci_gsc_irq_unmask,

From 3cacf808c9d8e302ff7cd94579a5d3c540232f9e Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula@intel.com>
Date: Wed, 17 Jan 2024 14:20:42 +0200
Subject: [PATCH 098/200] drm/xe: make hwmon_info const
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Make hwmon_info a const array of const pointers, and let it be placed in
rodata.

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117122044.1544174-3-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
---
 drivers/gpu/drm/xe/xe_hwmon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
index 174ed2185481e3..89c6f7f84b5a52 100644
--- a/drivers/gpu/drm/xe/xe_hwmon.c
+++ b/drivers/gpu/drm/xe/xe_hwmon.c
@@ -402,7 +402,7 @@ static const struct attribute_group *hwmon_groups[] = {
 	NULL
 };
 
-static const struct hwmon_channel_info *hwmon_info[] = {
+static const struct hwmon_channel_info * const hwmon_info[] = {
 	HWMON_CHANNEL_INFO(power, HWMON_P_MAX | HWMON_P_RATED_MAX | HWMON_P_CRIT),
 	HWMON_CHANNEL_INFO(curr, HWMON_C_CRIT),
 	HWMON_CHANNEL_INFO(in, HWMON_I_INPUT),

From 480ea9e306c7fdbfb24b0af046c28c10b98a74ab Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula@intel.com>
Date: Wed, 17 Jan 2024 14:20:43 +0200
Subject: [PATCH 099/200] drm/xe: make gpuvm_ops const
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Place the function pointers in rodata.

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117122044.1544174-4-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a7e7a0b240991a..7ea0f151d223c1 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1054,7 +1054,7 @@ static struct drm_gpuva_op *xe_vm_op_alloc(void)
 
 static void xe_vm_free(struct drm_gpuvm *gpuvm);
 
-static struct drm_gpuvm_ops gpuvm_ops = {
+static const struct drm_gpuvm_ops gpuvm_ops = {
 	.op_alloc = xe_vm_op_alloc,
 	.vm_bo_validate = xe_gpuvm_validate,
 	.vm_free = xe_vm_free,

From 9c0155b652bfb5fb1230380754d70d4acaabf75a Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula@intel.com>
Date: Wed, 17 Jan 2024 14:20:44 +0200
Subject: [PATCH 100/200] drm/xe: constify engine class sysfs attributes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

All of the attributes, as well as the array of attributes, can be const
and placed in rodata.

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117122044.1544174-5-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
---
 drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c | 38 +++++++++----------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
index e49bc14f0ecf09..2345fb42fa39f0 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
@@ -73,7 +73,7 @@ static ssize_t job_timeout_max_show(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->sched_props.job_timeout_max);
 }
 
-static struct kobj_attribute job_timeout_max_attr =
+static const struct kobj_attribute job_timeout_max_attr =
 __ATTR(job_timeout_max, 0644, job_timeout_max_show, job_timeout_max_store);
 
 static ssize_t job_timeout_min_store(struct kobject *kobj,
@@ -109,7 +109,7 @@ static ssize_t job_timeout_min_show(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->sched_props.job_timeout_min);
 }
 
-static struct kobj_attribute job_timeout_min_attr =
+static const struct kobj_attribute job_timeout_min_attr =
 __ATTR(job_timeout_min, 0644, job_timeout_min_show, job_timeout_min_store);
 
 static ssize_t job_timeout_store(struct kobject *kobj,
@@ -142,7 +142,7 @@ static ssize_t job_timeout_show(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->sched_props.job_timeout_ms);
 }
 
-static struct kobj_attribute job_timeout_attr =
+static const struct kobj_attribute job_timeout_attr =
 __ATTR(job_timeout_ms, 0644, job_timeout_show, job_timeout_store);
 
 static ssize_t job_timeout_default(struct kobject *kobj,
@@ -153,7 +153,7 @@ static ssize_t job_timeout_default(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->defaults.job_timeout_ms);
 }
 
-static struct kobj_attribute job_timeout_def =
+static const struct kobj_attribute job_timeout_def =
 __ATTR(job_timeout_ms, 0444, job_timeout_default, NULL);
 
 static ssize_t job_timeout_min_default(struct kobject *kobj,
@@ -164,7 +164,7 @@ static ssize_t job_timeout_min_default(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->defaults.job_timeout_min);
 }
 
-static struct kobj_attribute job_timeout_min_def =
+static const struct kobj_attribute job_timeout_min_def =
 __ATTR(job_timeout_min, 0444, job_timeout_min_default, NULL);
 
 static ssize_t job_timeout_max_default(struct kobject *kobj,
@@ -175,7 +175,7 @@ static ssize_t job_timeout_max_default(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->defaults.job_timeout_max);
 }
 
-static struct kobj_attribute job_timeout_max_def =
+static const struct kobj_attribute job_timeout_max_def =
 __ATTR(job_timeout_max, 0444, job_timeout_max_default, NULL);
 
 static ssize_t timeslice_duration_store(struct kobject *kobj,
@@ -234,7 +234,7 @@ static ssize_t timeslice_duration_max_show(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->sched_props.timeslice_max);
 }
 
-static struct kobj_attribute timeslice_duration_max_attr =
+static const struct kobj_attribute timeslice_duration_max_attr =
 	__ATTR(timeslice_duration_max, 0644, timeslice_duration_max_show,
 	       timeslice_duration_max_store);
 
@@ -272,7 +272,7 @@ static ssize_t timeslice_duration_min_show(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->sched_props.timeslice_min);
 }
 
-static struct kobj_attribute timeslice_duration_min_attr =
+static const struct kobj_attribute timeslice_duration_min_attr =
 	__ATTR(timeslice_duration_min, 0644, timeslice_duration_min_show,
 	       timeslice_duration_min_store);
 
@@ -284,7 +284,7 @@ static ssize_t timeslice_duration_show(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->sched_props.timeslice_us);
 }
 
-static struct kobj_attribute timeslice_duration_attr =
+static const struct kobj_attribute timeslice_duration_attr =
 	__ATTR(timeslice_duration_us, 0644, timeslice_duration_show,
 	       timeslice_duration_store);
 
@@ -296,7 +296,7 @@ static ssize_t timeslice_default(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->defaults.timeslice_us);
 }
 
-static struct kobj_attribute timeslice_duration_def =
+static const struct kobj_attribute timeslice_duration_def =
 __ATTR(timeslice_duration_us, 0444, timeslice_default, NULL);
 
 static ssize_t timeslice_min_default(struct kobject *kobj,
@@ -307,7 +307,7 @@ static ssize_t timeslice_min_default(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->defaults.timeslice_min);
 }
 
-static struct kobj_attribute timeslice_duration_min_def =
+static const struct kobj_attribute timeslice_duration_min_def =
 __ATTR(timeslice_duration_min, 0444, timeslice_min_default, NULL);
 
 static ssize_t timeslice_max_default(struct kobject *kobj,
@@ -318,7 +318,7 @@ static ssize_t timeslice_max_default(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->defaults.timeslice_max);
 }
 
-static struct kobj_attribute timeslice_duration_max_def =
+static const struct kobj_attribute timeslice_duration_max_def =
 __ATTR(timeslice_duration_max, 0444, timeslice_max_default, NULL);
 
 static ssize_t preempt_timeout_store(struct kobject *kobj,
@@ -351,7 +351,7 @@ static ssize_t preempt_timeout_show(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->sched_props.preempt_timeout_us);
 }
 
-static struct kobj_attribute preempt_timeout_attr =
+static const struct kobj_attribute preempt_timeout_attr =
 __ATTR(preempt_timeout_us, 0644, preempt_timeout_show, preempt_timeout_store);
 
 static ssize_t preempt_timeout_default(struct kobject *kobj,
@@ -363,7 +363,7 @@ static ssize_t preempt_timeout_default(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->defaults.preempt_timeout_us);
 }
 
-static struct kobj_attribute preempt_timeout_def =
+static const struct kobj_attribute preempt_timeout_def =
 __ATTR(preempt_timeout_us, 0444, preempt_timeout_default, NULL);
 
 static ssize_t preempt_timeout_min_default(struct kobject *kobj,
@@ -375,7 +375,7 @@ static ssize_t preempt_timeout_min_default(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->defaults.preempt_timeout_min);
 }
 
-static struct kobj_attribute preempt_timeout_min_def =
+static const struct kobj_attribute preempt_timeout_min_def =
 __ATTR(preempt_timeout_min, 0444, preempt_timeout_min_default, NULL);
 
 static ssize_t preempt_timeout_max_default(struct kobject *kobj,
@@ -387,7 +387,7 @@ static ssize_t preempt_timeout_max_default(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->defaults.preempt_timeout_max);
 }
 
-static struct kobj_attribute preempt_timeout_max_def =
+static const struct kobj_attribute preempt_timeout_max_def =
 __ATTR(preempt_timeout_max, 0444, preempt_timeout_max_default, NULL);
 
 static ssize_t preempt_timeout_max_store(struct kobject *kobj,
@@ -423,7 +423,7 @@ static ssize_t preempt_timeout_max_show(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->sched_props.preempt_timeout_max);
 }
 
-static struct kobj_attribute preempt_timeout_max_attr =
+static const struct kobj_attribute preempt_timeout_max_attr =
 	__ATTR(preempt_timeout_max, 0644, preempt_timeout_max_show,
 	       preempt_timeout_max_store);
 
@@ -460,7 +460,7 @@ static ssize_t preempt_timeout_min_show(struct kobject *kobj,
 	return sprintf(buf, "%u\n", eclass->sched_props.preempt_timeout_min);
 }
 
-static struct kobj_attribute preempt_timeout_min_attr =
+static const struct kobj_attribute preempt_timeout_min_attr =
 	__ATTR(preempt_timeout_min, 0644, preempt_timeout_min_show,
 	       preempt_timeout_min_store);
 
@@ -477,7 +477,7 @@ static const struct attribute *defaults[] = {
 	NULL
 };
 
-static const struct attribute *files[] = {
+static const struct attribute * const files[] = {
 	&job_timeout_attr.attr,
 	&job_timeout_min_attr.attr,
 	&job_timeout_max_attr.attr,

From f87f5ea4395912d987cc9a84090801c2093e2051 Mon Sep 17 00:00:00 2001
From: Badal Nilawar <badal.nilawar@intel.com>
Date: Fri, 19 Jan 2024 18:40:25 +0530
Subject: [PATCH 101/200] drm/xe/xe_debugfs: Print skip_guc_pc in xe info

Print xe->info.skip_guc_pc in xe info

Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119131025.1872947-1-badal.nilawar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_debugfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index c56fd7d59f057f..01db5b27bec5d3 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -55,6 +55,7 @@ static int info(struct seq_file *m, void *data)
 	drm_printf(&p, "force_execlist %s\n", str_yes_no(xe->info.force_execlist));
 	drm_printf(&p, "has_flat_ccs %s\n", str_yes_no(xe->info.has_flat_ccs));
 	drm_printf(&p, "has_usm %s\n", str_yes_no(xe->info.has_usm));
+	drm_printf(&p, "skip_guc_pc %s\n", str_yes_no(xe->info.skip_guc_pc));
 	for_each_gt(gt, xe, id) {
 		drm_printf(&p, "gt%d force wake %d\n", id,
 			   xe_force_wake_ref(gt_to_fw(gt), XE_FW_GT));

From 06af1954aeccca78180190eda657af8f52d296c1 Mon Sep 17 00:00:00 2001
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date: Thu, 18 Jan 2024 16:48:56 -0500
Subject: [PATCH 102/200] drm/xe: Do not flood dmesg with guc log

This information is already present at
/sys/kernel/debug/dri/0/gt0/uc/guc_log if needed.

v2: add missing chunk
v3: remove spurious line

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240118214856.399952-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_guc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 2891b0cc4f7f9a..576ff2c1fbb965 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -456,7 +456,6 @@ static int guc_wait_ucode(struct xe_guc *guc)
 
 	if (ret) {
 		struct drm_device *drm = &xe->drm;
-		struct drm_printer p = drm_info_printer(drm->dev);
 
 		drm_info(drm, "GuC load failed: status = 0x%08X\n", status);
 		drm_info(drm, "GuC load failed: status: Reset = %d, BootROM = 0x%02X, UKernel = 0x%02X, MIA = 0x%02X, Auth = 0x%02X\n",
@@ -478,8 +477,6 @@ static int guc_wait_ucode(struct xe_guc *guc)
 						SOFT_SCRATCH(13)));
 			ret = -ENXIO;
 		}
-
-		xe_guc_log_print(&guc->log, &p);
 	} else {
 		drm_dbg(&xe->drm, "GuC successfully loaded");
 	}

From 7b5bdb447b14930b9ef3e39bd301937889c60c96 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 18 Jan 2024 16:16:08 -0800
Subject: [PATCH 103/200] drm/xe: Use _ULL for u64 division

Use DIV_ROUND_UP_ULL() so it also works on 32bit build.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119001612.2991381-2-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/xe_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 7a3c1c13dc3471..ab417f4f7d2a77 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -642,7 +642,7 @@ void xe_device_wmb(struct xe_device *xe)
 u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
 {
 	return xe_device_has_flat_ccs(xe) ?
-		DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE(xe)) : 0;
+		DIV_ROUND_UP_ULL(size, NUM_BYTES_PER_CCS_BYTE(xe)) : 0;
 }
 
 bool xe_device_mem_access_ongoing(struct xe_device *xe)

From 6d8d038364d8ec573e9dc0872e17bee1e5f12490 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 18 Jan 2024 16:16:09 -0800
Subject: [PATCH 104/200] drm/xe/mmio: Cast to u64 when printing

resource_size_t uses %pa format in printk since the size varies
depending on build options. However to keep the io_size/physical_size
addition in the same call we can't pass the address without adding yet
another variable in these function. Simply cast it to u64 and keep using
%llx.

Fixes: 286089ce6929 ("drm/xe: Improve vram info debug printing")
Cc: Oak Zeng <oak.zeng@intel.com>
Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119001612.2991381-3-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/xe_mmio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index c8c5d74b6e9041..5f6b53ea5528b2 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -272,8 +272,8 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 		drm_info(&xe->drm, "VRAM[%u, %u]: Actual physical size %pa, usable size exclude stolen %pa, CPU accessible size %pa\n", id,
 			 tile->id, &tile->mem.vram.actual_physical_size, &tile->mem.vram.usable_size, &tile->mem.vram.io_size);
 		drm_info(&xe->drm, "VRAM[%u, %u]: DPA range: [%pa-%llx], io range: [%pa-%llx]\n", id, tile->id,
-			 &tile->mem.vram.dpa_base, tile->mem.vram.dpa_base + tile->mem.vram.actual_physical_size,
-			 &tile->mem.vram.io_start, tile->mem.vram.io_start + tile->mem.vram.io_size);
+			 &tile->mem.vram.dpa_base, tile->mem.vram.dpa_base + (u64)tile->mem.vram.actual_physical_size,
+			 &tile->mem.vram.io_start, tile->mem.vram.io_start + (u64)tile->mem.vram.io_size);
 
 		/* calculate total size using tile size to get the correct HW sizing */
 		total_size += tile_size;

From 406663f777bee53e9ad93dc080c333d4655ab7de Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 18 Jan 2024 16:16:10 -0800
Subject: [PATCH 105/200] drm/xe/display: Avoid calling readq()

readq() is not available in 32bits and i915_gem_object_read_from_page()
is supposed to allow reading arbitrary sizes determined by the `size`
argument. Currently the only caller only passes a size == 8 so the
second problem is not that big. Migrate to calling
memcpy()/memcpy_fromio() to allow possible changes in the display side
and to fix the build on 32b architectures.

v2: Use memcpy/memcpy_fromio directly rather than using iosys-map with
    the same size == 8 bytes restriction (Matt Roper)

Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119001612.2991381-4-lucas.demarchi@intel.com
---
 .../drm/xe/compat-i915-headers/gem/i915_gem_object.h  | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
index 5f19550cc84536..68d9f6116bdfc3 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
@@ -35,12 +35,10 @@ static inline int i915_gem_object_read_from_page(struct xe_bo *bo,
 					  u32 ofs, u64 *ptr, u32 size)
 {
 	struct ttm_bo_kmap_obj map;
-	void *virtual;
+	void *src;
 	bool is_iomem;
 	int ret;
 
-	XE_WARN_ON(size != 8);
-
 	ret = xe_bo_lock(bo, true);
 	if (ret)
 		return ret;
@@ -50,11 +48,12 @@ static inline int i915_gem_object_read_from_page(struct xe_bo *bo,
 		goto out_unlock;
 
 	ofs &= ~PAGE_MASK;
-	virtual = ttm_kmap_obj_virtual(&map, &is_iomem);
+	src = ttm_kmap_obj_virtual(&map, &is_iomem);
+	src += ofs;
 	if (is_iomem)
-		*ptr = readq((void __iomem *)(virtual + ofs));
+		memcpy_fromio(ptr, (void __iomem *)src, size);
 	else
-		*ptr = *(u64 *)(virtual + ofs);
+		memcpy(ptr, src, size);
 
 	ttm_bo_kunmap(&map);
 out_unlock:

From 8d038f49c1f36772653a498d85024d97c4838e44 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 18 Jan 2024 16:16:11 -0800
Subject: [PATCH 106/200] drm/xe: Fix cast on trace variable

Cast the pointer to unsigned long and let it be implicitly extended to
u64. This fixes the build on 32bits arch.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119001612.2991381-5-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/xe_trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 95163c303f3e11..e4e7262191adcb 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -31,7 +31,7 @@ DECLARE_EVENT_CLASS(xe_gt_tlb_invalidation_fence,
 			     ),
 
 		    TP_fast_assign(
-			   __entry->fence = (u64)fence;
+			   __entry->fence = (unsigned long)fence;
 			   __entry->seqno = fence->seqno;
 			   ),
 

From 836e487149c27253aabf364a4978cfb8206bd14b Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 18 Jan 2024 16:16:12 -0800
Subject: [PATCH 107/200] drm/xe: Enable 32bits build

Now that all the issues with 32bits are fixed, enable it again.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119001612.2991381-6-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index 1b57ae38210d4c..1b0ef91a5d2c35 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config DRM_XE
 	tristate "Intel Xe Graphics"
-	depends on DRM && PCI && MMU && (m || (y && KUNIT=y)) && 64BIT
+	depends on DRM && PCI && MMU && (m || (y && KUNIT=y))
 	select INTERVAL_TREE
 	# we need shmfs for the swappable backing store, and in particular
 	# the shmem_readpage() which depends upon tmpfs

From 6a02867560f77328ae5637b70b06704b140aafa6 Mon Sep 17 00:00:00 2001
From: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Date: Fri, 19 Jan 2024 09:48:26 +0530
Subject: [PATCH 108/200] drm/xe/xe2: Use XE_CACHE_WB pat index
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The pat table entry associated with XE_CACHE_WB is coherent whereas
XE_CACHE_NONE is non coherent. Migration expects the coherency
with cpu therefore use the coherent entry XE_CACHE_WB for
buffers not supporting compression. For read/write to flat ccs region
the issue is not related to coherency with cpu. The hardware expects
the pat index associated with GPUVA for indirect access to be
compression enabled hence use XE_CACHE_NONE_COMPRESSION.

v2
- Fix the argument to emit_pte, pass the bool directly. (Thomas)

v3
- Rebase
- Update commit message (Matt)

v4
- Add a Fixes: tag. (Thomas)

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 65ef8dbad1db ("drm/xe/xe2: Update emit_pte to use compression enabled PAT index")
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119041826.1670496-1-himal.prasad.ghimiray@intel.com
---
 drivers/gpu/drm/xe/xe_migrate.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index d5392cbbdb4910..7abf15546ced00 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -481,7 +481,7 @@ static void emit_pte(struct xe_migrate *m,
 	/* Indirect access needs compression enabled uncached PAT index */
 	if (GRAPHICS_VERx100(xe) >= 2000)
 		pat_index = is_comp_pte ? xe->pat.idx[XE_CACHE_NONE_COMPRESSION] :
-					  xe->pat.idx[XE_CACHE_NONE];
+					  xe->pat.idx[XE_CACHE_WB];
 	else
 		pat_index = xe->pat.idx[XE_CACHE_WB];
 
@@ -769,14 +769,14 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m,
 		if (src_is_vram && xe_migrate_allow_identity(src_L0, &src_it))
 			xe_res_next(&src_it, src_L0);
 		else
-			emit_pte(m, bb, src_L0_pt, src_is_vram, true, &src_it, src_L0,
-				 src);
+			emit_pte(m, bb, src_L0_pt, src_is_vram, copy_system_ccs,
+				 &src_it, src_L0, src);
 
 		if (dst_is_vram && xe_migrate_allow_identity(src_L0, &dst_it))
 			xe_res_next(&dst_it, src_L0);
 		else
-			emit_pte(m, bb, dst_L0_pt, dst_is_vram, true, &dst_it, src_L0,
-				 dst);
+			emit_pte(m, bb, dst_L0_pt, dst_is_vram, copy_system_ccs,
+				 &dst_it, src_L0, dst);
 
 		if (copy_system_ccs)
 			emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
@@ -1018,8 +1018,8 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
 		if (clear_vram && xe_migrate_allow_identity(clear_L0, &src_it))
 			xe_res_next(&src_it, clear_L0);
 		else
-			emit_pte(m, bb, clear_L0_pt, clear_vram, true, &src_it, clear_L0,
-				 dst);
+			emit_pte(m, bb, clear_L0_pt, clear_vram, clear_system_ccs,
+				 &src_it, clear_L0, dst);
 
 		bb->cs[bb->len++] = MI_BATCH_BUFFER_END;
 		update_idx = bb->len;

From f6bf0424cadc27d7cf6a049d2db960e4b52fa513 Mon Sep 17 00:00:00 2001
From: Moti Haimovski <mhaimovski@habana.ai>
Date: Mon, 22 Jan 2024 12:24:24 +0200
Subject: [PATCH 109/200] drm/xe/vm: bugfix in xe_vm_create_ioctl

Fix xe_vm_create_ioctl routine not freeing the vm-id allocated to it
when the function fails.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122102424.4008095-1-mhaimovski@habana.ai
---
 drivers/gpu/drm/xe/xe_vm.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 7ea0f151d223c1..dc609e3cc0945d 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1855,10 +1855,8 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
 	mutex_lock(&xef->vm.lock);
 	err = xa_alloc(&xef->vm.xa, &id, vm, xa_limit_32b, GFP_KERNEL);
 	mutex_unlock(&xef->vm.lock);
-	if (err) {
-		xe_vm_close_and_put(vm);
-		return err;
-	}
+	if (err)
+		goto err_close_and_put;
 
 	if (xe->info.has_asid) {
 		mutex_lock(&xe->usm.lock);
@@ -1866,11 +1864,9 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
 				      XA_LIMIT(1, XE_MAX_ASID - 1),
 				      &xe->usm.next_asid, GFP_KERNEL);
 		mutex_unlock(&xe->usm.lock);
-		if (err < 0) {
-			xe_vm_close_and_put(vm);
-			return err;
-		}
-		err = 0;
+		if (err < 0)
+			goto err_free_id;
+
 		vm->usm.asid = asid;
 	}
 
@@ -1888,6 +1884,15 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
 #endif
 
 	return 0;
+
+err_free_id:
+	mutex_lock(&xef->vm.lock);
+	xa_erase(&xef->vm.xa, id);
+	mutex_unlock(&xef->vm.lock);
+err_close_and_put:
+	xe_vm_close_and_put(vm);
+
+	return err;
 }
 
 int xe_vm_destroy_ioctl(struct drm_device *dev, void *data,

From c885886bda2a2b345688f72f283c9c6655d73eae Mon Sep 17 00:00:00 2001
From: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Date: Wed, 17 Jan 2024 10:02:15 +0530
Subject: [PATCH 110/200] drm/xe: Fix typo in vram frequency sysfs
 documentation

Fix function naming and description for xe_vram_freq_sysfs_init
function.

v2: Add fixes tag (Riana)
    Fix review comments (Lucas)

Fixes: 4ae3aeab32d7 ("drm/xe: Add vram frequency sysfs attributes")
Signed-off-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117043215.2598677-1-sujaritha.sundaresan@intel.com
---
 drivers/gpu/drm/xe/xe_vram_freq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vram_freq.c b/drivers/gpu/drm/xe/xe_vram_freq.c
index 7331462933070b..079cc283a18667 100644
--- a/drivers/gpu/drm/xe/xe_vram_freq.c
+++ b/drivers/gpu/drm/xe/xe_vram_freq.c
@@ -95,13 +95,12 @@ static void vram_freq_sysfs_fini(struct drm_device *drm, void *arg)
 	kobject_put(kobj);
 }
 
-/*
- * xe_vram_freq_init - Initialize vram frequency component
+/**
+ * xe_vram_freq_sysfs_init - Initialize vram frequency sysfs component
  * @tile: Xe Tile object
  *
  * It needs to be initialized after the main tile component is ready
  */
-
 void xe_vram_freq_sysfs_init(struct xe_tile *tile)
 {
 	struct xe_device *xe = tile_to_xe(tile);

From 02c4e64a860a05ca3ffe4d416c1ae9003d3453ea Mon Sep 17 00:00:00 2001
From: Shekhar Chauhan <shekhar.chauhan@intel.com>
Date: Tue, 23 Jan 2024 10:35:52 +0530
Subject: [PATCH 111/200] drm/xe/xe2_lpg: Introduce performance guide changes

Add performance guide changes to Xe2_LPG.

BSpec: 72161
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123050552.2250699-2-shekhar.chauhan@intel.com
---
 drivers/gpu/drm/xe/regs/xe_gt_regs.h | 6 ++++++
 drivers/gpu/drm/xe/xe_tuning.c       | 9 ++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index 0d4bfc35ff37ef..cd27480f648626 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -144,6 +144,9 @@
 
 #define GSCPSMI_BASE				XE_REG(0x880c)
 
+#define CCCHKNREG1				XE_REG_MCR(0x8828)
+#define   ENCOMPPERFFIX				REG_BIT(18)
+
 /* Fuse readout registers for GT */
 #define XEHP_FUSE4				XE_REG(0x9114)
 #define   CFEG_WMTP_DISABLE			REG_BIT(20)
@@ -289,6 +292,9 @@
 #define XEHP_L3NODEARBCFG			XE_REG_MCR(0xb0b4)
 #define   XEHP_LNESPARE				REG_BIT(19)
 
+#define L3SQCREG3				XE_REG_MCR(0xb108)
+#define   COMPPWOVERFETCHEN			REG_BIT(28)
+
 #define XEHP_L3SQCREG5				XE_REG_MCR(0xb158)
 #define   L3_PWM_TIMER_INIT_VAL_MASK		REG_GENMASK(9, 0)
 
diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
index 53ccd338fd8c0c..5c83c75bc4978d 100644
--- a/drivers/gpu/drm/xe/xe_tuning.c
+++ b/drivers/gpu/drm/xe/xe_tuning.c
@@ -37,7 +37,14 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
 	  XE_RTP_ACTIONS(FIELD_SET(XE2LPM_L3SQCREG5, L3_PWM_TIMER_INIT_VAL_MASK,
 				   REG_FIELD_PREP(L3_PWM_TIMER_INIT_VAL_MASK, 0x7f)))
 	},
-
+	{ XE_RTP_NAME("Tuning: Compression Overfetch"),
+	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2004, XE_RTP_END_VERSION_UNDEFINED)),
+	  XE_RTP_ACTIONS(CLR(CCCHKNREG1, ENCOMPPERFFIX)),
+	},
+	{ XE_RTP_NAME("Tuning: Enable compressible partial write overfetch in L3"),
+	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2004, XE_RTP_END_VERSION_UNDEFINED)),
+	  XE_RTP_ACTIONS(SET(L3SQCREG3, COMPPWOVERFETCHEN))
+	},
 	{}
 };
 

From 6240c2c43fd062515dda68e60866b4851f32d631 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Tue, 23 Jan 2024 16:31:47 +0100
Subject: [PATCH 112/200] drm/xe: Document nested struct members according to
 guidelines
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Document nested struct members with full names as described in
Documentation/doc-guide/kernel-doc.rst.

For this documentation we allow a column width of 100 to make
it more readable.

This fixes warnings similar to:
drivers/gpu/drm/xe/xe_lrc_types.h:45: warning: Excess struct member 'size' description in 'xe_lrc'

v2:
- Only change the documentation, not the member.

v3:
- Fix the commit message wording.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123153147.27305-1-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_device_types.h     | 159 ++++++++++++-----------
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  36 ++---
 drivers/gpu/drm/xe/xe_gsc_types.h        |  16 +--
 drivers/gpu/drm/xe/xe_gt_types.h         |  99 +++++++-------
 drivers/gpu/drm/xe/xe_guc_ct_types.h     |   4 +-
 drivers/gpu/drm/xe/xe_guc_submit_types.h |  18 +--
 drivers/gpu/drm/xe/xe_guc_types.h        |  28 ++--
 drivers/gpu/drm/xe/xe_hw_engine_types.h  |  70 +++++-----
 drivers/gpu/drm/xe/xe_lrc_types.h        |   6 +-
 drivers/gpu/drm/xe/xe_sched_job_types.h  |   6 +-
 drivers/gpu/drm/xe/xe_uc_fw_types.h      |   9 +-
 drivers/gpu/drm/xe/xe_wopcm_types.h      |   4 +-
 12 files changed, 232 insertions(+), 223 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 7eda86bd4c2a43..eb2b806a1d2337 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -143,10 +143,10 @@ struct xe_tile {
 	 * * 8MB-16MB: global GTT
 	 */
 	struct {
-		/** @size: size of tile's MMIO space */
+		/** @mmio.size: size of tile's MMIO space */
 		size_t size;
 
-		/** @regs: pointer to tile's MMIO space (starting with registers) */
+		/** @mmio.regs: pointer to tile's MMIO space (starting with registers) */
 		void __iomem *regs;
 	} mmio;
 
@@ -156,31 +156,31 @@ struct xe_tile {
 	 * Each tile has its own additional 256MB (28-bit) MMIO-extension space.
 	 */
 	struct {
-		/** @size: size of tile's additional MMIO-extension space */
+		/** @mmio_ext.size: size of tile's additional MMIO-extension space */
 		size_t size;
 
-		/** @regs: pointer to tile's additional MMIO-extension space */
+		/** @mmio_ext.regs: pointer to tile's additional MMIO-extension space */
 		void __iomem *regs;
 	} mmio_ext;
 
 	/** @mem: memory management info for tile */
 	struct {
 		/**
-		 * @vram: VRAM info for tile.
+		 * @mem.vram: VRAM info for tile.
 		 *
 		 * Although VRAM is associated with a specific tile, it can
 		 * still be accessed by all tiles' GTs.
 		 */
 		struct xe_mem_region vram;
 
-		/** @vram_mgr: VRAM TTM manager */
+		/** @mem.vram_mgr: VRAM TTM manager */
 		struct xe_ttm_vram_mgr *vram_mgr;
 
-		/** @ggtt: Global graphics translation table */
+		/** @mem.ggtt: Global graphics translation table */
 		struct xe_ggtt *ggtt;
 
 		/**
-		 * @kernel_bb_pool: Pool from which batchbuffers are allocated.
+		 * @mem.kernel_bb_pool: Pool from which batchbuffers are allocated.
 		 *
 		 * Media GT shares a pool with its primary GT.
 		 */
@@ -218,68 +218,68 @@ struct xe_device {
 
 	/** @info: device info */
 	struct intel_device_info {
-		/** @graphics_name: graphics IP name */
+		/** @info.graphics_name: graphics IP name */
 		const char *graphics_name;
-		/** @media_name: media IP name */
+		/** @info.media_name: media IP name */
 		const char *media_name;
-		/** @tile_mmio_ext_size: size of MMIO extension space, per-tile */
+		/** @info.tile_mmio_ext_size: size of MMIO extension space, per-tile */
 		u32 tile_mmio_ext_size;
-		/** @graphics_verx100: graphics IP version */
+		/** @info.graphics_verx100: graphics IP version */
 		u32 graphics_verx100;
-		/** @media_verx100: media IP version */
+		/** @info.media_verx100: media IP version */
 		u32 media_verx100;
-		/** @mem_region_mask: mask of valid memory regions */
+		/** @info.mem_region_mask: mask of valid memory regions */
 		u32 mem_region_mask;
-		/** @platform: XE platform enum */
+		/** @info.platform: XE platform enum */
 		enum xe_platform platform;
-		/** @subplatform: XE subplatform enum */
+		/** @info.subplatform: XE subplatform enum */
 		enum xe_subplatform subplatform;
-		/** @devid: device ID */
+		/** @info.devid: device ID */
 		u16 devid;
-		/** @revid: device revision */
+		/** @info.revid: device revision */
 		u8 revid;
-		/** @step: stepping information for each IP */
+		/** @info.step: stepping information for each IP */
 		struct xe_step_info step;
-		/** @dma_mask_size: DMA address bits */
+		/** @info.dma_mask_size: DMA address bits */
 		u8 dma_mask_size;
-		/** @vram_flags: Vram flags */
+		/** @info.vram_flags: Vram flags */
 		u8 vram_flags;
-		/** @tile_count: Number of tiles */
+		/** @info.tile_count: Number of tiles */
 		u8 tile_count;
-		/** @gt_count: Total number of GTs for entire device */
+		/** @info.gt_count: Total number of GTs for entire device */
 		u8 gt_count;
-		/** @vm_max_level: Max VM level */
+		/** @info.vm_max_level: Max VM level */
 		u8 vm_max_level;
-		/** @va_bits: Maximum bits of a virtual address */
+		/** @info.va_bits: Maximum bits of a virtual address */
 		u8 va_bits;
 
-		/** @is_dgfx: is discrete device */
+		/** @info.is_dgfx: is discrete device */
 		u8 is_dgfx:1;
-		/** @has_asid: Has address space ID */
+		/** @info.has_asid: Has address space ID */
 		u8 has_asid:1;
-		/** @force_execlist: Forced execlist submission */
+		/** @info.force_execlist: Forced execlist submission */
 		u8 force_execlist:1;
-		/** @has_flat_ccs: Whether flat CCS metadata is used */
+		/** @info.has_flat_ccs: Whether flat CCS metadata is used */
 		u8 has_flat_ccs:1;
-		/** @has_llc: Device has a shared CPU+GPU last level cache */
+		/** @info.has_llc: Device has a shared CPU+GPU last level cache */
 		u8 has_llc:1;
-		/** @has_mmio_ext: Device has extra MMIO address range */
+		/** @info.has_mmio_ext: Device has extra MMIO address range */
 		u8 has_mmio_ext:1;
-		/** @has_range_tlb_invalidation: Has range based TLB invalidations */
+		/** @info.has_range_tlb_invalidation: Has range based TLB invalidations */
 		u8 has_range_tlb_invalidation:1;
-		/** @has_sriov: Supports SR-IOV */
+		/** @info.has_sriov: Supports SR-IOV */
 		u8 has_sriov:1;
-		/** @has_usm: Device has unified shared memory support */
+		/** @info.has_usm: Device has unified shared memory support */
 		u8 has_usm:1;
-		/** @enable_display: display enabled */
+		/** @info.enable_display: display enabled */
 		u8 enable_display:1;
-		/** @skip_mtcfg: skip Multi-Tile configuration from MTCFG register */
+		/** @info.skip_mtcfg: skip Multi-Tile configuration from MTCFG register */
 		u8 skip_mtcfg:1;
-		/** @skip_pcode: skip access to PCODE uC */
+		/** @info.skip_pcode: skip access to PCODE uC */
 		u8 skip_pcode:1;
-		/** @has_heci_gscfi: device has heci gscfi */
+		/** @info.has_heci_gscfi: device has heci gscfi */
 		u8 has_heci_gscfi:1;
-		/** @skip_guc_pc: Skip GuC based PM feature init */
+		/** @info.skip_guc_pc: Skip GuC based PM feature init */
 		u8 skip_guc_pc:1;
 
 #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY)
@@ -291,10 +291,10 @@ struct xe_device {
 
 	/** @irq: device interrupt state */
 	struct {
-		/** @lock: lock for processing irq's on this device */
+		/** @irq.lock: lock for processing irq's on this device */
 		spinlock_t lock;
 
-		/** @enabled: interrupts enabled on this device */
+		/** @irq.enabled: interrupts enabled on this device */
 		bool enabled;
 	} irq;
 
@@ -303,17 +303,17 @@ struct xe_device {
 
 	/** @mmio: mmio info for device */
 	struct {
-		/** @size: size of MMIO space for device */
+		/** @mmio.size: size of MMIO space for device */
 		size_t size;
-		/** @regs: pointer to MMIO space for device */
+		/** @mmio.regs: pointer to MMIO space for device */
 		void __iomem *regs;
 	} mmio;
 
 	/** @mem: memory info for device */
 	struct {
-		/** @vram: VRAM info for device */
+		/** @mem.vram: VRAM info for device */
 		struct xe_mem_region vram;
-		/** @sys_mgr: system TTM manager */
+		/** @mem.sys_mgr: system TTM manager */
 		struct ttm_resource_manager sys_mgr;
 	} mem;
 
@@ -327,44 +327,44 @@ struct xe_device {
 
 	/** @clients: drm clients info */
 	struct {
-		/** @lock: Protects drm clients info */
+		/** @clients.lock: Protects drm clients info */
 		spinlock_t lock;
 
-		/** @count: number of drm clients */
+		/** @clients.count: number of drm clients */
 		u64 count;
 	} clients;
 
 	/** @usm: unified memory state */
 	struct {
-		/** @asid: convert a ASID to VM */
+		/** @usm.asid: convert a ASID to VM */
 		struct xarray asid_to_vm;
-		/** @next_asid: next ASID, used to cyclical alloc asids */
+		/** @usm.next_asid: next ASID, used to cyclical alloc asids */
 		u32 next_asid;
-		/** @num_vm_in_fault_mode: number of VM in fault mode */
+		/** @usm.num_vm_in_fault_mode: number of VM in fault mode */
 		u32 num_vm_in_fault_mode;
-		/** @num_vm_in_non_fault_mode: number of VM in non-fault mode */
+		/** @usm.num_vm_in_non_fault_mode: number of VM in non-fault mode */
 		u32 num_vm_in_non_fault_mode;
-		/** @lock: protects UM state */
+		/** @usm.lock: protects UM state */
 		struct mutex lock;
 	} usm;
 
 	/** @persistent_engines: engines that are closed but still running */
 	struct {
-		/** @lock: protects persistent engines */
+		/** @persistent_engines.lock: protects persistent engines */
 		struct mutex lock;
-		/** @list: list of persistent engines */
+		/** @persistent_engines.list: list of persistent engines */
 		struct list_head list;
 	} persistent_engines;
 
 	/** @pinned: pinned BO state */
 	struct {
-		/** @lock: protected pinned BO list state */
+		/** @pinned.lock: protected pinned BO list state */
 		spinlock_t lock;
-		/** @evicted: pinned kernel BO that are present */
+		/** @pinned.kernel_bo_present: pinned kernel BO that are present */
 		struct list_head kernel_bo_present;
-		/** @evicted: pinned BO that have been evicted */
+		/** @pinned.evicted: pinned BO that have been evicted */
 		struct list_head evicted;
-		/** @external_vram: pinned external BO in vram*/
+		/** @pinned.external_vram: pinned external BO in vram*/
 		struct list_head external_vram;
 	} pinned;
 
@@ -385,21 +385,26 @@ struct xe_device {
 	 * triggering additional actions when they occur.
 	 */
 	struct {
-		/** @ref: ref count of memory accesses */
+		/** @mem_access.ref: ref count of memory accesses */
 		atomic_t ref;
 
-		/** @vram_userfault: Encapsulate vram_userfault related stuff */
+		/**
+		 * @mem_access.vram_userfault: Encapsulate vram_userfault
+		 * related stuff
+		 */
 		struct {
 			/**
-			 * @lock: Protects access to @vram_usefault.list
-			 * Using mutex instead of spinlock as lock is applied to entire
-			 * list operation which may sleep
+			 * @mem_access.vram_userfault.lock: Protects access to
+			 * @vram_usefault.list Using mutex instead of spinlock
+			 * as lock is applied to entire list operation which
+			 * may sleep
 			 */
 			struct mutex lock;
 
 			/**
-			 * @list: Keep list of userfaulted vram bo, which require to release their
-			 * mmap mappings at runtime suspend path
+			 * @mem_access.vram_userfault.list: Keep list of userfaulted
+			 * vram bo, which require to release their mmap mappings
+			 * at runtime suspend path
 			 */
 			struct list_head list;
 		} vram_userfault;
@@ -409,28 +414,28 @@ struct xe_device {
 	 * @pat: Encapsulate PAT related stuff
 	 */
 	struct {
-		/** Internal operations to abstract platforms */
+		/** @pat.ops: Internal operations to abstract platforms */
 		const struct xe_pat_ops *ops;
-		/** PAT table to program in the HW */
+		/** @pat.table: PAT table to program in the HW */
 		const struct xe_pat_table_entry *table;
-		/** Number of PAT entries */
+		/** @pat.n_entries: Number of PAT entries */
 		int n_entries;
 		u32 idx[__XE_CACHE_LEVEL_COUNT];
 	} pat;
 
 	/** @d3cold: Encapsulate d3cold related stuff */
 	struct {
-		/** capable: Indicates if root port is d3cold capable */
+		/** @d3cold.capable: Indicates if root port is d3cold capable */
 		bool capable;
 
-		/** @allowed: Indicates if d3cold is a valid device state */
+		/** @d3cold.allowed: Indicates if d3cold is a valid device state */
 		bool allowed;
 
-		/** @power_lost: Indicates if card has really lost power. */
+		/** @d3cold.power_lost: Indicates if card has really lost power. */
 		bool power_lost;
 
 		/**
-		 * @vram_threshold:
+		 * @d3cold.vram_threshold:
 		 *
 		 * This represents the permissible threshold(in megabytes)
 		 * for vram save/restore. d3cold will be disallowed,
@@ -439,7 +444,7 @@ struct xe_device {
 		 * Default threshold value is 300mb.
 		 */
 		u32 vram_threshold;
-		/** @lock: protect vram_threshold */
+		/** @d3cold.lock: protect vram_threshold */
 		struct mutex lock;
 	} d3cold;
 
@@ -547,17 +552,17 @@ struct xe_file {
 
 	/** @vm: VM state for file */
 	struct {
-		/** @xe: xarray to store VMs */
+		/** @vm.xe: xarray to store VMs */
 		struct xarray xa;
-		/** @lock: protects file VM state */
+		/** @vm.lock: protects file VM state */
 		struct mutex lock;
 	} vm;
 
 	/** @exec_queue: Submission exec queue state for file */
 	struct {
-		/** @xe: xarray to store engines */
+		/** @exec_queue.xe: xarray to store engines */
 		struct xarray xa;
-		/** @lock: protects file engine state */
+		/** @exec_queue.lock: protects file engine state */
 		struct mutex lock;
 	} exec_queue;
 
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index e7f84dee5275ff..648391961fc45e 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -109,9 +109,9 @@ struct xe_exec_queue {
 	 * @persistent: persistent exec queue state
 	 */
 	struct {
-		/** @xef: file which this exec queue belongs to */
+		/** @persistent.xef: file which this exec queue belongs to */
 		struct xe_file *xef;
-		/** @link: link in list of persistent exec queues */
+		/** @persisiten.link: link in list of persistent exec queues */
 		struct list_head link;
 	} persistent;
 
@@ -120,55 +120,55 @@ struct xe_exec_queue {
 		 * @parallel: parallel submission state
 		 */
 		struct {
-			/** @composite_fence_ctx: context composite fence */
+			/** @parallel.composite_fence_ctx: context composite fence */
 			u64 composite_fence_ctx;
-			/** @composite_fence_seqno: seqno for composite fence */
+			/** @parallel.composite_fence_seqno: seqno for composite fence */
 			u32 composite_fence_seqno;
 		} parallel;
 		/**
 		 * @bind: bind submission state
 		 */
 		struct {
-			/** @fence_ctx: context bind fence */
+			/** @bind.fence_ctx: context bind fence */
 			u64 fence_ctx;
-			/** @fence_seqno: seqno for bind fence */
+			/** @bind.fence_seqno: seqno for bind fence */
 			u32 fence_seqno;
 		} bind;
 	};
 
 	/** @sched_props: scheduling properties */
 	struct {
-		/** @timeslice_us: timeslice period in micro-seconds */
+		/** @sched_props.timeslice_us: timeslice period in micro-seconds */
 		u32 timeslice_us;
-		/** @preempt_timeout_us: preemption timeout in micro-seconds */
+		/** @sched_props.preempt_timeout_us: preemption timeout in micro-seconds */
 		u32 preempt_timeout_us;
-		/** @job_timeout_ms: job timeout in milliseconds */
+		/** @sched_props.job_timeout_ms: job timeout in milliseconds */
 		u32 job_timeout_ms;
-		/** @priority: priority of this exec queue */
+		/** @sched_props.priority: priority of this exec queue */
 		enum xe_exec_queue_priority priority;
 	} sched_props;
 
 	/** @compute: compute exec queue state */
 	struct {
-		/** @pfence: preemption fence */
+		/** @compute.pfence: preemption fence */
 		struct dma_fence *pfence;
-		/** @context: preemption fence context */
+		/** @compute.context: preemption fence context */
 		u64 context;
-		/** @seqno: preemption fence seqno */
+		/** @compute.seqno: preemption fence seqno */
 		u32 seqno;
-		/** @link: link into VM's list of exec queues */
+		/** @compute.link: link into VM's list of exec queues */
 		struct list_head link;
-		/** @lock: preemption fences lock */
+		/** @compute.lock: preemption fences lock */
 		spinlock_t lock;
 	} compute;
 
 	/** @usm: unified shared memory state */
 	struct {
-		/** @acc_trigger: access counter trigger */
+		/** @usm.acc_trigger: access counter trigger */
 		u32 acc_trigger;
-		/** @acc_notify: access counter notify */
+		/** @usm.acc_notify: access counter notify */
 		u32 acc_notify;
-		/** @acc_granularity: access counter granularity */
+		/** @usm.acc_granularity: access counter granularity */
 		u32 acc_granularity;
 	} usm;
 
diff --git a/drivers/gpu/drm/xe/xe_gsc_types.h b/drivers/gpu/drm/xe/xe_gsc_types.h
index 060d0fe848ad7f..138d8cc0f19c22 100644
--- a/drivers/gpu/drm/xe/xe_gsc_types.h
+++ b/drivers/gpu/drm/xe/xe_gsc_types.h
@@ -50,21 +50,21 @@ struct xe_gsc {
 
 	/** @proxy: sub-structure containing the SW proxy-related variables */
 	struct {
-		/** @component: struct for communication with mei component */
+		/** @proxy.component: struct for communication with mei component */
 		struct i915_gsc_proxy_component *component;
-		/** @mutex: protects the component binding and usage */
+		/** @proxy.mutex: protects the component binding and usage */
 		struct mutex mutex;
-		/** @component_added: whether the component has been added */
+		/** @proxy.component_added: whether the component has been added */
 		bool component_added;
-		/** @bo: object to store message to and from the GSC */
+		/** @proxy.bo: object to store message to and from the GSC */
 		struct xe_bo *bo;
-		/** @to_gsc: map of the memory used to send messages to the GSC */
+		/** @proxy.to_gsc: map of the memory used to send messages to the GSC */
 		struct iosys_map to_gsc;
-		/** @from_gsc: map of the memory used to recv messages from the GSC */
+		/** @proxy.from_gsc: map of the memory used to recv messages from the GSC */
 		struct iosys_map from_gsc;
-		/** @to_csme: pointer to the memory used to send messages to CSME */
+		/** @proxy.to_csme: pointer to the memory used to send messages to CSME */
 		void *to_csme;
-		/** @from_csme: pointer to the memory used to recv messages from CSME */
+		/** @proxy.from_csme: pointer to the memory used to recv messages from CSME */
 		void *from_csme;
 	} proxy;
 };
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index 047cde6cda107e..3caaea2ff908a8 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -103,16 +103,16 @@ struct xe_gt {
 
 	/** @info: GT info */
 	struct {
-		/** @type: type of GT */
+		/** @info.type: type of GT */
 		enum xe_gt_type type;
-		/** @id: Unique ID of this GT within the PCI Device */
+		/** @info.id: Unique ID of this GT within the PCI Device */
 		u8 id;
-		/** @reference_clock: clock frequency */
+		/** @info.reference_clock: clock frequency */
 		u32 reference_clock;
-		/** @engine_mask: mask of engines present on GT */
+		/** @info.engine_mask: mask of engines present on GT */
 		u64 engine_mask;
 		/**
-		 * @__engine_mask: mask of engines present on GT read from
+		 * @info.__engine_mask: mask of engines present on GT read from
 		 * xe_pci.c, used to fake reading the engine_mask from the
 		 * hwconfig blob.
 		 */
@@ -125,14 +125,14 @@ struct xe_gt {
 	 * specific offset, as well as their own forcewake handling.
 	 */
 	struct {
-		/** @fw: force wake for GT */
+		/** @mmio.fw: force wake for GT */
 		struct xe_force_wake fw;
 		/**
-		 * @adj_limit: adjust MMIO address if address is below this
+		 * @mmio.adj_limit: adjust MMIO address if address is below this
 		 * value
 		 */
 		u32 adj_limit;
-		/** @adj_offset: offect to add to MMIO address when adjusting */
+		/** @mmio.adj_offset: offect to add to MMIO address when adjusting */
 		u32 adj_offset;
 	} mmio;
 
@@ -144,7 +144,7 @@ struct xe_gt {
 	/** @reset: state for GT resets */
 	struct {
 		/**
-		 * @worker: work so GT resets can done async allowing to reset
+		 * @reset.worker: work so GT resets can done async allowing to reset
 		 * code to safely flush all code paths
 		 */
 		struct work_struct worker;
@@ -152,36 +152,37 @@ struct xe_gt {
 
 	/** @tlb_invalidation: TLB invalidation state */
 	struct {
-		/** @seqno: TLB invalidation seqno, protected by CT lock */
+		/** @tlb_invalidation.seqno: TLB invalidation seqno, protected by CT lock */
 #define TLB_INVALIDATION_SEQNO_MAX	0x100000
 		int seqno;
 		/**
-		 * @seqno_recv: last received TLB invalidation seqno, protected by CT lock
+		 * @tlb_invalidation.seqno_recv: last received TLB invalidation seqno,
+		 * protected by CT lock
 		 */
 		int seqno_recv;
 		/**
-		 * @pending_fences: list of pending fences waiting TLB
+		 * @tlb_invalidation.pending_fences: list of pending fences waiting TLB
 		 * invaliations, protected by CT lock
 		 */
 		struct list_head pending_fences;
 		/**
-		 * @pending_lock: protects @pending_fences and updating
-		 * @seqno_recv.
+		 * @tlb_invalidation.pending_lock: protects @tlb_invalidation.pending_fences
+		 * and updating @tlb_invalidation.seqno_recv.
 		 */
 		spinlock_t pending_lock;
 		/**
-		 * @fence_tdr: schedules a delayed call to
+		 * @tlb_invalidation.fence_tdr: schedules a delayed call to
 		 * xe_gt_tlb_fence_timeout after the timeut interval is over.
 		 */
 		struct delayed_work fence_tdr;
-		/** @fence_context: context for TLB invalidation fences */
+		/** @tlb_invalidation.fence_context: context for TLB invalidation fences */
 		u64 fence_context;
 		/**
-		 * @fence_seqno: seqno to TLB invalidation fences, protected by
+		 * @tlb_invalidation.fence_seqno: seqno to TLB invalidation fences, protected by
 		 * tlb_invalidation.lock
 		 */
 		u32 fence_seqno;
-		/** @lock: protects TLB invalidation fences */
+		/** @tlb_invalidation.lock: protects TLB invalidation fences */
 		spinlock_t lock;
 	} tlb_invalidation;
 
@@ -196,7 +197,7 @@ struct xe_gt {
 	/** @usm: unified shared memory state */
 	struct {
 		/**
-		 * @bb_pool: Pool from which batchbuffers, for USM operations
+		 * @usm.bb_pool: Pool from which batchbuffers, for USM operations
 		 * (e.g. migrations, fixing page tables), are allocated.
 		 * Dedicated pool needed so USM operations to not get blocked
 		 * behind any user operations which may have resulted in a
@@ -204,67 +205,67 @@ struct xe_gt {
 		 */
 		struct xe_sa_manager *bb_pool;
 		/**
-		 * @reserved_bcs_instance: reserved BCS instance used for USM
+		 * @usm.reserved_bcs_instance: reserved BCS instance used for USM
 		 * operations (e.g. mmigrations, fixing page tables)
 		 */
 		u16 reserved_bcs_instance;
-		/** @pf_wq: page fault work queue, unbound, high priority */
+		/** @usm.pf_wq: page fault work queue, unbound, high priority */
 		struct workqueue_struct *pf_wq;
-		/** @acc_wq: access counter work queue, unbound, high priority */
+		/** @usm.acc_wq: access counter work queue, unbound, high priority */
 		struct workqueue_struct *acc_wq;
 		/**
-		 * @pf_queue: Page fault queue used to sync faults so faults can
+		 * @usm.pf_queue: Page fault queue used to sync faults so faults can
 		 * be processed not under the GuC CT lock. The queue is sized so
 		 * it can sync all possible faults (1 per physical engine).
 		 * Multiple queues exists for page faults from different VMs are
 		 * be processed in parallel.
 		 */
 		struct pf_queue {
-			/** @gt: back pointer to GT */
+			/** @usm.pf_queue.gt: back pointer to GT */
 			struct xe_gt *gt;
 #define PF_QUEUE_NUM_DW	128
-			/** @data: data in the page fault queue */
+			/** @usm.pf_queue.data: data in the page fault queue */
 			u32 data[PF_QUEUE_NUM_DW];
 			/**
-			 * @tail: tail pointer in DWs for page fault queue,
+			 * @usm.pf_queue.tail: tail pointer in DWs for page fault queue,
 			 * moved by worker which processes faults (consumer).
 			 */
 			u16 tail;
 			/**
-			 * @head: head pointer in DWs for page fault queue,
+			 * @usm.pf_queue.head: head pointer in DWs for page fault queue,
 			 * moved by G2H handler (producer).
 			 */
 			u16 head;
-			/** @lock: protects page fault queue */
+			/** @usm.pf_queue.lock: protects page fault queue */
 			spinlock_t lock;
-			/** @worker: to process page faults */
+			/** @usm.pf_queue.worker: to process page faults */
 			struct work_struct worker;
 #define NUM_PF_QUEUE	4
 		} pf_queue[NUM_PF_QUEUE];
 		/**
-		 * @acc_queue: Same as page fault queue, cannot process access
+		 * @usm.acc_queue: Same as page fault queue, cannot process access
 		 * counters under CT lock.
 		 */
 		struct acc_queue {
-			/** @gt: back pointer to GT */
+			/** @usm.acc_queue.gt: back pointer to GT */
 			struct xe_gt *gt;
 #define ACC_QUEUE_NUM_DW	128
-			/** @data: data in the page fault queue */
+			/** @usm.acc_queue.data: data in the page fault queue */
 			u32 data[ACC_QUEUE_NUM_DW];
 			/**
-			 * @tail: tail pointer in DWs for access counter queue,
+			 * @usm.acc_queue.tail: tail pointer in DWs for access counter queue,
 			 * moved by worker which processes counters
 			 * (consumer).
 			 */
 			u16 tail;
 			/**
-			 * @head: head pointer in DWs for access counter queue,
+			 * @usm.acc_queue.head: head pointer in DWs for access counter queue,
 			 * moved by G2H handler (producer).
 			 */
 			u16 head;
-			/** @lock: protects page fault queue */
+			/** @usm.acc_queue.lock: protects page fault queue */
 			spinlock_t lock;
-			/** @worker: to process access counters */
+			/** @usm.acc_queue.worker: to process access counters */
 			struct work_struct worker;
 #define NUM_ACC_QUEUE	4
 		} acc_queue[NUM_ACC_QUEUE];
@@ -301,7 +302,7 @@ struct xe_gt {
 
 	/** @pcode: GT's PCODE */
 	struct {
-		/** @lock: protecting GT's PCODE mailbox data */
+		/** @pcode.lock: protecting GT's PCODE mailbox data */
 		struct mutex lock;
 	} pcode;
 
@@ -313,32 +314,32 @@ struct xe_gt {
 
 	/** @mocs: info */
 	struct {
-		/** @uc_index: UC index */
+		/** @mocs.uc_index: UC index */
 		u8 uc_index;
-		/** @wb_index: WB index, only used on L3_CCS platforms */
+		/** @mocs.wb_index: WB index, only used on L3_CCS platforms */
 		u8 wb_index;
 	} mocs;
 
 	/** @fuse_topo: GT topology reported by fuse registers */
 	struct {
-		/** @g_dss_mask: dual-subslices usable by geometry */
+		/** @fuse_topo.g_dss_mask: dual-subslices usable by geometry */
 		xe_dss_mask_t g_dss_mask;
 
-		/** @c_dss_mask: dual-subslices usable by compute */
+		/** @fuse_topo.c_dss_mask: dual-subslices usable by compute */
 		xe_dss_mask_t c_dss_mask;
 
-		/** @eu_mask_per_dss: EU mask per DSS*/
+		/** @fuse_topo.eu_mask_per_dss: EU mask per DSS*/
 		xe_eu_mask_t eu_mask_per_dss;
 	} fuse_topo;
 
 	/** @steering: register steering for individual HW units */
 	struct {
-		/* @ranges: register ranges used for this steering type */
+		/** @steering.ranges: register ranges used for this steering type */
 		const struct xe_mmio_range *ranges;
 
-		/** @group_target: target to steer accesses to */
+		/** @steering.group_target: target to steer accesses to */
 		u16 group_target;
-		/** @instance_target: instance to steer accesses to */
+		/** @steering.instance_target: instance to steer accesses to */
 		u16 instance_target;
 	} steering[NUM_STEERING_TYPES];
 
@@ -350,13 +351,13 @@ struct xe_gt {
 
 	/** @wa_active: keep track of active workarounds */
 	struct {
-		/** @gt: bitmap with active GT workarounds */
+		/** @wa_active.gt: bitmap with active GT workarounds */
 		unsigned long *gt;
-		/** @engine: bitmap with active engine workarounds */
+		/** @wa_active.engine: bitmap with active engine workarounds */
 		unsigned long *engine;
-		/** @lrc: bitmap with active LRC workarounds */
+		/** @wa_active.lrc: bitmap with active LRC workarounds */
 		unsigned long *lrc;
-		/** @oob: bitmap with active OOB workaroudns */
+		/** @wa_active.oob: bitmap with active OOB workaroudns */
 		unsigned long *oob;
 	} wa_active;
 };
diff --git a/drivers/gpu/drm/xe/xe_guc_ct_types.h b/drivers/gpu/drm/xe/xe_guc_ct_types.h
index d814d4ee3fc653..68c871052341fa 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_ct_types.h
@@ -87,9 +87,9 @@ struct xe_guc_ct {
 	spinlock_t fast_lock;
 	/** @ctbs: buffers for sending and receiving commands */
 	struct {
-		/** @send: Host to GuC (H2G, send) channel */
+		/** @ctbs.send: Host to GuC (H2G, send) channel */
 		struct guc_ctb h2g;
-		/** @recv: GuC to Host (G2H, receive) channel */
+		/** @ctbs.recv: GuC to Host (G2H, receive) channel */
 		struct guc_ctb g2h;
 	} ctbs;
 	/** @g2h_outstanding: number of outstanding G2H */
diff --git a/drivers/gpu/drm/xe/xe_guc_submit_types.h b/drivers/gpu/drm/xe/xe_guc_submit_types.h
index 649b0a85269219..72fc0f42b0a5df 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_submit_types.h
@@ -102,9 +102,9 @@ struct xe_guc_submit_exec_queue_snapshot {
 
 	/** @sched_props: scheduling properties */
 	struct {
-		/** @timeslice_us: timeslice period in micro-seconds */
+		/** @sched_props.timeslice_us: timeslice period in micro-seconds */
 		u32 timeslice_us;
-		/** @preempt_timeout_us: preemption timeout in micro-seconds */
+		/** @sched_props.preempt_timeout_us: preemption timeout in micro-seconds */
 		u32 preempt_timeout_us;
 	} sched_props;
 
@@ -118,11 +118,11 @@ struct xe_guc_submit_exec_queue_snapshot {
 
 	/** @guc: GuC Engine Snapshot */
 	struct {
-		/** @wqi_head: work queue item head */
+		/** @guc.wqi_head: work queue item head */
 		u32 wqi_head;
-		/** @wqi_tail: work queue item tail */
+		/** @guc.wqi_tail: work queue item tail */
 		u32 wqi_tail;
-		/** @id: GuC id for this exec_queue */
+		/** @guc.id: GuC id for this exec_queue */
 		u16 id;
 	} guc;
 
@@ -133,13 +133,13 @@ struct xe_guc_submit_exec_queue_snapshot {
 	bool parallel_execution;
 	/** @parallel: snapshot of the useful parallel scratch */
 	struct {
-		/** @wq_desc: Workqueue description */
+		/** @parallel.wq_desc: Workqueue description */
 		struct {
-			/** @head: Workqueue Head */
+			/** @parallel.wq_desc.head: Workqueue Head */
 			u32 head;
-			/** @tail: Workqueue Tail */
+			/** @parallel.wq_desc.tail: Workqueue Tail */
 			u32 tail;
-			/** @status: Workqueue Status */
+			/** @parallel.wq_desc.status: Workqueue Status */
 			u32 status;
 		} wq_desc;
 		/** @wq: Workqueue Items */
diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
index dc6059de669c3f..edcd1a950bd3b6 100644
--- a/drivers/gpu/drm/xe/xe_guc_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_types.h
@@ -49,40 +49,40 @@ struct xe_guc {
 	struct xe_guc_db_mgr dbm;
 	/** @submission_state: GuC submission state */
 	struct {
-		/** @exec_queue_lookup: Lookup an xe_engine from guc_id */
+		/** @submission_state.exec_queue_lookup: Lookup an xe_engine from guc_id */
 		struct xarray exec_queue_lookup;
-		/** @guc_ids: used to allocate new guc_ids, single-lrc */
+		/** @submission_state.guc_ids: used to allocate new guc_ids, single-lrc */
 		struct ida guc_ids;
-		/** @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */
+		/** @submission_state.guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */
 		unsigned long *guc_ids_bitmap;
-		/** @stopped: submissions are stopped */
+		/** @submission_state.stopped: submissions are stopped */
 		atomic_t stopped;
-		/** @lock: protects submission state */
+		/** @submission_state.lock: protects submission state */
 		struct mutex lock;
-		/** @suspend: suspend fence state */
+		/** @submission_state.suspend: suspend fence state */
 		struct {
-			/** @lock: suspend fences lock */
+			/** @submission_state.suspend.lock: suspend fences lock */
 			spinlock_t lock;
-			/** @context: suspend fences context */
+			/** @submission_state.suspend.context: suspend fences context */
 			u64 context;
-			/** @seqno: suspend fences seqno */
+			/** @submission_state.suspend.seqno: suspend fences seqno */
 			u32 seqno;
 		} suspend;
 #ifdef CONFIG_PROVE_LOCKING
 #define NUM_SUBMIT_WQ	256
-		/** @submit_wq_pool: submission ordered workqueues pool */
+		/** @submission_state.submit_wq_pool: submission ordered workqueues pool */
 		struct workqueue_struct *submit_wq_pool[NUM_SUBMIT_WQ];
-		/** @submit_wq_idx: submission ordered workqueue index */
+		/** @submission_state.submit_wq_idx: submission ordered workqueue index */
 		int submit_wq_idx;
 #endif
-		/** @enabled: submission is enabled */
+		/** @submission_state.enabled: submission is enabled */
 		bool enabled;
 	} submission_state;
 	/** @hwconfig: Hardware config state */
 	struct {
-		/** @bo: buffer object of the hardware config */
+		/** @hwconfig.bo: buffer object of the hardware config */
 		struct xe_bo *bo;
-		/** @size: size of the hardware config */
+		/** @hwconfig.size: size of the hardware config */
 		u32 size;
 	} hwconfig;
 
diff --git a/drivers/gpu/drm/xe/xe_hw_engine_types.h b/drivers/gpu/drm/xe/xe_hw_engine_types.h
index dfeaaac08b7f95..61d9027e4665ad 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine_types.h
+++ b/drivers/gpu/drm/xe/xe_hw_engine_types.h
@@ -79,23 +79,23 @@ struct xe_hw_engine_class_intf {
 	 * @defaults: default scheduling properties
 	 */
 	struct {
-		/** @set_job_timeout: Set job timeout in ms for engine */
+		/** @sched_props.set_job_timeout: Set job timeout in ms for engine */
 		u32 job_timeout_ms;
-		/** @job_timeout_min: Min job timeout in ms for engine */
+		/** @sched_props.job_timeout_min: Min job timeout in ms for engine */
 		u32 job_timeout_min;
-		/** @job_timeout_max: Max job timeout in ms for engine */
+		/** @sched_props.job_timeout_max: Max job timeout in ms for engine */
 		u32 job_timeout_max;
-		/** @timeslice_us: timeslice period in micro-seconds */
+		/** @sched_props.timeslice_us: timeslice period in micro-seconds */
 		u32 timeslice_us;
-		/** @timeslice_min: min timeslice period in micro-seconds */
+		/** @sched_props.timeslice_min: min timeslice period in micro-seconds */
 		u32 timeslice_min;
-		/** @timeslice_max: max timeslice period in micro-seconds */
+		/** @sched_props.timeslice_max: max timeslice period in micro-seconds */
 		u32 timeslice_max;
-		/** @preempt_timeout_us: preemption timeout in micro-seconds */
+		/** @sched_props.preempt_timeout_us: preemption timeout in micro-seconds */
 		u32 preempt_timeout_us;
-		/** @preempt_timeout_min: min preemption timeout in micro-seconds */
+		/** @sched_props.preempt_timeout_min: min preemption timeout in micro-seconds */
 		u32 preempt_timeout_min;
-		/** @preempt_timeout_max: max preemption timeout in micro-seconds */
+		/** @sched_props.preempt_timeout_max: max preemption timeout in micro-seconds */
 		u32 preempt_timeout_max;
 	} sched_props, defaults;
 };
@@ -164,62 +164,62 @@ struct xe_hw_engine_snapshot {
 	u16 logical_instance;
 	/** @forcewake: Force Wake information snapshot */
 	struct {
-		/** @domain: force wake domain of this hw engine */
+		/** @forcewake.domain: force wake domain of this hw engine */
 		enum xe_force_wake_domains domain;
-		/** @ref: Forcewake ref for the above domain */
+		/** @forcewake.ref: Forcewake ref for the above domain */
 		int ref;
 	} forcewake;
 	/** @mmio_base: MMIO base address of this hw engine*/
 	u32 mmio_base;
 	/** @reg: Useful MMIO register snapshot */
 	struct {
-		/** @ring_hwstam: RING_HWSTAM */
+		/** @reg.ring_hwstam: RING_HWSTAM */
 		u32 ring_hwstam;
-		/** @ring_hws_pga: RING_HWS_PGA */
+		/** @reg.ring_hws_pga: RING_HWS_PGA */
 		u32 ring_hws_pga;
-		/** @ring_execlist_status_lo: RING_EXECLIST_STATUS_LO */
+		/** @reg.ring_execlist_status_lo: RING_EXECLIST_STATUS_LO */
 		u32 ring_execlist_status_lo;
-		/** @ring_execlist_status_hi: RING_EXECLIST_STATUS_HI */
+		/** @reg.ring_execlist_status_hi: RING_EXECLIST_STATUS_HI */
 		u32 ring_execlist_status_hi;
-		/** @ring_execlist_sq_contents_lo: RING_EXECLIST_SQ_CONTENTS */
+		/** @reg.ring_execlist_sq_contents_lo: RING_EXECLIST_SQ_CONTENTS */
 		u32 ring_execlist_sq_contents_lo;
-		/** @ring_execlist_sq_contents_hi: RING_EXECLIST_SQ_CONTENTS + 4 */
+		/** @reg.ring_execlist_sq_contents_hi: RING_EXECLIST_SQ_CONTENTS + 4 */
 		u32 ring_execlist_sq_contents_hi;
-		/** @ring_start: RING_START */
+		/** @reg.ring_start: RING_START */
 		u32 ring_start;
-		/** @ring_head: RING_HEAD */
+		/** @reg.ring_head: RING_HEAD */
 		u32 ring_head;
-		/** @ring_tail: RING_TAIL */
+		/** @reg.ring_tail: RING_TAIL */
 		u32 ring_tail;
-		/** @ring_ctl: RING_CTL */
+		/** @reg.ring_ctl: RING_CTL */
 		u32 ring_ctl;
-		/** @ring_mi_mode: RING_MI_MODE */
+		/** @reg.ring_mi_mode: RING_MI_MODE */
 		u32 ring_mi_mode;
-		/** @ring_mode: RING_MODE */
+		/** @reg.ring_mode: RING_MODE */
 		u32 ring_mode;
-		/** @ring_imr: RING_IMR */
+		/** @reg.ring_imr: RING_IMR */
 		u32 ring_imr;
-		/** @ring_esr: RING_ESR */
+		/** @reg.ring_esr: RING_ESR */
 		u32 ring_esr;
-		/** @ring_emr: RING_EMR */
+		/** @reg.ring_emr: RING_EMR */
 		u32 ring_emr;
-		/** @ring_eir: RING_EIR */
+		/** @reg.ring_eir: RING_EIR */
 		u32 ring_eir;
-		/** @ring_acthd_udw: RING_ACTHD_UDW */
+		/** @reg.ring_acthd_udw: RING_ACTHD_UDW */
 		u32 ring_acthd_udw;
-		/** @ring_acthd: RING_ACTHD */
+		/** @reg.ring_acthd: RING_ACTHD */
 		u32 ring_acthd;
-		/** @ring_bbaddr_udw: RING_BBADDR_UDW */
+		/** @reg.ring_bbaddr_udw: RING_BBADDR_UDW */
 		u32 ring_bbaddr_udw;
-		/** @ring_bbaddr: RING_BBADDR */
+		/** @reg.ring_bbaddr: RING_BBADDR */
 		u32 ring_bbaddr;
-		/** @ring_dma_fadd_udw: RING_DMA_FADD_UDW */
+		/** @reg.ring_dma_fadd_udw: RING_DMA_FADD_UDW */
 		u32 ring_dma_fadd_udw;
-		/** @ring_dma_fadd: RING_DMA_FADD */
+		/** @reg.ring_dma_fadd: RING_DMA_FADD */
 		u32 ring_dma_fadd;
-		/** @ipehr: IPEHR */
+		/** @reg.ipehr: IPEHR */
 		u32 ipehr;
-		/** @rcu_mode: RCU_MODE */
+		/** @reg.rcu_mode: RCU_MODE */
 		u32 rcu_mode;
 	} reg;
 };
diff --git a/drivers/gpu/drm/xe/xe_lrc_types.h b/drivers/gpu/drm/xe/xe_lrc_types.h
index 78220336062cf5..24f20ed66fd134 100644
--- a/drivers/gpu/drm/xe/xe_lrc_types.h
+++ b/drivers/gpu/drm/xe/xe_lrc_types.h
@@ -28,11 +28,11 @@ struct xe_lrc {
 
 	/** @ring: submission ring state */
 	struct {
-		/** @size: size of submission ring */
+		/** @ring.size: size of submission ring */
 		u32 size;
-		/** @tail: tail of submission ring */
+		/** @ring.tail: tail of submission ring */
 		u32 tail;
-		/** @old_tail: shadow of tail */
+		/** @ring.old_tail: shadow of tail */
 		u32 old_tail;
 	} ring;
 
diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
index 71213ba9735bc7..8778c34d662030 100644
--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
@@ -30,11 +30,11 @@ struct xe_sched_job {
 	struct dma_fence *fence;
 	/** @user_fence: write back value when BB is complete */
 	struct {
-		/** @used: user fence is used */
+		/** @user_fence.used: user fence is used */
 		bool used;
-		/** @addr: address to write to */
+		/** @user_fence.addr: address to write to */
 		u64 addr;
-		/** @value: write back value */
+		/** @user_fence.value: write back value */
 		u64 value;
 	} user_fence;
 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
diff --git a/drivers/gpu/drm/xe/xe_uc_fw_types.h b/drivers/gpu/drm/xe/xe_uc_fw_types.h
index ee914a5d852353..bc800b69686628 100644
--- a/drivers/gpu/drm/xe/xe_uc_fw_types.h
+++ b/drivers/gpu/drm/xe/xe_uc_fw_types.h
@@ -124,11 +124,14 @@ struct xe_uc_fw {
 
 	/** @versions: FW versions wanted and found */
 	struct {
-		/** @wanted: firmware version wanted by platform */
+		/** @versions.wanted: firmware version wanted by platform */
 		struct xe_uc_fw_version wanted;
-		/** @wanted_type: type of firmware version wanted (release vs compatibility) */
+		/**
+		 * @versions.wanted_type: type of firmware version wanted
+		 * (release vs compatibility)
+		 */
 		enum xe_uc_fw_version_types wanted_type;
-		/** @found: fw versions found in firmware blob */
+		/** @versions.found: fw versions found in firmware blob */
 		struct xe_uc_fw_version found[XE_UC_FW_VER_TYPE_COUNT];
 	} versions;
 
diff --git a/drivers/gpu/drm/xe/xe_wopcm_types.h b/drivers/gpu/drm/xe/xe_wopcm_types.h
index 486d850c408471..99d34837c408a7 100644
--- a/drivers/gpu/drm/xe/xe_wopcm_types.h
+++ b/drivers/gpu/drm/xe/xe_wopcm_types.h
@@ -16,9 +16,9 @@ struct xe_wopcm {
 	u32 size;
 	/** @guc: GuC WOPCM Region info */
 	struct {
-		/** @base: GuC WOPCM base which is offset from WOPCM base */
+		/** @guc.base: GuC WOPCM base which is offset from WOPCM base */
 		u32 base;
-		/** @size: Size of the GuC WOPCM region */
+		/** @guc.size: Size of the GuC WOPCM region */
 		u32 size;
 	} guc;
 };

From ab5ae65fb25d06c38a6617a628b964828adb4786 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Mon, 22 Jan 2024 19:12:42 -0800
Subject: [PATCH 113/200] drm/xe: Remove PVC from xe_wa kunit tests

Since the PCI IDs for PVC weren't added to the xe driver, the xe_wa
tests should not try to create a fake PVC device since they can't find
the right PCI ID. Fix bugs when running kunit:

	# xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111
	Expected ret == 0, but
	    ret == -19 (0xffffffffffffffed)
	[FAILED] PVC (B0)
	# xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111
	Expected ret == 0, but
	    ret == -19 (0xffffffffffffffed)
	[FAILED] PVC (B1)
	# xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111
	Expected ret == 0, but
	    ret == -19 (0xffffffffffffffed)
	[FAILED] PVC (C0)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123031242.3548724-1-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/tests/xe_wa_test.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_wa_test.c b/drivers/gpu/drm/xe/tests/xe_wa_test.c
index 439477593fafe2..44570d888355fa 100644
--- a/drivers/gpu/drm/xe/tests/xe_wa_test.c
+++ b/drivers/gpu/drm/xe/tests/xe_wa_test.c
@@ -69,9 +69,6 @@ static const struct platform_test_case cases[] = {
 	SUBPLATFORM_CASE(DG2, G10, C0),
 	SUBPLATFORM_CASE(DG2, G11, B1),
 	SUBPLATFORM_CASE(DG2, G12, A1),
-	PLATFORM_CASE(PVC, B0),
-	PLATFORM_CASE(PVC, B1),
-	PLATFORM_CASE(PVC, C0),
 	GMDID_CASE(METEORLAKE, 1270, A0, 1300, A0),
 	GMDID_CASE(METEORLAKE, 1271, A0, 1300, A0),
 	GMDID_CASE(LUNARLAKE, 2004, A0, 2000, A0),

From c65908c33b80b329ed4ed680f1333617967fe28f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Tue, 23 Jan 2024 12:44:46 -0800
Subject: [PATCH 114/200] drm/xe: Remove double new lines in devcoredump
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Right now devcoredump has a new line between '**** GuC CT ****' and
'H2G CTB (all sizes in DW):' while other sections don't have.

v2: remove double new line after IPEHR

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-1-jose.souza@intel.com
---
 drivers/gpu/drm/xe/xe_guc_ct.c    | 4 ++--
 drivers/gpu/drm/xe/xe_hw_engine.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index ee5d99456aebc8..0c073556909483 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1366,7 +1366,7 @@ void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot,
 		return;
 
 	if (snapshot->ct_enabled) {
-		drm_puts(p, "\nH2G CTB (all sizes in DW):\n");
+		drm_puts(p, "H2G CTB (all sizes in DW):\n");
 		guc_ctb_snapshot_print(&snapshot->h2g, p);
 
 		drm_puts(p, "\nG2H CTB (all sizes in DW):\n");
@@ -1375,7 +1375,7 @@ void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot,
 		drm_printf(p, "\tg2h outstanding: %d\n",
 			   snapshot->g2h_outstanding);
 	} else {
-		drm_puts(p, "\nCT disabled\n");
+		drm_puts(p, "CT disabled\n");
 	}
 }
 
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index 3aaab507f37feb..789736a34c1ba2 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -856,7 +856,7 @@ void xe_hw_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot,
 	drm_printf(p, "\tDMA_FADDR: 0x%08x_%08x\n",
 		   snapshot->reg.ring_dma_fadd_udw,
 		   snapshot->reg.ring_dma_fadd);
-	drm_printf(p, "\tIPEHR: 0x%08x\n\n", snapshot->reg.ipehr);
+	drm_printf(p, "\tIPEHR: 0x%08x\n", snapshot->reg.ipehr);
 	if (snapshot->class == XE_ENGINE_CLASS_COMPUTE)
 		drm_printf(p, "\tRCU_MODE: 0x%08x\n",
 			   snapshot->reg.rcu_mode);

From 98fefec8c38117d50cbbc6ca240ed953570ea778 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Tue, 23 Jan 2024 12:44:47 -0800
Subject: [PATCH 115/200] drm/xe: Change devcoredump functions parameters to
 xe_sched_job
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When devcoredump start to dump the VMs contents it will be necessary
to know the starting addresses of batch buffers of the job that hang.

This information it set in xe_sched_job and xe_sched_job is not easily
acessible from xe_exec_queue, so here changing the parameter, next
patch will append the batch buffer addresses to devcoredump snapshot
capture.

v3:
- update functions documentation to xe_sched_job

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-2-jose.souza@intel.com
---
 drivers/gpu/drm/xe/xe_devcoredump.c | 14 ++++++-----
 drivers/gpu/drm/xe/xe_devcoredump.h |  6 ++---
 drivers/gpu/drm/xe/xe_guc_submit.c  | 38 ++++++++++++++++++++++-------
 drivers/gpu/drm/xe/xe_guc_submit.h  |  4 +--
 4 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index 68abc0b195beb8..fd74dae2924339 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -16,6 +16,7 @@
 #include "xe_guc_ct.h"
 #include "xe_guc_submit.h"
 #include "xe_hw_engine.h"
+#include "xe_sched_job.h"
 
 /**
  * DOC: Xe device coredump
@@ -123,9 +124,10 @@ static void xe_devcoredump_free(void *data)
 }
 
 static void devcoredump_snapshot(struct xe_devcoredump *coredump,
-				 struct xe_exec_queue *q)
+				 struct xe_sched_job *job)
 {
 	struct xe_devcoredump_snapshot *ss = &coredump->snapshot;
+	struct xe_exec_queue *q = job->q;
 	struct xe_guc *guc = exec_queue_to_guc(q);
 	struct xe_hw_engine *hwe;
 	enum xe_hw_engine_id id;
@@ -150,7 +152,7 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump,
 	xe_force_wake_get(gt_to_fw(q->gt), XE_FORCEWAKE_ALL);
 
 	coredump->snapshot.ct = xe_guc_ct_snapshot_capture(&guc->ct, true);
-	coredump->snapshot.ge = xe_guc_exec_queue_snapshot_capture(q);
+	coredump->snapshot.ge = xe_guc_exec_queue_snapshot_capture(job);
 
 	for_each_hw_engine(hwe, q->gt, id) {
 		if (hwe->class != q->hwe->class ||
@@ -167,15 +169,15 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump,
 
 /**
  * xe_devcoredump - Take the required snapshots and initialize coredump device.
- * @q: The faulty xe_exec_queue, where the issue was detected.
+ * @job: The faulty xe_sched_job, where the issue was detected.
  *
  * This function should be called at the crash time within the serialized
  * gt_reset. It is skipped if we still have the core dump device available
  * with the information of the 'first' snapshot.
  */
-void xe_devcoredump(struct xe_exec_queue *q)
+void xe_devcoredump(struct xe_sched_job *job)
 {
-	struct xe_device *xe = gt_to_xe(q->gt);
+	struct xe_device *xe = gt_to_xe(job->q->gt);
 	struct xe_devcoredump *coredump = &xe->devcoredump;
 
 	if (coredump->captured) {
@@ -184,7 +186,7 @@ void xe_devcoredump(struct xe_exec_queue *q)
 	}
 
 	coredump->captured = true;
-	devcoredump_snapshot(coredump, q);
+	devcoredump_snapshot(coredump, job);
 
 	drm_info(&xe->drm, "Xe device coredump has been created\n");
 	drm_info(&xe->drm, "Check your /sys/class/drm/card%d/device/devcoredump/data\n",
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h b/drivers/gpu/drm/xe/xe_devcoredump.h
index 6ac218a5c19458..df8671f0b5eb2f 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.h
+++ b/drivers/gpu/drm/xe/xe_devcoredump.h
@@ -7,12 +7,12 @@
 #define _XE_DEVCOREDUMP_H_
 
 struct xe_device;
-struct xe_exec_queue;
+struct xe_sched_job;
 
 #ifdef CONFIG_DEV_COREDUMP
-void xe_devcoredump(struct xe_exec_queue *q);
+void xe_devcoredump(struct xe_sched_job *job);
 #else
-static inline void xe_devcoredump(struct xe_exec_queue *q)
+static inline void xe_devcoredump(struct xe_sched_job *job)
 {
 }
 #endif
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 7c29b8333c7192..2b008ec1b6de59 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -934,7 +934,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		drm_notice(&xe->drm, "Timedout job: seqno=%u, guc_id=%d, flags=0x%lx",
 			   xe_sched_job_seqno(job), q->guc->id, q->flags);
 		simple_error_capture(q);
-		xe_devcoredump(q);
+		xe_devcoredump(job);
 	} else {
 		drm_dbg(&xe->drm, "Timedout signaled job: seqno=%u, guc_id=%d, flags=0x%lx",
 			 xe_sched_job_seqno(job), q->guc->id, q->flags);
@@ -1780,7 +1780,7 @@ guc_exec_queue_wq_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
 
 /**
  * xe_guc_exec_queue_snapshot_capture - Take a quick snapshot of the GuC Engine.
- * @q: Xe exec queue.
+ * @job: faulty Xe scheduled job.
  *
  * This can be printed out in a later stage like during dev_coredump
  * analysis.
@@ -1789,12 +1789,12 @@ guc_exec_queue_wq_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
  * caller, using `xe_guc_exec_queue_snapshot_free`.
  */
 struct xe_guc_submit_exec_queue_snapshot *
-xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q)
+xe_guc_exec_queue_snapshot_capture(struct xe_sched_job *job)
 {
+	struct xe_exec_queue *q = job->q;
 	struct xe_guc *guc = exec_queue_to_guc(q);
 	struct xe_device *xe = guc_to_xe(guc);
 	struct xe_gpu_scheduler *sched = &q->guc->sched;
-	struct xe_sched_job *job;
 	struct xe_guc_submit_exec_queue_snapshot *snapshot;
 	int i;
 
@@ -1852,14 +1852,16 @@ xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q)
 	if (!snapshot->pending_list) {
 		drm_err(&xe->drm, "Skipping GuC Engine pending_list snapshot.\n");
 	} else {
+		struct xe_sched_job *job_iter;
+
 		i = 0;
-		list_for_each_entry(job, &sched->base.pending_list, drm.list) {
+		list_for_each_entry(job_iter, &sched->base.pending_list, drm.list) {
 			snapshot->pending_list[i].seqno =
-				xe_sched_job_seqno(job);
+				xe_sched_job_seqno(job_iter);
 			snapshot->pending_list[i].fence =
-				dma_fence_is_signaled(job->fence) ? 1 : 0;
+				dma_fence_is_signaled(job_iter->fence) ? 1 : 0;
 			snapshot->pending_list[i].finished =
-				dma_fence_is_signaled(&job->drm.s_fence->finished)
+				dma_fence_is_signaled(&job_iter->drm.s_fence->finished)
 				? 1 : 0;
 			i++;
 		}
@@ -1945,10 +1947,28 @@ void xe_guc_exec_queue_snapshot_free(struct xe_guc_submit_exec_queue_snapshot *s
 static void guc_exec_queue_print(struct xe_exec_queue *q, struct drm_printer *p)
 {
 	struct xe_guc_submit_exec_queue_snapshot *snapshot;
+	struct xe_gpu_scheduler *sched = &q->guc->sched;
+	struct xe_sched_job *job;
+	bool found = false;
+
+	spin_lock(&sched->base.job_list_lock);
+	list_for_each_entry(job, &sched->base.pending_list, drm.list) {
+		if (job->q == q) {
+			xe_sched_job_get(job);
+			found = true;
+			break;
+		}
+	}
+	spin_unlock(&sched->base.job_list_lock);
 
-	snapshot = xe_guc_exec_queue_snapshot_capture(q);
+	if (!found)
+		return;
+
+	snapshot = xe_guc_exec_queue_snapshot_capture(job);
 	xe_guc_exec_queue_snapshot_print(snapshot, p);
 	xe_guc_exec_queue_snapshot_free(snapshot);
+
+	xe_sched_job_put(job);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
index fc97869c5b865d..723dc2bd8df912 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.h
+++ b/drivers/gpu/drm/xe/xe_guc_submit.h
@@ -9,8 +9,8 @@
 #include <linux/types.h>
 
 struct drm_printer;
-struct xe_exec_queue;
 struct xe_guc;
+struct xe_sched_job;
 
 int xe_guc_submit_init(struct xe_guc *guc);
 
@@ -27,7 +27,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
 int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
 
 struct xe_guc_submit_exec_queue_snapshot *
-xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
+xe_guc_exec_queue_snapshot_capture(struct xe_sched_job *job);
 void
 xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snapshot,
 				 struct drm_printer *p);

From 83ef64ebde37db364bffa19801486c957daffab0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Tue, 23 Jan 2024 12:44:48 -0800
Subject: [PATCH 116/200] drm/xe: Nuke xe from xe_devcoredump
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

xe is never set in xe_devcoredump but if xe_device is needed
devcoredump_to_xe_device() can be used.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-3-jose.souza@intel.com
---
 drivers/gpu/drm/xe/xe_devcoredump_types.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h
index 7fdad9c3d3dde6..50106efcbc29d7 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump_types.h
+++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h
@@ -44,8 +44,6 @@ struct xe_devcoredump_snapshot {
  * for reading the information.
  */
 struct xe_devcoredump {
-	/** @xe: Xe device. */
-	struct xe_device *xe;
 	/** @captured: The snapshot of the first hang has already been taken. */
 	bool captured;
 	/** @snapshot: Snapshot is captured at time of the first crash */

From facd388708f06ab0c5cf492323c130e924c87aed Mon Sep 17 00:00:00 2001
From: Matt Roper <matthew.d.roper@intel.com>
Date: Tue, 23 Jan 2024 12:44:49 -0800
Subject: [PATCH 117/200] drm/xe: Stash GMD_ID value in xe_gt
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Although we've stored the major and minor versions for graphics/media in
xe_device, it will be simpler to implement the uapi version query if we
also stash the raw register value in the GT itself.

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-4-jose.souza@intel.com
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.c       | 6 ++++++
 drivers/gpu/drm/xe/xe_gt_types.h | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 1fe4d54409d344..47cb87f599e463 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -399,6 +399,12 @@ static int gt_fw_domain_init(struct xe_gt *gt)
 	/* Initialize CCS mode sysfs after early initialization of HW engines */
 	xe_gt_ccs_mode_sysfs_init(gt);
 
+	/*
+	 * Stash hardware-reported version.  Since this register does not exist
+	 * on pre-MTL platforms, reading it there will (correctly) return 0.
+	 */
+	gt->info.gmdid = xe_mmio_read32(gt, GMD_ID);
+
 	err = xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
 	XE_WARN_ON(err);
 	xe_device_mem_access_put(gt_to_xe(gt));
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index 3caaea2ff908a8..70c615dd149865 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -117,6 +117,8 @@ struct xe_gt {
 		 * hwconfig blob.
 		 */
 		u64 __engine_mask;
+		/** @info.gmdid: raw GMD_ID value from hardware */
+		u32 gmdid;
 	} info;
 
 	/**

From 4376cee62092ac79ecce1a4a99f1ffd61f50d47f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Tue, 23 Jan 2024 12:44:50 -0800
Subject: [PATCH 118/200] drm/xe: Print more device information in devcoredump
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

To properly decode batch buffer Mesa tools needs to know what
platform is this one, for now we can do that with PCI id but
already making it future proof by also printing GTs GMD version.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-5-jose.souza@intel.com
---
 drivers/gpu/drm/xe/xe_devcoredump.c |  2 ++
 drivers/gpu/drm/xe/xe_device.c      | 20 ++++++++++++++++++++
 drivers/gpu/drm/xe/xe_device.h      |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index fd74dae2924339..e701f0d07b6767 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -63,6 +63,7 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
 				   size_t count, void *data, size_t datalen)
 {
 	struct xe_devcoredump *coredump = data;
+	struct xe_device *xe = coredump_to_xe(coredump);
 	struct xe_devcoredump_snapshot *ss;
 	struct drm_printer p;
 	struct drm_print_iterator iter;
@@ -89,6 +90,7 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
 	drm_printf(&p, "Snapshot time: %lld.%09ld\n", ts.tv_sec, ts.tv_nsec);
 	ts = ktime_to_timespec64(ss->boot_time);
 	drm_printf(&p, "Uptime: %lld.%09ld\n", ts.tv_sec, ts.tv_nsec);
+	xe_device_snapshot_print(xe, &p);
 
 	drm_printf(&p, "\n**** GuC CT ****\n");
 	xe_guc_ct_snapshot_print(coredump->snapshot.ct, &p);
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index ab417f4f7d2a77..6faa7865b1aabc 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -727,3 +727,23 @@ void xe_device_mem_access_put(struct xe_device *xe)
 
 	xe_assert(xe, ref >= 0);
 }
+
+void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p)
+{
+	struct xe_gt *gt;
+	u8 id;
+
+	drm_printf(p, "PCI ID: 0x%04x\n", xe->info.devid);
+	drm_printf(p, "PCI revision: 0x%02x\n", xe->info.revid);
+
+	for_each_gt(gt, xe, id) {
+		drm_printf(p, "GT id: %u\n", id);
+		drm_printf(p, "\tType: %s\n",
+			   gt->info.type == XE_GT_TYPE_MAIN ? "main" : "media");
+		drm_printf(p, "\tIP ver: %u.%u.%u\n",
+			   REG_FIELD_GET(GMD_ID_ARCH_MASK, gt->info.gmdid),
+			   REG_FIELD_GET(GMD_ID_RELEASE_MASK, gt->info.gmdid),
+			   REG_FIELD_GET(GMD_ID_REVID, gt->info.gmdid));
+		drm_printf(p, "\tCS reference clock: %u\n", gt->info.reference_clock);
+	}
+}
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index af8ac2e9e27095..270124da1e00e6 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -175,4 +175,6 @@ static inline bool xe_device_has_memirq(struct xe_device *xe)
 
 u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
 
+void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p);
+
 #endif

From 89e394f0db473922a180ca65ae9b9858760fc803 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Tue, 23 Jan 2024 12:44:51 -0800
Subject: [PATCH 119/200] drm/xe: Print registers spread in 2 u32 as u64
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This makes easier to use those registers when copying its values to
calculator also makes easier for tools to parse it.

To avoids padding holes in xe_hw_engine_snapshot the u64 variables
were moved to the top of xe_hw_engine_snapshot.reg.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-6-jose.souza@intel.com
---
 drivers/gpu/drm/xe/xe_hw_engine.c       | 69 ++++++++++++-------------
 drivers/gpu/drm/xe/xe_hw_engine_types.h | 30 ++++-------
 2 files changed, 42 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index 789736a34c1ba2..f7d66b71f1d717 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -749,6 +749,7 @@ struct xe_hw_engine_snapshot *
 xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe)
 {
 	struct xe_hw_engine_snapshot *snapshot;
+	u64 val;
 
 	if (!xe_hw_engine_is_valid(hwe))
 		return NULL;
@@ -766,19 +767,31 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe)
 						    hwe->domain);
 	snapshot->mmio_base = hwe->mmio_base;
 
-	snapshot->reg.ring_hwstam = hw_engine_mmio_read32(hwe, RING_HWSTAM(0));
-	snapshot->reg.ring_hws_pga = hw_engine_mmio_read32(hwe,
-							   RING_HWS_PGA(0));
-	snapshot->reg.ring_execlist_status_lo =
+	snapshot->reg.ring_execlist_status =
 		hw_engine_mmio_read32(hwe, RING_EXECLIST_STATUS_LO(0));
-	snapshot->reg.ring_execlist_status_hi =
-		hw_engine_mmio_read32(hwe, RING_EXECLIST_STATUS_HI(0));
-	snapshot->reg.ring_execlist_sq_contents_lo =
-		hw_engine_mmio_read32(hwe,
-				      RING_EXECLIST_SQ_CONTENTS_LO(0));
-	snapshot->reg.ring_execlist_sq_contents_hi =
-		hw_engine_mmio_read32(hwe,
-				      RING_EXECLIST_SQ_CONTENTS_HI(0));
+	val = hw_engine_mmio_read32(hwe, RING_EXECLIST_STATUS_HI(0));
+	snapshot->reg.ring_execlist_status |= val << 32;
+
+	snapshot->reg.ring_execlist_sq_contents =
+		hw_engine_mmio_read32(hwe, RING_EXECLIST_SQ_CONTENTS_LO(0));
+	val = hw_engine_mmio_read32(hwe, RING_EXECLIST_SQ_CONTENTS_HI(0));
+	snapshot->reg.ring_execlist_sq_contents |= val << 32;
+
+	snapshot->reg.ring_acthd = hw_engine_mmio_read32(hwe, RING_ACTHD(0));
+	val = hw_engine_mmio_read32(hwe, RING_ACTHD_UDW(0));
+	snapshot->reg.ring_acthd |= val << 32;
+
+	snapshot->reg.ring_bbaddr = hw_engine_mmio_read32(hwe, RING_BBADDR(0));
+	val = hw_engine_mmio_read32(hwe, RING_BBADDR_UDW(0));
+	snapshot->reg.ring_bbaddr |= val << 32;
+
+	snapshot->reg.ring_dma_fadd =
+		hw_engine_mmio_read32(hwe, RING_DMA_FADD(0));
+	val = hw_engine_mmio_read32(hwe, RING_DMA_FADD_UDW(0));
+	snapshot->reg.ring_dma_fadd |= val << 32;
+
+	snapshot->reg.ring_hwstam = hw_engine_mmio_read32(hwe, RING_HWSTAM(0));
+	snapshot->reg.ring_hws_pga = hw_engine_mmio_read32(hwe, RING_HWS_PGA(0));
 	snapshot->reg.ring_start = hw_engine_mmio_read32(hwe, RING_START(0));
 	snapshot->reg.ring_head =
 		hw_engine_mmio_read32(hwe, RING_HEAD(0)) & HEAD_ADDR;
@@ -792,16 +805,6 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe)
 	snapshot->reg.ring_esr = hw_engine_mmio_read32(hwe, RING_ESR(0));
 	snapshot->reg.ring_emr = hw_engine_mmio_read32(hwe, RING_EMR(0));
 	snapshot->reg.ring_eir = hw_engine_mmio_read32(hwe, RING_EIR(0));
-	snapshot->reg.ring_acthd_udw =
-		hw_engine_mmio_read32(hwe, RING_ACTHD_UDW(0));
-	snapshot->reg.ring_acthd = hw_engine_mmio_read32(hwe, RING_ACTHD(0));
-	snapshot->reg.ring_bbaddr_udw =
-		hw_engine_mmio_read32(hwe, RING_BBADDR_UDW(0));
-	snapshot->reg.ring_bbaddr = hw_engine_mmio_read32(hwe, RING_BBADDR(0));
-	snapshot->reg.ring_dma_fadd_udw =
-		hw_engine_mmio_read32(hwe, RING_DMA_FADD_UDW(0));
-	snapshot->reg.ring_dma_fadd =
-		hw_engine_mmio_read32(hwe, RING_DMA_FADD(0));
 	snapshot->reg.ipehr = hw_engine_mmio_read32(hwe, RING_IPEHR(0));
 
 	if (snapshot->class == XE_ENGINE_CLASS_COMPUTE)
@@ -830,14 +833,10 @@ void xe_hw_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot,
 		   snapshot->forcewake.domain, snapshot->forcewake.ref);
 	drm_printf(p, "\tHWSTAM: 0x%08x\n", snapshot->reg.ring_hwstam);
 	drm_printf(p, "\tRING_HWS_PGA: 0x%08x\n", snapshot->reg.ring_hws_pga);
-	drm_printf(p, "\tRING_EXECLIST_STATUS_LO: 0x%08x\n",
-		   snapshot->reg.ring_execlist_status_lo);
-	drm_printf(p, "\tRING_EXECLIST_STATUS_HI: 0x%08x\n",
-		   snapshot->reg.ring_execlist_status_hi);
-	drm_printf(p, "\tRING_EXECLIST_SQ_CONTENTS_LO: 0x%08x\n",
-		   snapshot->reg.ring_execlist_sq_contents_lo);
-	drm_printf(p, "\tRING_EXECLIST_SQ_CONTENTS_HI: 0x%08x\n",
-		   snapshot->reg.ring_execlist_sq_contents_hi);
+	drm_printf(p, "\tRING_EXECLIST_STATUS: 0x%016llx\n",
+		   snapshot->reg.ring_execlist_status);
+	drm_printf(p, "\tRING_EXECLIST_SQ_CONTENTS: 0x%016llx\n",
+		   snapshot->reg.ring_execlist_sq_contents);
 	drm_printf(p, "\tRING_START: 0x%08x\n", snapshot->reg.ring_start);
 	drm_printf(p, "\tRING_HEAD:  0x%08x\n", snapshot->reg.ring_head);
 	drm_printf(p, "\tRING_TAIL:  0x%08x\n", snapshot->reg.ring_tail);
@@ -849,13 +848,9 @@ void xe_hw_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot,
 	drm_printf(p, "\tRING_ESR:   0x%08x\n", snapshot->reg.ring_esr);
 	drm_printf(p, "\tRING_EMR:   0x%08x\n", snapshot->reg.ring_emr);
 	drm_printf(p, "\tRING_EIR:   0x%08x\n", snapshot->reg.ring_eir);
-	drm_printf(p, "\tACTHD:  0x%08x_%08x\n", snapshot->reg.ring_acthd_udw,
-		   snapshot->reg.ring_acthd);
-	drm_printf(p, "\tBBADDR: 0x%08x_%08x\n", snapshot->reg.ring_bbaddr_udw,
-		   snapshot->reg.ring_bbaddr);
-	drm_printf(p, "\tDMA_FADDR: 0x%08x_%08x\n",
-		   snapshot->reg.ring_dma_fadd_udw,
-		   snapshot->reg.ring_dma_fadd);
+	drm_printf(p, "\tACTHD:  0x%016llx\n", snapshot->reg.ring_acthd);
+	drm_printf(p, "\tBBADDR: 0x%016llx\n", snapshot->reg.ring_bbaddr);
+	drm_printf(p, "\tDMA_FADDR: 0x%016llx\n", snapshot->reg.ring_dma_fadd);
 	drm_printf(p, "\tIPEHR: 0x%08x\n", snapshot->reg.ipehr);
 	if (snapshot->class == XE_ENGINE_CLASS_COMPUTE)
 		drm_printf(p, "\tRCU_MODE: 0x%08x\n",
diff --git a/drivers/gpu/drm/xe/xe_hw_engine_types.h b/drivers/gpu/drm/xe/xe_hw_engine_types.h
index 61d9027e4665ad..d7f828c76cc5f3 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine_types.h
+++ b/drivers/gpu/drm/xe/xe_hw_engine_types.h
@@ -173,18 +173,20 @@ struct xe_hw_engine_snapshot {
 	u32 mmio_base;
 	/** @reg: Useful MMIO register snapshot */
 	struct {
+		/** @reg.ring_execlist_status: RING_EXECLIST_STATUS */
+		u64 ring_execlist_status;
+		/** @reg.ring_execlist_sq_contents: RING_EXECLIST_SQ_CONTENTS */
+		u64 ring_execlist_sq_contents;
+		/** @reg.ring_acthd: RING_ACTHD */
+		u64 ring_acthd;
+		/** @reg.ring_bbaddr: RING_BBADDR */
+		u64 ring_bbaddr;
+		/** @reg.ring_dma_fadd: RING_DMA_FADD */
+		u64 ring_dma_fadd;
 		/** @reg.ring_hwstam: RING_HWSTAM */
 		u32 ring_hwstam;
 		/** @reg.ring_hws_pga: RING_HWS_PGA */
 		u32 ring_hws_pga;
-		/** @reg.ring_execlist_status_lo: RING_EXECLIST_STATUS_LO */
-		u32 ring_execlist_status_lo;
-		/** @reg.ring_execlist_status_hi: RING_EXECLIST_STATUS_HI */
-		u32 ring_execlist_status_hi;
-		/** @reg.ring_execlist_sq_contents_lo: RING_EXECLIST_SQ_CONTENTS */
-		u32 ring_execlist_sq_contents_lo;
-		/** @reg.ring_execlist_sq_contents_hi: RING_EXECLIST_SQ_CONTENTS + 4 */
-		u32 ring_execlist_sq_contents_hi;
 		/** @reg.ring_start: RING_START */
 		u32 ring_start;
 		/** @reg.ring_head: RING_HEAD */
@@ -205,18 +207,6 @@ struct xe_hw_engine_snapshot {
 		u32 ring_emr;
 		/** @reg.ring_eir: RING_EIR */
 		u32 ring_eir;
-		/** @reg.ring_acthd_udw: RING_ACTHD_UDW */
-		u32 ring_acthd_udw;
-		/** @reg.ring_acthd: RING_ACTHD */
-		u32 ring_acthd;
-		/** @reg.ring_bbaddr_udw: RING_BBADDR_UDW */
-		u32 ring_bbaddr_udw;
-		/** @reg.ring_bbaddr: RING_BBADDR */
-		u32 ring_bbaddr;
-		/** @reg.ring_dma_fadd_udw: RING_DMA_FADD_UDW */
-		u32 ring_dma_fadd_udw;
-		/** @reg.ring_dma_fadd: RING_DMA_FADD */
-		u32 ring_dma_fadd;
 		/** @reg.ipehr: IPEHR */
 		u32 ipehr;
 		/** @reg.rcu_mode: RCU_MODE */

From 28a98c39fa9b917cea04cb429eb1744e161c82f0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Tue, 23 Jan 2024 12:44:52 -0800
Subject: [PATCH 120/200] drm/xe: Remove additional spaces in devcoredump HW
 Engines section
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

I guess the indention was to keep it visually aligned but that
would require a lot of spaces and was not followed by other registers
so lets just drop it.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-7-jose.souza@intel.com
---
 drivers/gpu/drm/xe/xe_hw_engine.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index f7d66b71f1d717..0d17e32d70c87d 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -838,17 +838,17 @@ void xe_hw_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot,
 	drm_printf(p, "\tRING_EXECLIST_SQ_CONTENTS: 0x%016llx\n",
 		   snapshot->reg.ring_execlist_sq_contents);
 	drm_printf(p, "\tRING_START: 0x%08x\n", snapshot->reg.ring_start);
-	drm_printf(p, "\tRING_HEAD:  0x%08x\n", snapshot->reg.ring_head);
-	drm_printf(p, "\tRING_TAIL:  0x%08x\n", snapshot->reg.ring_tail);
+	drm_printf(p, "\tRING_HEAD: 0x%08x\n", snapshot->reg.ring_head);
+	drm_printf(p, "\tRING_TAIL: 0x%08x\n", snapshot->reg.ring_tail);
 	drm_printf(p, "\tRING_CTL: 0x%08x\n", snapshot->reg.ring_ctl);
 	drm_printf(p, "\tRING_MI_MODE: 0x%08x\n", snapshot->reg.ring_mi_mode);
 	drm_printf(p, "\tRING_MODE: 0x%08x\n",
 		   snapshot->reg.ring_mode);
-	drm_printf(p, "\tRING_IMR:   0x%08x\n", snapshot->reg.ring_imr);
-	drm_printf(p, "\tRING_ESR:   0x%08x\n", snapshot->reg.ring_esr);
-	drm_printf(p, "\tRING_EMR:   0x%08x\n", snapshot->reg.ring_emr);
-	drm_printf(p, "\tRING_EIR:   0x%08x\n", snapshot->reg.ring_eir);
-	drm_printf(p, "\tACTHD:  0x%016llx\n", snapshot->reg.ring_acthd);
+	drm_printf(p, "\tRING_IMR: 0x%08x\n", snapshot->reg.ring_imr);
+	drm_printf(p, "\tRING_ESR: 0x%08x\n", snapshot->reg.ring_esr);
+	drm_printf(p, "\tRING_EMR: 0x%08x\n", snapshot->reg.ring_emr);
+	drm_printf(p, "\tRING_EIR: 0x%08x\n", snapshot->reg.ring_eir);
+	drm_printf(p, "\tACTHD: 0x%016llx\n", snapshot->reg.ring_acthd);
 	drm_printf(p, "\tBBADDR: 0x%016llx\n", snapshot->reg.ring_bbaddr);
 	drm_printf(p, "\tDMA_FADDR: 0x%016llx\n", snapshot->reg.ring_dma_fadd);
 	drm_printf(p, "\tIPEHR: 0x%08x\n", snapshot->reg.ipehr);

From 439987f6f471c2d8c99e77d3aa75cda597796b9d Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula@intel.com>
Date: Wed, 24 Jan 2024 11:05:15 +0200
Subject: [PATCH 121/200] drm/xe: don't build debugfs files when
 CONFIG_DEBUG_FS=n

If we unconditionally build the debugfs files, we'll get both the static
inline stubs from the headers and the real functions for
CONFIG_DEBUG_FS=n. Avoid building the debugfs files with that config.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/r/152521f9-119f-4c61-b467-3e91f4aecb1a@infradead.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124090515.3363901-1-jani.nikula@intel.com
---
 drivers/gpu/drm/xe/Makefile | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index fe8b266a981918..abb2be8268d0df 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -222,8 +222,6 @@ xe-$(CONFIG_DRM_XE_DISPLAY) += \
 	i915-display/intel_ddi.o \
 	i915-display/intel_ddi_buf_trans.o \
 	i915-display/intel_display.o \
-	i915-display/intel_display_debugfs.o \
-	i915-display/intel_display_debugfs_params.o \
 	i915-display/intel_display_device.o \
 	i915-display/intel_display_driver.o \
 	i915-display/intel_display_irq.o \
@@ -267,7 +265,6 @@ xe-$(CONFIG_DRM_XE_DISPLAY) += \
 	i915-display/intel_modeset_setup.o \
 	i915-display/intel_modeset_verify.o \
 	i915-display/intel_panel.o \
-	i915-display/intel_pipe_crc.o \
 	i915-display/intel_pmdemand.o \
 	i915-display/intel_pps.o \
 	i915-display/intel_psr.o \
@@ -294,6 +291,13 @@ ifeq ($(CONFIG_DRM_FBDEV_EMULATION),y)
 	xe-$(CONFIG_DRM_XE_DISPLAY) += i915-display/intel_fbdev.o
 endif
 
+ifeq ($(CONFIG_DEBUG_FS),y)
+	xe-$(CONFIG_DRM_XE_DISPLAY) += \
+		i915-display/intel_display_debugfs.o \
+		i915-display/intel_display_debugfs_params.o \
+		i915-display/intel_pipe_crc.o
+endif
+
 obj-$(CONFIG_DRM_XE) += xe.o
 obj-$(CONFIG_DRM_XE_KUNIT_TEST) += tests/
 

From c6878e47431c72168da08dfbc1496c09b2d3c246 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Wed, 24 Jan 2024 09:18:30 -0800
Subject: [PATCH 122/200] drm/xe: Fix crash in trace_dma_fence_init()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

trace_dma_fence_init() uses dma_fence_ops functions
like get_driver_name() and get_timeline_name() to generate trace
information but the Xe KMD implementation of those functions makes
use of xe_hw_fence_ctx that was being set after dma_fence_init().

So here just inverting the order to fix the crash.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124171830.95774-1-jose.souza@intel.com
---
 drivers/gpu/drm/xe/xe_hw_fence.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index a6094c81f2ad0f..a5de3e7b0bd6ab 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -217,13 +217,13 @@ struct xe_hw_fence *xe_hw_fence_create(struct xe_hw_fence_ctx *ctx,
 	if (!fence)
 		return ERR_PTR(-ENOMEM);
 
-	dma_fence_init(&fence->dma, &xe_hw_fence_ops, &ctx->irq->lock,
-		       ctx->dma_fence_ctx, ctx->next_seqno++);
-
 	fence->ctx = ctx;
 	fence->seqno_map = seqno_map;
 	INIT_LIST_HEAD(&fence->irq_link);
 
+	dma_fence_init(&fence->dma, &xe_hw_fence_ops, &ctx->irq->lock,
+		       ctx->dma_fence_ctx, ctx->next_seqno++);
+
 	trace_xe_hw_fence_create(fence);
 
 	return fence;

From dc75d03716fe3944210a9381884c6b699fe0de90 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Mon, 22 Jan 2024 13:01:54 -0800
Subject: [PATCH 123/200] drm/xe/guc: Add more GuC CT states

The Guc CT has more than enabled / disables states rather it has 4. The
4 states are not initialized, disabled, stopped, and enabled. Change the
code to reflect this. These states will enable proper return codes from
functions and therefore enable proper error messages.

v2:
- s/XE_GUC_CT_STATE_DROP_MESSAGES/XE_GUC_CT_STATE_STOPPED (Michal)
- Add assert for CT being initialized (Michal)
- Fix kernel for CT state enum (Michal)

v3:
- Kernel doc (Michal)
- s/reiecved/received (Michal)
- assert CT state not initialized in xe_guc_ct_init (Michal)
- add argument xe_guc_ct_set_state to clear g2h (Michal)

v4:
- Drop clear_outstanding_g2h argument (Michal)

v5:
- Move xa_destroy outside of fast lock (CI)

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122210156.1517444-2-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_guc.c          |  4 +-
 drivers/gpu/drm/xe/xe_guc_ct.c       | 80 ++++++++++++++++++++++------
 drivers/gpu/drm/xe/xe_guc_ct.h       |  8 ++-
 drivers/gpu/drm/xe/xe_guc_ct_types.h | 18 ++++++-
 4 files changed, 88 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 576ff2c1fbb965..fcb8a9efac7049 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -684,7 +684,7 @@ int xe_guc_mmio_send_recv(struct xe_guc *guc, const u32 *request,
 
 	BUILD_BUG_ON(VF_SW_FLAG_COUNT != MED_VF_SW_FLAG_COUNT);
 
-	xe_assert(xe, !guc->ct.enabled);
+	xe_assert(xe, !xe_guc_ct_enabled(&guc->ct));
 	xe_assert(xe, len);
 	xe_assert(xe, len <= VF_SW_FLAG_COUNT);
 	xe_assert(xe, len <= MED_VF_SW_FLAG_COUNT);
@@ -870,7 +870,7 @@ int xe_guc_stop(struct xe_guc *guc)
 {
 	int ret;
 
-	xe_guc_ct_disable(&guc->ct);
+	xe_guc_ct_stop(&guc->ct);
 
 	ret = xe_guc_submit_stop(guc);
 	if (ret)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 0c073556909483..d480355e10a0ed 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -166,6 +166,8 @@ int xe_guc_ct_init(struct xe_guc_ct *ct)
 	if (err)
 		return err;
 
+	xe_assert(xe, ct->state == XE_GUC_CT_STATE_NOT_INITIALIZED);
+	ct->state = XE_GUC_CT_STATE_DISABLED;
 	return 0;
 }
 
@@ -285,12 +287,35 @@ static int guc_ct_control_toggle(struct xe_guc_ct *ct, bool enable)
 	return ret > 0 ? -EPROTO : ret;
 }
 
+static void xe_guc_ct_set_state(struct xe_guc_ct *ct,
+				enum xe_guc_ct_state state)
+{
+	mutex_lock(&ct->lock);		/* Serialise dequeue_one_g2h() */
+	spin_lock_irq(&ct->fast_lock);	/* Serialise CT fast-path */
+
+	xe_gt_assert(ct_to_gt(ct), ct->g2h_outstanding == 0 ||
+		     state == XE_GUC_CT_STATE_STOPPED);
+
+	ct->g2h_outstanding = 0;
+	ct->state = state;
+
+	spin_unlock_irq(&ct->fast_lock);
+
+	/*
+	 * Lockdep doesn't like this under the fast lock and he destroy only
+	 * needs to be serialized with the send path which ct lock provides.
+	 */
+	xa_destroy(&ct->fence_lookup);
+
+	mutex_unlock(&ct->lock);
+}
+
 int xe_guc_ct_enable(struct xe_guc_ct *ct)
 {
 	struct xe_device *xe = ct_to_xe(ct);
 	int err;
 
-	xe_assert(xe, !ct->enabled);
+	xe_assert(xe, !xe_guc_ct_enabled(ct));
 
 	guc_ct_ctb_h2g_init(xe, &ct->ctbs.h2g, &ct->bo->vmap);
 	guc_ct_ctb_g2h_init(xe, &ct->ctbs.g2h, &ct->bo->vmap);
@@ -307,12 +332,7 @@ int xe_guc_ct_enable(struct xe_guc_ct *ct)
 	if (err)
 		goto err_out;
 
-	mutex_lock(&ct->lock);
-	spin_lock_irq(&ct->fast_lock);
-	ct->g2h_outstanding = 0;
-	ct->enabled = true;
-	spin_unlock_irq(&ct->fast_lock);
-	mutex_unlock(&ct->lock);
+	xe_guc_ct_set_state(ct, XE_GUC_CT_STATE_ENABLED);
 
 	smp_mb();
 	wake_up_all(&ct->wq);
@@ -326,15 +346,26 @@ int xe_guc_ct_enable(struct xe_guc_ct *ct)
 	return err;
 }
 
+/**
+ * xe_guc_ct_disable - Set GuC to disabled state
+ * @ct: the &xe_guc_ct
+ *
+ * Set GuC CT to disabled state, no outstanding g2h expected in this transition
+ */
 void xe_guc_ct_disable(struct xe_guc_ct *ct)
 {
-	mutex_lock(&ct->lock); /* Serialise dequeue_one_g2h() */
-	spin_lock_irq(&ct->fast_lock); /* Serialise CT fast-path */
-	ct->enabled = false; /* Finally disable CT communication */
-	spin_unlock_irq(&ct->fast_lock);
-	mutex_unlock(&ct->lock);
+	xe_guc_ct_set_state(ct, XE_GUC_CT_STATE_DISABLED);
+}
 
-	xa_destroy(&ct->fence_lookup);
+/**
+ * xe_guc_ct_stop - Set GuC to stopped state
+ * @ct: the &xe_guc_ct
+ *
+ * Set GuC CT to stopped state and clear any outstanding g2h
+ */
+void xe_guc_ct_stop(struct xe_guc_ct *ct)
+{
+	xe_guc_ct_set_state(ct, XE_GUC_CT_STATE_STOPPED);
 }
 
 static bool h2g_has_room(struct xe_guc_ct *ct, u32 cmd_len)
@@ -509,6 +540,7 @@ static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
 	u16 seqno;
 	int ret;
 
+	xe_assert(xe, ct->state != XE_GUC_CT_STATE_NOT_INITIALIZED);
 	xe_assert(xe, !g2h_len || !g2h_fence);
 	xe_assert(xe, !num_g2h || !g2h_fence);
 	xe_assert(xe, !g2h_len || num_g2h);
@@ -520,11 +552,18 @@ static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
 		goto out;
 	}
 
-	if (unlikely(!ct->enabled)) {
+	if (ct->state == XE_GUC_CT_STATE_DISABLED) {
 		ret = -ENODEV;
 		goto out;
 	}
 
+	if (ct->state == XE_GUC_CT_STATE_STOPPED) {
+		ret = -ECANCELED;
+		goto out;
+	}
+
+	xe_assert(xe, xe_guc_ct_enabled(ct));
+
 	if (g2h_fence) {
 		g2h_len = GUC_CTB_HXG_MSG_MAX_LEN;
 		num_g2h = 1;
@@ -712,7 +751,8 @@ static bool retry_failure(struct xe_guc_ct *ct, int ret)
 		return false;
 
 #define ct_alive(ct)	\
-	(ct->enabled && !ct->ctbs.h2g.info.broken && !ct->ctbs.g2h.info.broken)
+	(xe_guc_ct_enabled(ct) && !ct->ctbs.h2g.info.broken && \
+	 !ct->ctbs.g2h.info.broken)
 	if (!wait_event_interruptible_timeout(ct->wq, ct_alive(ct),  HZ * 5))
 		return false;
 #undef ct_alive
@@ -1031,14 +1071,20 @@ static int g2h_read(struct xe_guc_ct *ct, u32 *msg, bool fast_path)
 	u32 action;
 	u32 *hxg;
 
+	xe_assert(xe, ct->state != XE_GUC_CT_STATE_NOT_INITIALIZED);
 	lockdep_assert_held(&ct->fast_lock);
 
-	if (!ct->enabled)
+	if (ct->state == XE_GUC_CT_STATE_DISABLED)
 		return -ENODEV;
 
+	if (ct->state == XE_GUC_CT_STATE_STOPPED)
+		return -ECANCELED;
+
 	if (g2h->info.broken)
 		return -EPIPE;
 
+	xe_assert(xe, xe_guc_ct_enabled(ct));
+
 	/* Calculate DW available to read */
 	tail = desc_read(xe, g2h, tail);
 	avail = tail - g2h->info.head;
@@ -1340,7 +1386,7 @@ struct xe_guc_ct_snapshot *xe_guc_ct_snapshot_capture(struct xe_guc_ct *ct,
 		return NULL;
 	}
 
-	if (ct->enabled) {
+	if (xe_guc_ct_enabled(ct)) {
 		snapshot->ct_enabled = true;
 		snapshot->g2h_outstanding = READ_ONCE(ct->g2h_outstanding);
 		guc_ctb_snapshot_capture(xe, &ct->ctbs.h2g,
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h
index 9ecb67db8ec404..5083e099064f4b 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.h
+++ b/drivers/gpu/drm/xe/xe_guc_ct.h
@@ -13,6 +13,7 @@ struct drm_printer;
 int xe_guc_ct_init(struct xe_guc_ct *ct);
 int xe_guc_ct_enable(struct xe_guc_ct *ct);
 void xe_guc_ct_disable(struct xe_guc_ct *ct);
+void xe_guc_ct_stop(struct xe_guc_ct *ct);
 void xe_guc_ct_fast_path(struct xe_guc_ct *ct);
 
 struct xe_guc_ct_snapshot *
@@ -22,9 +23,14 @@ void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot,
 void xe_guc_ct_snapshot_free(struct xe_guc_ct_snapshot *snapshot);
 void xe_guc_ct_print(struct xe_guc_ct *ct, struct drm_printer *p, bool atomic);
 
+static inline bool xe_guc_ct_enabled(struct xe_guc_ct *ct)
+{
+	return ct->state == XE_GUC_CT_STATE_ENABLED;
+}
+
 static inline void xe_guc_ct_irq_handler(struct xe_guc_ct *ct)
 {
-	if (!ct->enabled)
+	if (!xe_guc_ct_enabled(ct))
 		return;
 
 	wake_up_all(&ct->wq);
diff --git a/drivers/gpu/drm/xe/xe_guc_ct_types.h b/drivers/gpu/drm/xe/xe_guc_ct_types.h
index 68c871052341fa..d29144c9f20bbb 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_ct_types.h
@@ -72,6 +72,20 @@ struct xe_guc_ct_snapshot {
 	struct guc_ctb_snapshot h2g;
 };
 
+/**
+ * enum xe_guc_ct_state - CT state
+ * @XE_GUC_CT_STATE_NOT_INITIALIZED: CT not initialized, messages not expected in this state
+ * @XE_GUC_CT_STATE_DISABLED: CT disabled, messages not expected in this state
+ * @XE_GUC_CT_STATE_STOPPED: CT stopped, drop messages without errors
+ * @XE_GUC_CT_STATE_ENABLED: CT enabled, messages sent / received in this state
+ */
+enum xe_guc_ct_state {
+	XE_GUC_CT_STATE_NOT_INITIALIZED = 0,
+	XE_GUC_CT_STATE_DISABLED,
+	XE_GUC_CT_STATE_STOPPED,
+	XE_GUC_CT_STATE_ENABLED,
+};
+
 /**
  * struct xe_guc_ct - GuC command transport (CT) layer
  *
@@ -96,8 +110,8 @@ struct xe_guc_ct {
 	u32 g2h_outstanding;
 	/** @g2h_worker: worker to process G2H messages */
 	struct work_struct g2h_worker;
-	/** @enabled: CT enabled */
-	bool enabled;
+	/** @state: CT state */
+	enum xe_guc_ct_state state;
 	/** @fence_seqno: G2H fence seqno - 16 bits used by CT */
 	u32 fence_seqno;
 	/** @fence_lookup: G2H fence lookup */

From 83a7173bacc9eb627b04e23c3d15cbe0fa656497 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Mon, 22 Jan 2024 13:01:55 -0800
Subject: [PATCH 124/200] drm/xe: Move TLB invalidation reset before HW reset

This is a software reset which can be done immediately after stopping
the UC.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122210156.1517444-3-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_gt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 47cb87f599e463..675a2927a19ef1 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -638,12 +638,12 @@ static int gt_reset(struct xe_gt *gt)
 	if (err)
 		goto err_out;
 
+	xe_gt_tlb_invalidation_reset(gt);
+
 	err = do_gt_reset(gt);
 	if (err)
 		goto err_out;
 
-	xe_gt_tlb_invalidation_reset(gt);
-
 	err = do_gt_restart(gt);
 	if (err)
 		goto err_out;

From d688b86a290ecb9ca1a413f01da056be4b7a4914 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Mon, 22 Jan 2024 13:01:56 -0800
Subject: [PATCH 125/200] drm/xe/guc: Flush G2H handler when turning off CTs

Make sure G2H handler is not running when changing the CT state to drop
messages or disabled. This will help prevent races in the code ensuring
that G2H are not being processed after changing the state.

v2:
- s/flush_g2h_handler/stop_g2h_handler (Michal)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo remove the extra line while pushing]
Link: https://patchwork.freedesktop.org/patch/msgid/20240122210156.1517444-4-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index d480355e10a0ed..f3d356383cedfd 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -346,26 +346,34 @@ int xe_guc_ct_enable(struct xe_guc_ct *ct)
 	return err;
 }
 
+static void stop_g2h_handler(struct xe_guc_ct *ct)
+{
+	cancel_work_sync(&ct->g2h_worker);
+}
+
 /**
  * xe_guc_ct_disable - Set GuC to disabled state
  * @ct: the &xe_guc_ct
  *
- * Set GuC CT to disabled state, no outstanding g2h expected in this transition
+ * Set GuC CT to disabled state and stop g2h handler. No outstanding g2h expected
+ * in this transition.
  */
 void xe_guc_ct_disable(struct xe_guc_ct *ct)
 {
 	xe_guc_ct_set_state(ct, XE_GUC_CT_STATE_DISABLED);
+	stop_g2h_handler(ct);
 }
 
 /**
  * xe_guc_ct_stop - Set GuC to stopped state
  * @ct: the &xe_guc_ct
  *
- * Set GuC CT to stopped state and clear any outstanding g2h
+ * Set GuC CT to stopped state, stop g2h handler, and clear any outstanding g2h
  */
 void xe_guc_ct_stop(struct xe_guc_ct *ct)
 {
 	xe_guc_ct_set_state(ct, XE_GUC_CT_STATE_STOPPED);
+	stop_g2h_handler(ct);
 }
 
 static bool h2g_has_room(struct xe_guc_ct *ct, u32 cmd_len)

From 348769d1cbfab409b9ac21c653dd4db609760175 Mon Sep 17 00:00:00 2001
From: Fei Yang <fei.yang@intel.com>
Date: Wed, 24 Jan 2024 22:52:45 -0800
Subject: [PATCH 126/200] drm/xe: correct the assertion for number of PTEs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

While one MI_STORE_DATA_IMM can take no more than 0x1fe qwords,
the size of the pgtable can be 512 entries.

Fixes: 43d48379c939 ("drm/xe: correct the calculation of remaining size")
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Tested-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240125065245.1204731-2-fei.yang@intel.com
---
 drivers/gpu/drm/xe/xe_migrate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 7abf15546ced00..9ab004871f9a62 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -71,6 +71,7 @@ struct xe_migrate {
 #define NUM_KERNEL_PDE 17
 #define NUM_PT_SLOTS 32
 #define LEVEL0_PAGE_TABLE_ENCODE_SIZE SZ_2M
+#define MAX_NUM_PTE 512
 
 /*
  * Although MI_STORE_DATA_IMM's "length" field is 10-bits, 0x3FE is the largest
@@ -1107,7 +1108,7 @@ static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs,
 	 * This shouldn't be possible in practice.. might change when 16K
 	 * pages are used. Hence the assert.
 	 */
-	xe_tile_assert(tile, update->qwords <= MAX_PTE_PER_SDI);
+	xe_tile_assert(tile, update->qwords < MAX_NUM_PTE);
 	if (!ppgtt_ofs)
 		ppgtt_ofs = xe_migrate_vram_ofs(tile_to_xe(tile),
 						xe_bo_addr(update->pt_bo, 0,

From 9f5971bdf78e0937206556534247243ad56cd735 Mon Sep 17 00:00:00 2001
From: Matt Roper <matthew.d.roper@intel.com>
Date: Fri, 26 Jan 2024 14:06:14 -0800
Subject: [PATCH 127/200] drm/xe: Grab mem_access when disabling C6 on
 skip_guc_pc platforms

If skip_guc_pc is set for a platform, C6 is disabled directly without
acquiring a mem_access reference, triggering an assertion inside
xe_gt_idle_disable_c6.

Fixes: 975e4a3795d4 ("drm/xe: Manually setup C6 when skip_guc_pc is set")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240126220613.865939-2-matthew.d.roper@intel.com
---
 drivers/gpu/drm/xe/xe_guc_pc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
index f71085228cb339..d91702592520af 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.c
+++ b/drivers/gpu/drm/xe/xe_guc_pc.c
@@ -963,7 +963,9 @@ void xe_guc_pc_fini(struct xe_guc_pc *pc)
 	struct xe_device *xe = pc_to_xe(pc);
 
 	if (xe->info.skip_guc_pc) {
+		xe_device_mem_access_get(xe);
 		xe_gt_idle_disable_c6(pc_to_gt(pc));
+		xe_device_mem_access_put(xe);
 		return;
 	}
 

From 20485e3a810c480cef60caf53988619f61127e7b Mon Sep 17 00:00:00 2001
From: Badal Nilawar <badal.nilawar@intel.com>
Date: Sat, 27 Jan 2024 22:20:40 +0530
Subject: [PATCH 128/200] drm/hwmon: Fix abi doc warnings

This fixes warnings in xe, i915 hwmon docs:

Warning: /sys/devices/.../hwmon/hwmon<i>/curr1_crit is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:35  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:52
Warning: /sys/devices/.../hwmon/hwmon<i>/energy1_input is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:54  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:65
Warning: /sys/devices/.../hwmon/hwmon<i>/in0_input is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:46  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:0
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_crit is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:22  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:39
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_max is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:0  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:8
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_max_interval is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:62  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:30
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_rated_max is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:14  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:22

Use a path containing the driver name to differentiate the documentation
of each entry.

Fixes: fb1b70607f73 ("drm/xe/hwmon: Expose power attributes")
Fixes: 92d44a422d0d ("drm/xe/hwmon: Expose card reactive critical power")
Fixes: fbcdc9d3bf58 ("drm/xe/hwmon: Expose input voltage attribute")
Fixes: 71d0a32524f9 ("drm/xe/hwmon: Expose hwmon energy attribute")
Fixes: 4446fcf220ce ("drm/xe/hwmon: Expose power1_max_interval")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/all/20240125113345.291118ff@canb.auug.org.au/
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240127165040.2348009-1-badal.nilawar@intel.com
---
 .../ABI/testing/sysfs-driver-intel-i915-hwmon      | 14 +++++++-------
 .../ABI/testing/sysfs-driver-intel-xe-hwmon        | 14 +++++++-------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
index 8d7d8f05f6cd0a..92fe7c5c5ac1d1 100644
--- a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
+++ b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
@@ -1,4 +1,4 @@
-What:		/sys/devices/.../hwmon/hwmon<i>/in0_input
+What:		/sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/in0_input
 Date:		February 2023
 KernelVersion:	6.2
 Contact:	intel-gfx@lists.freedesktop.org
@@ -6,7 +6,7 @@ Description:	RO. Current Voltage in millivolt.
 
 		Only supported for particular Intel i915 graphics platforms.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/power1_max
+What:		/sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_max
 Date:		February 2023
 KernelVersion:	6.2
 Contact:	intel-gfx@lists.freedesktop.org
@@ -20,7 +20,7 @@ Description:	RW. Card reactive sustained  (PL1/Tau) power limit in microwatts.
 
 		Only supported for particular Intel i915 graphics platforms.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/power1_rated_max
+What:		/sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_rated_max
 Date:		February 2023
 KernelVersion:	6.2
 Contact:	intel-gfx@lists.freedesktop.org
@@ -28,7 +28,7 @@ Description:	RO. Card default power limit (default TDP setting).
 
 		Only supported for particular Intel i915 graphics platforms.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/power1_max_interval
+What:		/sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_max_interval
 Date:		February 2023
 KernelVersion:	6.2
 Contact:	intel-gfx@lists.freedesktop.org
@@ -37,7 +37,7 @@ Description:	RW. Sustained power limit interval (Tau in PL1/Tau) in
 
 		Only supported for particular Intel i915 graphics platforms.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/power1_crit
+What:		/sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_crit
 Date:		February 2023
 KernelVersion:	6.2
 Contact:	intel-gfx@lists.freedesktop.org
@@ -50,7 +50,7 @@ Description:	RW. Card reactive critical (I1) power limit in microwatts.
 
 		Only supported for particular Intel i915 graphics platforms.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/curr1_crit
+What:		/sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/curr1_crit
 Date:		February 2023
 KernelVersion:	6.2
 Contact:	intel-gfx@lists.freedesktop.org
@@ -63,7 +63,7 @@ Description:	RW. Card reactive critical (I1) power limit in milliamperes.
 
 		Only supported for particular Intel i915 graphics platforms.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/energy1_input
+What:		/sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/energy1_input
 Date:		February 2023
 KernelVersion:	6.2
 Contact:	intel-gfx@lists.freedesktop.org
diff --git a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
index 8c321bc9dc0440..023fd82de3f70a 100644
--- a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
+++ b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
@@ -1,4 +1,4 @@
-What:		/sys/devices/.../hwmon/hwmon<i>/power1_max
+What:		/sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_max
 Date:		September 2023
 KernelVersion:	6.5
 Contact:	intel-xe@lists.freedesktop.org
@@ -12,7 +12,7 @@ Description:	RW. Card reactive sustained  (PL1) power limit in microwatts.
 
 		Only supported for particular Intel xe graphics platforms.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/power1_rated_max
+What:		/sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_rated_max
 Date:		September 2023
 KernelVersion:	6.5
 Contact:	intel-xe@lists.freedesktop.org
@@ -20,7 +20,7 @@ Description:	RO. Card default power limit (default TDP setting).
 
 		Only supported for particular Intel xe graphics platforms.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/power1_crit
+What:		/sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_crit
 Date:		September 2023
 KernelVersion:	6.5
 Contact:	intel-xe@lists.freedesktop.org
@@ -33,7 +33,7 @@ Description:	RW. Card reactive critical (I1) power limit in microwatts.
 
 		Only supported for particular Intel xe graphics platforms.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/curr1_crit
+What:		/sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/curr1_crit
 Date:		September 2023
 KernelVersion:	6.5
 Contact:	intel-xe@lists.freedesktop.org
@@ -44,7 +44,7 @@ Description:	RW. Card reactive critical (I1) power limit in milliamperes.
 		the operating frequency if the power averaged over a window
 		exceeds this limit.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/in0_input
+What:		/sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/in0_input
 Date:		September 2023
 KernelVersion:	6.5
 Contact:	intel-xe@lists.freedesktop.org
@@ -52,7 +52,7 @@ Description:	RO. Current Voltage in millivolt.
 
 		Only supported for particular Intel xe graphics platforms.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/energy1_input
+What:		/sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/energy1_input
 Date:		September 2023
 KernelVersion:	6.5
 Contact:	intel-xe@lists.freedesktop.org
@@ -60,7 +60,7 @@ Description:	RO. Energy input of device in microjoules.
 
 		Only supported for particular Intel xe graphics platforms.
 
-What:		/sys/devices/.../hwmon/hwmon<i>/power1_max_interval
+What:		/sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_max_interval
 Date:		October 2023
 KernelVersion:	6.6
 Contact:	intel-xe@lists.freedesktop.org

From aeacfd2dbebb97a36a7221c5ec694d04480bcd30 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Mon, 29 Jan 2024 13:45:10 -0800
Subject: [PATCH 129/200] drm/xe/xe2: Enable has_usm

When xe2 support started to be added, USM was still not functional. This
has changed, and now USM can be enabled for xe2. Remove FIXME leftover
to allow VM to be created with DRM_XE_VM_CREATE_FLAG_FAULT_MODE.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240129214510.123829-1-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/xe_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 6664d1b2efdb9a..4b7999b24ff754 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -165,7 +165,7 @@ static const struct xe_graphics_desc graphics_xelpg = {
 	.has_asid = 1, \
 	.has_flat_ccs = 1, \
 	.has_range_tlb_invalidation = 1, \
-	.has_usm = 0 /* FIXME: implementation missing */, \
+	.has_usm = 1, \
 	.va_bits = 48, \
 	.vm_max_level = 4, \
 	.hw_engine_mask = \

From cd43106c9b0506504b6dea3703d2d31c80b1d592 Mon Sep 17 00:00:00 2001
From: Karthik Poosa <karthik.poosa@intel.com>
Date: Thu, 25 Jan 2024 22:26:52 +0530
Subject: [PATCH 130/200] drm/xe/guc: Reduce a print from warn to debug

Reduce debug print from warn to debug to avoid unnecessary warning
message in dmesg: the firmware loading logic already has the right
printk priority level when checking the firmware version.

Fixes: c5a06c9169f3 ("drm/xe/guc: Enable WA 14018913170")
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240125165652.3764711-1-karthik.poosa@intel.com
[ slightly reword debug and commit messages ]
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_guc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index fcb8a9efac7049..868208a3982951 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -174,8 +174,8 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc)
 		if (GUC_VER(version->major, version->minor, version->patch) >= GUC_VER(70, 7, 0))
 			flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
 		else
-			drm_warn(&xe->drm, "can't apply WA 14018913170, GUC version expected >= 70.7.0, found %u %u %u\n",
-				 version->major, version->minor, version->patch);
+			drm_dbg(&xe->drm, "Skip WA 14018913170: GUC version expected >= 70.7.0, found %u.%u.%u\n",
+				version->major, version->minor, version->patch);
 	}
 
 	return flags;

From 8945a46a7cba19054a911fd9c33f1fb34b623359 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Tue, 30 Jan 2024 05:22:49 -0800
Subject: [PATCH 131/200] drm/xe: Use function to emit PIPE_CONTROL
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This reduces code duplication in xe_ring_ops.

v2:
- fix flags of emit_pipe_imm_ggtt()
- reduce to only one function

v3:
- fix emit_pipe_imm_ggtt() stall_only check

Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240130132249.8615-1-jose.souza@intel.com
---
 drivers/gpu/drm/xe/xe_ring_ops.c | 57 ++++++++++++++------------------
 1 file changed, 24 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 1e4c06eacd9842..d5e9621428ef38 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -113,6 +113,19 @@ static int emit_flush_invalidate(u32 flag, u32 *dw, int i)
 	return i;
 }
 
+static int
+emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset, u32 value)
+{
+	dw[i++] = GFX_OP_PIPE_CONTROL(6) | bit_group_0;
+	dw[i++] = bit_group_1;
+	dw[i++] = offset;
+	dw[i++] = 0;
+	dw[i++] = value;
+	dw[i++] = 0;
+
+	return i;
+}
+
 static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
 				int i)
 {
@@ -131,14 +144,7 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
 
 	flags &= ~mask_flags;
 
-	dw[i++] = GFX_OP_PIPE_CONTROL(6);
-	dw[i++] = flags;
-	dw[i++] = LRC_PPHWSP_SCRATCH_ADDR;
-	dw[i++] = 0;
-	dw[i++] = 0;
-	dw[i++] = 0;
-
-	return i;
+	return emit_pipe_control(dw, i, 0, flags, LRC_PPHWSP_SCRATCH_ADDR, 0);
 }
 
 static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
@@ -174,14 +180,7 @@ static int emit_render_cache_flush(struct xe_sched_job *job, u32 *dw, int i)
 	else if (job->q->class == XE_ENGINE_CLASS_COMPUTE)
 		flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
 
-	dw[i++] = GFX_OP_PIPE_CONTROL(6) | PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
-	dw[i++] = flags;
-	dw[i++] = 0;
-	dw[i++] = 0;
-	dw[i++] = 0;
-	dw[i++] = 0;
-
-	return i;
+	return emit_pipe_control(dw, i, PIPE_CONTROL0_HDC_PIPELINE_FLUSH, flags, 0, 0);
 }
 
 static int emit_pipe_control_to_ring_end(struct xe_hw_engine *hwe, u32 *dw, int i)
@@ -189,14 +188,9 @@ static int emit_pipe_control_to_ring_end(struct xe_hw_engine *hwe, u32 *dw, int
 	if (hwe->class != XE_ENGINE_CLASS_RENDER)
 		return i;
 
-	if (XE_WA(hwe->gt, 16020292621)) {
-		dw[i++] = GFX_OP_PIPE_CONTROL(6);
-		dw[i++] = PIPE_CONTROL_LRI_POST_SYNC;
-		dw[i++] = RING_NOPID(hwe->mmio_base).addr;
-		dw[i++] = 0;
-		dw[i++] = 0;
-		dw[i++] = 0;
-	}
+	if (XE_WA(hwe->gt, 16020292621))
+		i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_LRI_POST_SYNC,
+				      RING_NOPID(hwe->mmio_base).addr, 0);
 
 	return i;
 }
@@ -204,16 +198,13 @@ static int emit_pipe_control_to_ring_end(struct xe_hw_engine *hwe, u32 *dw, int
 static int emit_pipe_imm_ggtt(u32 addr, u32 value, bool stall_only, u32 *dw,
 			      int i)
 {
-	dw[i++] = GFX_OP_PIPE_CONTROL(6);
-	dw[i++] = (stall_only ? PIPE_CONTROL_CS_STALL :
-		   PIPE_CONTROL_FLUSH_ENABLE | PIPE_CONTROL_CS_STALL) |
-		PIPE_CONTROL_GLOBAL_GTT_IVB | PIPE_CONTROL_QW_WRITE;
-	dw[i++] = addr;
-	dw[i++] = 0;
-	dw[i++] = value;
-	dw[i++] = 0; /* We're thrashing one extra dword. */
+	u32 flags = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_GLOBAL_GTT_IVB |
+		    PIPE_CONTROL_QW_WRITE;
 
-	return i;
+	if (!stall_only)
+		flags |= PIPE_CONTROL_FLUSH_ENABLE;
+
+	return emit_pipe_control(dw, i, 0, flags, addr, value);
 }
 
 static u32 get_ppgtt_flag(struct xe_sched_job *job)

From 5746eaaa805e16c49661ee79ce520773d63e3919 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Tue, 30 Jan 2024 05:56:47 -0800
Subject: [PATCH 132/200] drm/xe: Add functions to convert regular address to
 canonical address and back
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Some instructions requires canonical address like
MI_BATCH_BUFFER_START(UMDs must call xe_exec with a canonical address
for Xe2+).

So here adding functions to convert regular address to canonical
address and back, the first user of this functions will be added
in the next patch.

v3:
- inline removed
- rename highest_address_bit_get() to ppgtt_msb_get()

v4:
- use xe->info.va_bits instead of xe->info.dma_mask_size

BSpec: 47626
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240130135648.30211-1-jose.souza@intel.com
---
 drivers/gpu/drm/xe/xe_device.c | 10 ++++++++++
 drivers/gpu/drm/xe/xe_device.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 6faa7865b1aabc..8e8567e06f0bc1 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -747,3 +747,13 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p)
 		drm_printf(p, "\tCS reference clock: %u\n", gt->info.reference_clock);
 	}
 }
+
+u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address)
+{
+	return sign_extend64(address, xe->info.va_bits - 1);
+}
+
+u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address)
+{
+	return address & GENMASK_ULL(xe->info.va_bits - 1, 0);
+}
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 270124da1e00e6..462f59e902b12e 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -177,4 +177,7 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
 
 void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p);
 
+u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address);
+u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address);
+
 #endif

From be7d51c5b4688efbd8496ad97dbdd01a41e52d37 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Tue, 30 Jan 2024 05:56:48 -0800
Subject: [PATCH 133/200] drm/xe: Add batch buffer addresses to devcoredump
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Those addresses are necessary to Mesa tools knows where in VM are the
batch buffers to parse and print instructions that are human readable.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240130135648.30211-2-jose.souza@intel.com
---
 drivers/gpu/drm/xe/xe_devcoredump.c       |  5 +++
 drivers/gpu/drm/xe/xe_devcoredump_types.h |  3 ++
 drivers/gpu/drm/xe/xe_sched_job.c         | 38 +++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_sched_job.h         |  5 +++
 drivers/gpu/drm/xe/xe_sched_job_types.h   |  5 +++
 5 files changed, 56 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index e701f0d07b6767..08d3f6cb72292b 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -96,6 +96,9 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
 	xe_guc_ct_snapshot_print(coredump->snapshot.ct, &p);
 	xe_guc_exec_queue_snapshot_print(coredump->snapshot.ge, &p);
 
+	drm_printf(&p, "\n**** Job ****\n");
+	xe_sched_job_snapshot_print(coredump->snapshot.job, &p);
+
 	drm_printf(&p, "\n**** HW Engines ****\n");
 	for (i = 0; i < XE_NUM_HW_ENGINES; i++)
 		if (coredump->snapshot.hwe[i])
@@ -116,6 +119,7 @@ static void xe_devcoredump_free(void *data)
 
 	xe_guc_ct_snapshot_free(coredump->snapshot.ct);
 	xe_guc_exec_queue_snapshot_free(coredump->snapshot.ge);
+	xe_sched_job_snapshot_free(coredump->snapshot.job);
 	for (i = 0; i < XE_NUM_HW_ENGINES; i++)
 		if (coredump->snapshot.hwe[i])
 			xe_hw_engine_snapshot_free(coredump->snapshot.hwe[i]);
@@ -155,6 +159,7 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump,
 
 	coredump->snapshot.ct = xe_guc_ct_snapshot_capture(&guc->ct, true);
 	coredump->snapshot.ge = xe_guc_exec_queue_snapshot_capture(job);
+	coredump->snapshot.job = xe_sched_job_snapshot_capture(job);
 
 	for_each_hw_engine(hwe, q->gt, id) {
 		if (hwe->class != q->hwe->class ||
diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h
index 50106efcbc29d7..d259119b2c9802 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump_types.h
+++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h
@@ -31,8 +31,11 @@ struct xe_devcoredump_snapshot {
 	struct xe_guc_ct_snapshot *ct;
 	/** @ge: Guc Engine snapshot */
 	struct xe_guc_submit_exec_queue_snapshot *ge;
+
 	/** @hwe: HW Engine snapshot array */
 	struct xe_hw_engine_snapshot *hwe[XE_NUM_HW_ENGINES];
+	/** @job: Snapshot of job state */
+	struct xe_sched_job_snapshot *job;
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index 01106a1156ad82..cde1407867db6c 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -278,3 +278,41 @@ int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm)
 
 	return drm_sched_job_add_dependency(&job->drm, fence);
 }
+
+struct xe_sched_job_snapshot *
+xe_sched_job_snapshot_capture(struct xe_sched_job *job)
+{
+	struct xe_exec_queue *q = job->q;
+	struct xe_device *xe = q->gt->tile->xe;
+	struct xe_sched_job_snapshot *snapshot;
+	size_t len = sizeof(*snapshot) + (sizeof(u64) * q->width);
+	u16 i;
+
+	snapshot = kzalloc(len, GFP_ATOMIC);
+	if (!snapshot)
+		return NULL;
+
+	snapshot->batch_addr_len = q->width;
+	for (i = 0; i < q->width; i++)
+		snapshot->batch_addr[i] = xe_device_uncanonicalize_addr(xe, job->batch_addr[i]);
+
+	return snapshot;
+}
+
+void xe_sched_job_snapshot_free(struct xe_sched_job_snapshot *snapshot)
+{
+	kfree(snapshot);
+}
+
+void
+xe_sched_job_snapshot_print(struct xe_sched_job_snapshot *snapshot,
+			    struct drm_printer *p)
+{
+	u16 i;
+
+	if (!snapshot)
+		return;
+
+	for (i = 0; i < snapshot->batch_addr_len; i++)
+		drm_printf(p, "batch_addr[%u]: 0x%016llx\n", i, snapshot->batch_addr[i]);
+}
diff --git a/drivers/gpu/drm/xe/xe_sched_job.h b/drivers/gpu/drm/xe/xe_sched_job.h
index 34f475ba7f5020..f1a660648cf00c 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.h
+++ b/drivers/gpu/drm/xe/xe_sched_job.h
@@ -8,6 +8,7 @@
 
 #include "xe_sched_job_types.h"
 
+struct drm_printer;
 struct xe_vm;
 
 #define XE_SCHED_HANG_LIMIT 1
@@ -77,4 +78,8 @@ xe_sched_job_add_migrate_flush(struct xe_sched_job *job, u32 flags)
 
 bool xe_sched_job_is_migration(struct xe_exec_queue *q);
 
+struct xe_sched_job_snapshot *xe_sched_job_snapshot_capture(struct xe_sched_job *job);
+void xe_sched_job_snapshot_free(struct xe_sched_job_snapshot *snapshot);
+void xe_sched_job_snapshot_print(struct xe_sched_job_snapshot *snapshot, struct drm_printer *p);
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
index 8778c34d662030..b1d83da50a53da 100644
--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
@@ -43,4 +43,9 @@ struct xe_sched_job {
 	u64 batch_addr[];
 };
 
+struct xe_sched_job_snapshot {
+	u16 batch_addr_len;
+	u64 batch_addr[];
+};
+
 #endif

From d1df9bfbf68c65418f30917f406b6d5bd597714e Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Wed, 24 Jan 2024 15:44:13 -0800
Subject: [PATCH 134/200] drm/xe: Only allow 1 ufence per exec / bind IOCTL
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The way exec ufences are coded only 1 ufence per IOCTL will be signaled.
It is possible to fix this but for current use cases 1 ufence per IOCTL
is sufficient. Enforce a limit of 1 ufence per IOCTL (both exec and bind
to be uniform).

v2:
- Add fixes tag (Thomas)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124234413.1640825-1-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_exec.c | 10 +++++++++-
 drivers/gpu/drm/xe/xe_sync.h |  5 +++++
 drivers/gpu/drm/xe/xe_vm.c   | 10 +++++++++-
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 59fd9bb40c1899..952496c6260dfb 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -150,7 +150,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	u64 addresses[XE_HW_ENGINE_MAX_INSTANCE];
 	struct drm_gpuvm_exec vm_exec = {.extra.fn = xe_exec_fn};
 	struct drm_exec *exec = &vm_exec.exec;
-	u32 i, num_syncs = 0;
+	u32 i, num_syncs = 0, num_ufence = 0;
 	struct xe_sched_job *job;
 	struct dma_fence *rebind_fence;
 	struct xe_vm *vm;
@@ -196,6 +196,14 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 					   SYNC_PARSE_FLAG_LR_MODE : 0));
 		if (err)
 			goto err_syncs;
+
+		if (xe_sync_is_ufence(&syncs[i]))
+			num_ufence++;
+	}
+
+	if (XE_IOCTL_DBG(xe, num_ufence > 1)) {
+		err = -EINVAL;
+		goto err_syncs;
 	}
 
 	if (xe_exec_queue_is_parallel(q)) {
diff --git a/drivers/gpu/drm/xe/xe_sync.h b/drivers/gpu/drm/xe/xe_sync.h
index d284afbe917c19..f43cdcaca6c579 100644
--- a/drivers/gpu/drm/xe/xe_sync.h
+++ b/drivers/gpu/drm/xe/xe_sync.h
@@ -33,4 +33,9 @@ struct dma_fence *
 xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
 		     struct xe_exec_queue *q, struct xe_vm *vm);
 
+static inline bool xe_sync_is_ufence(struct xe_sync_entry *sync)
+{
+	return !!sync->ufence;
+}
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index d096a8c00bd403..8576535c4b6a65 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2851,7 +2851,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_gpuva_ops **ops = NULL;
 	struct xe_vm *vm;
 	struct xe_exec_queue *q = NULL;
-	u32 num_syncs;
+	u32 num_syncs, num_ufence = 0;
 	struct xe_sync_entry *syncs = NULL;
 	struct drm_xe_vm_bind_op *bind_ops;
 	LIST_HEAD(ops_list);
@@ -2988,6 +2988,14 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 					   SYNC_PARSE_FLAG_DISALLOW_USER_FENCE : 0));
 		if (err)
 			goto free_syncs;
+
+		if (xe_sync_is_ufence(&syncs[num_syncs]))
+			num_ufence++;
+	}
+
+	if (XE_IOCTL_DBG(xe, num_ufence > 1)) {
+		err = -EINVAL;
+		goto free_syncs;
 	}
 
 	if (!args->num_binds) {

From f01ece502af0e8c6ed5af1facbd88fe9a6160a1e Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula@intel.com>
Date: Mon, 22 Jan 2024 12:14:27 +0200
Subject: [PATCH 135/200] drm/xe: move xe_display.[ch] under display/

All the other display related files are under display/ subdirectory,
also move xe_display.[ch] there.

Sort the build list while at it.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122101428.2683468-1-jani.nikula@intel.com
---
 drivers/gpu/drm/xe/Makefile                   | 18 +++++++++---------
 drivers/gpu/drm/xe/{ => display}/xe_display.c |  0
 drivers/gpu/drm/xe/{ => display}/xe_display.h |  0
 3 files changed, 9 insertions(+), 9 deletions(-)
 rename drivers/gpu/drm/xe/{ => display}/xe_display.c (100%)
 rename drivers/gpu/drm/xe/{ => display}/xe_display.h (100%)

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index abb2be8268d0df..f017a59bf01f36 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -185,17 +185,17 @@ $(obj)/i915-display/%.o: $(srctree)/drivers/gpu/drm/i915/display/%.c FORCE
 
 # Display code specific to xe
 xe-$(CONFIG_DRM_XE_DISPLAY) += \
-	xe_display.o \
-	display/xe_fb_pin.o \
-	display/xe_hdcp_gsc.o \
-	display/xe_plane_initial.o \
-	display/xe_display_rps.o \
+	display/ext/i915_irq.o \
+	display/ext/i915_utils.o \
+	display/intel_fb_bo.o \
+	display/intel_fbdev_fb.o \
+	display/xe_display.o \
 	display/xe_display_misc.o \
+	display/xe_display_rps.o \
 	display/xe_dsb_buffer.o \
-	display/intel_fbdev_fb.o \
-	display/intel_fb_bo.o \
-	display/ext/i915_irq.o \
-	display/ext/i915_utils.o
+	display/xe_fb_pin.o \
+	display/xe_hdcp_gsc.o \
+	display/xe_plane_initial.o
 
 # SOC code shared with i915
 xe-$(CONFIG_DRM_XE_DISPLAY) += \
diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
similarity index 100%
rename from drivers/gpu/drm/xe/xe_display.c
rename to drivers/gpu/drm/xe/display/xe_display.c
diff --git a/drivers/gpu/drm/xe/xe_display.h b/drivers/gpu/drm/xe/display/xe_display.h
similarity index 100%
rename from drivers/gpu/drm/xe/xe_display.h
rename to drivers/gpu/drm/xe/display/xe_display.h

From 1e5a4dfe3834dae4b97a3b26d6fb9a632667946a Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula@intel.com>
Date: Mon, 22 Jan 2024 12:14:28 +0200
Subject: [PATCH 136/200] drm/xe: drop display/ subdir from include directories

There are very few places that need to include anything from under
display/. Require the display/ prefix in #include directives, and drop
the subdirectory from the header search path.

Sort the include lists while at it.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122101428.2683468-2-jani.nikula@intel.com
---
 drivers/gpu/drm/xe/Makefile    | 1 -
 drivers/gpu/drm/xe/xe_device.c | 6 +++---
 drivers/gpu/drm/xe/xe_irq.c    | 2 +-
 drivers/gpu/drm/xe/xe_pci.c    | 2 +-
 drivers/gpu/drm/xe/xe_pm.c     | 2 +-
 5 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index f017a59bf01f36..c531210695db0a 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -165,7 +165,6 @@ xe-$(CONFIG_DRM_XE_KUNIT_TEST) += \
 subdir-ccflags-$(CONFIG_DRM_XE_DISPLAY) += \
 	-I$(srctree)/$(src)/display/ext \
 	-I$(srctree)/$(src)/compat-i915-headers \
-	-I$(srctree)/drivers/gpu/drm/xe/display/ \
 	-I$(srctree)/drivers/gpu/drm/i915/display/ \
 	-Ddrm_i915_gem_object=xe_bo \
 	-Ddrm_i915_private=xe_device
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 8e8567e06f0bc1..5b84d730552029 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -15,20 +15,21 @@
 #include <drm/drm_print.h>
 #include <drm/xe_drm.h>
 
+#include "display/xe_display.h"
 #include "regs/xe_gt_regs.h"
 #include "regs/xe_regs.h"
 #include "xe_bo.h"
 #include "xe_debugfs.h"
-#include "xe_display.h"
 #include "xe_dma_buf.h"
 #include "xe_drm_client.h"
 #include "xe_drv.h"
-#include "xe_exec_queue.h"
 #include "xe_exec.h"
+#include "xe_exec_queue.h"
 #include "xe_ggtt.h"
 #include "xe_gsc_proxy.h"
 #include "xe_gt.h"
 #include "xe_gt_mcr.h"
+#include "xe_hwmon.h"
 #include "xe_irq.h"
 #include "xe_memirq.h"
 #include "xe_mmio.h"
@@ -43,7 +44,6 @@
 #include "xe_ttm_sys_mgr.h"
 #include "xe_vm.h"
 #include "xe_wait_user_fence.h"
-#include "xe_hwmon.h"
 
 #ifdef CONFIG_LOCKDEP
 struct lockdep_map xe_device_mem_access_lockdep_map = {
diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
index 2fd8cc26fc9fe0..d31e87de5a1c84 100644
--- a/drivers/gpu/drm/xe/xe_irq.c
+++ b/drivers/gpu/drm/xe/xe_irq.c
@@ -9,10 +9,10 @@
 
 #include <drm/drm_managed.h>
 
+#include "display/xe_display.h"
 #include "regs/xe_gt_regs.h"
 #include "regs/xe_regs.h"
 #include "xe_device.h"
-#include "xe_display.h"
 #include "xe_drv.h"
 #include "xe_gsc_proxy.h"
 #include "xe_gt.h"
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 4b7999b24ff754..557f2d88a8c1ed 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -15,9 +15,9 @@
 #include <drm/drm_drv.h>
 #include <drm/xe_pciids.h>
 
+#include "display/xe_display.h"
 #include "regs/xe_gt_regs.h"
 #include "xe_device.h"
-#include "xe_display.h"
 #include "xe_drv.h"
 #include "xe_gt.h"
 #include "xe_macros.h"
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index d5f219796d7ef2..4d3984ae4ff121 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -10,11 +10,11 @@
 #include <drm/drm_managed.h>
 #include <drm/ttm/ttm_placement.h>
 
+#include "display/xe_display.h"
 #include "xe_bo.h"
 #include "xe_bo_evict.h"
 #include "xe_device.h"
 #include "xe_device_sysfs.h"
-#include "xe_display.h"
 #include "xe_ggtt.h"
 #include "xe_gt.h"
 #include "xe_guc.h"

From 97fd7a7e4e877676a2ab1a687ba958b70931abcc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Wed, 17 Jan 2024 14:40:46 +0100
Subject: [PATCH 137/200] drm/xe: Annotate mcr_[un]lock()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

These functions acquire and release the gt::mcr_lock. Annotate
accordingly.
Fix the corresponding sparse warning.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Fixes: fb1d55efdfcb ("drm/xe: Cleanup OPEN_BRACE style issues")
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-4-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_gt_mcr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_mcr.c b/drivers/gpu/drm/xe/xe_gt_mcr.c
index 77925b35cf8dcb..8546cd3cc50d1f 100644
--- a/drivers/gpu/drm/xe/xe_gt_mcr.c
+++ b/drivers/gpu/drm/xe/xe_gt_mcr.c
@@ -480,7 +480,7 @@ static bool xe_gt_mcr_get_nonterminated_steering(struct xe_gt *gt,
  * to synchronize with external clients (e.g., firmware), so a semaphore
  * register will also need to be taken.
  */
-static void mcr_lock(struct xe_gt *gt)
+static void mcr_lock(struct xe_gt *gt) __acquires(&gt->mcr_lock)
 {
 	struct xe_device *xe = gt_to_xe(gt);
 	int ret = 0;
@@ -500,7 +500,7 @@ static void mcr_lock(struct xe_gt *gt)
 	drm_WARN_ON_ONCE(&xe->drm, ret == -ETIMEDOUT);
 }
 
-static void mcr_unlock(struct xe_gt *gt)
+static void mcr_unlock(struct xe_gt *gt) __releases(&gt->mcr_lock)
 {
 	/* Release hardware semaphore - this is done by writing 1 to the register */
 	if (GRAPHICS_VERx100(gt_to_xe(gt)) >= 1270)

From 78366eed6853aa6a5deccb2eb182f9334d2bd208 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Wed, 17 Jan 2024 14:40:47 +0100
Subject: [PATCH 138/200] drm/xe: Don't use __user error pointers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The error pointer macros are not aware of __user pointers and as a
consequence sparse warns.

Have the copy_mask() function return an integer instead of a __user
pointer.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-5-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_query.c | 50 +++++++++++++++++------------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
index 9b35673b286c80..7e924faeeea0b0 100644
--- a/drivers/gpu/drm/xe/xe_query.c
+++ b/drivers/gpu/drm/xe/xe_query.c
@@ -459,21 +459,21 @@ static size_t calc_topo_query_size(struct xe_device *xe)
 		 sizeof_field(struct xe_gt, fuse_topo.eu_mask_per_dss));
 }
 
-static void __user *copy_mask(void __user *ptr,
-			      struct drm_xe_query_topology_mask *topo,
-			      void *mask, size_t mask_size)
+static int copy_mask(void __user **ptr,
+		     struct drm_xe_query_topology_mask *topo,
+		     void *mask, size_t mask_size)
 {
 	topo->num_bytes = mask_size;
 
-	if (copy_to_user(ptr, topo, sizeof(*topo)))
-		return ERR_PTR(-EFAULT);
-	ptr += sizeof(topo);
+	if (copy_to_user(*ptr, topo, sizeof(*topo)))
+		return -EFAULT;
+	*ptr += sizeof(topo);
 
-	if (copy_to_user(ptr, mask, mask_size))
-		return ERR_PTR(-EFAULT);
-	ptr += mask_size;
+	if (copy_to_user(*ptr, mask, mask_size))
+		return -EFAULT;
+	*ptr += mask_size;
 
-	return ptr;
+	return 0;
 }
 
 static int query_gt_topology(struct xe_device *xe,
@@ -493,28 +493,28 @@ static int query_gt_topology(struct xe_device *xe,
 	}
 
 	for_each_gt(gt, xe, id) {
+		int err;
+
 		topo.gt_id = id;
 
 		topo.type = DRM_XE_TOPO_DSS_GEOMETRY;
-		query_ptr = copy_mask(query_ptr, &topo,
-				      gt->fuse_topo.g_dss_mask,
-				      sizeof(gt->fuse_topo.g_dss_mask));
-		if (IS_ERR(query_ptr))
-			return PTR_ERR(query_ptr);
+		err = copy_mask(&query_ptr, &topo, gt->fuse_topo.g_dss_mask,
+				sizeof(gt->fuse_topo.g_dss_mask));
+		if (err)
+			return err;
 
 		topo.type = DRM_XE_TOPO_DSS_COMPUTE;
-		query_ptr = copy_mask(query_ptr, &topo,
-				      gt->fuse_topo.c_dss_mask,
-				      sizeof(gt->fuse_topo.c_dss_mask));
-		if (IS_ERR(query_ptr))
-			return PTR_ERR(query_ptr);
+		err = copy_mask(&query_ptr, &topo, gt->fuse_topo.c_dss_mask,
+				sizeof(gt->fuse_topo.c_dss_mask));
+		if (err)
+			return err;
 
 		topo.type = DRM_XE_TOPO_EU_PER_DSS;
-		query_ptr = copy_mask(query_ptr, &topo,
-				      gt->fuse_topo.eu_mask_per_dss,
-				      sizeof(gt->fuse_topo.eu_mask_per_dss));
-		if (IS_ERR(query_ptr))
-			return PTR_ERR(query_ptr);
+		err = copy_mask(&query_ptr, &topo,
+				gt->fuse_topo.eu_mask_per_dss,
+				sizeof(gt->fuse_topo.eu_mask_per_dss));
+		if (err)
+			return err;
 	}
 
 	return 0;

From 996da37ffa82b9d863f6fe0b7b2ce9d692d0b31e Mon Sep 17 00:00:00 2001
From: Matt Roper <matthew.d.roper@intel.com>
Date: Tue, 30 Jan 2024 12:03:09 -0800
Subject: [PATCH 139/200] drm/xe: Convert job timeouts from assert to warning

xe_assert() is intended to be used only for "impossible" situations that
should never be hit (and if they are hit it means there's a driver bug
somewhere); assertions are only compiled into debug builds.

Although we expect jobs submitted by the kernel to be well-behaved and
run without error, timeouts are a legitimate possibility for reasons
beyond our control (bad firmware, flaky hardware, etc.).  We should use
a real WARN if we encounter these, even for non-debug builds, to ensure
the issue is being properly highlighted in bug reports and such.

Also give the WARNs more human-readable messages and move them below the
general notice-level message that gets printed for any kind of timeout
to make the errors a bit more understandable.

v2:
 - Convert the VM / exec_queue_killed assertion as well.  (MattB)

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240130200308.1429134-2-matthew.d.roper@intel.com
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 2b008ec1b6de59..4744668ef60a93 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -23,6 +23,7 @@
 #include "xe_force_wake.h"
 #include "xe_gpu_scheduler.h"
 #include "xe_gt.h"
+#include "xe_gt_printk.h"
 #include "xe_guc.h"
 #include "xe_guc_ct.h"
 #include "xe_guc_exec_queue_types.h"
@@ -928,11 +929,13 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	int i = 0;
 
 	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags)) {
-		xe_assert(xe, !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
-		xe_assert(xe, !(q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q)));
-
 		drm_notice(&xe->drm, "Timedout job: seqno=%u, guc_id=%d, flags=0x%lx",
 			   xe_sched_job_seqno(job), q->guc->id, q->flags);
+		xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_KERNEL,
+			   "Kernel-submitted job timed out\n");
+		xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q),
+			   "VM job timed out on non-killed execqueue\n");
+
 		simple_error_capture(q);
 		xe_devcoredump(job);
 	} else {

From d83d8ae275c6bf87506b71b8a1acd98452137dc5 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Tue, 30 Jan 2024 18:54:24 -0800
Subject: [PATCH 140/200] drm/xe: Make all GuC ABI shift values unsigned
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

All GuC ABI definitions are unsigned and not defining as unsigned is
causing build errors [1].

[1] https://lore.kernel.org/all/20240123111235.3097079-1-geert@linux-m68k.org/

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240131025424.2087936-1-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/abi/guc_actions_abi.h      |  4 ++--
 drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h |  4 ++--
 .../drm/xe/abi/guc_communication_ctb_abi.h    |  8 ++++----
 drivers/gpu/drm/xe/abi/guc_klvs_abi.h         |  6 +++---
 drivers/gpu/drm/xe/abi/guc_messages_abi.h     | 20 +++++++++----------
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
index 3062e0e0d467ee..79ba98a169f907 100644
--- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
@@ -50,8 +50,8 @@
 
 #define HOST2GUC_SELF_CFG_REQUEST_MSG_LEN		(GUC_HXG_REQUEST_MSG_MIN_LEN + 3u)
 #define HOST2GUC_SELF_CFG_REQUEST_MSG_0_MBZ		GUC_HXG_REQUEST_MSG_0_DATA0
-#define HOST2GUC_SELF_CFG_REQUEST_MSG_1_KLV_KEY		(0xffff << 16)
-#define HOST2GUC_SELF_CFG_REQUEST_MSG_1_KLV_LEN		(0xffff << 0)
+#define HOST2GUC_SELF_CFG_REQUEST_MSG_1_KLV_KEY		(0xffffu << 16)
+#define HOST2GUC_SELF_CFG_REQUEST_MSG_1_KLV_LEN		(0xffffu << 0)
 #define HOST2GUC_SELF_CFG_REQUEST_MSG_2_VALUE32		GUC_HXG_REQUEST_MSG_n_DATAn
 #define HOST2GUC_SELF_CFG_REQUEST_MSG_3_VALUE64		GUC_HXG_REQUEST_MSG_n_DATAn
 
diff --git a/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h
index 811add10c30dc2..c165e26c097669 100644
--- a/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h
@@ -242,8 +242,8 @@ struct slpc_shared_data {
 		(HOST2GUC_PC_SLPC_REQUEST_REQUEST_MSG_MIN_LEN + \
 			HOST2GUC_PC_SLPC_EVENT_MAX_INPUT_ARGS)
 #define HOST2GUC_PC_SLPC_REQUEST_MSG_0_MBZ		GUC_HXG_REQUEST_MSG_0_DATA0
-#define HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ID		(0xff << 8)
-#define HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ARGC	(0xff << 0)
+#define HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ID		(0xffu << 8)
+#define HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ARGC	(0xffu << 0)
 #define HOST2GUC_PC_SLPC_REQUEST_MSG_N_EVENT_DATA_N	GUC_HXG_REQUEST_MSG_n_DATAn
 
 #endif
diff --git a/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
index 4aaed1cb4e125d..8f86a16dc5777c 100644
--- a/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
@@ -82,11 +82,11 @@ static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
 #define GUC_CTB_HDR_LEN				1u
 #define GUC_CTB_MSG_MIN_LEN			GUC_CTB_HDR_LEN
 #define GUC_CTB_MSG_MAX_LEN			(GUC_CTB_MSG_MIN_LEN + GUC_CTB_MAX_DWORDS)
-#define GUC_CTB_MSG_0_FENCE			(0xffff << 16)
-#define GUC_CTB_MSG_0_FORMAT			(0xf << 12)
+#define GUC_CTB_MSG_0_FENCE			(0xffffu << 16)
+#define GUC_CTB_MSG_0_FORMAT			(0xfu << 12)
 #define   GUC_CTB_FORMAT_HXG			0u
-#define GUC_CTB_MSG_0_RESERVED			(0xf << 8)
-#define GUC_CTB_MSG_0_NUM_DWORDS		(0xff << 0)
+#define GUC_CTB_MSG_0_RESERVED			(0xfu << 8)
+#define GUC_CTB_MSG_0_NUM_DWORDS		(0xffu << 0)
 #define   GUC_CTB_MAX_DWORDS			255
 
 /**
diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
index 47094b9b044cbb..0400bc0fccdc9b 100644
--- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
@@ -31,9 +31,9 @@
  */
 
 #define GUC_KLV_LEN_MIN				1u
-#define GUC_KLV_0_KEY				(0xffff << 16)
-#define GUC_KLV_0_LEN				(0xffff << 0)
-#define GUC_KLV_n_VALUE				(0xffffffff << 0)
+#define GUC_KLV_0_KEY				(0xffffu << 16)
+#define GUC_KLV_0_LEN				(0xffffu << 0)
+#define GUC_KLV_n_VALUE				(0xffffffffu << 0)
 
 /**
  * DOC: GuC Self Config KLVs
diff --git a/drivers/gpu/drm/xe/abi/guc_messages_abi.h b/drivers/gpu/drm/xe/abi/guc_messages_abi.h
index ff888d16bd4f1b..534a39db777270 100644
--- a/drivers/gpu/drm/xe/abi/guc_messages_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_messages_abi.h
@@ -41,10 +41,10 @@
  */
 
 #define GUC_HXG_MSG_MIN_LEN			1u
-#define GUC_HXG_MSG_0_ORIGIN			(0x1 << 31)
+#define GUC_HXG_MSG_0_ORIGIN			(0x1u << 31)
 #define   GUC_HXG_ORIGIN_HOST			0u
 #define   GUC_HXG_ORIGIN_GUC			1u
-#define GUC_HXG_MSG_0_TYPE			(0x7 << 28)
+#define GUC_HXG_MSG_0_TYPE			(0x7u << 28)
 #define   GUC_HXG_TYPE_REQUEST			0u
 #define   GUC_HXG_TYPE_EVENT			1u
 #define   GUC_HXG_TYPE_FAST_REQUEST		2u
@@ -52,8 +52,8 @@
 #define   GUC_HXG_TYPE_NO_RESPONSE_RETRY	5u
 #define   GUC_HXG_TYPE_RESPONSE_FAILURE		6u
 #define   GUC_HXG_TYPE_RESPONSE_SUCCESS		7u
-#define GUC_HXG_MSG_0_AUX			(0xfffffff << 0)
-#define GUC_HXG_MSG_n_PAYLOAD			(0xffffffff << 0)
+#define GUC_HXG_MSG_0_AUX			(0xfffffffu << 0)
+#define GUC_HXG_MSG_n_PAYLOAD			(0xffffffffu << 0)
 
 /**
  * DOC: HXG Request
@@ -87,8 +87,8 @@
  */
 
 #define GUC_HXG_REQUEST_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
-#define GUC_HXG_REQUEST_MSG_0_DATA0		(0xfff << 16)
-#define GUC_HXG_REQUEST_MSG_0_ACTION		(0xffff << 0)
+#define GUC_HXG_REQUEST_MSG_0_DATA0		(0xfffu << 16)
+#define GUC_HXG_REQUEST_MSG_0_ACTION		(0xffffu << 0)
 #define GUC_HXG_REQUEST_MSG_n_DATAn		GUC_HXG_MSG_n_PAYLOAD
 
 /**
@@ -119,8 +119,8 @@
  */
 
 #define GUC_HXG_EVENT_MSG_MIN_LEN		GUC_HXG_MSG_MIN_LEN
-#define GUC_HXG_EVENT_MSG_0_DATA0		(0xfff << 16)
-#define GUC_HXG_EVENT_MSG_0_ACTION		(0xffff << 0)
+#define GUC_HXG_EVENT_MSG_0_DATA0		(0xfffu << 16)
+#define GUC_HXG_EVENT_MSG_0_ACTION		(0xffffu << 0)
 #define GUC_HXG_EVENT_MSG_n_DATAn		GUC_HXG_MSG_n_PAYLOAD
 
 /**
@@ -190,8 +190,8 @@
  */
 
 #define GUC_HXG_FAILURE_MSG_LEN			GUC_HXG_MSG_MIN_LEN
-#define GUC_HXG_FAILURE_MSG_0_HINT		(0xfff << 16)
-#define GUC_HXG_FAILURE_MSG_0_ERROR		(0xffff << 0)
+#define GUC_HXG_FAILURE_MSG_0_HINT		(0xfffu << 16)
+#define GUC_HXG_FAILURE_MSG_0_ERROR		(0xffffu << 0)
 
 /**
  * DOC: HXG Response

From 152ca51d8db03f08a71c25e999812e263839fdce Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Tue, 23 Jan 2024 13:26:38 -0800
Subject: [PATCH 141/200] drm/xe: Use LRC prefix rather than CTX prefix in lrc
 desc defines
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The sparc build fails [1] due to CTX_VALID being redefined. Fix this by
using a better naming convention of LRC_VALID as this define is used in
setting bits in the lrc descriptor. To be uniform, change other define
with LRC prefix too.

[1] https://lore.kernel.org/all/20240123111235.3097079-1-geert@linux-m68k.org/

v2:
- s/LEGACY_64B_CONTEXT/LRC_LEGACY_64B_CONTEXT (Lucas)

Fixes: 0bc519d20ffa ("drm/xe: Remove GEN[0-9]*_ prefixes")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123212638.1605626-1-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_lrc.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index f17e9785355e8a..8c85e90220de5b 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -23,10 +23,10 @@
 #include "xe_sriov.h"
 #include "xe_vm.h"
 
-#define CTX_VALID				(1 << 0)
-#define CTX_PRIVILEGE				(1 << 8)
-#define CTX_ADDRESSING_MODE_SHIFT		3
-#define LEGACY_64B_CONTEXT			3
+#define LRC_VALID				(1 << 0)
+#define LRC_PRIVILEGE				(1 << 8)
+#define LRC_ADDRESSING_MODE_SHIFT		3
+#define LRC_LEGACY_64B_CONTEXT			3
 
 #define ENGINE_CLASS_SHIFT			61
 #define ENGINE_INSTANCE_SHIFT			48
@@ -786,15 +786,15 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 				     (q->usm.acc_notify << ACC_NOTIFY_S) |
 				     q->usm.acc_trigger);
 
-	lrc->desc = CTX_VALID;
-	lrc->desc |= LEGACY_64B_CONTEXT << CTX_ADDRESSING_MODE_SHIFT;
+	lrc->desc = LRC_VALID;
+	lrc->desc |= LRC_LEGACY_64B_CONTEXT << LRC_ADDRESSING_MODE_SHIFT;
 	/* TODO: Priority */
 
 	/* While this appears to have something about privileged batches or
 	 * some such, it really just means PPGTT mode.
 	 */
 	if (vm)
-		lrc->desc |= CTX_PRIVILEGE;
+		lrc->desc |= LRC_PRIVILEGE;
 
 	if (GRAPHICS_VERx100(xe) < 1250) {
 		lrc->desc |= (u64)hwe->instance << ENGINE_INSTANCE_SHIFT;

From 5bd24e78829ad569fa1c3ce9a05b59bb97b91f3d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Wed, 31 Jan 2024 10:16:28 +0100
Subject: [PATCH 142/200] drm/xe/vm: Subclass userptr vmas
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The construct allocating only parts of the vma structure when
the userptr part is not needed is very fragile. A developer could
add additional fields below the userptr part, and the code could
easily attempt to access the userptr part even if its not persent.

So introduce xe_userptr_vma which subclasses struct xe_vma the
proper way, and accordingly modify a couple of interfaces.
This should also help if adding userptr helpers to drm_gpuvm.

v2:
- Fix documentation of to_userptr_vma() (Matthew Brost)
- Fix allocation and freeing of vmas to clearer distinguish
  between the types.

Closes: https://lore.kernel.org/intel-xe/0c4cc1a7-f409-4597-b110-81f9e45d1ffe@embeddedor.com/T/#u
Fixes: a4cc60a55fd9 ("drm/xe: Only alloc userptr part of xe_vma for userptrs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240131091628.12318-1-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_gt_pagefault.c |  11 +-
 drivers/gpu/drm/xe/xe_pt.c           |  32 +++---
 drivers/gpu/drm/xe/xe_vm.c           | 155 ++++++++++++++++-----------
 drivers/gpu/drm/xe/xe_vm.h           |  16 ++-
 drivers/gpu/drm/xe/xe_vm_types.h     |  16 +--
 5 files changed, 137 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 7ce67c9d30a7b6..78970259cea948 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -165,7 +165,8 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 		goto unlock_vm;
 	}
 
-	if (!xe_vma_is_userptr(vma) || !xe_vma_userptr_check_repin(vma)) {
+	if (!xe_vma_is_userptr(vma) ||
+	    !xe_vma_userptr_check_repin(to_userptr_vma(vma))) {
 		downgrade_write(&vm->lock);
 		write_locked = false;
 	}
@@ -181,11 +182,13 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 	/* TODO: Validate fault */
 
 	if (xe_vma_is_userptr(vma) && write_locked) {
+		struct xe_userptr_vma *uvma = to_userptr_vma(vma);
+
 		spin_lock(&vm->userptr.invalidated_lock);
-		list_del_init(&vma->userptr.invalidate_link);
+		list_del_init(&uvma->userptr.invalidate_link);
 		spin_unlock(&vm->userptr.invalidated_lock);
 
-		ret = xe_vma_userptr_pin_pages(vma);
+		ret = xe_vma_userptr_pin_pages(uvma);
 		if (ret)
 			goto unlock_vm;
 
@@ -220,7 +223,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 	dma_fence_put(fence);
 
 	if (xe_vma_is_userptr(vma))
-		ret = xe_vma_userptr_check_repin(vma);
+		ret = xe_vma_userptr_check_repin(to_userptr_vma(vma));
 	vma->usm.tile_invalidated &= ~BIT(tile->id);
 
 unlock_dma_resv:
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index de1030a4758837..e45b37c3f0c262 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -618,8 +618,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 
 	if (!xe_vma_is_null(vma)) {
 		if (xe_vma_is_userptr(vma))
-			xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma),
-					&curs);
+			xe_res_first_sg(to_userptr_vma(vma)->userptr.sg, 0,
+					xe_vma_size(vma), &curs);
 		else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
 			xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
 				     xe_vma_size(vma), &curs);
@@ -906,17 +906,17 @@ static void xe_vm_dbg_print_entries(struct xe_device *xe,
 
 #ifdef CONFIG_DRM_XE_USERPTR_INVAL_INJECT
 
-static int xe_pt_userptr_inject_eagain(struct xe_vma *vma)
+static int xe_pt_userptr_inject_eagain(struct xe_userptr_vma *uvma)
 {
-	u32 divisor = vma->userptr.divisor ? vma->userptr.divisor : 2;
+	u32 divisor = uvma->userptr.divisor ? uvma->userptr.divisor : 2;
 	static u32 count;
 
 	if (count++ % divisor == divisor - 1) {
-		struct xe_vm *vm = xe_vma_vm(vma);
+		struct xe_vm *vm = xe_vma_vm(&uvma->vma);
 
-		vma->userptr.divisor = divisor << 1;
+		uvma->userptr.divisor = divisor << 1;
 		spin_lock(&vm->userptr.invalidated_lock);
-		list_move_tail(&vma->userptr.invalidate_link,
+		list_move_tail(&uvma->userptr.invalidate_link,
 			       &vm->userptr.invalidated);
 		spin_unlock(&vm->userptr.invalidated_lock);
 		return true;
@@ -927,7 +927,7 @@ static int xe_pt_userptr_inject_eagain(struct xe_vma *vma)
 
 #else
 
-static bool xe_pt_userptr_inject_eagain(struct xe_vma *vma)
+static bool xe_pt_userptr_inject_eagain(struct xe_userptr_vma *uvma)
 {
 	return false;
 }
@@ -1000,9 +1000,9 @@ static int xe_pt_userptr_pre_commit(struct xe_migrate_pt_update *pt_update)
 {
 	struct xe_pt_migrate_pt_update *userptr_update =
 		container_of(pt_update, typeof(*userptr_update), base);
-	struct xe_vma *vma = pt_update->vma;
-	unsigned long notifier_seq = vma->userptr.notifier_seq;
-	struct xe_vm *vm = xe_vma_vm(vma);
+	struct xe_userptr_vma *uvma = to_userptr_vma(pt_update->vma);
+	unsigned long notifier_seq = uvma->userptr.notifier_seq;
+	struct xe_vm *vm = xe_vma_vm(&uvma->vma);
 	int err = xe_pt_vm_dependencies(pt_update->job,
 					&vm->rftree[pt_update->tile_id],
 					pt_update->start,
@@ -1023,7 +1023,7 @@ static int xe_pt_userptr_pre_commit(struct xe_migrate_pt_update *pt_update)
 	 */
 	do {
 		down_read(&vm->userptr.notifier_lock);
-		if (!mmu_interval_read_retry(&vma->userptr.notifier,
+		if (!mmu_interval_read_retry(&uvma->userptr.notifier,
 					     notifier_seq))
 			break;
 
@@ -1032,11 +1032,11 @@ static int xe_pt_userptr_pre_commit(struct xe_migrate_pt_update *pt_update)
 		if (userptr_update->bind)
 			return -EAGAIN;
 
-		notifier_seq = mmu_interval_read_begin(&vma->userptr.notifier);
+		notifier_seq = mmu_interval_read_begin(&uvma->userptr.notifier);
 	} while (true);
 
 	/* Inject errors to test_whether they are handled correctly */
-	if (userptr_update->bind && xe_pt_userptr_inject_eagain(vma)) {
+	if (userptr_update->bind && xe_pt_userptr_inject_eagain(uvma)) {
 		up_read(&vm->userptr.notifier_lock);
 		return -EAGAIN;
 	}
@@ -1297,7 +1297,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 		vma->tile_present |= BIT(tile->id);
 
 		if (bind_pt_update.locked) {
-			vma->userptr.initial_bind = true;
+			to_userptr_vma(vma)->userptr.initial_bind = true;
 			up_read(&vm->userptr.notifier_lock);
 			xe_bo_put_commit(&deferred);
 		}
@@ -1642,7 +1642,7 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
 
 		if (!vma->tile_present) {
 			spin_lock(&vm->userptr.invalidated_lock);
-			list_del_init(&vma->userptr.invalidate_link);
+			list_del_init(&to_userptr_vma(vma)->userptr.invalidate_link);
 			spin_unlock(&vm->userptr.invalidated_lock);
 		}
 		up_read(&vm->userptr.notifier_lock);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 8576535c4b6a65..c107f7f414d88c 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -46,7 +46,7 @@ static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
 
 /**
  * xe_vma_userptr_check_repin() - Advisory check for repin needed
- * @vma: The userptr vma
+ * @uvma: The userptr vma
  *
  * Check if the userptr vma has been invalidated since last successful
  * repin. The check is advisory only and can the function can be called
@@ -56,15 +56,17 @@ static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
  *
  * Return: 0 if userptr vma is valid, -EAGAIN otherwise; repin recommended.
  */
-int xe_vma_userptr_check_repin(struct xe_vma *vma)
+int xe_vma_userptr_check_repin(struct xe_userptr_vma *uvma)
 {
-	return mmu_interval_check_retry(&vma->userptr.notifier,
-					vma->userptr.notifier_seq) ?
+	return mmu_interval_check_retry(&uvma->userptr.notifier,
+					uvma->userptr.notifier_seq) ?
 		-EAGAIN : 0;
 }
 
-int xe_vma_userptr_pin_pages(struct xe_vma *vma)
+int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma)
 {
+	struct xe_userptr *userptr = &uvma->userptr;
+	struct xe_vma *vma = &uvma->vma;
 	struct xe_vm *vm = xe_vma_vm(vma);
 	struct xe_device *xe = vm->xe;
 	const unsigned long num_pages = xe_vma_size(vma) >> PAGE_SHIFT;
@@ -80,30 +82,30 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
 	if (vma->gpuva.flags & XE_VMA_DESTROYED)
 		return 0;
 
-	notifier_seq = mmu_interval_read_begin(&vma->userptr.notifier);
-	if (notifier_seq == vma->userptr.notifier_seq)
+	notifier_seq = mmu_interval_read_begin(&userptr->notifier);
+	if (notifier_seq == userptr->notifier_seq)
 		return 0;
 
 	pages = kvmalloc_array(num_pages, sizeof(*pages), GFP_KERNEL);
 	if (!pages)
 		return -ENOMEM;
 
-	if (vma->userptr.sg) {
+	if (userptr->sg) {
 		dma_unmap_sgtable(xe->drm.dev,
-				  vma->userptr.sg,
+				  userptr->sg,
 				  read_only ? DMA_TO_DEVICE :
 				  DMA_BIDIRECTIONAL, 0);
-		sg_free_table(vma->userptr.sg);
-		vma->userptr.sg = NULL;
+		sg_free_table(userptr->sg);
+		userptr->sg = NULL;
 	}
 
 	pinned = ret = 0;
 	if (in_kthread) {
-		if (!mmget_not_zero(vma->userptr.notifier.mm)) {
+		if (!mmget_not_zero(userptr->notifier.mm)) {
 			ret = -EFAULT;
 			goto mm_closed;
 		}
-		kthread_use_mm(vma->userptr.notifier.mm);
+		kthread_use_mm(userptr->notifier.mm);
 	}
 
 	while (pinned < num_pages) {
@@ -123,32 +125,32 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
 	}
 
 	if (in_kthread) {
-		kthread_unuse_mm(vma->userptr.notifier.mm);
-		mmput(vma->userptr.notifier.mm);
+		kthread_unuse_mm(userptr->notifier.mm);
+		mmput(userptr->notifier.mm);
 	}
 mm_closed:
 	if (ret)
 		goto out;
 
-	ret = sg_alloc_table_from_pages_segment(&vma->userptr.sgt, pages,
+	ret = sg_alloc_table_from_pages_segment(&userptr->sgt, pages,
 						pinned, 0,
 						(u64)pinned << PAGE_SHIFT,
 						xe_sg_segment_size(xe->drm.dev),
 						GFP_KERNEL);
 	if (ret) {
-		vma->userptr.sg = NULL;
+		userptr->sg = NULL;
 		goto out;
 	}
-	vma->userptr.sg = &vma->userptr.sgt;
+	userptr->sg = &userptr->sgt;
 
-	ret = dma_map_sgtable(xe->drm.dev, vma->userptr.sg,
+	ret = dma_map_sgtable(xe->drm.dev, userptr->sg,
 			      read_only ? DMA_TO_DEVICE :
 			      DMA_BIDIRECTIONAL,
 			      DMA_ATTR_SKIP_CPU_SYNC |
 			      DMA_ATTR_NO_KERNEL_MAPPING);
 	if (ret) {
-		sg_free_table(vma->userptr.sg);
-		vma->userptr.sg = NULL;
+		sg_free_table(userptr->sg);
+		userptr->sg = NULL;
 		goto out;
 	}
 
@@ -167,8 +169,8 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
 	kvfree(pages);
 
 	if (!(ret < 0)) {
-		vma->userptr.notifier_seq = notifier_seq;
-		if (xe_vma_userptr_check_repin(vma) == -EAGAIN)
+		userptr->notifier_seq = notifier_seq;
+		if (xe_vma_userptr_check_repin(uvma) == -EAGAIN)
 			goto retry;
 	}
 
@@ -635,7 +637,9 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
 				   const struct mmu_notifier_range *range,
 				   unsigned long cur_seq)
 {
-	struct xe_vma *vma = container_of(mni, struct xe_vma, userptr.notifier);
+	struct xe_userptr *userptr = container_of(mni, typeof(*userptr), notifier);
+	struct xe_userptr_vma *uvma = container_of(userptr, typeof(*uvma), userptr);
+	struct xe_vma *vma = &uvma->vma;
 	struct xe_vm *vm = xe_vma_vm(vma);
 	struct dma_resv_iter cursor;
 	struct dma_fence *fence;
@@ -651,7 +655,7 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
 	mmu_interval_set_seq(mni, cur_seq);
 
 	/* No need to stop gpu access if the userptr is not yet bound. */
-	if (!vma->userptr.initial_bind) {
+	if (!userptr->initial_bind) {
 		up_write(&vm->userptr.notifier_lock);
 		return true;
 	}
@@ -663,7 +667,7 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
 	if (!xe_vm_in_fault_mode(vm) &&
 	    !(vma->gpuva.flags & XE_VMA_DESTROYED) && vma->tile_present) {
 		spin_lock(&vm->userptr.invalidated_lock);
-		list_move_tail(&vma->userptr.invalidate_link,
+		list_move_tail(&userptr->invalidate_link,
 			       &vm->userptr.invalidated);
 		spin_unlock(&vm->userptr.invalidated_lock);
 	}
@@ -703,7 +707,7 @@ static const struct mmu_interval_notifier_ops vma_userptr_notifier_ops = {
 
 int xe_vm_userptr_pin(struct xe_vm *vm)
 {
-	struct xe_vma *vma, *next;
+	struct xe_userptr_vma *uvma, *next;
 	int err = 0;
 	LIST_HEAD(tmp_evict);
 
@@ -711,22 +715,23 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
 
 	/* Collect invalidated userptrs */
 	spin_lock(&vm->userptr.invalidated_lock);
-	list_for_each_entry_safe(vma, next, &vm->userptr.invalidated,
+	list_for_each_entry_safe(uvma, next, &vm->userptr.invalidated,
 				 userptr.invalidate_link) {
-		list_del_init(&vma->userptr.invalidate_link);
-		list_move_tail(&vma->combined_links.userptr,
+		list_del_init(&uvma->userptr.invalidate_link);
+		list_move_tail(&uvma->userptr.repin_link,
 			       &vm->userptr.repin_list);
 	}
 	spin_unlock(&vm->userptr.invalidated_lock);
 
 	/* Pin and move to temporary list */
-	list_for_each_entry_safe(vma, next, &vm->userptr.repin_list,
-				 combined_links.userptr) {
-		err = xe_vma_userptr_pin_pages(vma);
+	list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list,
+				 userptr.repin_link) {
+		err = xe_vma_userptr_pin_pages(uvma);
 		if (err < 0)
 			return err;
 
-		list_move_tail(&vma->combined_links.userptr, &vm->rebind_list);
+		list_del_init(&uvma->userptr.repin_link);
+		list_move_tail(&uvma->vma.combined_links.rebind, &vm->rebind_list);
 	}
 
 	return 0;
@@ -782,6 +787,14 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 	return fence;
 }
 
+static void xe_vma_free(struct xe_vma *vma)
+{
+	if (xe_vma_is_userptr(vma))
+		kfree(to_userptr_vma(vma));
+	else
+		kfree(vma);
+}
+
 #define VMA_CREATE_FLAG_READ_ONLY	BIT(0)
 #define VMA_CREATE_FLAG_IS_NULL		BIT(1)
 
@@ -800,14 +813,26 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	xe_assert(vm->xe, start < end);
 	xe_assert(vm->xe, end < vm->size);
 
-	if (!bo && !is_null)	/* userptr */
+	/*
+	 * Allocate and ensure that the xe_vma_is_userptr() return
+	 * matches what was allocated.
+	 */
+	if (!bo && !is_null) {
+		struct xe_userptr_vma *uvma = kzalloc(sizeof(*uvma), GFP_KERNEL);
+
+		if (!uvma)
+			return ERR_PTR(-ENOMEM);
+
+		vma = &uvma->vma;
+	} else {
 		vma = kzalloc(sizeof(*vma), GFP_KERNEL);
-	else
-		vma = kzalloc(sizeof(*vma) - sizeof(struct xe_userptr),
-			      GFP_KERNEL);
-	if (!vma) {
-		vma = ERR_PTR(-ENOMEM);
-		return vma;
+		if (!vma)
+			return ERR_PTR(-ENOMEM);
+
+		if (is_null)
+			vma->gpuva.flags |= DRM_GPUVA_SPARSE;
+		if (bo)
+			vma->gpuva.gem.obj = &bo->ttm.base;
 	}
 
 	INIT_LIST_HEAD(&vma->combined_links.rebind);
@@ -818,8 +843,6 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	vma->gpuva.va.range = end - start + 1;
 	if (read_only)
 		vma->gpuva.flags |= XE_VMA_READ_ONLY;
-	if (is_null)
-		vma->gpuva.flags |= DRM_GPUVA_SPARSE;
 
 	for_each_tile(tile, vm->xe, id)
 		vma->tile_mask |= 0x1 << id;
@@ -836,35 +859,35 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 
 		vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
 		if (IS_ERR(vm_bo)) {
-			kfree(vma);
+			xe_vma_free(vma);
 			return ERR_CAST(vm_bo);
 		}
 
 		drm_gpuvm_bo_extobj_add(vm_bo);
 		drm_gem_object_get(&bo->ttm.base);
-		vma->gpuva.gem.obj = &bo->ttm.base;
 		vma->gpuva.gem.offset = bo_offset_or_userptr;
 		drm_gpuva_link(&vma->gpuva, vm_bo);
 		drm_gpuvm_bo_put(vm_bo);
 	} else /* userptr or null */ {
 		if (!is_null) {
+			struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
 			u64 size = end - start + 1;
 			int err;
 
-			INIT_LIST_HEAD(&vma->userptr.invalidate_link);
+			INIT_LIST_HEAD(&userptr->invalidate_link);
+			INIT_LIST_HEAD(&userptr->repin_link);
 			vma->gpuva.gem.offset = bo_offset_or_userptr;
 
-			err = mmu_interval_notifier_insert(&vma->userptr.notifier,
+			err = mmu_interval_notifier_insert(&userptr->notifier,
 							   current->mm,
 							   xe_vma_userptr(vma), size,
 							   &vma_userptr_notifier_ops);
 			if (err) {
-				kfree(vma);
-				vma = ERR_PTR(err);
-				return vma;
+				xe_vma_free(vma);
+				return ERR_PTR(err);
 			}
 
-			vma->userptr.notifier_seq = LONG_MAX;
+			userptr->notifier_seq = LONG_MAX;
 		}
 
 		xe_vm_get(vm);
@@ -880,13 +903,15 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 	bool read_only = xe_vma_read_only(vma);
 
 	if (xe_vma_is_userptr(vma)) {
-		if (vma->userptr.sg) {
+		struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
+
+		if (userptr->sg) {
 			dma_unmap_sgtable(xe->drm.dev,
-					  vma->userptr.sg,
+					  userptr->sg,
 					  read_only ? DMA_TO_DEVICE :
 					  DMA_BIDIRECTIONAL, 0);
-			sg_free_table(vma->userptr.sg);
-			vma->userptr.sg = NULL;
+			sg_free_table(userptr->sg);
+			userptr->sg = NULL;
 		}
 
 		/*
@@ -894,7 +919,7 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 		 * the notifer until we're sure the GPU is not accessing
 		 * them anymore
 		 */
-		mmu_interval_notifier_remove(&vma->userptr.notifier);
+		mmu_interval_notifier_remove(&userptr->notifier);
 		xe_vm_put(vm);
 	} else if (xe_vma_is_null(vma)) {
 		xe_vm_put(vm);
@@ -902,7 +927,7 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 		xe_bo_put(xe_vma_bo(vma));
 	}
 
-	kfree(vma);
+	xe_vma_free(vma);
 }
 
 static void vma_destroy_work_func(struct work_struct *w)
@@ -933,7 +958,7 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 		xe_assert(vm->xe, vma->gpuva.flags & XE_VMA_DESTROYED);
 
 		spin_lock(&vm->userptr.invalidated_lock);
-		list_del(&vma->userptr.invalidate_link);
+		list_del(&to_userptr_vma(vma)->userptr.invalidate_link);
 		spin_unlock(&vm->userptr.invalidated_lock);
 	} else if (!xe_vma_is_null(vma)) {
 		xe_bo_assert_held(xe_vma_bo(vma));
@@ -2150,7 +2175,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 		drm_exec_fini(&exec);
 
 	if (xe_vma_is_userptr(vma)) {
-		err = xe_vma_userptr_pin_pages(vma);
+		err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
 		if (err) {
 			prep_vma_destroy(vm, vma, false);
 			xe_vma_destroy_unlocked(vma);
@@ -2507,7 +2532,7 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
 
 	if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
 		lockdep_assert_held_write(&vm->lock);
-		err = xe_vma_userptr_pin_pages(vma);
+		err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
 		if (!err)
 			goto retry_userptr;
 
@@ -3138,8 +3163,8 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
 	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
 		if (xe_vma_is_userptr(vma)) {
 			WARN_ON_ONCE(!mmu_interval_check_retry
-				     (&vma->userptr.notifier,
-				      vma->userptr.notifier_seq));
+				     (&to_userptr_vma(vma)->userptr.notifier,
+				      to_userptr_vma(vma)->userptr.notifier_seq));
 			WARN_ON_ONCE(!dma_resv_test_signaled(xe_vm_resv(xe_vma_vm(vma)),
 							     DMA_RESV_USAGE_BOOKKEEP));
 
@@ -3200,11 +3225,11 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
 		if (is_null) {
 			addr = 0;
 		} else if (is_userptr) {
+			struct sg_table *sg = to_userptr_vma(vma)->userptr.sg;
 			struct xe_res_cursor cur;
 
-			if (vma->userptr.sg) {
-				xe_res_first_sg(vma->userptr.sg, 0, XE_PAGE_SIZE,
-						&cur);
+			if (sg) {
+				xe_res_first_sg(sg, 0, XE_PAGE_SIZE, &cur);
 				addr = xe_res_dma(&cur);
 			} else {
 				addr = 0;
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index e9c907cbcd891e..df4a82e960ff0c 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -160,6 +160,18 @@ static inline bool xe_vma_is_userptr(struct xe_vma *vma)
 	return xe_vma_has_no_bo(vma) && !xe_vma_is_null(vma);
 }
 
+/**
+ * to_userptr_vma() - Return a pointer to an embedding userptr vma
+ * @vma: Pointer to the embedded struct xe_vma
+ *
+ * Return: Pointer to the embedding userptr vma
+ */
+static inline struct xe_userptr_vma *to_userptr_vma(struct xe_vma *vma)
+{
+	xe_assert(xe_vma_vm(vma)->xe, xe_vma_is_userptr(vma));
+	return container_of(vma, struct xe_userptr_vma, vma);
+}
+
 u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_tile *tile);
 
 int xe_vm_create_ioctl(struct drm_device *dev, void *data,
@@ -222,9 +234,9 @@ static inline void xe_vm_reactivate_rebind(struct xe_vm *vm)
 	}
 }
 
-int xe_vma_userptr_pin_pages(struct xe_vma *vma);
+int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma);
 
-int xe_vma_userptr_check_repin(struct xe_vma *vma);
+int xe_vma_userptr_check_repin(struct xe_userptr_vma *uvma);
 
 bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 63e8a50b88e949..1fec66ae2eb2dd 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -37,6 +37,8 @@ struct xe_vm;
 struct xe_userptr {
 	/** @invalidate_link: Link for the vm::userptr.invalidated list */
 	struct list_head invalidate_link;
+	/** @userptr: link into VM repin list if userptr. */
+	struct list_head repin_link;
 	/**
 	 * @notifier: MMU notifier for user pointer (invalidation call back)
 	 */
@@ -68,8 +70,6 @@ struct xe_vma {
 	 * resv.
 	 */
 	union {
-		/** @userptr: link into VM repin list if userptr. */
-		struct list_head userptr;
 		/** @rebind: link into VM if this VMA needs rebinding. */
 		struct list_head rebind;
 		/** @destroy: link to contested list when VM is being closed. */
@@ -105,11 +105,15 @@ struct xe_vma {
 	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
 	 */
 	u16 pat_index;
+};
 
-	/**
-	 * @userptr: user pointer state, only allocated for VMAs that are
-	 * user pointers
-	 */
+/**
+ * struct xe_userptr_vma - A userptr vma subclass
+ * @vma: The vma.
+ * @userptr: Additional userptr information.
+ */
+struct xe_userptr_vma {
+	struct xe_vma vma;
 	struct xe_userptr userptr;
 };
 

From d6beadc8d7326adf4fb6e62bb0453b17b93816c7 Mon Sep 17 00:00:00 2001
From: Suraj Kandpal <suraj.kandpal@intel.com>
Date: Wed, 24 Jan 2024 10:22:48 +0530
Subject: [PATCH 143/200] drm/xe/gsc: Add status check during gsc header
 readout

Before checking if data is present in the message reply check the
status in header and see if it indicates any error.

--v2
- Use drm_err() instead of drm_dbg_kms() [Daniele]

--v3
- Use &xe->drm in drm_err to make it more cleaner [Daniele]

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124045248.687023-1-suraj.kandpal@intel.com
---
 drivers/gpu/drm/xe/xe_gsc_submit.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gsc_submit.c b/drivers/gpu/drm/xe/xe_gsc_submit.c
index 9ecc1ead684470..348994b271be6a 100644
--- a/drivers/gpu/drm/xe/xe_gsc_submit.c
+++ b/drivers/gpu/drm/xe/xe_gsc_submit.c
@@ -125,11 +125,18 @@ int xe_gsc_read_out_header(struct xe_device *xe,
 {
 	u32 marker = mtl_gsc_header_rd(xe, map, offset, validity_marker);
 	u32 size = mtl_gsc_header_rd(xe, map, offset, message_size);
+	u32 status = mtl_gsc_header_rd(xe, map, offset, status);
 	u32 payload_size = size - GSC_HDR_SIZE;
 
 	if (marker != GSC_HECI_VALIDITY_MARKER)
 		return -EPROTO;
 
+	if (status != 0) {
+		drm_err(&xe->drm, "GSC header readout indicates error: %d\n",
+			status);
+		return -EINVAL;
+	}
+
 	if (size < GSC_HDR_SIZE || payload_size < min_payload_size)
 		return -ENODATA;
 

From 3acc1ff1a72fce00cdbd3ef1c27108a967fd5616 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Thu, 1 Feb 2024 09:55:32 -0800
Subject: [PATCH 144/200] drm/xe: Fix loop in vm_bind_ioctl_ops_unwind

The logic for the unwind loop is incorrect resulting in an infinite
loop. Fix to unwind to go from the last operations list to he first.

Fixes: 617eebb9c480 ("drm/xe: Fix array of binds")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201175532.2303168-1-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_vm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index c107f7f414d88c..9c1c68a2fff734 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2669,7 +2669,7 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
 {
 	int i;
 
-	for (i = num_ops_list - 1; i; ++i) {
+	for (i = num_ops_list - 1; i >= 0; --i) {
 		struct drm_gpuva_ops *__ops = ops[i];
 		struct drm_gpuva_op *__op;
 

From 5fcbf83e39ecde8e54c0b3da3a755a306a0ac348 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Thu, 1 Feb 2024 10:48:44 -0800
Subject: [PATCH 145/200] drm/xe: Drop rebind argument from xe_pt_prepare_bind

This is unused, drop it.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201184844.2317004-1-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_pt.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index e45b37c3f0c262..3a99bf6e558f79 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -861,8 +861,7 @@ static void xe_pt_commit_bind(struct xe_vma *vma,
 
 static int
 xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma,
-		   struct xe_vm_pgtable_update *entries, u32 *num_entries,
-		   bool rebind)
+		   struct xe_vm_pgtable_update *entries, u32 *num_entries)
 {
 	int err;
 
@@ -1218,7 +1217,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 	       "Preparing bind, with range [%llx...%llx) engine %p.\n",
 	       xe_vma_start(vma), xe_vma_end(vma), q);
 
-	err = xe_pt_prepare_bind(tile, vma, entries, &num_entries, rebind);
+	err = xe_pt_prepare_bind(tile, vma, entries, &num_entries);
 	if (err)
 		goto err;
 	xe_tile_assert(tile, num_entries <= ARRAY_SIZE(entries));

From a856b67a84169e065ebbeee50258936b1eacc9eb Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Wed, 31 Jan 2024 16:48:48 -0800
Subject: [PATCH 146/200] drm/xe: Take a reference in
 xe_exec_queue_last_fence_get()

Take a reference in xe_exec_queue_last_fence_get(). Also fix a reference
counting underflow bug VM bind and unbind.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201004849.2219558-2-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_exec_queue.c | 8 ++++++--
 drivers/gpu/drm/xe/xe_migrate.c    | 5 ++++-
 drivers/gpu/drm/xe/xe_sched_job.c  | 1 -
 drivers/gpu/drm/xe/xe_sync.c       | 2 --
 drivers/gpu/drm/xe/xe_vm.c         | 1 +
 5 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index c0b7434e78f114..2976635be4d396 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -976,20 +976,24 @@ void xe_exec_queue_last_fence_put_unlocked(struct xe_exec_queue *q)
  * @q: The exec queue
  * @vm: The VM the engine does a bind or exec for
  *
- * Get last fence, does not take a ref
+ * Get last fence, takes a ref
  *
  * Returns: last fence if not signaled, dma fence stub if signaled
  */
 struct dma_fence *xe_exec_queue_last_fence_get(struct xe_exec_queue *q,
 					       struct xe_vm *vm)
 {
+	struct dma_fence *fence;
+
 	xe_exec_queue_last_fence_lockdep_assert(q, vm);
 
 	if (q->last_fence &&
 	    test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &q->last_fence->flags))
 		xe_exec_queue_last_fence_put(q, vm);
 
-	return q->last_fence ? q->last_fence : dma_fence_get_stub();
+	fence = q->last_fence ? q->last_fence : dma_fence_get_stub();
+	dma_fence_get(fence);
+	return fence;
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 9ab004871f9a62..894e36c28f3271 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1214,8 +1214,11 @@ static bool no_in_syncs(struct xe_vm *vm, struct xe_exec_queue *q,
 	}
 	if (q) {
 		fence = xe_exec_queue_last_fence_get(q, vm);
-		if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+		if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
+			dma_fence_put(fence);
 			return false;
+		}
+		dma_fence_put(fence);
 	}
 
 	return true;
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index cde1407867db6c..8151ddafb94075 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -274,7 +274,6 @@ int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm)
 	struct dma_fence *fence;
 
 	fence = xe_exec_queue_last_fence_get(job->q, vm);
-	dma_fence_get(fence);
 
 	return drm_sched_job_add_dependency(&job->drm, fence);
 }
diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
index e4c220cf9115e9..aab92bee1d7cf2 100644
--- a/drivers/gpu/drm/xe/xe_sync.c
+++ b/drivers/gpu/drm/xe/xe_sync.c
@@ -307,7 +307,6 @@ xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
 	/* Easy case... */
 	if (!num_in_fence) {
 		fence = xe_exec_queue_last_fence_get(q, vm);
-		dma_fence_get(fence);
 		return fence;
 	}
 
@@ -322,7 +321,6 @@ xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
 		}
 	}
 	fences[current_fence++] = xe_exec_queue_last_fence_get(q, vm);
-	dma_fence_get(fences[current_fence - 1]);
 	cf = dma_fence_array_create(num_in_fence, fences,
 				    vm->composite_fence_ctx,
 				    vm->composite_fence_seqno++,
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 9c1c68a2fff734..a74f79edfbc8e0 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1984,6 +1984,7 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
 					xe_exec_queue_last_fence_get(wait_exec_queue, vm);
 
 				xe_sync_entry_signal(&syncs[i], NULL, fence);
+				dma_fence_put(fence);
 			}
 		}
 

From 447f74d223b4f6cbab74963bf1099050c15374ce Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Wed, 31 Jan 2024 16:48:49 -0800
Subject: [PATCH 147/200] drm/xe: Pick correct userptr VMA to repin on REMAP op
 failure

A REMAP op is composed of 3 VMA's - unmap, prev map, and next map. When
op_execute fails with -EAGAIN we need to update the local VMA pointer to
the current op state and then repin the VMA if it is a userptr.

Fixes a failure seen in xe_vm.munmap-style-unbind-userptr-one-partial.

Fixes: b06d47be7c83 ("drm/xe: Port Xe to GPUVA")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201004849.2219558-3-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_vm.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a74f79edfbc8e0..7e29b816c4d426 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2531,13 +2531,25 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
 	}
 	drm_exec_fini(&exec);
 
-	if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
+	if (err == -EAGAIN) {
 		lockdep_assert_held_write(&vm->lock);
-		err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
-		if (!err)
-			goto retry_userptr;
 
-		trace_xe_vma_fail(vma);
+		if (op->base.op == DRM_GPUVA_OP_REMAP) {
+			if (!op->remap.unmap_done)
+				vma = gpuva_to_vma(op->base.remap.unmap->va);
+			else if (op->remap.prev)
+				vma = op->remap.prev;
+			else
+				vma = op->remap.next;
+		}
+
+		if (xe_vma_is_userptr(vma)) {
+			err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
+			if (!err)
+				goto retry_userptr;
+
+			trace_xe_vma_fail(vma);
+		}
 	}
 
 	return err;

From 774ef5dfc95578a9079426d5106076dcd59c4dfa Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 3 Jan 2024 12:48:02 +0100
Subject: [PATCH 148/200] drm/xe: circumvent bogus stringop-overflow warning

gcc-13 warns about an array overflow that it sees but that is
prevented by the "asid % NUM_PF_QUEUE" calculation:

drivers/gpu/drm/xe/xe_gt_pagefault.c: In function 'xe_guc_pagefault_handler':
include/linux/fortify-string.h:57:33: error: writing 16 bytes into a region of size 0 [-Werror=stringop-overflow=]
include/linux/fortify-string.h:689:26: note: in expansion of macro '__fortify_memcpy_chk'
  689 | #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
      |                          ^~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/xe/xe_gt_pagefault.c:341:17: note: in expansion of macro 'memcpy'
  341 |                 memcpy(pf_queue->data + pf_queue->tail, msg, len * sizeof(u32));
      |                 ^~~~~~
drivers/gpu/drm/xe/xe_gt_types.h:102:25: note: at offset [1144, 265324] into destination object 'tile' of size 8

I found that rewriting the assignment using pointer addition rather than the
equivalent array index calculation prevents the warning, so use that instead.

I sent a bug report against gcc for the false positive warning.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113214
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240103114819.2913937-1-arnd@kernel.org
---
 drivers/gpu/drm/xe/xe_gt_pagefault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 78970259cea948..c26e4fcca01e65 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -340,7 +340,7 @@ int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len)
 		return -EPROTO;
 
 	asid = FIELD_GET(PFD_ASID, msg[1]);
-	pf_queue = &gt->usm.pf_queue[asid % NUM_PF_QUEUE];
+	pf_queue = gt->usm.pf_queue + (asid % NUM_PF_QUEUE);
 
 	spin_lock_irqsave(&pf_queue->lock, flags);
 	full = pf_queue_full(pf_queue);

From 72f86ed3c88933d6fa09b036de93621ea71097a7 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Thu, 1 Feb 2024 19:34:40 -0800
Subject: [PATCH 149/200] drm/xe: Map both mem.kernel_bb_pool and usm.bb_pool

For integrated devices we need to map both mem.kernel_bb_pool and
usm.bb_pool to be able to run batches from both pools.

Fixes: a682b6a42d4d ("drm/xe: Support device page faults on integrated platforms")
Tested-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202033440.2351862-1-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_gt.c      |  5 ++++-
 drivers/gpu/drm/xe/xe_migrate.c | 23 ++++++++++++++++++-----
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 675a2927a19ef1..a30b8cfd2ddbcd 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -456,7 +456,10 @@ static int all_fw_domain_init(struct xe_gt *gt)
 		 * USM has its only SA pool to non-block behind user operations
 		 */
 		if (gt_to_xe(gt)->info.has_usm) {
-			gt->usm.bb_pool = xe_sa_bo_manager_init(gt_to_tile(gt), SZ_1M, 16);
+			struct xe_device *xe = gt_to_xe(gt);
+
+			gt->usm.bb_pool = xe_sa_bo_manager_init(gt_to_tile(gt),
+								IS_DGFX(xe) ? SZ_1M : SZ_512K, 16);
 			if (IS_ERR(gt->usm.bb_pool)) {
 				err = PTR_ERR(gt->usm.bb_pool);
 				goto err_force_wake;
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 894e36c28f3271..3d2438dc86eeef 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -180,11 +180,6 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 	if (!IS_DGFX(xe)) {
 		/* Write out batch too */
 		m->batch_base_ofs = NUM_PT_SLOTS * XE_PAGE_SIZE;
-		if (xe->info.has_usm) {
-			batch = tile->primary_gt->usm.bb_pool->bo;
-			m->usm_batch_base_ofs = m->batch_base_ofs;
-		}
-
 		for (i = 0; i < batch->size;
 		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
 		     XE_PAGE_SIZE) {
@@ -195,6 +190,24 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 				  entry);
 			level++;
 		}
+		if (xe->info.has_usm) {
+			xe_tile_assert(tile, batch->size == SZ_1M);
+
+			batch = tile->primary_gt->usm.bb_pool->bo;
+			m->usm_batch_base_ofs = m->batch_base_ofs + SZ_1M;
+			xe_tile_assert(tile, batch->size == SZ_512K);
+
+			for (i = 0; i < batch->size;
+			     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
+			     XE_PAGE_SIZE) {
+				entry = vm->pt_ops->pte_encode_bo(batch, i,
+								  pat_index, 0);
+
+				xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
+					  entry);
+				level++;
+			}
+		}
 	} else {
 		u64 batch_addr = xe_bo_addr(batch, 0, XE_PAGE_SIZE);
 

From 86c99abb5f1b6fcd69fb268eeb2e34cb7c4f355c Mon Sep 17 00:00:00 2001
From: Xiaoming Wang <xiaoming.wang@intel.com>
Date: Fri, 2 Feb 2024 13:56:58 -0800
Subject: [PATCH 150/200] drm/xe/display: Fix memleak in display initialization

intel_power_domains_init is called twice in xe_device_probe:

1) intel_power_domains_init()
   xe_display_init_nommio()
   xe_device_probe()

2) intel_power_domains_init()
   intel_display_driver_probe_noirq()
   xe_display_init_noirq()
   xe_device_probe()

It needs remove one to avoid power_domains->power_wells double malloc.

unreferenced object 0xffff88811150ee00 (size 512):
  comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
  hex dump (first 32 bytes):
    10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff  ................
    ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
    [<ffffffff812c98b2>] __kmalloc+0x52/0x150
    [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
    [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
    [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
    [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
    [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
    [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
    [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
    [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
    [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
    [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
    [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
    [<ffffffff8192e159>] bus_add_driver+0x119/0x220
    [<ffffffff81930d00>] driver_register+0x60/0x120
    [<ffffffffa05e50a0>] 0xffffffffa05e50a0

The call to intel_power_domains_cleanup() needs to stay where it is for
now. The main issue is that while the init is called by the display
side, shared by i915 and xe, the cleanup is called by a non-shared code
path. Fixing that will be done as a separate commit.

Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Signed-off-by: Xiaoming Wang <xiaoming.wang@intel.com>
[ reword commit message and explain why the fini needs to stay
  where it is ]
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202215658.561298-1-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/display/xe_display.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index 74391d9b11ae0e..e4db069f0db3f1 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -134,8 +134,6 @@ static void xe_display_fini_nommio(struct drm_device *dev, void *dummy)
 
 int xe_display_init_nommio(struct xe_device *xe)
 {
-	int err;
-
 	if (!xe->info.enable_display)
 		return 0;
 
@@ -145,10 +143,6 @@ int xe_display_init_nommio(struct xe_device *xe)
 	/* This must be called before any calls to HAS_PCH_* */
 	intel_detect_pch(xe);
 
-	err = intel_power_domains_init(xe);
-	if (err)
-		return err;
-
 	return drmm_add_action_or_reset(&xe->drm, xe_display_fini_nommio, xe);
 }
 

From 6650ad3e094812f27d9a70d82e5633271a7c9a5a Mon Sep 17 00:00:00 2001
From: John Harrison <John.C.Harrison@Intel.com>
Date: Fri, 2 Feb 2024 12:00:14 -0800
Subject: [PATCH 151/200] drm/xe/uc: Include patch version in expectations

Patch level releases can be just as important as major level releases
if they fix a critical bug. So include the patch version in the
expectation check so the user is properly informed if they need to
update.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202200017.2133438-4-John.C.Harrison@Intel.com
---
 drivers/gpu/drm/xe/xe_uc_fw.c | 49 +++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c b/drivers/gpu/drm/xe/xe_uc_fw.c
index 9dff96dfe45592..d3f0fe2101a793 100644
--- a/drivers/gpu/drm/xe/xe_uc_fw.c
+++ b/drivers/gpu/drm/xe/xe_uc_fw.c
@@ -92,6 +92,7 @@ struct uc_fw_entry {
 		const char *path;
 		u16 major;
 		u16 minor;
+		u16 patch;
 		bool full_ver_required;
 	};
 };
@@ -102,14 +103,14 @@ struct fw_blobs_by_type {
 };
 
 #define XE_GUC_FIRMWARE_DEFS(fw_def, mmp_ver, major_ver)			\
-	fw_def(METEORLAKE,	major_ver(i915,	guc,	mtl,	70, 7))		\
-	fw_def(DG2,		major_ver(i915,	guc,	dg2,	70, 5))		\
-	fw_def(DG1,		major_ver(i915,	guc,	dg1,	70, 5))		\
-	fw_def(ALDERLAKE_N,	major_ver(i915,	guc,	tgl,	70, 5))		\
-	fw_def(ALDERLAKE_P,	major_ver(i915,	guc,	adlp,	70, 5))		\
-	fw_def(ALDERLAKE_S,	major_ver(i915,	guc,	tgl,	70, 5))		\
-	fw_def(ROCKETLAKE,	major_ver(i915,	guc,	tgl,	70, 5))		\
-	fw_def(TIGERLAKE,	major_ver(i915,	guc,	tgl,	70, 5))
+	fw_def(METEORLAKE,	major_ver(i915,	guc,	mtl,	70, 7, 0))	\
+	fw_def(DG2,		major_ver(i915,	guc,	dg2,	70, 5, 0))	\
+	fw_def(DG1,		major_ver(i915,	guc,	dg1,	70, 5, 0))	\
+	fw_def(ALDERLAKE_N,	major_ver(i915,	guc,	tgl,	70, 5, 0))	\
+	fw_def(ALDERLAKE_P,	major_ver(i915,	guc,	adlp,	70, 5, 0))	\
+	fw_def(ALDERLAKE_S,	major_ver(i915,	guc,	tgl,	70, 5, 0))	\
+	fw_def(ROCKETLAKE,	major_ver(i915,	guc,	tgl,	70, 5, 0))	\
+	fw_def(TIGERLAKE,	major_ver(i915,	guc,	tgl,	70, 5, 0))
 
 #define XE_HUC_FIRMWARE_DEFS(fw_def, mmp_ver, no_ver)		\
 	fw_def(METEORLAKE,	no_ver(i915,	huc_gsc,	mtl))		\
@@ -121,24 +122,24 @@ struct fw_blobs_by_type {
 
 /* for the GSC FW we match the compatibility version and not the release one */
 #define XE_GSC_FIRMWARE_DEFS(fw_def, major_ver)		\
-	fw_def(METEORLAKE,	major_ver(i915,	gsc,	mtl,	1, 0))
+	fw_def(METEORLAKE,	major_ver(i915,	gsc,	mtl,	1, 0, 0))
 
 #define MAKE_FW_PATH(dir__, uc__, shortname__, version__)			\
 	__stringify(dir__) "/" __stringify(shortname__) "_" __stringify(uc__) version__ ".bin"
 
 #define fw_filename_mmp_ver(dir_, uc_, shortname_, a, b, c)			\
 	MAKE_FW_PATH(dir_, uc_, shortname_, "_" __stringify(a ## . ## b ## . ## c))
-#define fw_filename_major_ver(dir_, uc_, shortname_, a, b)			\
+#define fw_filename_major_ver(dir_, uc_, shortname_, a, b, c)			\
 	MAKE_FW_PATH(dir_, uc_, shortname_, "_" __stringify(a))
 #define fw_filename_no_ver(dir_, uc_, shortname_)				\
 	MAKE_FW_PATH(dir_, uc_, shortname_, "")
 
 #define uc_fw_entry_mmp_ver(dir_, uc_, shortname_, a, b, c)			\
 	{ fw_filename_mmp_ver(dir_, uc_, shortname_, a, b, c),			\
-	  a, b, true }
-#define uc_fw_entry_major_ver(dir_, uc_, shortname_, a, b)			\
-	{ fw_filename_major_ver(dir_, uc_, shortname_, a, b),			\
-	  a, b }
+	  a, b, c, true }
+#define uc_fw_entry_major_ver(dir_, uc_, shortname_, a, b, c)			\
+	{ fw_filename_major_ver(dir_, uc_, shortname_, a, b, c),		\
+	  a, b, c }
 #define uc_fw_entry_no_ver(dir_, uc_, shortname_)				\
 	{ fw_filename_no_ver(dir_, uc_, shortname_),				\
 	  0, 0 }
@@ -221,6 +222,7 @@ uc_fw_auto_select(struct xe_device *xe, struct xe_uc_fw *uc_fw)
 			uc_fw->path = entries[i].path;
 			uc_fw->versions.wanted.major = entries[i].major;
 			uc_fw->versions.wanted.minor = entries[i].minor;
+			uc_fw->versions.wanted.patch = entries[i].patch;
 			uc_fw->full_ver_required = entries[i].full_ver_required;
 
 			if (uc_fw->type == XE_UC_FW_TYPE_GSC)
@@ -340,19 +342,22 @@ int xe_uc_fw_check_version_requirements(struct xe_uc_fw *uc_fw)
 	 * Otherwise, at least the major version.
 	 */
 	if (wanted->major != found->major ||
-	    (uc_fw->full_ver_required && wanted->minor != found->minor)) {
-		drm_notice(&xe->drm, "%s firmware %s: unexpected version: %u.%u != %u.%u\n",
+	    (uc_fw->full_ver_required &&
+	     ((wanted->minor != found->minor) ||
+	      (wanted->patch != found->patch)))) {
+		drm_notice(&xe->drm, "%s firmware %s: unexpected version: %u.%u.%u != %u.%u.%u\n",
 			   xe_uc_fw_type_repr(uc_fw->type), uc_fw->path,
-			   found->major, found->minor,
-			   wanted->major, wanted->minor);
+			   found->major, found->minor, found->patch,
+			   wanted->major, wanted->minor, wanted->patch);
 		goto fail;
 	}
 
-	if (wanted->minor > found->minor) {
-		drm_notice(&xe->drm, "%s firmware (%u.%u) is recommended, but only (%u.%u) was found in %s\n",
+	if (wanted->minor > found->minor ||
+	    (wanted->minor == found->minor && wanted->patch > found->patch)) {
+		drm_notice(&xe->drm, "%s firmware (%u.%u.%u) is recommended, but only (%u.%u.%u) was found in %s\n",
 			   xe_uc_fw_type_repr(uc_fw->type),
-			   wanted->major, wanted->minor,
-			   found->major, found->minor,
+			   wanted->major, wanted->minor, wanted->patch,
+			   found->major, found->minor, found->patch,
 			   uc_fw->path);
 		drm_info(&xe->drm, "Consider updating your linux-firmware pkg or downloading from %s\n",
 			 XE_UC_FIRMWARE_URL);

From 32ca46bf294462acad91235ab15e37f1cb3ca73b Mon Sep 17 00:00:00 2001
From: John Harrison <John.C.Harrison@Intel.com>
Date: Fri, 2 Feb 2024 12:00:15 -0800
Subject: [PATCH 152/200] drm/xe/guc: Update to GuC firmware 70.19.2

API compatibility version: 1.8.2

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202200017.2133438-5-John.C.Harrison@Intel.com
---
 drivers/gpu/drm/xe/xe_uc_fw.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c b/drivers/gpu/drm/xe/xe_uc_fw.c
index d3f0fe2101a793..d896c98dd0bb45 100644
--- a/drivers/gpu/drm/xe/xe_uc_fw.c
+++ b/drivers/gpu/drm/xe/xe_uc_fw.c
@@ -103,14 +103,14 @@ struct fw_blobs_by_type {
 };
 
 #define XE_GUC_FIRMWARE_DEFS(fw_def, mmp_ver, major_ver)			\
-	fw_def(METEORLAKE,	major_ver(i915,	guc,	mtl,	70, 7, 0))	\
-	fw_def(DG2,		major_ver(i915,	guc,	dg2,	70, 5, 0))	\
-	fw_def(DG1,		major_ver(i915,	guc,	dg1,	70, 5, 0))	\
-	fw_def(ALDERLAKE_N,	major_ver(i915,	guc,	tgl,	70, 5, 0))	\
-	fw_def(ALDERLAKE_P,	major_ver(i915,	guc,	adlp,	70, 5, 0))	\
-	fw_def(ALDERLAKE_S,	major_ver(i915,	guc,	tgl,	70, 5, 0))	\
-	fw_def(ROCKETLAKE,	major_ver(i915,	guc,	tgl,	70, 5, 0))	\
-	fw_def(TIGERLAKE,	major_ver(i915,	guc,	tgl,	70, 5, 0))
+	fw_def(METEORLAKE,	major_ver(i915,	guc,	mtl,	70, 19, 2))	\
+	fw_def(DG2,		major_ver(i915,	guc,	dg2,	70, 19, 2))	\
+	fw_def(DG1,		major_ver(i915,	guc,	dg1,	70, 19, 2))	\
+	fw_def(ALDERLAKE_N,	major_ver(i915,	guc,	tgl,	70, 19, 2))	\
+	fw_def(ALDERLAKE_P,	major_ver(i915,	guc,	adlp,	70, 19, 2))	\
+	fw_def(ALDERLAKE_S,	major_ver(i915,	guc,	tgl,	70, 19, 2))	\
+	fw_def(ROCKETLAKE,	major_ver(i915,	guc,	tgl,	70, 19, 2))	\
+	fw_def(TIGERLAKE,	major_ver(i915,	guc,	tgl,	70, 19, 2))
 
 #define XE_HUC_FIRMWARE_DEFS(fw_def, mmp_ver, no_ver)		\
 	fw_def(METEORLAKE,	no_ver(i915,	huc_gsc,	mtl))		\

From db0adab049120e2df92420139538a22c8ee6faa0 Mon Sep 17 00:00:00 2001
From: John Harrison <John.C.Harrison@Intel.com>
Date: Fri, 2 Feb 2024 12:00:16 -0800
Subject: [PATCH 153/200] drm/xe/guc: Add support for LNL firmware

First release of GuC firmware for LNL is now available, so start
using it.

v2: Actually use xe directory. Doh! (review feedback from Lucas)

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202200017.2133438-6-John.C.Harrison@Intel.com
---
 drivers/gpu/drm/xe/xe_uc_fw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c b/drivers/gpu/drm/xe/xe_uc_fw.c
index d896c98dd0bb45..4714f2c8d2bae1 100644
--- a/drivers/gpu/drm/xe/xe_uc_fw.c
+++ b/drivers/gpu/drm/xe/xe_uc_fw.c
@@ -103,6 +103,7 @@ struct fw_blobs_by_type {
 };
 
 #define XE_GUC_FIRMWARE_DEFS(fw_def, mmp_ver, major_ver)			\
+	fw_def(LUNARLAKE,	major_ver(xe,	guc,	lnl,	70, 19, 2))	\
 	fw_def(METEORLAKE,	major_ver(i915,	guc,	mtl,	70, 19, 2))	\
 	fw_def(DG2,		major_ver(i915,	guc,	dg2,	70, 19, 2))	\
 	fw_def(DG1,		major_ver(i915,	guc,	dg1,	70, 19, 2))	\

From 17ffcdb041a4bce4db8f96552cbcf7ec8897490c Mon Sep 17 00:00:00 2001
From: Nirmoy Das <nirmoy.das@intel.com>
Date: Wed, 31 Jan 2024 06:18:38 +0100
Subject: [PATCH 154/200] drm/xe/query: Use kzalloc for drm_xe_query_engines

Use kzalloc like other routines for better consistency.

v2: Improve the subject(Matt)

Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240131051838.24705-1-nirmoy.das@intel.com
---
 drivers/gpu/drm/xe/xe_query.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
index 7e924faeeea0b0..4f1ab91dbec584 100644
--- a/drivers/gpu/drm/xe/xe_query.c
+++ b/drivers/gpu/drm/xe/xe_query.c
@@ -198,7 +198,7 @@ static int query_engines(struct xe_device *xe,
 		return -EINVAL;
 	}
 
-	engines = kmalloc(size, GFP_KERNEL);
+	engines = kzalloc(size, GFP_KERNEL);
 	if (!engines)
 		return -ENOMEM;
 
@@ -212,14 +212,10 @@ static int query_engines(struct xe_device *xe,
 			engines->engines[i].instance.engine_instance =
 				hwe->logical_instance;
 			engines->engines[i].instance.gt_id = gt->info.id;
-			engines->engines[i].instance.pad = 0;
-			memset(engines->engines[i].reserved, 0,
-			       sizeof(engines->engines[i].reserved));
 
 			i++;
 		}
 
-	engines->pad = 0;
 	engines->num_engines = i;
 
 	if (copy_to_user(query_ptr, engines, size)) {

From 5ad6af5c91e9b942c44b657122270d935db3a813 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Mon, 5 Feb 2024 15:17:14 -0800
Subject: [PATCH 155/200] drm/xe: Assume large page size if VMA not yet bound

The calculation to determine max page size of a VMA during a REMAP
operations assumes the VMA has been bound. This assumption is not true
if the VMA is from an eariler operation in an array of binds. If a VMA
has not been bound use the maximum page size which will ensure the
previous / next REMAP operations are not incorrectly skipped.

Fixes: 8f33b4f054fc ("drm/xe: Avoid doing rebinds")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240205231714.2956225-1-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_vm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 7e29b816c4d426..ed594fa2f8da13 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2200,8 +2200,10 @@ static u64 xe_vma_max_pte_size(struct xe_vma *vma)
 		return SZ_1G;
 	else if (vma->gpuva.flags & XE_VMA_PTE_2M)
 		return SZ_2M;
+	else if (vma->gpuva.flags & XE_VMA_PTE_4K)
+		return SZ_4K;
 
-	return SZ_4K;
+	return SZ_1G;	/* Uninitialized, used max size */
 }
 
 static u64 xe_vma_set_pte_size(struct xe_vma *vma, u64 size)

From 8087199cd5951c1eba26003b3e4296dbb2110adf Mon Sep 17 00:00:00 2001
From: Matthew Auld <matthew.auld@intel.com>
Date: Fri, 2 Feb 2024 17:14:36 +0000
Subject: [PATCH 156/200] drm/xe/vm: don't ignore error when in_kthread
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If GUP fails and we are in_kthread, we can have pinned = 0 and ret = 0.
If that happens we call sg_alloc_append_table_from_pages() with n_pages
= 0, which is not well behaved and can trigger:

kernel BUG at include/linux/scatterlist.h:115!

depending on if the pages array happens to be zeroed or not. Even if we
don't hit that it crashes later when trying to dma_map the returned
table.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202171435.427630-2-matthew.auld@intel.com
---
 drivers/gpu/drm/xe/xe_vm.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index ed594fa2f8da13..78952a9a15827b 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -114,11 +114,8 @@ int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma)
 					  num_pages - pinned,
 					  read_only ? 0 : FOLL_WRITE,
 					  &pages[pinned]);
-		if (ret < 0) {
-			if (in_kthread)
-				ret = 0;
+		if (ret < 0)
 			break;
-		}
 
 		pinned += ret;
 		ret = 0;

From 404669db60103ee6e6e4fe17bd6015bb5882e7b4 Mon Sep 17 00:00:00 2001
From: Karthik Poosa <karthik.poosa@intel.com>
Date: Thu, 1 Feb 2024 23:36:00 +0530
Subject: [PATCH 157/200] drm/xe/hwmon: Refactor xe hwmon

Check latest platform first in xe_hwmon_get_reg.
Move PVC HWMON registers to regs/xe_pcode.h.

Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201180600.434822-1-karthik.poosa@intel.com
---
 drivers/gpu/drm/xe/regs/xe_gt_regs.h    |  6 ------
 drivers/gpu/drm/xe/regs/xe_pcode_regs.h | 21 +++++++++++++++++++++
 drivers/gpu/drm/xe/xe_hwmon.c           | 25 +++++++++++++------------
 3 files changed, 34 insertions(+), 18 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/regs/xe_pcode_regs.h

diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index cd27480f648626..15ac2d284d48f6 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -490,10 +490,4 @@
 #define   GT_CS_MASTER_ERROR_INTERRUPT		REG_BIT(3)
 #define   GT_RENDER_USER_INTERRUPT		REG_BIT(0)
 
-#define PVC_GT0_PACKAGE_ENERGY_STATUS		XE_REG(0x281004)
-#define PVC_GT0_PACKAGE_RAPL_LIMIT		XE_REG(0x281008)
-#define PVC_GT0_PACKAGE_POWER_SKU_UNIT		XE_REG(0x281068)
-#define PVC_GT0_PLATFORM_ENERGY_STATUS		XE_REG(0x28106c)
-#define PVC_GT0_PACKAGE_POWER_SKU		XE_REG(0x281080)
-
 #endif
diff --git a/drivers/gpu/drm/xe/regs/xe_pcode_regs.h b/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
new file mode 100644
index 00000000000000..3dae858508c8e3
--- /dev/null
+++ b/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#ifndef _XE_PCODE_REGS_H_
+#define _XE_PCODE_REGS_H_
+
+#include "regs/xe_reg_defs.h"
+
+/*
+ * This file contains addresses of PCODE registers visible through GT MMIO space.
+ */
+
+#define PVC_GT0_PACKAGE_ENERGY_STATUS           XE_REG(0x281004)
+#define PVC_GT0_PACKAGE_RAPL_LIMIT              XE_REG(0x281008)
+#define PVC_GT0_PACKAGE_POWER_SKU_UNIT          XE_REG(0x281068)
+#define PVC_GT0_PLATFORM_ENERGY_STATUS          XE_REG(0x28106c)
+#define PVC_GT0_PACKAGE_POWER_SKU               XE_REG(0x281080)
+
+#endif /* _XE_PCODE_REGS_H_ */
diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
index 89c6f7f84b5a52..f0194d5121a5dd 100644
--- a/drivers/gpu/drm/xe/xe_hwmon.c
+++ b/drivers/gpu/drm/xe/xe_hwmon.c
@@ -10,6 +10,7 @@
 #include <drm/drm_managed.h>
 #include "regs/xe_gt_regs.h"
 #include "regs/xe_mchbar_regs.h"
+#include "regs/xe_pcode_regs.h"
 #include "xe_device.h"
 #include "xe_gt.h"
 #include "xe_hwmon.h"
@@ -77,32 +78,32 @@ static u32 xe_hwmon_get_reg(struct xe_hwmon *hwmon, enum xe_hwmon_reg hwmon_reg)
 
 	switch (hwmon_reg) {
 	case REG_PKG_RAPL_LIMIT:
-		if (xe->info.platform == XE_DG2)
-			reg = PCU_CR_PACKAGE_RAPL_LIMIT;
-		else if (xe->info.platform == XE_PVC)
+		if (xe->info.platform == XE_PVC)
 			reg = PVC_GT0_PACKAGE_RAPL_LIMIT;
+		else if (xe->info.platform == XE_DG2)
+			reg = PCU_CR_PACKAGE_RAPL_LIMIT;
 		break;
 	case REG_PKG_POWER_SKU:
-		if (xe->info.platform == XE_DG2)
-			reg = PCU_CR_PACKAGE_POWER_SKU;
-		else if (xe->info.platform == XE_PVC)
+		if (xe->info.platform == XE_PVC)
 			reg = PVC_GT0_PACKAGE_POWER_SKU;
+		else if (xe->info.platform == XE_DG2)
+			reg = PCU_CR_PACKAGE_POWER_SKU;
 		break;
 	case REG_PKG_POWER_SKU_UNIT:
-		if (xe->info.platform == XE_DG2)
-			reg = PCU_CR_PACKAGE_POWER_SKU_UNIT;
-		else if (xe->info.platform == XE_PVC)
+		if (xe->info.platform == XE_PVC)
 			reg = PVC_GT0_PACKAGE_POWER_SKU_UNIT;
+		else if (xe->info.platform == XE_DG2)
+			reg = PCU_CR_PACKAGE_POWER_SKU_UNIT;
 		break;
 	case REG_GT_PERF_STATUS:
 		if (xe->info.platform == XE_DG2)
 			reg = GT_PERF_STATUS;
 		break;
 	case REG_PKG_ENERGY_STATUS:
-		if (xe->info.platform == XE_DG2)
-			reg = PCU_CR_PACKAGE_ENERGY_STATUS;
-		else if (xe->info.platform == XE_PVC)
+		if (xe->info.platform == XE_PVC)
 			reg = PVC_GT0_PLATFORM_ENERGY_STATUS;
+		else if (xe->info.platform == XE_DG2)
+			reg = PCU_CR_PACKAGE_ENERGY_STATUS;
 		break;
 	default:
 		drm_warn(&xe->drm, "Unknown xe hwmon reg id: %d\n", hwmon_reg);

From 95ec8c1d6c9a16bdbee09f65d33b6b0c1cd83848 Mon Sep 17 00:00:00 2001
From: Riana Tauro <riana.tauro@intel.com>
Date: Tue, 6 Feb 2024 11:29:17 +0530
Subject: [PATCH 158/200] drm/xe/pm: add debug logs for D3cold

add additional debug logs for PME# capability and
presence of ACPI _PR3 resources. This is to identify
the reason why the card is not capable of D3cold.

No functional changes

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240206055917.2629027-1-riana.tauro@intel.com
---
 drivers/gpu/drm/xe/xe_pm.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 4d3984ae4ff121..ab283e9a8b4e25 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -125,17 +125,26 @@ int xe_pm_resume(struct xe_device *xe)
 	return 0;
 }
 
-static bool xe_pm_pci_d3cold_capable(struct pci_dev *pdev)
+static bool xe_pm_pci_d3cold_capable(struct xe_device *xe)
 {
+	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
 	struct pci_dev *root_pdev;
 
 	root_pdev = pcie_find_root_port(pdev);
 	if (!root_pdev)
 		return false;
 
-	/* D3Cold requires PME capability and _PR3 power resource */
-	if (!pci_pme_capable(root_pdev, PCI_D3cold) || !pci_pr3_present(root_pdev))
+	/* D3Cold requires PME capability */
+	if (!pci_pme_capable(root_pdev, PCI_D3cold)) {
+		drm_dbg(&xe->drm, "d3cold: PME# not supported\n");
 		return false;
+	}
+
+	/* D3Cold requires _PR3 power resource */
+	if (!pci_pr3_present(root_pdev)) {
+		drm_dbg(&xe->drm, "d3cold: ACPI _PR3 not present\n");
+		return false;
+	}
 
 	return true;
 }
@@ -171,15 +180,13 @@ void xe_pm_init_early(struct xe_device *xe)
 
 void xe_pm_init(struct xe_device *xe)
 {
-	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
-
 	/* For now suspend/resume is only allowed with GuC */
 	if (!xe_device_uc_enabled(xe))
 		return;
 
 	drmm_mutex_init(&xe->drm, &xe->d3cold.lock);
 
-	xe->d3cold.capable = xe_pm_pci_d3cold_capable(pdev);
+	xe->d3cold.capable = xe_pm_pci_d3cold_capable(xe);
 
 	if (xe->d3cold.capable) {
 		xe_device_sysfs_init(xe);

From d9890c028d66a9e1ee3cccaa081ab5aedcbfe431 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Mon, 5 Feb 2024 20:50:10 -0800
Subject: [PATCH 159/200] drm/xe: Remove TEST_VM_ASYNC_OPS_ERROR

TEST_VM_ASYNC_OPS_ERROR is broken and unused. Remove for now and will
pull back in a later time when it is used, fixed, and properly hidden
behind a Kconfig option. Also fixup the supported flags value.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240206045010.2981051-1-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_vm.c       | 28 +---------------------------
 drivers/gpu/drm/xe/xe_vm_types.h |  8 --------
 2 files changed, 1 insertion(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 78952a9a15827b..9d2e8088d07e22 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -37,8 +37,6 @@
 #include "generated/xe_wa_oob.h"
 #include "xe_wa.h"
 
-#define TEST_VM_ASYNC_OPS_ERROR
-
 static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
 {
 	return vm->gpuvm.r_obj;
@@ -2062,7 +2060,6 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
 	struct drm_gpuva_ops *ops;
 	struct drm_gpuva_op *__op;
-	struct xe_vma_op *op;
 	struct drm_gpuvm_bo *vm_bo;
 	int err;
 
@@ -2109,15 +2106,6 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 	if (IS_ERR(ops))
 		return ops;
 
-#ifdef TEST_VM_ASYNC_OPS_ERROR
-	if (operation & FORCE_ASYNC_OP_ERROR) {
-		op = list_first_entry_or_null(&ops->list, struct xe_vma_op,
-					      base.entry);
-		if (op)
-			op->inject_error = true;
-	}
-#endif
-
 	drm_gpuva_for_each_op(__op, ops) {
 		struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
@@ -2560,13 +2548,6 @@ static int xe_vma_op_execute(struct xe_vm *vm, struct xe_vma_op *op)
 
 	lockdep_assert_held_write(&vm->lock);
 
-#ifdef TEST_VM_ASYNC_OPS_ERROR
-	if (op->inject_error) {
-		op->inject_error = false;
-		return -ENOMEM;
-	}
-#endif
-
 	switch (op->base.op) {
 	case DRM_GPUVA_OP_MAP:
 		ret = __xe_vma_op_execute(vm, op->map.vma, op);
@@ -2726,16 +2707,9 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
 	return 0;
 }
 
-#ifdef TEST_VM_ASYNC_OPS_ERROR
-#define SUPPORTED_FLAGS	\
-	(FORCE_ASYNC_OP_ERROR | DRM_XE_VM_BIND_FLAG_READONLY | \
-	 DRM_XE_VM_BIND_FLAG_IMMEDIATE | DRM_XE_VM_BIND_FLAG_NULL | 0xffff)
-#else
 #define SUPPORTED_FLAGS	\
 	(DRM_XE_VM_BIND_FLAG_READONLY | \
-	 DRM_XE_VM_BIND_FLAG_IMMEDIATE | DRM_XE_VM_BIND_FLAG_NULL | \
-	 0xffff)
-#endif
+	 DRM_XE_VM_BIND_FLAG_IMMEDIATE | DRM_XE_VM_BIND_FLAG_NULL)
 #define XE_64K_PAGE_MASK 0xffffull
 #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 1fec66ae2eb2dd..5ac9c5bebabc3c 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -21,9 +21,6 @@ struct xe_bo;
 struct xe_sync_entry;
 struct xe_vm;
 
-#define TEST_VM_ASYNC_OPS_ERROR
-#define FORCE_ASYNC_OP_ERROR	BIT(31)
-
 #define XE_VMA_READ_ONLY	DRM_GPUVA_USERBITS
 #define XE_VMA_DESTROYED	(DRM_GPUVA_USERBITS << 1)
 #define XE_VMA_ATOMIC_PTE_BIT	(DRM_GPUVA_USERBITS << 2)
@@ -360,11 +357,6 @@ struct xe_vma_op {
 	/** @flags: operation flags */
 	enum xe_vma_op_flags flags;
 
-#ifdef TEST_VM_ASYNC_OPS_ERROR
-	/** @inject_error: inject error to test async op error handling */
-	bool inject_error;
-#endif
-
 	union {
 		/** @map: VMA map operation specific data */
 		struct xe_vma_op_map map;

From 45883418969c445cdc901e208e190ed1a5d95956 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 1 Feb 2024 14:47:23 -0800
Subject: [PATCH 160/200] drm/xe: Always allow to override firmware

The current logic for firmware selection is reminiscent from i915 where
there are 2 backends and several platforms support only 1: execlist or
GuC. The xe driver has only the GuC backend and it simply doesn't work
without it. Allow developers to override the firmware path even if there
isn't a firmware entry in the table yet: this allows developers to more
easily test the very first firmware before adding it there.

The justification above is only true for GuC, however those override
paths should really be viewed as developer aid param. Simply make the
same logic for all firmwares and allow the override path to be used
for all of them.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201224724.551130-2-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/xe_uc_fw.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c b/drivers/gpu/drm/xe/xe_uc_fw.c
index 4714f2c8d2bae1..a8b203ac7c8141 100644
--- a/drivers/gpu/drm/xe/xe_uc_fw.c
+++ b/drivers/gpu/drm/xe/xe_uc_fw.c
@@ -658,6 +658,7 @@ static int uc_fw_request(struct xe_uc_fw *uc_fw, const struct firmware **firmwar
 	xe_assert(xe, !uc_fw->path);
 
 	uc_fw_auto_select(xe, uc_fw);
+	uc_fw_override(uc_fw);
 	xe_uc_fw_change_status(uc_fw, uc_fw->path ?
 			       XE_UC_FIRMWARE_SELECTED :
 			       XE_UC_FIRMWARE_NOT_SUPPORTED);
@@ -665,8 +666,6 @@ static int uc_fw_request(struct xe_uc_fw *uc_fw, const struct firmware **firmwar
 	if (!xe_uc_fw_is_supported(uc_fw))
 		return 0;
 
-	uc_fw_override(uc_fw);
-
 	/* an empty path means the firmware is disabled */
 	if (!xe_device_uc_enabled(xe) || !(*uc_fw->path)) {
 		xe_uc_fw_change_status(uc_fw, XE_UC_FIRMWARE_DISABLED);

From 6badfc463d609a3db1cd4d13035a8b69c2a6ad7e Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 1 Feb 2024 14:47:24 -0800
Subject: [PATCH 161/200] drm/xe: Avoid cryptic message when there's no GuC
 definition

If there's no GuC firmware entry in the table and the user didn't pass
an override path, the error message is very cryptic: xe will simply try
to continue and then fail when submitting the default context:

	xe 0000:00:02.0: [drm:xe_pci_probe [xe]] XE_LUNARLAKE  64b0:0001 dgfx:0 gfx:Xe2_LPG (20.04) media:Xe2_LPM (20.00) display:no dma_m_s:46 tc:1 gscfi:0
	...
	xe: probe of 0000:00:02.0 failed with error -22

Add an explicit error message and bail out:

	xe 0000:00:02.0: [drm:xe_pci_probe [xe]] XE_LUNARLAKE  64b0:0001 dgfx:0 gfx:Xe2_LPG (20.04) media:Xe2_LPM (20.00) display:no dma_m_s:46 tc:1 gscfi:0
	xe 0000:00:02.0: [drm] *ERROR* No GuC firmware defined for platform
	xe 0000:00:02.0: [drm] *ERROR* GuC init failed with -2
	...
	xe: probe of 0000:00:02.0 failed with error -2

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201224724.551130-3-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/xe_uc_fw.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c b/drivers/gpu/drm/xe/xe_uc_fw.c
index a8b203ac7c8141..a9d25b3fa67cfc 100644
--- a/drivers/gpu/drm/xe/xe_uc_fw.c
+++ b/drivers/gpu/drm/xe/xe_uc_fw.c
@@ -663,8 +663,13 @@ static int uc_fw_request(struct xe_uc_fw *uc_fw, const struct firmware **firmwar
 			       XE_UC_FIRMWARE_SELECTED :
 			       XE_UC_FIRMWARE_NOT_SUPPORTED);
 
-	if (!xe_uc_fw_is_supported(uc_fw))
+	if (!xe_uc_fw_is_supported(uc_fw)) {
+		if (uc_fw->type == XE_UC_FW_TYPE_GUC) {
+			drm_err(&xe->drm, "No GuC firmware defined for platform\n");
+			return -ENOENT;
+		}
 		return 0;
+	}
 
 	/* an empty path means the firmware is disabled */
 	if (!xe_device_uc_enabled(xe) || !(*uc_fw->path)) {

From eb538b5574251a449f40b1ee35efc631228c8992 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Thu, 8 Feb 2024 14:21:15 +0100
Subject: [PATCH 162/200] drm/xe/vm: Avoid reserving zero fences
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function xe_vm_prepare_vma was blindly accepting zero as the
number of fences and forwarded that to drm_exec_prepare_obj.

However, that leads to an out-of-bounds shift in the
dma_resv_reserve_fences() and while one could argue that the
dma_resv code should be robust against that, avoid attempting
to reserve zero fences.

Relevant stack trace:

[773.183188] ------------[ cut here ]------------
[773.183199] UBSAN: shift-out-of-bounds in ../include/linux/log2.h:57:13
[773.183241] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[773.183254] CPU: 2 PID: 1816 Comm: xe_evict Tainted: G     U             6.8.0-rc3-xe #1
[773.183256] Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 2014 10/14/2022
[773.183257] Call Trace:
[773.183258]  <TASK>
[773.183260]  dump_stack_lvl+0xaf/0xd0
[773.183266]  dump_stack+0x10/0x20
[773.183283]  ubsan_epilogue+0x9/0x40
[773.183286]  __ubsan_handle_shift_out_of_bounds+0x10f/0x170
[773.183293]  dma_resv_reserve_fences.cold+0x2b/0x48
[773.183295]  ? ww_mutex_lock+0x3c/0x110
[773.183301]  drm_exec_prepare_obj+0x45/0x60 [drm_exec]
[773.183313]  xe_vm_prepare_vma+0x33/0x70 [xe]
[773.183375]  xe_vma_destroy_unlocked+0x55/0xa0 [xe]
[773.183427]  xe_vm_close_and_put+0x526/0x940 [xe]

Fixes: 2714d5093620 ("drm/xe: Convert pagefaulting code to use drm_exec")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240208132115.3132-1-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_vm.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 9d2e8088d07e22..836a6e849cda82 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -995,9 +995,16 @@ int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
 	int err;
 
 	XE_WARN_ON(!vm);
-	err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), num_shared);
-	if (!err && bo && !bo->vm)
-		err = drm_exec_prepare_obj(exec, &bo->ttm.base, num_shared);
+	if (num_shared)
+		err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), num_shared);
+	else
+		err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
+	if (!err && bo && !bo->vm) {
+		if (num_shared)
+			err = drm_exec_prepare_obj(exec, &bo->ttm.base, num_shared);
+		else
+			err = drm_exec_lock_obj(exec, &bo->ttm.base);
+	}
 
 	return err;
 }

From 82bd83a0cf7ab1e92bd100fb91081a6855bd3545 Mon Sep 17 00:00:00 2001
From: Dani Liberman <dliberman@habana.ai>
Date: Wed, 24 Jan 2024 09:50:58 +0200
Subject: [PATCH 163/200] drm/xe/irq: allocate all possible msix interrupts

If platform supports MSIX, driver needs to allocate all possible
interrupts.

v2:
  - drop msix_cap and use the api return code instead.
  - fix commit message.

v3:
  - pass specific type in irq flags.

Cc: Ohad Sharabi <osharabi@habana.ai>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124075058.2302235-1-dliberman@habana.ai
---
 drivers/gpu/drm/xe/xe_irq.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
index d31e87de5a1c84..17b3affdb6a0cb 100644
--- a/drivers/gpu/drm/xe/xe_irq.c
+++ b/drivers/gpu/drm/xe/xe_irq.c
@@ -683,8 +683,9 @@ static void irq_uninstall(struct drm_device *drm, void *arg)
 int xe_irq_install(struct xe_device *xe)
 {
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+	unsigned int irq_flags = PCI_IRQ_MSIX;
 	irq_handler_t irq_handler;
-	int err, irq;
+	int err, irq, nvec;
 
 	irq_handler = xe_irq_handler(xe);
 	if (!irq_handler) {
@@ -694,7 +695,19 @@ int xe_irq_install(struct xe_device *xe)
 
 	xe_irq_reset(xe);
 
-	err = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI | PCI_IRQ_MSIX);
+	nvec = pci_msix_vec_count(pdev);
+	if (nvec <= 0) {
+		if (nvec == -EINVAL) {
+			/* MSIX capability is not supported in the device, using MSI */
+			irq_flags = PCI_IRQ_MSI;
+			nvec = 1;
+		} else {
+			drm_err(&xe->drm, "MSIX: Failed getting count\n");
+			return nvec;
+		}
+	}
+
+	err = pci_alloc_irq_vectors(pdev, nvec, nvec, irq_flags);
 	if (err < 0) {
 		drm_err(&xe->drm, "MSI/MSIX: Failed to enable support %d\n", err);
 		return err;

From 63fb531fbfda81bda652546a39333b565aea324d Mon Sep 17 00:00:00 2001
From: Matthew Auld <matthew.auld@intel.com>
Date: Mon, 5 Feb 2024 15:31:11 +0000
Subject: [PATCH 164/200] drm/xe/display: fix i915_gem_object_is_shmem()
 wrapper

shmem ensures the memory is cleared on allocation, however here we are
using TTM, which doesn't natively support shmem (other than for swap),
but instead just allocates normal system memory. And we only zero such
memory for userspace allocations. In the case of intel_fbdev we are
missing the memset_io() since display path incorrectly thinks object is
shmem based.

Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240205153110.38340-2-matthew.auld@intel.com
---
 drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
index 68d9f6116bdfc3..777c20ceabab12 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
@@ -10,7 +10,7 @@
 
 #include "xe_bo.h"
 
-#define i915_gem_object_is_shmem(obj) ((obj)->flags & XE_BO_CREATE_SYSTEM_BIT)
+#define i915_gem_object_is_shmem(obj) (0) /* We don't use shmem */
 
 static inline dma_addr_t i915_gem_object_get_dma_address(const struct xe_bo *bo, pgoff_t n)
 {

From f8237c8c6a0e3b42cf6129b6e26327e2a51d4504 Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula@intel.com>
Date: Mon, 12 Feb 2024 16:57:57 +0200
Subject: [PATCH 165/200] drm/xe: use drm based debugging instead of dev
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Prefer drm_dbg() over dev_dbg().

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240212145757.645094-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
---
 drivers/gpu/drm/xe/xe_irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
index 17b3affdb6a0cb..2f5d179e0d00b7 100644
--- a/drivers/gpu/drm/xe/xe_irq.c
+++ b/drivers/gpu/drm/xe/xe_irq.c
@@ -442,7 +442,7 @@ static irqreturn_t dg1_irq_handler(int irq, void *arg)
 		 * irq as device is inaccessible.
 		 */
 		if (master_ctl == REG_GENMASK(31, 0)) {
-			dev_dbg(tile_to_xe(tile)->drm.dev,
+			drm_dbg(&tile_to_xe(tile)->drm,
 				"Ignore this IRQ as device might be in DPC containment.\n");
 			return IRQ_HANDLED;
 		}

From 157261c58b283f5c83e3f9087eca63be8d591ab8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Fri, 9 Feb 2024 12:26:55 +0100
Subject: [PATCH 166/200] drm/xe/pt: Allow for stricter type- and range
 checking
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Distinguish between xe_pt and the xe_pt_dir subclass when
allocating and freeing. Also use a fixed-size array for the
xe_pt_dir page entries to make life easier for dynamic range-
checkers. Finally rename the page-directory child pointer array
to "children".

While no functional change, this fixes ubsan splats similar to:

[   51.463021] ------------[ cut here ]------------
[   51.463022] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/xe/xe_pt.c:47:9
[   51.463023] index 0 is out of range for type 'xe_ptw *[*]'
[   51.463024] CPU: 5 PID: 2778 Comm: xe_vm Tainted: G     U             6.8.0-rc1+ #218
[   51.463026] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
[   51.463027] Call Trace:
[   51.463028]  <TASK>
[   51.463029]  dump_stack_lvl+0x47/0x60
[   51.463030]  __ubsan_handle_out_of_bounds+0x95/0xd0
[   51.463032]  xe_pt_destroy+0xa5/0x150 [xe]
[   51.463088]  __xe_pt_unbind_vma+0x36c/0x9b0 [xe]
[   51.463144]  xe_vm_unbind+0xd8/0x580 [xe]
[   51.463204]  ? drm_exec_prepare_obj+0x3f/0x60 [drm_exec]
[   51.463208]  __xe_vma_op_execute+0x5da/0x910 [xe]
[   51.463268]  ? __drm_gpuvm_sm_unmap+0x1cb/0x220 [drm_gpuvm]
[   51.463272]  ? radix_tree_node_alloc.constprop.0+0x89/0xc0
[   51.463275]  ? drm_gpuva_it_remove+0x1f3/0x2a0 [drm_gpuvm]
[   51.463279]  ? drm_gpuva_remove+0x2f/0xc0 [drm_gpuvm]
[   51.463283]  xe_vm_bind_ioctl+0x1a55/0x20b0 [xe]
[   51.463344]  ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[   51.463414]  drm_ioctl_kernel+0xb6/0x120
[   51.463416]  drm_ioctl+0x287/0x4e0
[   51.463418]  ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[   51.463481]  __x64_sys_ioctl+0x94/0xd0
[   51.463484]  do_syscall_64+0x86/0x170
[   51.463486]  ? syscall_exit_to_user_mode+0x7d/0x200
[   51.463488]  ? do_syscall_64+0x96/0x170
[   51.463490]  ? do_syscall_64+0x96/0x170
[   51.463492]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[   51.463494] RIP: 0033:0x7f246bfe817d
[   51.463498] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[   51.463501] RSP: 002b:00007ffc1bd19ad0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   51.463502] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f246bfe817d
[   51.463504] RDX: 00007ffc1bd19b60 RSI: 0000000040886445 RDI: 0000000000000003
[   51.463505] RBP: 00007ffc1bd19b20 R08: 0000000000000000 R09: 0000000000000000
[   51.463506] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc1bd19b60
[   51.463508] R13: 0000000040886445 R14: 0000000000000003 R15: 0000000000010000
[   51.463510]  </TASK>
[   51.463517] ---[ end trace ]---

v2
- Fix kerneldoc warning (Matthew Brost)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240209112655.4872-1-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_pt.c      | 39 +++++++++++++++++++++------------
 drivers/gpu/drm/xe/xe_pt_walk.c |  2 +-
 drivers/gpu/drm/xe/xe_pt_walk.h | 19 +++-------------
 3 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 3a99bf6e558f79..1e35c752544790 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -20,8 +20,8 @@
 
 struct xe_pt_dir {
 	struct xe_pt pt;
-	/** @dir: Directory structure for the xe_pt_walk functionality */
-	struct xe_ptw_dir dir;
+	/** @children: Array of page-table child nodes */
+	struct xe_ptw *children[XE_PDES];
 };
 
 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM)
@@ -44,7 +44,7 @@ static struct xe_pt_dir *as_xe_pt_dir(struct xe_pt *pt)
 
 static struct xe_pt *xe_pt_entry(struct xe_pt_dir *pt_dir, unsigned int index)
 {
-	return container_of(pt_dir->dir.entries[index], struct xe_pt, base);
+	return container_of(pt_dir->children[index], struct xe_pt, base);
 }
 
 static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
@@ -65,6 +65,14 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
 		XE_PTE_NULL;
 }
 
+static void xe_pt_free(struct xe_pt *pt)
+{
+	if (pt->level)
+		kfree(as_xe_pt_dir(pt));
+	else
+		kfree(pt);
+}
+
 /**
  * xe_pt_create() - Create a page-table.
  * @vm: The vm to create for.
@@ -85,15 +93,19 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
 {
 	struct xe_pt *pt;
 	struct xe_bo *bo;
-	size_t size;
 	int err;
 
-	size = !level ?  sizeof(struct xe_pt) : sizeof(struct xe_pt_dir) +
-		XE_PDES * sizeof(struct xe_ptw *);
-	pt = kzalloc(size, GFP_KERNEL);
+	if (level) {
+		struct xe_pt_dir *dir = kzalloc(sizeof(*dir), GFP_KERNEL);
+
+		pt = (dir) ? &dir->pt : NULL;
+	} else {
+		pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+	}
 	if (!pt)
 		return ERR_PTR(-ENOMEM);
 
+	pt->level = level;
 	bo = xe_bo_create_pin_map(vm->xe, tile, vm, SZ_4K,
 				  ttm_bo_type_kernel,
 				  XE_BO_CREATE_VRAM_IF_DGFX(tile) |
@@ -106,8 +118,7 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
 		goto err_kfree;
 	}
 	pt->bo = bo;
-	pt->level = level;
-	pt->base.dir = level ? &as_xe_pt_dir(pt)->dir : NULL;
+	pt->base.children = level ? as_xe_pt_dir(pt)->children : NULL;
 
 	if (vm->xef)
 		xe_drm_client_add_bo(vm->xef->client, pt->bo);
@@ -116,7 +127,7 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
 	return pt;
 
 err_kfree:
-	kfree(pt);
+	xe_pt_free(pt);
 	return ERR_PTR(err);
 }
 
@@ -193,7 +204,7 @@ void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred)
 					      deferred);
 		}
 	}
-	kfree(pt);
+	xe_pt_free(pt);
 }
 
 /**
@@ -358,7 +369,7 @@ xe_pt_insert_entry(struct xe_pt_stage_bind_walk *xe_walk, struct xe_pt *parent,
 		struct iosys_map *map = &parent->bo->vmap;
 
 		if (unlikely(xe_child))
-			parent->base.dir->entries[offset] = &xe_child->base;
+			parent->base.children[offset] = &xe_child->base;
 
 		xe_pt_write(xe_walk->vm->xe, map, offset, pte);
 		parent->num_live++;
@@ -853,7 +864,7 @@ static void xe_pt_commit_bind(struct xe_vma *vma,
 				xe_pt_destroy(xe_pt_entry(pt_dir, j_),
 					      xe_vma_vm(vma)->flags, deferred);
 
-			pt_dir->dir.entries[j_] = &newpte->base;
+			pt_dir->children[j_] = &newpte->base;
 		}
 		kfree(entries[i].pt_entries);
 	}
@@ -1506,7 +1517,7 @@ xe_pt_commit_unbind(struct xe_vma *vma,
 					xe_pt_destroy(xe_pt_entry(pt_dir, i),
 						      xe_vma_vm(vma)->flags, deferred);
 
-				pt_dir->dir.entries[i] = NULL;
+				pt_dir->children[i] = NULL;
 			}
 		}
 	}
diff --git a/drivers/gpu/drm/xe/xe_pt_walk.c b/drivers/gpu/drm/xe/xe_pt_walk.c
index 8f6c8d063f39f0..b8b3d2aea4923d 100644
--- a/drivers/gpu/drm/xe/xe_pt_walk.c
+++ b/drivers/gpu/drm/xe/xe_pt_walk.c
@@ -74,7 +74,7 @@ int xe_pt_walk_range(struct xe_ptw *parent, unsigned int level,
 		     u64 addr, u64 end, struct xe_pt_walk *walk)
 {
 	pgoff_t offset = xe_pt_offset(addr, level, walk);
-	struct xe_ptw **entries = parent->dir ? parent->dir->entries : NULL;
+	struct xe_ptw **entries = parent->children ? parent->children : NULL;
 	const struct xe_pt_walk_ops *ops = walk->ops;
 	enum page_walk_action action;
 	struct xe_ptw *child;
diff --git a/drivers/gpu/drm/xe/xe_pt_walk.h b/drivers/gpu/drm/xe/xe_pt_walk.h
index ec3d1e9efa6d51..5ecc4d2f0f6536 100644
--- a/drivers/gpu/drm/xe/xe_pt_walk.h
+++ b/drivers/gpu/drm/xe/xe_pt_walk.h
@@ -8,28 +8,15 @@
 #include <linux/pagewalk.h>
 #include <linux/types.h>
 
-struct xe_ptw_dir;
-
 /**
  * struct xe_ptw - base class for driver pagetable subclassing.
- * @dir: Pointer to an array of children if any.
+ * @children: Pointer to an array of children if any.
  *
  * Drivers could subclass this, and if it's a page-directory, typically
- * embed the xe_ptw_dir::entries array in the same allocation.
+ * embed an array of xe_ptw pointers.
  */
 struct xe_ptw {
-	struct xe_ptw_dir *dir;
-};
-
-/**
- * struct xe_ptw_dir - page directory structure
- * @entries: Array holding page directory children.
- *
- * It is the responsibility of the user to ensure @entries is
- * correctly sized.
- */
-struct xe_ptw_dir {
-	struct xe_ptw *entries[0];
+	struct xe_ptw **children;
 };
 
 /**

From a43d5060086e328cda6a0a110fed489a9b867fad Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Tue, 13 Feb 2024 16:43:48 +0100
Subject: [PATCH 167/200] drm/xe/vf: Assume fixed GSM size if VF

VFs can't use size mirrored from PCI config, but it should be
safe to assume it covers full 4GiB GGTT.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213154355.1221-2-michal.wajdeczko@intel.com
---
 drivers/gpu/drm/xe/xe_ggtt.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 6fdf830678b3bd..ab96edb058d6ce 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -20,6 +20,7 @@
 #include "xe_gt_tlb_invalidation.h"
 #include "xe_map.h"
 #include "xe_mmio.h"
+#include "xe_sriov.h"
 #include "xe_wopcm.h"
 
 #define XELPG_GGTT_PTE_PAT0	BIT_ULL(52)
@@ -144,7 +145,11 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
 	unsigned int gsm_size;
 
-	gsm_size = probe_gsm_size(pdev);
+	if (IS_SRIOV_VF(xe))
+		gsm_size = SZ_8M; /* GGTT is expected to be 4GiB */
+	else
+		gsm_size = probe_gsm_size(pdev);
+
 	if (gsm_size == 0) {
 		drm_err(&xe->drm, "Hardware reported no preallocated GSM\n");
 		return -ENOMEM;

From aec14e3370c43ac6041da4f08ef1ebb91bd45060 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Tue, 13 Feb 2024 16:43:49 +0100
Subject: [PATCH 168/200] drm/xe/vf: Don't try to capture engine data
 unavailable to VF

Don't capture engine ring registers as thoe are not available for
the VF driver.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213154355.1221-3-michal.wajdeczko@intel.com
---
 drivers/gpu/drm/xe/xe_hw_engine.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index 0d17e32d70c87d..b5e83ea172f3ff 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -25,6 +25,7 @@
 #include "xe_reg_sr.h"
 #include "xe_rtp.h"
 #include "xe_sched_job.h"
+#include "xe_sriov.h"
 #include "xe_tuning.h"
 #include "xe_uc_fw.h"
 #include "xe_wa.h"
@@ -767,6 +768,10 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe)
 						    hwe->domain);
 	snapshot->mmio_base = hwe->mmio_base;
 
+	/* no more VF accessible data below this point */
+	if (IS_SRIOV_VF(gt_to_xe(hwe->gt)))
+		return snapshot;
+
 	snapshot->reg.ring_execlist_status =
 		hw_engine_mmio_read32(hwe, RING_EXECLIST_STATUS_LO(0));
 	val = hw_engine_mmio_read32(hwe, RING_EXECLIST_STATUS_HI(0));

From 18bc97fb4a0c5920580ee3973cf0b7c6192dc7e9 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Tue, 13 Feb 2024 16:43:50 +0100
Subject: [PATCH 169/200] drm/xe/vf: Don't program MOCS if VF

MOCS programming may only be done by the PF driver.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213154355.1221-4-michal.wajdeczko@intel.com
---
 drivers/gpu/drm/xe/xe_mocs.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_mocs.c b/drivers/gpu/drm/xe/xe_mocs.c
index d205d684d80903..609d997b3e9b09 100644
--- a/drivers/gpu/drm/xe/xe_mocs.c
+++ b/drivers/gpu/drm/xe/xe_mocs.c
@@ -13,6 +13,7 @@
 #include "xe_gt_mcr.h"
 #include "xe_mmio.h"
 #include "xe_platform_types.h"
+#include "xe_sriov.h"
 #include "xe_step_types.h"
 
 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
@@ -539,6 +540,9 @@ void xe_mocs_init(struct xe_gt *gt)
 	struct xe_mocs_info table;
 	unsigned int flags;
 
+	if (IS_SRIOV_VF(gt_to_xe(gt)))
+		return;
+
 	/*
 	 * MOCS settings are split between "GLOB_MOCS" and/or "LNCFCMOCS"
 	 * registers depending on platform.

From 60da62fbe9afdb7f62800600e1095c5a49bb5546 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Tue, 13 Feb 2024 16:43:51 +0100
Subject: [PATCH 170/200] drm/xe/vf: Don't initialize stolen memory manager if
 VF

VF drivers don't have access to the stolen memory.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213154355.1221-5-michal.wajdeczko@intel.com
---
 drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
index e5d7d5e2bec129..662f1e9bfc65df 100644
--- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
@@ -19,6 +19,7 @@
 #include "xe_gt.h"
 #include "xe_mmio.h"
 #include "xe_res_cursor.h"
+#include "xe_sriov.h"
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_ttm_vram_mgr.h"
 #include "xe_wa.h"
@@ -205,7 +206,9 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
 	u64 stolen_size, io_size, pgsize;
 	int err;
 
-	if (IS_DGFX(xe))
+	if (IS_SRIOV_VF(xe))
+		stolen_size = 0;
+	else if (IS_DGFX(xe))
 		stolen_size = detect_bar2_dgfx(xe, mgr);
 	else if (GRAPHICS_VERx100(xe) >= 1270)
 		stolen_size = detect_bar2_integrated(xe, mgr);

From 3ed34c655210090f4105be1ba5ab5f8f1dccefe1 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Tue, 13 Feb 2024 16:43:52 +0100
Subject: [PATCH 171/200] drm/xe/vf: Don't check if LMEM is initialized if VF

It is PF driver responsibility to verify that LMEM was correctly
initialized, also VF drivers don't have access to GU_CNTL register.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213154355.1221-6-michal.wajdeczko@intel.com
---
 drivers/gpu/drm/xe/xe_mmio.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index 5f6b53ea5528b2..e3db3a17876099 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -20,6 +20,7 @@
 #include "xe_gt_mcr.h"
 #include "xe_macros.h"
 #include "xe_module.h"
+#include "xe_sriov.h"
 #include "xe_tile.h"
 
 #define XEHP_MTCFG_ADDR		XE_REG(0x101800)
@@ -363,13 +364,19 @@ static int xe_verify_lmem_ready(struct xe_device *xe)
 {
 	struct xe_gt *gt = xe_root_mmio_gt(xe);
 
+	if (!IS_DGFX(xe))
+		return 0;
+
+	if (IS_SRIOV_VF(xe))
+		return 0;
+
 	/*
 	 * The boot firmware initializes local memory and assesses its health.
 	 * If memory training fails, the punit will have been instructed to
 	 * keep the GT powered down; we won't be able to communicate with it
 	 * and we should not continue with driver initialization.
 	 */
-	if (IS_DGFX(xe) && !(xe_mmio_read32(gt, GU_CNTL) & LMEM_INIT)) {
+	if (!(xe_mmio_read32(gt, GU_CNTL) & LMEM_INIT)) {
 		drm_err(&xe->drm, "VRAM not initialized by firmware\n");
 		return -ENODEV;
 	}

From 602f9ebf321a9442857ff0685c9a6dfb78bf807b Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Tue, 13 Feb 2024 16:43:53 +0100
Subject: [PATCH 172/200] drm/xe/vf: Don't enable hwmon if VF

Registers used by hwmon are not available for VF drivers.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213154355.1221-7-michal.wajdeczko@intel.com
---
 drivers/gpu/drm/xe/xe_hwmon.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
index f0194d5121a5dd..b82233a4160624 100644
--- a/drivers/gpu/drm/xe/xe_hwmon.c
+++ b/drivers/gpu/drm/xe/xe_hwmon.c
@@ -17,6 +17,7 @@
 #include "xe_mmio.h"
 #include "xe_pcode.h"
 #include "xe_pcode_api.h"
+#include "xe_sriov.h"
 
 enum xe_hwmon_reg {
 	REG_PKG_RAPL_LIMIT,
@@ -746,6 +747,10 @@ void xe_hwmon_register(struct xe_device *xe)
 	if (!IS_DGFX(xe))
 		return;
 
+	/* hwmon is not available on VFs */
+	if (IS_SRIOV_VF(xe))
+		return;
+
 	hwmon = devm_kzalloc(dev, sizeof(*hwmon), GFP_KERNEL);
 	if (!hwmon)
 		return;

From 96eb895c7ec62b5ae76ea697770fabe9e48f8107 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Tue, 13 Feb 2024 16:43:54 +0100
Subject: [PATCH 173/200] drm/xe/vf: Don't program PAT if VF

PAT programming can only be done by the PF driver.
Besides VF drivers don't have access to control registers.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213154355.1221-8-michal.wajdeczko@intel.com
---
 drivers/gpu/drm/xe/xe_pat.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index 1ff6bc79e7d444..e148934d554b0d 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -13,6 +13,7 @@
 #include "xe_gt.h"
 #include "xe_gt_mcr.h"
 #include "xe_mmio.h"
+#include "xe_sriov.h"
 
 #define _PAT_ATS				0x47fc
 #define _PAT_INDEX(index)			_PICK_EVEN_2RANGES(index, 8, \
@@ -433,6 +434,10 @@ void xe_pat_init_early(struct xe_device *xe)
 		drm_err(&xe->drm, "Missing PAT table for platform with graphics version %d.%02d!\n",
 			GRAPHICS_VER(xe), GRAPHICS_VERx100(xe) % 100);
 	}
+
+	/* VFs can't program nor dump PAT settings */
+	if (IS_SRIOV_VF(xe))
+		xe->pat.ops = NULL;
 }
 
 void xe_pat_init(struct xe_gt *gt)

From be46d7aacf9e66e1645e781260eeb7a14873f762 Mon Sep 17 00:00:00 2001
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Tue, 13 Feb 2024 16:43:55 +0100
Subject: [PATCH 174/200] drm/xe/vf: Don't support MCR registers if VF

VF drivers can't operate on MCR registers.  Make sure that driver
is not trying to read nor write using any of MCR register.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213154355.1221-9-michal.wajdeczko@intel.com
---
 drivers/gpu/drm/xe/xe_gt_mcr.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_mcr.c b/drivers/gpu/drm/xe/xe_gt_mcr.c
index 8546cd3cc50d1f..a7ab9ba645f996 100644
--- a/drivers/gpu/drm/xe/xe_gt_mcr.c
+++ b/drivers/gpu/drm/xe/xe_gt_mcr.c
@@ -10,6 +10,7 @@
 #include "xe_gt_topology.h"
 #include "xe_gt_types.h"
 #include "xe_mmio.h"
+#include "xe_sriov.h"
 
 /**
  * DOC: GT Multicast/Replicated (MCR) Register Support
@@ -38,6 +39,8 @@
  * ``init_steering_*()`` functions is to apply the platform-specific rules for
  * each MCR register type to identify a steering target that will select a
  * non-terminated instance.
+ *
+ * MCR registers are not available on Virtual Function (VF).
  */
 
 #define STEER_SEMAPHORE		XE_REG(0xFD0)
@@ -352,6 +355,9 @@ void xe_gt_mcr_init(struct xe_gt *gt)
 	BUILD_BUG_ON(IMPLICIT_STEERING + 1 != NUM_STEERING_TYPES);
 	BUILD_BUG_ON(ARRAY_SIZE(xe_steering_types) != NUM_STEERING_TYPES);
 
+	if (IS_SRIOV_VF(xe))
+		return;
+
 	spin_lock_init(&gt->mcr_lock);
 
 	if (gt->info.type == XE_GT_TYPE_MEDIA) {
@@ -405,6 +411,9 @@ void xe_gt_mcr_set_implicit_defaults(struct xe_gt *gt)
 {
 	struct xe_device *xe = gt_to_xe(gt);
 
+	if (IS_SRIOV_VF(xe))
+		return;
+
 	if (xe->info.platform == XE_DG2) {
 		u32 steer_val = REG_FIELD_PREP(MCR_SLICE_MASK, 0) |
 			REG_FIELD_PREP(MCR_SUBSLICE_MASK, 2);
@@ -588,6 +597,8 @@ u32 xe_gt_mcr_unicast_read_any(struct xe_gt *gt, struct xe_reg_mcr reg_mcr)
 	u32 val;
 	bool steer;
 
+	xe_gt_assert(gt, !IS_SRIOV_VF(gt_to_xe(gt)));
+
 	steer = xe_gt_mcr_get_nonterminated_steering(gt, reg_mcr,
 						     &group, &instance);
 
@@ -619,6 +630,8 @@ u32 xe_gt_mcr_unicast_read(struct xe_gt *gt,
 {
 	u32 val;
 
+	xe_gt_assert(gt, !IS_SRIOV_VF(gt_to_xe(gt)));
+
 	mcr_lock(gt);
 	val = rw_with_mcr_steering(gt, reg_mcr, MCR_OP_READ, group, instance, 0);
 	mcr_unlock(gt);
@@ -640,6 +653,8 @@ u32 xe_gt_mcr_unicast_read(struct xe_gt *gt,
 void xe_gt_mcr_unicast_write(struct xe_gt *gt, struct xe_reg_mcr reg_mcr,
 			     u32 value, int group, int instance)
 {
+	xe_gt_assert(gt, !IS_SRIOV_VF(gt_to_xe(gt)));
+
 	mcr_lock(gt);
 	rw_with_mcr_steering(gt, reg_mcr, MCR_OP_WRITE, group, instance, value);
 	mcr_unlock(gt);
@@ -658,6 +673,8 @@ void xe_gt_mcr_multicast_write(struct xe_gt *gt, struct xe_reg_mcr reg_mcr,
 {
 	struct xe_reg reg = to_xe_reg(reg_mcr);
 
+	xe_gt_assert(gt, !IS_SRIOV_VF(gt_to_xe(gt)));
+
 	/*
 	 * Synchronize with any unicast operations.  Once we have exclusive
 	 * access, the MULTICAST bit should already be set, so there's no need

From 9bc36e58d162466236a38489e4b41f38a8848c94 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= <jose.souza@intel.com>
Date: Thu, 8 Feb 2024 10:35:38 -0800
Subject: [PATCH 175/200] drm/xe: Add uAPI to query GuC firmware submission
 version
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Due to a bug in GuC firmware, Mesa can't enable by default the usage of
compute engines in DG2 and newer.

A new GuC firmware fixed the issue but until now there was no way
for Mesa to know if KMD was running with the fixed GuC version or not,
so this uAPI is required.

It may be expanded in future to query other firmware versions too.

This is querying XE_UC_FW_VER_COMPATIBILITY/submission version because
that is also supported by VFs, while XE_UC_FW_VER_RELEASE don't.

i915 uAPI: https://patchwork.freedesktop.org/series/129627/
Mesa usage: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25233

v2:
- fixed drm_xe_query_uc_fw_version documentation
- moved branch_ver as the first version number

Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240208183539.185095-1-jose.souza@intel.com
---
 drivers/gpu/drm/xe/xe_query.c | 44 +++++++++++++++++++++++++++++++++++
 include/uapi/drm/xe_drm.h     | 31 ++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
index 4f1ab91dbec584..92bb06c0586eb4 100644
--- a/drivers/gpu/drm/xe/xe_query.c
+++ b/drivers/gpu/drm/xe/xe_query.c
@@ -516,6 +516,49 @@ static int query_gt_topology(struct xe_device *xe,
 	return 0;
 }
 
+static int
+query_uc_fw_version(struct xe_device *xe, struct drm_xe_device_query *query)
+{
+	struct drm_xe_query_uc_fw_version __user *query_ptr = u64_to_user_ptr(query->data);
+	size_t size = sizeof(struct drm_xe_query_uc_fw_version);
+	struct drm_xe_query_uc_fw_version resp;
+	struct xe_uc_fw_version *version = NULL;
+
+	if (query->size == 0) {
+		query->size = size;
+		return 0;
+	} else if (XE_IOCTL_DBG(xe, query->size != size)) {
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&resp, query_ptr, size))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(xe, resp.pad || resp.pad2 || resp.reserved))
+		return -EINVAL;
+
+	switch (resp.uc_type) {
+	case XE_QUERY_UC_TYPE_GUC_SUBMISSION: {
+		struct xe_guc *guc = &xe->tiles[0].primary_gt->uc.guc;
+
+		version = &guc->fw.versions.found[XE_UC_FW_VER_COMPATIBILITY];
+		break;
+	}
+	default:
+		return -EINVAL;
+	}
+
+	resp.branch_ver = 0;
+	resp.major_ver = version->major;
+	resp.minor_ver = version->minor;
+	resp.patch_ver = version->patch;
+
+	if (copy_to_user(query_ptr, &resp, size))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int (* const xe_query_funcs[])(struct xe_device *xe,
 				      struct drm_xe_device_query *query) = {
 	query_engines,
@@ -525,6 +568,7 @@ static int (* const xe_query_funcs[])(struct xe_device *xe,
 	query_hwconfig,
 	query_gt_topology,
 	query_engine_cycles,
+	query_uc_fw_version,
 };
 
 int xe_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 50bbea0992d9c3..0819959a52227f 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -574,6 +574,36 @@ struct drm_xe_query_engine_cycles {
 	__u64 cpu_delta;
 };
 
+/**
+ * struct drm_xe_query_uc_fw_version - query a micro-controller firmware version
+ *
+ * Given a uc_type this will return the branch, major, minor and patch version
+ * of the micro-controller firmware.
+ */
+struct drm_xe_query_uc_fw_version {
+	/** @uc_type: The micro-controller type to query firmware version */
+#define XE_QUERY_UC_TYPE_GUC_SUBMISSION 0
+	__u16 uc_type;
+
+	/** @pad: MBZ */
+	__u16 pad;
+
+	/** @branch_ver: branch uc fw version */
+	__u32 branch_ver;
+	/** @major_ver: major uc fw version */
+	__u32 major_ver;
+	/** @minor_ver: minor uc fw version */
+	__u32 minor_ver;
+	/** @patch_ver: patch uc fw version */
+	__u32 patch_ver;
+
+	/** @pad2: MBZ */
+	__u32 pad2;
+
+	/** @reserved: Reserved */
+	__u64 reserved;
+};
+
 /**
  * struct drm_xe_device_query - Input of &DRM_IOCTL_XE_DEVICE_QUERY - main
  * structure to query device information
@@ -643,6 +673,7 @@ struct drm_xe_device_query {
 #define DRM_XE_DEVICE_QUERY_HWCONFIG		4
 #define DRM_XE_DEVICE_QUERY_GT_TOPOLOGY		5
 #define DRM_XE_DEVICE_QUERY_ENGINE_CYCLES	6
+#define DRM_XE_DEVICE_QUERY_UC_FW_VERSION	7
 	/** @query: The type of data to query */
 	__u32 query;
 

From 761b333718cf86a01067400950f1cf48f2e375fc Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Mon, 12 Feb 2024 20:32:51 -0800
Subject: [PATCH 176/200] drm/xe: Remove exec queue bind.fence_*

struct xe_exec_queue bind.fence_* members are unused. Remove these.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213043251.3482928-1-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_exec_queue.c       |  4 ----
 drivers/gpu/drm/xe/xe_exec_queue_types.h | 29 ++++++++----------------
 2 files changed, 9 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 2976635be4d396..da84ac93a55931 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -94,10 +94,6 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 		q->parallel.composite_fence_ctx = dma_fence_context_alloc(1);
 		q->parallel.composite_fence_seqno = XE_FENCE_INITIAL_SEQNO;
 	}
-	if (q->flags & EXEC_QUEUE_FLAG_VM) {
-		q->bind.fence_ctx = dma_fence_context_alloc(1);
-		q->bind.fence_seqno = XE_FENCE_INITIAL_SEQNO;
-	}
 
 	return q;
 }
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 648391961fc45e..3df8571e4a0752 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -115,26 +115,15 @@ struct xe_exec_queue {
 		struct list_head link;
 	} persistent;
 
-	union {
-		/**
-		 * @parallel: parallel submission state
-		 */
-		struct {
-			/** @parallel.composite_fence_ctx: context composite fence */
-			u64 composite_fence_ctx;
-			/** @parallel.composite_fence_seqno: seqno for composite fence */
-			u32 composite_fence_seqno;
-		} parallel;
-		/**
-		 * @bind: bind submission state
-		 */
-		struct {
-			/** @bind.fence_ctx: context bind fence */
-			u64 fence_ctx;
-			/** @bind.fence_seqno: seqno for bind fence */
-			u32 fence_seqno;
-		} bind;
-	};
+	/**
+	 * @parallel: parallel submission state
+	 */
+	struct {
+		/** @parallel.composite_fence_ctx: context composite fence */
+		u64 composite_fence_ctx;
+		/** @parallel.composite_fence_seqno: seqno for composite fence */
+		u32 composite_fence_seqno;
+	} parallel;
 
 	/** @sched_props: scheduling properties */
 	struct {

From f2c9364db57992b1496db4ae5e67ab14926be3ec Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd@arndb.de>
Date: Tue, 13 Feb 2024 10:56:48 +0100
Subject: [PATCH 177/200] drm/xe: avoid function cast warnings
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

clang-16 warns about a cast between incompatible function types:

drivers/gpu/drm/xe/xe_range_fence.c:155:10: error: cast from 'void (*)(const void *)' to 'void (*)(struct xe_range_fence *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
  155 |         .free = (void (*)(struct xe_range_fence *rfence)) kfree,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Avoid this with a trivial helper function that calls kfree() here.

v2:
- s/* rfence/*rfence/ (Thomas)

Fixes: 845f64bdbfc9 ("drm/xe: Introduce a range-fence utility")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213095719.454865-1-arnd@kernel.org
---
 drivers/gpu/drm/xe/xe_range_fence.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_range_fence.c b/drivers/gpu/drm/xe/xe_range_fence.c
index d35d9ec58e86f9..372378e89e9892 100644
--- a/drivers/gpu/drm/xe/xe_range_fence.c
+++ b/drivers/gpu/drm/xe/xe_range_fence.c
@@ -151,6 +151,11 @@ xe_range_fence_tree_next(struct xe_range_fence *rfence, u64 start, u64 last)
 	return xe_range_fence_tree_iter_next(rfence, start, last);
 }
 
+static void xe_range_fence_free(struct xe_range_fence *rfence)
+{
+	kfree(rfence);
+}
+
 const struct xe_range_fence_ops xe_range_fence_kfree_ops = {
-	.free = (void (*)(struct xe_range_fence *rfence)) kfree,
+	.free = xe_range_fence_free,
 };

From f1a9abc0cf311375695bede1590364864c05976d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Date: Fri, 9 Feb 2024 12:34:44 +0100
Subject: [PATCH 178/200] drm/xe/uapi: Remove support for persistent
 exec_queues
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Persistent exec_queues delays explicit destruction of exec_queues
until they are done executing, but destruction on process exit
is still immediate. It turns out no UMD is relying on this
functionality, so remove it. If there turns out to be a use-case
in the future, let's re-add.

Persistent exec_queues were never used for LR VMs

v2:
- Don't add an "UNUSED" define for the missing property
  (Lucas, Rodrigo)
v3:
- Remove the remaining struct xe_exec_queue::persistent state
  (Niranjana, Lucas)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240209113444.8396-1-thomas.hellstrom@linux.intel.com
---
 drivers/gpu/drm/xe/xe_device.c           | 39 ------------------------
 drivers/gpu/drm/xe/xe_device.h           |  4 ---
 drivers/gpu/drm/xe/xe_device_types.h     |  8 -----
 drivers/gpu/drm/xe/xe_exec_queue.c       | 36 ++++------------------
 drivers/gpu/drm/xe/xe_exec_queue_types.h | 10 ------
 drivers/gpu/drm/xe/xe_execlist.c         |  2 --
 drivers/gpu/drm/xe/xe_guc_submit.c       |  2 --
 include/uapi/drm/xe_drm.h                |  1 -
 8 files changed, 6 insertions(+), 96 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 5b84d730552029..1cf7561c8b4d92 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -86,9 +86,6 @@ static int xe_file_open(struct drm_device *dev, struct drm_file *file)
 	return 0;
 }
 
-static void device_kill_persistent_exec_queues(struct xe_device *xe,
-					       struct xe_file *xef);
-
 static void xe_file_close(struct drm_device *dev, struct drm_file *file)
 {
 	struct xe_device *xe = to_xe_device(dev);
@@ -105,8 +102,6 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file)
 	mutex_unlock(&xef->exec_queue.lock);
 	xa_destroy(&xef->exec_queue.xa);
 	mutex_destroy(&xef->exec_queue.lock);
-	device_kill_persistent_exec_queues(xe, xef);
-
 	mutex_lock(&xef->vm.lock);
 	xa_for_each(&xef->vm.xa, idx, vm)
 		xe_vm_close_and_put(vm);
@@ -258,9 +253,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 			xa_erase(&xe->usm.asid_to_vm, asid);
 	}
 
-	drmm_mutex_init(&xe->drm, &xe->persistent_engines.lock);
-	INIT_LIST_HEAD(&xe->persistent_engines.list);
-
 	spin_lock_init(&xe->pinned.lock);
 	INIT_LIST_HEAD(&xe->pinned.kernel_bo_present);
 	INIT_LIST_HEAD(&xe->pinned.external_vram);
@@ -599,37 +591,6 @@ void xe_device_shutdown(struct xe_device *xe)
 {
 }
 
-void xe_device_add_persistent_exec_queues(struct xe_device *xe, struct xe_exec_queue *q)
-{
-	mutex_lock(&xe->persistent_engines.lock);
-	list_add_tail(&q->persistent.link, &xe->persistent_engines.list);
-	mutex_unlock(&xe->persistent_engines.lock);
-}
-
-void xe_device_remove_persistent_exec_queues(struct xe_device *xe,
-					     struct xe_exec_queue *q)
-{
-	mutex_lock(&xe->persistent_engines.lock);
-	if (!list_empty(&q->persistent.link))
-		list_del(&q->persistent.link);
-	mutex_unlock(&xe->persistent_engines.lock);
-}
-
-static void device_kill_persistent_exec_queues(struct xe_device *xe,
-					       struct xe_file *xef)
-{
-	struct xe_exec_queue *q, *next;
-
-	mutex_lock(&xe->persistent_engines.lock);
-	list_for_each_entry_safe(q, next, &xe->persistent_engines.list,
-				 persistent.link)
-		if (q->persistent.xef == xef) {
-			xe_exec_queue_kill(q);
-			list_del_init(&q->persistent.link);
-		}
-	mutex_unlock(&xe->persistent_engines.lock);
-}
-
 void xe_device_wmb(struct xe_device *xe)
 {
 	struct xe_gt *gt = xe_root_mmio_gt(xe);
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 462f59e902b12e..14be34d9f5434b 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -42,10 +42,6 @@ int xe_device_probe(struct xe_device *xe);
 void xe_device_remove(struct xe_device *xe);
 void xe_device_shutdown(struct xe_device *xe);
 
-void xe_device_add_persistent_exec_queues(struct xe_device *xe, struct xe_exec_queue *q);
-void xe_device_remove_persistent_exec_queues(struct xe_device *xe,
-					     struct xe_exec_queue *q);
-
 void xe_device_wmb(struct xe_device *xe);
 
 static inline struct xe_file *to_xe_file(const struct drm_file *file)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index eb2b806a1d2337..9785eef2e5a4e6 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -348,14 +348,6 @@ struct xe_device {
 		struct mutex lock;
 	} usm;
 
-	/** @persistent_engines: engines that are closed but still running */
-	struct {
-		/** @persistent_engines.lock: protects persistent engines */
-		struct mutex lock;
-		/** @persistent_engines.list: list of persistent engines */
-		struct list_head list;
-	} persistent_engines;
-
 	/** @pinned: pinned BO state */
 	struct {
 		/** @pinned.lock: protected pinned BO list state */
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index da84ac93a55931..dda90f0c989031 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -60,7 +60,6 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 	q->fence_irq = &gt->fence_irq[hwe->class];
 	q->ring_ops = gt->ring_ops[hwe->class];
 	q->ops = gt->exec_queue_ops;
-	INIT_LIST_HEAD(&q->persistent.link);
 	INIT_LIST_HEAD(&q->compute.link);
 	INIT_LIST_HEAD(&q->multi_gt_link);
 
@@ -375,23 +374,6 @@ static int exec_queue_set_preemption_timeout(struct xe_device *xe,
 	return 0;
 }
 
-static int exec_queue_set_persistence(struct xe_device *xe, struct xe_exec_queue *q,
-				      u64 value, bool create)
-{
-	if (XE_IOCTL_DBG(xe, !create))
-		return -EINVAL;
-
-	if (XE_IOCTL_DBG(xe, xe_vm_in_preempt_fence_mode(q->vm)))
-		return -EINVAL;
-
-	if (value)
-		q->flags |= EXEC_QUEUE_FLAG_PERSISTENT;
-	else
-		q->flags &= ~EXEC_QUEUE_FLAG_PERSISTENT;
-
-	return 0;
-}
-
 static int exec_queue_set_job_timeout(struct xe_device *xe, struct xe_exec_queue *q,
 				      u64 value, bool create)
 {
@@ -465,7 +447,6 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority,
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PREEMPTION_TIMEOUT] = exec_queue_set_preemption_timeout,
-	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PERSISTENCE] = exec_queue_set_persistence,
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_JOB_TIMEOUT] = exec_queue_set_job_timeout,
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_TRIGGER] = exec_queue_set_acc_trigger,
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_NOTIFY] = exec_queue_set_acc_notify,
@@ -492,6 +473,9 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
 		return -EINVAL;
 
 	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
+	if (!exec_queue_set_property_funcs[idx])
+		return -EINVAL;
+
 	return exec_queue_set_property_funcs[idx](xe, q, ext.value,  create);
 }
 
@@ -703,8 +687,7 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 			/* The migration vm doesn't hold rpm ref */
 			xe_device_mem_access_get(xe);
 
-			flags = EXEC_QUEUE_FLAG_PERSISTENT | EXEC_QUEUE_FLAG_VM |
-				(id ? EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD : 0);
+			flags = EXEC_QUEUE_FLAG_VM | (id ? EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD : 0);
 
 			migrate_vm = xe_migrate_get_vm(gt_to_tile(gt)->migrate);
 			new = xe_exec_queue_create(xe, migrate_vm, logical_mask,
@@ -755,9 +738,7 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 		}
 
 		q = xe_exec_queue_create(xe, vm, logical_mask,
-					 args->width, hwe,
-					 xe_vm_in_lr_mode(vm) ? 0 :
-					 EXEC_QUEUE_FLAG_PERSISTENT,
+					 args->width, hwe, 0,
 					 args->extensions);
 		up_read(&vm->lock);
 		xe_vm_put(vm);
@@ -774,8 +755,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 		}
 	}
 
-	q->persistent.xef = xef;
-
 	mutex_lock(&xef->exec_queue.lock);
 	err = xa_alloc(&xef->exec_queue.xa, &id, q, xa_limit_32b, GFP_KERNEL);
 	mutex_unlock(&xef->exec_queue.lock);
@@ -918,10 +897,7 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
 	if (XE_IOCTL_DBG(xe, !q))
 		return -ENOENT;
 
-	if (!(q->flags & EXEC_QUEUE_FLAG_PERSISTENT))
-		xe_exec_queue_kill(q);
-	else
-		xe_device_add_persistent_exec_queues(xe, q);
+	xe_exec_queue_kill(q);
 
 	trace_xe_exec_queue_close(q);
 	xe_exec_queue_put(q);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 3df8571e4a0752..c40240e8806836 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -105,16 +105,6 @@ struct xe_exec_queue {
 		struct xe_guc_exec_queue *guc;
 	};
 
-	/**
-	 * @persistent: persistent exec queue state
-	 */
-	struct {
-		/** @persistent.xef: file which this exec queue belongs to */
-		struct xe_file *xef;
-		/** @persisiten.link: link in list of persistent exec queues */
-		struct list_head link;
-	} persistent;
-
 	/**
 	 * @parallel: parallel submission state
 	 */
diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
index 58dfe6a78ffebe..1788e78caf5c8b 100644
--- a/drivers/gpu/drm/xe/xe_execlist.c
+++ b/drivers/gpu/drm/xe/xe_execlist.c
@@ -378,8 +378,6 @@ static void execlist_exec_queue_fini_async(struct work_struct *w)
 		list_del(&exl->active_link);
 	spin_unlock_irqrestore(&exl->port->lock, flags);
 
-	if (q->flags & EXEC_QUEUE_FLAG_PERSISTENT)
-		xe_device_remove_persistent_exec_queues(xe, q);
 	drm_sched_entity_fini(&exl->entity);
 	drm_sched_fini(&exl->sched);
 	kfree(exl);
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 4744668ef60a93..efee08680aeab1 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1031,8 +1031,6 @@ static void __guc_exec_queue_fini_async(struct work_struct *w)
 
 	if (xe_exec_queue_is_lr(q))
 		cancel_work_sync(&ge->lr_tdr);
-	if (q->flags & EXEC_QUEUE_FLAG_PERSISTENT)
-		xe_device_remove_persistent_exec_queues(gt_to_xe(q->gt), q);
 	release_guc_id(guc, q);
 	xe_sched_entity_fini(&ge->entity);
 	xe_sched_fini(&ge->sched);
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 0819959a52227f..dce06de592d055 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1076,7 +1076,6 @@ struct drm_xe_exec_queue_create {
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY		0
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE		1
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PREEMPTION_TIMEOUT	2
-#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PERSISTENCE		3
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_JOB_TIMEOUT		4
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_TRIGGER		5
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_NOTIFY		6

From 237412e45390805e14a6936fb998d756c4eac9d8 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 18 Jan 2024 16:16:12 -0800
Subject: [PATCH 179/200] drm/xe: Enable 32bits build

Now that all the issues with 32bits are fixed, enable it again.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119001612.2991381-6-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index e36ae1f0d8859f..0e31dfb8989e92 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config DRM_XE
 	tristate "Intel Xe Graphics"
-	depends on DRM && PCI && MMU && (m || (y && KUNIT=y)) && 64BIT
+	depends on DRM && PCI && MMU && (m || (y && KUNIT=y))
 	select INTERVAL_TREE
 	# we need shmfs for the swappable backing store, and in particular
 	# the shmem_readpage() which depends upon tmpfs

From a0df2cc858c309a8bc2e87b4274772587aa25e05 Mon Sep 17 00:00:00 2001
From: Priyanka Dandamudi <priyanka.dandamudi@intel.com>
Date: Tue, 20 Feb 2024 10:17:48 +0530
Subject: [PATCH 180/200] drm/xe/xe_bo_move: Enhance xe_bo_move trace
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Enhanced xe_bo_move trace to be more readable.
It will help to show the migration details.
Src and dst details.

v2: Modify trace_xe_bo_move(), it takes the integer mem_type
rather than a string.
Make mem_type_to_name() extern, it will be used by trace.(Thomas)

v3: Move mem_type_to_name() to xe_bo.[ch] (Thomas, Matt)

v4: Add device details to reduce ambiquity related to vram0/vram1. (Oak)

v5: Rename mem_type_to_name to xe_mem_type_to_name. (Thomas)

v6: Optimised code to use xe_bo_device(__entry->bo). (Thomas)

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Oak Zeng <oak.zeng@intel.com>
Cc: Kempczynski Zbigniew <Zbigniew.Kempczynski@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
Signed-off-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240220044748.948496-1-priyanka.dandamudi@intel.com
---
 drivers/gpu/drm/xe/xe_bo.c         | 11 +++++++++--
 drivers/gpu/drm/xe/xe_bo.h         |  1 +
 drivers/gpu/drm/xe/xe_drm_client.c | 12 ++----------
 drivers/gpu/drm/xe/xe_trace.h      | 27 +++++++++++++++++++++++----
 4 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 686d716c558158..d59b67d43c25ec 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -28,6 +28,14 @@
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_vm.h"
 
+const char *const xe_mem_type_to_name[TTM_NUM_MEM_TYPES]  = {
+	[XE_PL_SYSTEM] = "system",
+	[XE_PL_TT] = "gtt",
+	[XE_PL_VRAM0] = "vram0",
+	[XE_PL_VRAM1] = "vram1",
+	[XE_PL_STOLEN] = "stolen"
+};
+
 static const struct ttm_place sys_placement_flags = {
 	.fpfn = 0,
 	.lpfn = 0,
@@ -727,8 +735,7 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 		migrate = xe->tiles[0].migrate;
 
 	xe_assert(xe, migrate);
-
-	trace_xe_bo_move(bo);
+	trace_xe_bo_move(bo, new_mem->mem_type, old_mem_type);
 	xe_device_mem_access_get(xe);
 
 	if (xe_bo_is_pinned(bo) && !xe_bo_is_user(bo)) {
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index db4b2db6b07307..7e49e670d91e54 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -244,6 +244,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo);
 int xe_bo_restore_pinned(struct xe_bo *bo);
 
 extern const struct ttm_device_funcs xe_ttm_funcs;
+extern const char *const xe_mem_type_to_name[];
 
 int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file);
diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index 82d1305e831f29..6040e4d22b2809 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -131,14 +131,6 @@ static void bo_meminfo(struct xe_bo *bo,
 
 static void show_meminfo(struct drm_printer *p, struct drm_file *file)
 {
-	static const char *const mem_type_to_name[TTM_NUM_MEM_TYPES]  = {
-		[XE_PL_SYSTEM] = "system",
-		[XE_PL_TT] = "gtt",
-		[XE_PL_VRAM0] = "vram0",
-		[XE_PL_VRAM1] = "vram1",
-		[4 ... 6] = NULL,
-		[XE_PL_STOLEN] = "stolen"
-	};
 	struct drm_memory_stats stats[TTM_NUM_MEM_TYPES] = {};
 	struct xe_file *xef = file->driver_priv;
 	struct ttm_device *bdev = &xef->xe->ttm;
@@ -171,7 +163,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
 	spin_unlock(&client->bos_lock);
 
 	for (mem_type = XE_PL_SYSTEM; mem_type < TTM_NUM_MEM_TYPES; ++mem_type) {
-		if (!mem_type_to_name[mem_type])
+		if (!xe_mem_type_to_name[mem_type])
 			continue;
 
 		man = ttm_manager_type(bdev, mem_type);
@@ -182,7 +174,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
 					       DRM_GEM_OBJECT_RESIDENT |
 					       (mem_type != XE_PL_SYSTEM ? 0 :
 					       DRM_GEM_OBJECT_PURGEABLE),
-					       mem_type_to_name[mem_type]);
+					       xe_mem_type_to_name[mem_type]);
 		}
 	}
 }
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index e4e7262191adcb..0cce98a6b14b78 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -12,6 +12,7 @@
 #include <linux/tracepoint.h>
 #include <linux/types.h>
 
+#include "xe_bo.h"
 #include "xe_bo_types.h"
 #include "xe_exec_queue_types.h"
 #include "xe_gpu_scheduler_types.h"
@@ -31,7 +32,7 @@ DECLARE_EVENT_CLASS(xe_gt_tlb_invalidation_fence,
 			     ),
 
 		    TP_fast_assign(
-			   __entry->fence = (unsigned long)fence;
+			   __entry->fence = (u64)fence;
 			   __entry->seqno = fence->seqno;
 			   ),
 
@@ -100,9 +101,27 @@ DEFINE_EVENT(xe_bo, xe_bo_cpu_fault,
 	     TP_ARGS(bo)
 );
 
-DEFINE_EVENT(xe_bo, xe_bo_move,
-	     TP_PROTO(struct xe_bo *bo),
-	     TP_ARGS(bo)
+TRACE_EVENT(xe_bo_move,
+	    TP_PROTO(struct xe_bo *bo, uint32_t new_placement, uint32_t old_placement),
+	    TP_ARGS(bo, new_placement, old_placement),
+	    TP_STRUCT__entry(
+		     __field(struct xe_bo *, bo)
+		     __field(size_t, size)
+		     __field(u32, new_placement)
+		     __field(u32, old_placement)
+		     __array(char, device_id, 12)
+			),
+
+	    TP_fast_assign(
+		   __entry->bo      = bo;
+		   __entry->size = bo->size;
+		   __entry->new_placement = new_placement;
+		   __entry->old_placement = old_placement;
+		   strscpy(__entry->device_id, dev_name(xe_bo_device(__entry->bo)->drm.dev), 12);
+		   ),
+	    TP_printk("migrate object %p [size %zu] from %s to %s device_id:%s",
+		      __entry->bo, __entry->size, xe_mem_type_to_name[__entry->old_placement],
+		      xe_mem_type_to_name[__entry->new_placement], __entry->device_id)
 );
 
 DECLARE_EVENT_CLASS(xe_exec_queue,

From 19adaccef8b246182dc89a7470aa7758245efd5d Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Mon, 19 Feb 2024 13:19:40 -0800
Subject: [PATCH 181/200] drm/xe: Fix xe_vma_set_pte_size
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

xe_vma_set_pte_size had a return value and did not set the 4k VMA flag.
Both of these were incorrect. Fix these.

Fixes: c47794bdd63d ("drm/xe: Set max pte size when skipping rebinds")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219211942.3633795-2-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_vm.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 836a6e849cda82..2b0c76c663e579 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2198,7 +2198,7 @@ static u64 xe_vma_max_pte_size(struct xe_vma *vma)
 	return SZ_1G;	/* Uninitialized, used max size */
 }
 
-static u64 xe_vma_set_pte_size(struct xe_vma *vma, u64 size)
+static void xe_vma_set_pte_size(struct xe_vma *vma, u64 size)
 {
 	switch (size) {
 	case SZ_1G:
@@ -2207,9 +2207,10 @@ static u64 xe_vma_set_pte_size(struct xe_vma *vma, u64 size)
 	case SZ_2M:
 		vma->gpuva.flags |= XE_VMA_PTE_2M;
 		break;
+	case SZ_4K:
+		vma->gpuva.flags |= XE_VMA_PTE_4K;
+		break;
 	}
-
-	return SZ_4K;
 }
 
 static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)

From 15f0e0c2c46dddd8ee56d9b3db679fd302cc4b91 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Mon, 19 Feb 2024 13:19:41 -0800
Subject: [PATCH 182/200] drm/xe: Add XE_VMA_PTE_64K VMA flag
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add XE_VMA_PTE_64K VMA flag to ensure skipping rebinds does not cross
64k page boundaries.

Fixes: 8f33b4f054fc ("drm/xe: Avoid doing rebinds")
Fixes: c47794bdd63d ("drm/xe: Set max pte size when skipping rebinds")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219211942.3633795-3-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_pt.c       | 6 ++++--
 drivers/gpu/drm/xe/xe_vm.c       | 5 +++++
 drivers/gpu/drm/xe/xe_vm_types.h | 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 1e35c752544790..46cea37bdac514 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -499,10 +499,12 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 		 * this device *requires* 64K PTE size for VRAM, fail.
 		 */
 		if (level == 0 && !xe_parent->is_compact) {
-			if (xe_pt_is_pte_ps64K(addr, next, xe_walk))
+			if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) {
+				xe_walk->vma->gpuva.flags |= XE_VMA_PTE_64K;
 				pte |= XE_PTE_PS64;
-			else if (XE_WARN_ON(xe_walk->needs_64K))
+			} else if (XE_WARN_ON(xe_walk->needs_64K)) {
 				return -EINVAL;
+			}
 		}
 
 		ret = xe_pt_insert_entry(xe_walk, xe_parent, offset, NULL, pte);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 2b0c76c663e579..7781290d2b99b1 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2192,6 +2192,8 @@ static u64 xe_vma_max_pte_size(struct xe_vma *vma)
 		return SZ_1G;
 	else if (vma->gpuva.flags & XE_VMA_PTE_2M)
 		return SZ_2M;
+	else if (vma->gpuva.flags & XE_VMA_PTE_64K)
+		return SZ_64K;
 	else if (vma->gpuva.flags & XE_VMA_PTE_4K)
 		return SZ_4K;
 
@@ -2207,6 +2209,9 @@ static void xe_vma_set_pte_size(struct xe_vma *vma, u64 size)
 	case SZ_2M:
 		vma->gpuva.flags |= XE_VMA_PTE_2M;
 		break;
+	case SZ_64K:
+		vma->gpuva.flags |= XE_VMA_PTE_64K;
+		break;
 	case SZ_4K:
 		vma->gpuva.flags |= XE_VMA_PTE_4K;
 		break;
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 5ac9c5bebabc3c..91800ce7084500 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -29,6 +29,7 @@ struct xe_vm;
 #define XE_VMA_PTE_4K		(DRM_GPUVA_USERBITS << 5)
 #define XE_VMA_PTE_2M		(DRM_GPUVA_USERBITS << 6)
 #define XE_VMA_PTE_1G		(DRM_GPUVA_USERBITS << 7)
+#define XE_VMA_PTE_64K		(DRM_GPUVA_USERBITS << 8)
 
 /** struct xe_userptr - User pointer */
 struct xe_userptr {

From 0f688c0eb63a643ef0568b29b12cefbb23181e1a Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Mon, 19 Feb 2024 13:19:42 -0800
Subject: [PATCH 183/200] drm/xe: Return 2MB page size for compact 64k PTEs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Compact 64k PTEs are only intended to be used within a single VMA which
covers the entire 2MB range of the compact 64k PTEs. Add
XE_VMA_PTE_COMPACT VMA flag to indicate compact 64k PTEs are used and
update xe_vma_max_pte_size to return at least 2MB if set.

v2: Include missing changes

Fixes: 8f33b4f054fc ("drm/xe: Avoid doing rebinds")
Fixes: c47794bdd63d ("drm/xe: Set max pte size when skipping rebinds")
Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/758
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219211942.3633795-4-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_pt.c       | 5 ++++-
 drivers/gpu/drm/xe/xe_vm.c       | 2 +-
 drivers/gpu/drm/xe/xe_vm_types.h | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 46cea37bdac514..7f54bc3e389d58 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -547,13 +547,16 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 		*child = &xe_child->base;
 
 		/*
-		 * Prefer the compact pagetable layout for L0 if possible.
+		 * Prefer the compact pagetable layout for L0 if possible. Only
+		 * possible if VMA covers entire 2MB region as compact 64k and
+		 * 4k pages cannot be mixed within a 2MB region.
 		 * TODO: Suballocate the pt bo to avoid wasting a lot of
 		 * memory.
 		 */
 		if (GRAPHICS_VERx100(tile_to_xe(xe_walk->tile)) >= 1250 && level == 1 &&
 		    covers && xe_pt_scan_64K(addr, next, xe_walk)) {
 			walk->shifts = xe_compact_pt_shifts;
+			xe_walk->vma->gpuva.flags |= XE_VMA_PTE_COMPACT;
 			flags |= XE_PDE_64K;
 			xe_child->is_compact = true;
 		}
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 7781290d2b99b1..23a44ef85aa4bd 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2190,7 +2190,7 @@ static u64 xe_vma_max_pte_size(struct xe_vma *vma)
 {
 	if (vma->gpuva.flags & XE_VMA_PTE_1G)
 		return SZ_1G;
-	else if (vma->gpuva.flags & XE_VMA_PTE_2M)
+	else if (vma->gpuva.flags & (XE_VMA_PTE_2M | XE_VMA_PTE_COMPACT))
 		return SZ_2M;
 	else if (vma->gpuva.flags & XE_VMA_PTE_64K)
 		return SZ_64K;
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 91800ce7084500..a603cc2eb56b3f 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -30,6 +30,7 @@ struct xe_vm;
 #define XE_VMA_PTE_2M		(DRM_GPUVA_USERBITS << 6)
 #define XE_VMA_PTE_1G		(DRM_GPUVA_USERBITS << 7)
 #define XE_VMA_PTE_64K		(DRM_GPUVA_USERBITS << 8)
+#define XE_VMA_PTE_COMPACT	(DRM_GPUVA_USERBITS << 9)
 
 /** struct xe_userptr - User pointer */
 struct xe_userptr {

From a44bbace73dfb56a83d8dd5a6f2181d9d181522b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Winiarski?= <michal.winiarski@intel.com>
Date: Mon, 19 Feb 2024 14:05:27 +0100
Subject: [PATCH 184/200] drm/xe/guc: Allocate GuC data structures in system
 memory for initial load
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

GuC load will need to happen at an earlier point in probe, where local
memory is not yet available. Use system memory for GuC data structures
used for initial "hwconfig" load, and realloc at a later,
"post-hwconfig" load if needed, when local memory is available.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-1-michal.winiarski@intel.com
---
 drivers/gpu/drm/xe/xe_bo.c           | 32 ++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_bo.h           |  1 +
 drivers/gpu/drm/xe/xe_guc.c          | 34 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_ads.c      |  2 +-
 drivers/gpu/drm/xe/xe_guc_ct.c       |  2 +-
 drivers/gpu/drm/xe/xe_guc_hwconfig.c |  2 +-
 drivers/gpu/drm/xe/xe_guc_log.c      |  2 +-
 7 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 9d7ad67bd7b4c0..76dfaf1cd20080 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1612,6 +1612,38 @@ struct xe_bo *xe_managed_bo_create_from_data(struct xe_device *xe, struct xe_til
 	return bo;
 }
 
+/**
+ * xe_managed_bo_reinit_in_vram
+ * @xe: xe device
+ * @tile: Tile where the new buffer will be created
+ * @src: Managed buffer object allocated in system memory
+ *
+ * Replace a managed src buffer object allocated in system memory with a new
+ * one allocated in vram, copying the data between them.
+ * Buffer object in VRAM is not going to have the same GGTT address, the caller
+ * is responsible for making sure that any old references to it are updated.
+ *
+ * Returns 0 for success, negative error code otherwise.
+ */
+int xe_managed_bo_reinit_in_vram(struct xe_device *xe, struct xe_tile *tile, struct xe_bo **src)
+{
+	struct xe_bo *bo;
+
+	xe_assert(xe, IS_DGFX(xe));
+	xe_assert(xe, !(*src)->vmap.is_iomem);
+
+	bo = xe_managed_bo_create_from_data(xe, tile, (*src)->vmap.vaddr, (*src)->size,
+					    XE_BO_CREATE_VRAM_IF_DGFX(tile) |
+					    XE_BO_CREATE_GGTT_BIT);
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
+
+	drmm_release_action(&xe->drm, __xe_bo_unpin_map_no_vm, *src);
+	*src = bo;
+
+	return 0;
+}
+
 /*
  * XXX: This is in the VM bind data path, likely should calculate this once and
  * store, with a recalculation if the BO is moved.
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 7e49e670d91e54..c59ad15961ce55 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -129,6 +129,7 @@ struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile
 					   size_t size, u32 flags);
 struct xe_bo *xe_managed_bo_create_from_data(struct xe_device *xe, struct xe_tile *tile,
 					     const void *data, size_t size, u32 flags);
+int xe_managed_bo_reinit_in_vram(struct xe_device *xe, struct xe_tile *tile, struct xe_bo **src);
 
 int xe_bo_placement_for_flags(struct xe_device *xe, struct xe_bo *bo,
 			      u32 bo_flags);
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 868208a3982951..8c10546e387dfb 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -272,6 +272,34 @@ void xe_guc_comm_init_early(struct xe_guc *guc)
 		guc->notify_reg = GUC_HOST_INTERRUPT;
 }
 
+static int xe_guc_realloc_post_hwconfig(struct xe_guc *guc)
+{
+	struct xe_tile *tile = gt_to_tile(guc_to_gt(guc));
+	struct xe_device *xe = guc_to_xe(guc);
+	int ret;
+
+	if (!IS_DGFX(guc_to_xe(guc)))
+		return 0;
+
+	ret = xe_managed_bo_reinit_in_vram(xe, tile, &guc->fw.bo);
+	if (ret)
+		return ret;
+
+	ret = xe_managed_bo_reinit_in_vram(xe, tile, &guc->log.bo);
+	if (ret)
+		return ret;
+
+	ret = xe_managed_bo_reinit_in_vram(xe, tile, &guc->ads.bo);
+	if (ret)
+		return ret;
+
+	ret = xe_managed_bo_reinit_in_vram(xe, tile, &guc->ct.bo);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
 int xe_guc_init(struct xe_guc *guc)
 {
 	struct xe_device *xe = guc_to_xe(guc);
@@ -331,6 +359,12 @@ int xe_guc_init(struct xe_guc *guc)
  */
 int xe_guc_init_post_hwconfig(struct xe_guc *guc)
 {
+	int ret;
+
+	ret = xe_guc_realloc_post_hwconfig(guc);
+	if (ret)
+		return ret;
+
 	guc_init_params_post_hwconfig(guc);
 
 	return xe_guc_ads_init_post_hwconfig(&guc->ads);
diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
index 390e6f1bf4e1c8..6ad4c1a90a787b 100644
--- a/drivers/gpu/drm/xe/xe_guc_ads.c
+++ b/drivers/gpu/drm/xe/xe_guc_ads.c
@@ -273,7 +273,7 @@ int xe_guc_ads_init(struct xe_guc_ads *ads)
 	ads->regset_size = calculate_regset_size(gt);
 
 	bo = xe_managed_bo_create_pin_map(xe, tile, guc_ads_size(ads) + MAX_GOLDEN_LRC_SIZE,
-					  XE_BO_CREATE_VRAM_IF_DGFX(tile) |
+					  XE_BO_CREATE_SYSTEM_BIT |
 					  XE_BO_CREATE_GGTT_BIT);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index f3d356383cedfd..355edd4d758af7 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -155,7 +155,7 @@ int xe_guc_ct_init(struct xe_guc_ct *ct)
 	primelockdep(ct);
 
 	bo = xe_managed_bo_create_pin_map(xe, tile, guc_ct_size(),
-					  XE_BO_CREATE_VRAM_IF_DGFX(tile) |
+					  XE_BO_CREATE_SYSTEM_BIT |
 					  XE_BO_CREATE_GGTT_BIT);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
diff --git a/drivers/gpu/drm/xe/xe_guc_hwconfig.c b/drivers/gpu/drm/xe/xe_guc_hwconfig.c
index 2a13a00917f8cd..ea49f3885c108d 100644
--- a/drivers/gpu/drm/xe/xe_guc_hwconfig.c
+++ b/drivers/gpu/drm/xe/xe_guc_hwconfig.c
@@ -78,7 +78,7 @@ int xe_guc_hwconfig_init(struct xe_guc *guc)
 		return -EINVAL;
 
 	bo = xe_managed_bo_create_pin_map(xe, tile, PAGE_ALIGN(size),
-					  XE_BO_CREATE_VRAM_IF_DGFX(tile) |
+					  XE_BO_CREATE_SYSTEM_BIT |
 					  XE_BO_CREATE_GGTT_BIT);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c
index bcd2f4d34081da..45135c3520e549 100644
--- a/drivers/gpu/drm/xe/xe_guc_log.c
+++ b/drivers/gpu/drm/xe/xe_guc_log.c
@@ -84,7 +84,7 @@ int xe_guc_log_init(struct xe_guc_log *log)
 	struct xe_bo *bo;
 
 	bo = xe_managed_bo_create_pin_map(xe, tile, guc_log_size(),
-					  XE_BO_CREATE_VRAM_IF_DGFX(tile) |
+					  XE_BO_CREATE_SYSTEM_BIT |
 					  XE_BO_CREATE_GGTT_BIT);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);

From 7606f7d0f069c0fbb033b52e898c437c3aa13f32 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Winiarski?= <michal.winiarski@intel.com>
Date: Mon, 19 Feb 2024 14:05:28 +0100
Subject: [PATCH 185/200] drm/xe/huc: Realloc HuC FW in vram for post-hwconfig
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Similar to GuC, we're using system memory for the initial stage, and
move the image to vram when it's available for subsequent loads (e.g.
after reset).

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-2-michal.winiarski@intel.com
---
 drivers/gpu/drm/xe/xe_huc.c | 19 +++++++++++++++++++
 drivers/gpu/drm/xe/xe_huc.h |  1 +
 drivers/gpu/drm/xe/xe_uc.c  |  4 ++++
 3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_huc.c b/drivers/gpu/drm/xe/xe_huc.c
index eca109791c6ae6..b545f850087cd8 100644
--- a/drivers/gpu/drm/xe/xe_huc.c
+++ b/drivers/gpu/drm/xe/xe_huc.c
@@ -112,6 +112,25 @@ int xe_huc_init(struct xe_huc *huc)
 	return ret;
 }
 
+int xe_huc_init_post_hwconfig(struct xe_huc *huc)
+{
+	struct xe_tile *tile = gt_to_tile(huc_to_gt(huc));
+	struct xe_device *xe = huc_to_xe(huc);
+	int ret;
+
+	if (!IS_DGFX(huc_to_xe(huc)))
+		return 0;
+
+	if (!xe_uc_fw_is_loadable(&huc->fw))
+		return 0;
+
+	ret = xe_managed_bo_reinit_in_vram(xe, tile, &huc->fw.bo);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
 int xe_huc_upload(struct xe_huc *huc)
 {
 	if (!xe_uc_fw_is_loadable(&huc->fw))
diff --git a/drivers/gpu/drm/xe/xe_huc.h b/drivers/gpu/drm/xe/xe_huc.h
index 532017230287ff..3ab56cc14b00a7 100644
--- a/drivers/gpu/drm/xe/xe_huc.h
+++ b/drivers/gpu/drm/xe/xe_huc.h
@@ -17,6 +17,7 @@ enum xe_huc_auth_types {
 };
 
 int xe_huc_init(struct xe_huc *huc);
+int xe_huc_init_post_hwconfig(struct xe_huc *huc);
 int xe_huc_upload(struct xe_huc *huc);
 int xe_huc_auth(struct xe_huc *huc, enum xe_huc_auth_types type);
 bool xe_huc_is_authenticated(struct xe_huc *huc, enum xe_huc_auth_types type);
diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
index 8f37a809525fb3..d62137306d2808 100644
--- a/drivers/gpu/drm/xe/xe_uc.c
+++ b/drivers/gpu/drm/xe/xe_uc.c
@@ -94,6 +94,10 @@ int xe_uc_init_post_hwconfig(struct xe_uc *uc)
 	if (err)
 		return err;
 
+	err = xe_huc_init_post_hwconfig(&uc->huc);
+	if (err)
+		return err;
+
 	return xe_gsc_init_post_hwconfig(&uc->gsc);
 }
 

From 8a4587ef9f952105d1d5a7ffcdee848219cdc743 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Winiarski?= <michal.winiarski@intel.com>
Date: Mon, 19 Feb 2024 14:05:29 +0100
Subject: [PATCH 186/200] drm/xe/guc: Move GuC power control init to
 "post-hwconfig"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

SLPC is not used at "hwconfig" stage. Move the initialization of data
structures used for SLPC to a later point in probe.
Also - move the xe_guc_pc_init_early to happen just prior to initial
"hwconfig" load.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-3-michal.winiarski@intel.com
---
 drivers/gpu/drm/xe/xe_gt.c     |  3 ---
 drivers/gpu/drm/xe/xe_guc.c    | 12 +++++++-----
 drivers/gpu/drm/xe/xe_guc_pc.c | 19 +++++++++++++++----
 drivers/gpu/drm/xe/xe_guc_pc.h |  1 -
 4 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index c79edfd5a34fb9..fdfb70c7db9f8b 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -368,9 +368,6 @@ static int gt_fw_domain_init(struct xe_gt *gt)
 	if (err)
 		goto err_force_wake;
 
-	/* Raise GT freq to speed up HuC/GuC load */
-	xe_guc_pc_init_early(&gt->uc.guc.pc);
-
 	err = xe_uc_init_hwconfig(&gt->uc);
 	if (err)
 		goto err_force_wake;
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 8c10546e387dfb..68089556c14096 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -251,7 +251,6 @@ static void guc_fini(struct drm_device *drm, void *arg)
 	struct xe_guc *guc = arg;
 
 	xe_force_wake_get(gt_to_fw(guc_to_gt(guc)), XE_FORCEWAKE_ALL);
-	xe_guc_pc_fini(&guc->pc);
 	xe_uc_fini_hw(&guc_to_gt(guc)->uc);
 	xe_force_wake_put(gt_to_fw(guc_to_gt(guc)), XE_FORCEWAKE_ALL);
 }
@@ -330,10 +329,6 @@ int xe_guc_init(struct xe_guc *guc)
 	if (ret)
 		goto out;
 
-	ret = xe_guc_pc_init(&guc->pc);
-	if (ret)
-		goto out;
-
 	ret = drmm_add_action_or_reset(&gt_to_xe(gt)->drm, guc_fini, guc);
 	if (ret)
 		goto out;
@@ -367,6 +362,10 @@ int xe_guc_init_post_hwconfig(struct xe_guc *guc)
 
 	guc_init_params_post_hwconfig(guc);
 
+	ret = xe_guc_pc_init(&guc->pc);
+	if (ret)
+		return ret;
+
 	return xe_guc_ads_init_post_hwconfig(&guc->ads);
 }
 
@@ -574,6 +573,9 @@ int xe_guc_min_load_for_hwconfig(struct xe_guc *guc)
 
 	xe_guc_ads_populate_minimal(&guc->ads);
 
+	/* Raise GT freq to speed up HuC/GuC load */
+	xe_guc_pc_init_early(&guc->pc);
+
 	ret = __xe_guc_upload(guc);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
index d91702592520af..2839d685631bc2 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.c
+++ b/drivers/gpu/drm/xe/xe_guc_pc.c
@@ -956,10 +956,12 @@ int xe_guc_pc_stop(struct xe_guc_pc *pc)
 
 /**
  * xe_guc_pc_fini - Finalize GuC's Power Conservation component
- * @pc: Xe_GuC_PC instance
+ * @drm: DRM device
+ * @arg: opaque pointer that should point to Xe_GuC_PC instance
  */
-void xe_guc_pc_fini(struct xe_guc_pc *pc)
+static void xe_guc_pc_fini(struct drm_device *drm, void *arg)
 {
+	struct xe_guc_pc *pc = arg;
 	struct xe_device *xe = pc_to_xe(pc);
 
 	if (xe->info.skip_guc_pc) {
@@ -969,9 +971,10 @@ void xe_guc_pc_fini(struct xe_guc_pc *pc)
 		return;
 	}
 
+	xe_force_wake_get(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL);
 	XE_WARN_ON(xe_guc_pc_gucrc_disable(pc));
 	XE_WARN_ON(xe_guc_pc_stop(pc));
-	mutex_destroy(&pc->freq_lock);
+	xe_force_wake_put(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL);
 }
 
 /**
@@ -985,11 +988,14 @@ int xe_guc_pc_init(struct xe_guc_pc *pc)
 	struct xe_device *xe = gt_to_xe(gt);
 	struct xe_bo *bo;
 	u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data));
+	int err;
 
 	if (xe->info.skip_guc_pc)
 		return 0;
 
-	mutex_init(&pc->freq_lock);
+	err = drmm_mutex_init(&xe->drm, &pc->freq_lock);
+	if (err)
+		return err;
 
 	bo = xe_managed_bo_create_pin_map(xe, tile, size,
 					  XE_BO_CREATE_VRAM_IF_DGFX(tile) |
@@ -998,5 +1004,10 @@ int xe_guc_pc_init(struct xe_guc_pc *pc)
 		return PTR_ERR(bo);
 
 	pc->bo = bo;
+
+	err = drmm_add_action_or_reset(&xe->drm, xe_guc_pc_fini, pc);
+	if (err)
+		return err;
+
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_guc_pc.h b/drivers/gpu/drm/xe/xe_guc_pc.h
index cecad8e9300b4a..d3680d89490ee1 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.h
+++ b/drivers/gpu/drm/xe/xe_guc_pc.h
@@ -9,7 +9,6 @@
 #include "xe_guc_pc_types.h"
 
 int xe_guc_pc_init(struct xe_guc_pc *pc);
-void xe_guc_pc_fini(struct xe_guc_pc *pc);
 int xe_guc_pc_start(struct xe_guc_pc *pc);
 int xe_guc_pc_stop(struct xe_guc_pc *pc);
 int xe_guc_pc_gucrc_disable(struct xe_guc_pc *pc);

From bf8ec3c3e82c70b39244ccde96a875773c1fc620 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Winiarski?= <michal.winiarski@intel.com>
Date: Mon, 19 Feb 2024 14:05:30 +0100
Subject: [PATCH 187/200] drm/xe: Initialize GuC earlier during probe
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

SR-IOV VF has limited access to MMIO registers. Fortunately, it is able
to access a curated subset that is needed to initialize the driver by
communicating with SR-IOV PF using GuC CT.
Initialize GuC earlier in order to keep the unified probe ordering
between VF and PF modes.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-4-michal.winiarski@intel.com
---
 drivers/gpu/drm/xe/xe_device.c |  6 ++++
 drivers/gpu/drm/xe/xe_gt.c     | 57 +++++++++++++++++++++++-----------
 drivers/gpu/drm/xe/xe_gt.h     |  1 +
 drivers/gpu/drm/xe/xe_uc.c     | 10 ++++--
 4 files changed, 54 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 1cf7561c8b4d92..ca85e81fdb4438 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -463,6 +463,12 @@ int xe_device_probe(struct xe_device *xe)
 		}
 	}
 
+	for_each_gt(gt, xe, id) {
+		err = xe_gt_init_hwconfig(gt);
+		if (err)
+			return err;
+	}
+
 	err = drmm_add_action_or_reset(&xe->drm, xe_driver_flr_fini, xe);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index fdfb70c7db9f8b..b75f0bf0a9a10f 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -315,7 +315,6 @@ int xe_gt_init_early(struct xe_gt *gt)
 		return err;
 
 	xe_gt_topology_init(gt);
-	xe_gt_mcr_init(gt);
 
 	err = xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
 	if (err)
@@ -354,8 +353,6 @@ static int gt_fw_domain_init(struct xe_gt *gt)
 	if (err)
 		goto err_hw_fence_irq;
 
-	xe_pat_init(gt);
-
 	if (!xe_gt_is_media_type(gt)) {
 		err = xe_ggtt_init(gt_to_tile(gt)->mem.ggtt);
 		if (err)
@@ -364,19 +361,8 @@ static int gt_fw_domain_init(struct xe_gt *gt)
 			xe_lmtt_init(&gt_to_tile(gt)->sriov.pf.lmtt);
 	}
 
-	err = xe_uc_init(&gt->uc);
-	if (err)
-		goto err_force_wake;
-
-	err = xe_uc_init_hwconfig(&gt->uc);
-	if (err)
-		goto err_force_wake;
-
 	xe_gt_idle_sysfs_init(&gt->gtidle);
 
-	/* XXX: Fake that we pull the engine mask from hwconfig blob */
-	gt->info.engine_mask = gt->info.__engine_mask;
-
 	/* Enable per hw engine IRQs */
 	xe_irq_enable_hwe(gt);
 
@@ -444,10 +430,6 @@ static int all_fw_domain_init(struct xe_gt *gt)
 	if (err)
 		goto err_force_wake;
 
-	err = xe_uc_init_post_hwconfig(&gt->uc);
-	if (err)
-		goto err_force_wake;
-
 	if (!xe_gt_is_media_type(gt)) {
 		/*
 		 * USM has its only SA pool to non-block behind user operations
@@ -474,6 +456,10 @@ static int all_fw_domain_init(struct xe_gt *gt)
 		}
 	}
 
+	err = xe_uc_init_post_hwconfig(&gt->uc);
+	if (err)
+		goto err_force_wake;
+
 	err = xe_uc_init_hw(&gt->uc);
 	if (err)
 		goto err_force_wake;
@@ -503,6 +489,41 @@ static int all_fw_domain_init(struct xe_gt *gt)
 	return err;
 }
 
+/*
+ * Initialize enough GT to be able to load GuC in order to obtain hwconfig and
+ * enable CTB communication.
+ */
+int xe_gt_init_hwconfig(struct xe_gt *gt)
+{
+	int err;
+
+	xe_device_mem_access_get(gt_to_xe(gt));
+	err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
+	if (err)
+		goto out;
+
+	xe_gt_mcr_init(gt);
+	xe_pat_init(gt);
+
+	err = xe_uc_init(&gt->uc);
+	if (err)
+		goto out_fw;
+
+	err = xe_uc_init_hwconfig(&gt->uc);
+	if (err)
+		goto out_fw;
+
+	/* XXX: Fake that we pull the engine mask from hwconfig blob */
+	gt->info.engine_mask = gt->info.__engine_mask;
+
+out_fw:
+	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
+out:
+	xe_device_mem_access_put(gt_to_xe(gt));
+
+	return err;
+}
+
 int xe_gt_init(struct xe_gt *gt)
 {
 	int err;
diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
index c1675bd44cf6d6..ed6ea8057e35fe 100644
--- a/drivers/gpu/drm/xe/xe_gt.h
+++ b/drivers/gpu/drm/xe/xe_gt.h
@@ -33,6 +33,7 @@ static inline bool xe_fault_inject_gt_reset(void)
 #endif
 
 struct xe_gt *xe_gt_alloc(struct xe_tile *tile);
+int xe_gt_init_hwconfig(struct xe_gt *gt);
 int xe_gt_init_early(struct xe_gt *gt);
 int xe_gt_init(struct xe_gt *gt);
 int xe_gt_record_default_lrcs(struct xe_gt *gt);
diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
index d62137306d2808..7033f8c1b4318a 100644
--- a/drivers/gpu/drm/xe/xe_uc.c
+++ b/drivers/gpu/drm/xe/xe_uc.c
@@ -32,13 +32,15 @@ uc_to_xe(struct xe_uc *uc)
 /* Should be called once at driver load only */
 int xe_uc_init(struct xe_uc *uc)
 {
+	struct xe_device *xe = uc_to_xe(uc);
 	int ret;
 
+	xe_device_mem_access_get(xe);
+
 	/*
 	 * We call the GuC/HuC/GSC init functions even if GuC submission is off
 	 * to correctly move our tracking of the FW state to "disabled".
 	 */
-
 	ret = xe_guc_init(&uc->guc);
 	if (ret)
 		goto err;
@@ -52,7 +54,7 @@ int xe_uc_init(struct xe_uc *uc)
 		goto err;
 
 	if (!xe_device_uc_enabled(uc_to_xe(uc)))
-		return 0;
+		goto err;
 
 	ret = xe_wopcm_init(&uc->wopcm);
 	if (ret)
@@ -66,9 +68,13 @@ int xe_uc_init(struct xe_uc *uc)
 	if (ret)
 		goto err;
 
+	xe_device_mem_access_put(xe);
+
 	return 0;
 
 err:
+	xe_device_mem_access_put(xe);
+
 	return ret;
 }
 

From 69a5f1774adda6c8c0c6e751f1f66aa353d36463 Mon Sep 17 00:00:00 2001
From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date: Sun, 14 Jan 2024 16:09:16 +0100
Subject: [PATCH 188/200] drm/xe/guc: Remove usage of the deprecated
 ida_simple_xx() API

ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().

Note that the upper limit of ida_simple_get() is exclusive, but the one of
ida_alloc_max() is inclusive. So a -1 has been added when needed.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/d6a9ec9dc426fca372eaa1423a83632bd743c5d9.1705244938.git.christophe.jaillet@wanadoo.fr
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index bbcd47737a5972..3ac51162cfa723 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -312,7 +312,7 @@ static void __release_guc_id(struct xe_guc *guc, struct xe_exec_queue *q, u32 xa
 				      q->guc->id - GUC_ID_START_MLRC,
 				      order_base_2(q->width));
 	else
-		ida_simple_remove(&guc->submission_state.guc_ids, q->guc->id);
+		ida_free(&guc->submission_state.guc_ids, q->guc->id);
 }
 
 static int alloc_guc_id(struct xe_guc *guc, struct xe_exec_queue *q)
@@ -336,8 +336,8 @@ static int alloc_guc_id(struct xe_guc *guc, struct xe_exec_queue *q)
 		ret = bitmap_find_free_region(bitmap, GUC_ID_NUMBER_MLRC,
 					      order_base_2(q->width));
 	} else {
-		ret = ida_simple_get(&guc->submission_state.guc_ids, 0,
-				     GUC_ID_NUMBER_SLRC, GFP_NOWAIT);
+		ret = ida_alloc_max(&guc->submission_state.guc_ids,
+				    GUC_ID_NUMBER_SLRC - 1, GFP_NOWAIT);
 	}
 	if (ret < 0)
 		return ret;

From e5626eb80026c4b63f8682cdeca1456303c65791 Mon Sep 17 00:00:00 2001
From: Ashutosh Dixit <ashutosh.dixit@intel.com>
Date: Tue, 6 Feb 2024 11:27:31 -0800
Subject: [PATCH 189/200] drm/xe/xe_gt_idle: Drop redundant newline in name

Newline in name is redunant and produces an unnecessary empty line during
'cat name'. Newline is added during sysfs_emit. See '27a1a1e2e47d ("drm/xe:
stringify the argument to avoid potential vulnerability")'.

v2: Add Fixes tag (Riana)

Fixes: 7b076d14f21a ("drm/xe/mtl: Add support to get C6 residency/status of MTL")
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_idle.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_idle.c b/drivers/gpu/drm/xe/xe_gt_idle.c
index 9358f733688969..9fcae65b64699e 100644
--- a/drivers/gpu/drm/xe/xe_gt_idle.c
+++ b/drivers/gpu/drm/xe/xe_gt_idle.c
@@ -145,10 +145,10 @@ void xe_gt_idle_sysfs_init(struct xe_gt_idle *gtidle)
 	}
 
 	if (xe_gt_is_media_type(gt)) {
-		sprintf(gtidle->name, "gt%d-mc\n", gt->info.id);
+		sprintf(gtidle->name, "gt%d-mc", gt->info.id);
 		gtidle->idle_residency = xe_guc_pc_mc6_residency;
 	} else {
-		sprintf(gtidle->name, "gt%d-rc\n", gt->info.id);
+		sprintf(gtidle->name, "gt%d-rc", gt->info.id);
 		gtidle->idle_residency = xe_guc_pc_rc6_residency;
 	}
 

From bb619d71224ea85ec94e0a83b2bb82ebe7df2a41 Mon Sep 17 00:00:00 2001
From: Ashutosh Dixit <ashutosh.dixit@intel.com>
Date: Mon, 12 Feb 2024 19:35:48 -0800
Subject: [PATCH 190/200] drm/xe: Fix modpost warning on xe_mocs kunit module

$ make W=1 -j100 M=drivers/gpu/drm/xe
MODPOST drivers/gpu/drm/xe/Module.symvers
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/gpu/drm/xe/tests/xe_mocs_test.o

Fix is identical to '1d425066f15f ("drm/xe: Fix modpost warning on kunit
modules")'.

Fixes: a6a4ea6d7d37 ("drm/xe: Add mocs kunit")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_mocs_test.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/xe/tests/xe_mocs_test.c b/drivers/gpu/drm/xe/tests/xe_mocs_test.c
index 4f62e7a4270bdb..ee40f31e1e12c8 100644
--- a/drivers/gpu/drm/xe/tests/xe_mocs_test.c
+++ b/drivers/gpu/drm/xe/tests/xe_mocs_test.c
@@ -22,4 +22,5 @@ kunit_test_suite(xe_mocs_test_suite);
 
 MODULE_AUTHOR("Intel Corporation");
 MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("xe_mocs kunit test");
 MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING);

From 8491b0ef3233a94901d6f28d203c5ebb2f0f0b33 Mon Sep 17 00:00:00 2001
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Wed, 21 Feb 2024 14:30:16 +0100
Subject: [PATCH 191/200] drm/xe/snapshot: Remove drm_err on guc alloc failures

The kernel will complain loudly if allocation fails, no need to do it
ourselves.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240221133024.898315-1-maarten.lankhorst@linux.intel.com
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 3ac51162cfa723..ff77bc8da1b270 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1794,18 +1794,14 @@ struct xe_guc_submit_exec_queue_snapshot *
 xe_guc_exec_queue_snapshot_capture(struct xe_sched_job *job)
 {
 	struct xe_exec_queue *q = job->q;
-	struct xe_guc *guc = exec_queue_to_guc(q);
-	struct xe_device *xe = guc_to_xe(guc);
 	struct xe_gpu_scheduler *sched = &q->guc->sched;
 	struct xe_guc_submit_exec_queue_snapshot *snapshot;
 	int i;
 
 	snapshot = kzalloc(sizeof(*snapshot), GFP_ATOMIC);
 
-	if (!snapshot) {
-		drm_err(&xe->drm, "Skipping GuC Engine snapshot entirely.\n");
+	if (!snapshot)
 		return NULL;
-	}
 
 	snapshot->guc.id = q->guc->id;
 	memcpy(&snapshot->name, &q->name, sizeof(snapshot->name));
@@ -1821,9 +1817,7 @@ xe_guc_exec_queue_snapshot_capture(struct xe_sched_job *job)
 	snapshot->lrc = kmalloc_array(q->width, sizeof(struct lrc_snapshot),
 				      GFP_ATOMIC);
 
-	if (!snapshot->lrc) {
-		drm_err(&xe->drm, "Skipping GuC Engine LRC snapshot.\n");
-	} else {
+	if (snapshot->lrc) {
 		for (i = 0; i < q->width; ++i) {
 			struct xe_lrc *lrc = q->lrc + i;
 
@@ -1851,9 +1845,7 @@ xe_guc_exec_queue_snapshot_capture(struct xe_sched_job *job)
 					       sizeof(struct pending_list_snapshot),
 					       GFP_ATOMIC);
 
-	if (!snapshot->pending_list) {
-		drm_err(&xe->drm, "Skipping GuC Engine pending_list snapshot.\n");
-	} else {
+	if (snapshot->pending_list) {
 		struct xe_sched_job *job_iter;
 
 		i = 0;

From bd71cdd209c63f3d526aef661282b5252a436c4d Mon Sep 17 00:00:00 2001
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Wed, 21 Feb 2024 14:30:17 +0100
Subject: [PATCH 192/200] drm/xe: Clear all snapshot members after deleting
 coredump

It's not strictly needed to clear right now, but this prevents bugs
from dangling pointers.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240221133024.898315-2-maarten.lankhorst@linux.intel.com
---
 drivers/gpu/drm/xe/xe_devcoredump.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index 08d3f6cb72292b..ae26d8c6d01c5a 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -124,6 +124,8 @@ static void xe_devcoredump_free(void *data)
 		if (coredump->snapshot.hwe[i])
 			xe_hw_engine_snapshot_free(coredump->snapshot.hwe[i]);
 
+	/* To prevent stale data on next snapshot, clear everything */
+	memset(&coredump->snapshot, 0, sizeof(coredump->snapshot));
 	coredump->captured = false;
 	drm_info(&coredump_to_xe(coredump)->drm,
 		 "Xe device coredump has been deleted.\n");

From 76a86b58d2b3de31e88acb487ebfa0c3cc7c41d2 Mon Sep 17 00:00:00 2001
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Wed, 21 Feb 2024 14:30:18 +0100
Subject: [PATCH 193/200] drm/xe: Add uapi for dumpable bos
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add the flag XE_VM_BIND_FLAG_DUMPABLE to notify devcoredump that this
mapping should be dumped.

This is not hooked up, but the uapi should be ready before merging.

It's likely easier to dump the contents of the bo's at devcoredump
readout time, so it's better if the bos will stay unmodified after
a hang. The NEEDS_CPU_MAPPING flag is removed as requirement.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240221133024.898315-3-maarten.lankhorst@linux.intel.com
---
 drivers/gpu/drm/xe/xe_vm.c | 3 ++-
 include/uapi/drm/xe_drm.h  | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 23a44ef85aa4bd..08462f80f000ef 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2722,7 +2722,8 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
 
 #define SUPPORTED_FLAGS	\
 	(DRM_XE_VM_BIND_FLAG_READONLY | \
-	 DRM_XE_VM_BIND_FLAG_IMMEDIATE | DRM_XE_VM_BIND_FLAG_NULL)
+	 DRM_XE_VM_BIND_FLAG_IMMEDIATE | DRM_XE_VM_BIND_FLAG_NULL | \
+	 DRM_XE_VM_BIND_FLAG_DUMPABLE)
 #define XE_64K_PAGE_MASK 0xffffull
 #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
 
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index dce06de592d055..2fefec9c0e946c 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -961,6 +961,7 @@ struct drm_xe_vm_bind_op {
 #define DRM_XE_VM_BIND_FLAG_READONLY	(1 << 0)
 #define DRM_XE_VM_BIND_FLAG_IMMEDIATE	(1 << 1)
 #define DRM_XE_VM_BIND_FLAG_NULL	(1 << 2)
+#define DRM_XE_VM_BIND_FLAG_DUMPABLE	(1 << 3)
 	/** @flags: Bind flags */
 	__u32 flags;
 

From ffb7249df1998a623525648fca412e17a440a136 Mon Sep 17 00:00:00 2001
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Wed, 21 Feb 2024 14:30:19 +0100
Subject: [PATCH 194/200] drm/xe: Annotate each dumpable vma as such
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In preparation for snapshot dumping, mark each dumpable VMA as such, so
we can walk over the VM later and dump it.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240221133024.898315-4-maarten.lankhorst@linux.intel.com
---
 drivers/gpu/drm/xe/xe_vm.c       | 13 +++++++++++++
 drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
 2 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 08462f80f000ef..df1e3841005d48 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -792,6 +792,7 @@ static void xe_vma_free(struct xe_vma *vma)
 
 #define VMA_CREATE_FLAG_READ_ONLY	BIT(0)
 #define VMA_CREATE_FLAG_IS_NULL		BIT(1)
+#define VMA_CREATE_FLAG_DUMPABLE	BIT(2)
 
 static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 				    struct xe_bo *bo,
@@ -804,6 +805,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	u8 id;
 	bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
 	bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
+	bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
 
 	xe_assert(vm->xe, start < end);
 	xe_assert(vm->xe, end < vm->size);
@@ -838,6 +840,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	vma->gpuva.va.range = end - start + 1;
 	if (read_only)
 		vma->gpuva.flags |= XE_VMA_READ_ONLY;
+	if (dumpable)
+		vma->gpuva.flags |= XE_VMA_DUMPABLE;
 
 	for_each_tile(tile, vm->xe, id)
 		vma->tile_mask |= 0x1 << id;
@@ -2122,6 +2126,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			op->map.read_only =
 				flags & DRM_XE_VM_BIND_FLAG_READONLY;
 			op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
+			op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
 			op->map.pat_index = pat_index;
 		} else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
 			op->prefetch.region = prefetch_region;
@@ -2317,6 +2322,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 				VMA_CREATE_FLAG_READ_ONLY : 0;
 			flags |= op->map.is_null ?
 				VMA_CREATE_FLAG_IS_NULL : 0;
+			flags |= op->map.dumpable ?
+				VMA_CREATE_FLAG_DUMPABLE : 0;
 
 			vma = new_vma(vm, &op->base.map, op->map.pat_index,
 				      flags);
@@ -2341,6 +2348,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 				flags |= op->base.remap.unmap->va->flags &
 					DRM_GPUVA_SPARSE ?
 					VMA_CREATE_FLAG_IS_NULL : 0;
+				flags |= op->base.remap.unmap->va->flags &
+					XE_VMA_DUMPABLE ?
+					VMA_CREATE_FLAG_DUMPABLE : 0;
 
 				vma = new_vma(vm, op->base.remap.prev,
 					      old->pat_index, flags);
@@ -2372,6 +2382,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 				flags |= op->base.remap.unmap->va->flags &
 					DRM_GPUVA_SPARSE ?
 					VMA_CREATE_FLAG_IS_NULL : 0;
+				flags |= op->base.remap.unmap->va->flags &
+					XE_VMA_DUMPABLE ?
+					VMA_CREATE_FLAG_DUMPABLE : 0;
 
 				vma = new_vma(vm, op->base.remap.next,
 					      old->pat_index, flags);
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index a603cc2eb56b3f..a975ac83eccae4 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -31,6 +31,7 @@ struct xe_vm;
 #define XE_VMA_PTE_1G		(DRM_GPUVA_USERBITS << 7)
 #define XE_VMA_PTE_64K		(DRM_GPUVA_USERBITS << 8)
 #define XE_VMA_PTE_COMPACT	(DRM_GPUVA_USERBITS << 9)
+#define XE_VMA_DUMPABLE		(DRM_GPUVA_USERBITS << 10)
 
 /** struct xe_userptr - User pointer */
 struct xe_userptr {
@@ -294,6 +295,8 @@ struct xe_vma_op_map {
 	bool read_only;
 	/** @is_null: is NULL binding */
 	bool is_null;
+	/** @dumpable: whether BO is dumped on GPU hang */
+	bool dumpable;
 	/** @pat_index: The pat index to use for this operation. */
 	u16 pat_index;
 };

From 0cd99046ca0522d8d212eb9adb093063a5f333ae Mon Sep 17 00:00:00 2001
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Wed, 21 Feb 2024 14:30:20 +0100
Subject: [PATCH 195/200] drm/xe: Add vm snapshot mutex for easily taking a vm
 snapshot during devcoredump
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The devcoredump is done in fence signaling context. Because of this, we
cannot take any of the normal mutexes or we would invert.

Normal: Take vm->lock, dma_fence_wait()
Devcoredump: from dma_fence_wait() context, take vm->lock.

This doesn't work, and we only care about integrity, so take the locks
around additions and removals of vma's.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240221133024.898315-5-maarten.lankhorst@linux.intel.com
---
 drivers/gpu/drm/xe/xe_vm.c       | 8 ++++++++
 drivers/gpu/drm/xe/xe_vm_types.h | 5 +++++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index df1e3841005d48..5c8cbaf0b8d985 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1055,7 +1055,9 @@ static int xe_vm_insert_vma(struct xe_vm *vm, struct xe_vma *vma)
 	xe_assert(vm->xe, xe_vma_vm(vma) == vm);
 	lockdep_assert_held(&vm->lock);
 
+	mutex_lock(&vm->snap_mutex);
 	err = drm_gpuva_insert(&vm->gpuvm, &vma->gpuva);
+	mutex_unlock(&vm->snap_mutex);
 	XE_WARN_ON(err);	/* Shouldn't be possible */
 
 	return err;
@@ -1066,7 +1068,9 @@ static void xe_vm_remove_vma(struct xe_vm *vm, struct xe_vma *vma)
 	xe_assert(vm->xe, xe_vma_vm(vma) == vm);
 	lockdep_assert_held(&vm->lock);
 
+	mutex_lock(&vm->snap_mutex);
 	drm_gpuva_remove(&vma->gpuva);
+	mutex_unlock(&vm->snap_mutex);
 	if (vm->usm.last_fault_vma == vma)
 		vm->usm.last_fault_vma = NULL;
 }
@@ -1293,6 +1297,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	vm->flags = flags;
 
 	init_rwsem(&vm->lock);
+	mutex_init(&vm->snap_mutex);
 
 	INIT_LIST_HEAD(&vm->rebind_list);
 
@@ -1418,6 +1423,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	return ERR_PTR(err);
 
 err_no_resv:
+	mutex_destroy(&vm->snap_mutex);
 	for_each_tile(tile, xe, id)
 		xe_range_fence_tree_fini(&vm->rftree[id]);
 	kfree(vm);
@@ -1517,6 +1523,8 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 
 	up_write(&vm->lock);
 
+	mutex_destroy(&vm->snap_mutex);
+
 	mutex_lock(&xe->usm.lock);
 	if (vm->flags & XE_VM_FLAG_FAULT_MODE)
 		xe->usm.num_vm_in_fault_mode--;
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index a975ac83eccae4..7d4f810f9c0465 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -160,6 +160,11 @@ struct xe_vm {
 	 * VM
 	 */
 	struct rw_semaphore lock;
+	/**
+	 * @snap_mutex: Mutex used to guard insertions and removals from gpuva,
+	 * so we can take a snapshot safely from devcoredump.
+	 */
+	struct mutex snap_mutex;
 
 	/**
 	 * @rebind_list: list of VMAs that need rebinding. Protected by the

From 0eb2a18a8fad629da8595bfc253d63d6bec71495 Mon Sep 17 00:00:00 2001
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Wed, 21 Feb 2024 14:30:21 +0100
Subject: [PATCH 196/200] drm/xe: Implement VM snapshot support for BO's and
 userptr
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Since we cannot immediately capture the BO's and userptr, perform it in
2 stages. The immediate stage takes a reference to each BO and userptr,
while a delayed worker captures the contents and then frees the
reference.

This is required because in signaling context, no locks can be taken, no
memory can be allocated, and no waits on userspace can be performed.

With the delayed worker, all of this can be performed very easily,
without having to resort to hacks.

Changes since v1:
- Fix crash on NULL captured vm.
- Use ascii85_encode to capture BO contents and save some space.
- Add length to coredump output for each captured area.
Changes since v2:
- Dump each mapping on their own line, to simplify tooling.
- Fix null pointer deref in xe_vm_snapshot_free.
Changes since v3:
- Don't add uninitialized value to snap->ofs. (Souza)
- Use kernel types for u32 and u64.
- Move snap_mutex destruction to final vm destruction. (Souza)
Changes since v4:
- Remove extra memset. (Souza)

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240221133024.898315-6-maarten.lankhorst@linux.intel.com
---
 drivers/gpu/drm/xe/xe_devcoredump.c       |  32 +++-
 drivers/gpu/drm/xe/xe_devcoredump_types.h |   8 +
 drivers/gpu/drm/xe/xe_vm.c                | 170 +++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm.h                |   5 +
 4 files changed, 211 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index ae26d8c6d01c5a..68d3d623a05bfa 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -17,6 +17,7 @@
 #include "xe_guc_submit.h"
 #include "xe_hw_engine.h"
 #include "xe_sched_job.h"
+#include "xe_vm.h"
 
 /**
  * DOC: Xe device coredump
@@ -59,12 +60,22 @@ static struct xe_guc *exec_queue_to_guc(struct xe_exec_queue *q)
 	return &q->gt->uc.guc;
 }
 
+static void xe_devcoredump_deferred_snap_work(struct work_struct *work)
+{
+	struct xe_devcoredump_snapshot *ss = container_of(work, typeof(*ss), work);
+
+	xe_force_wake_get(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL);
+	if (ss->vm)
+		xe_vm_snapshot_capture_delayed(ss->vm);
+	xe_force_wake_put(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL);
+}
+
 static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
 				   size_t count, void *data, size_t datalen)
 {
 	struct xe_devcoredump *coredump = data;
 	struct xe_device *xe = coredump_to_xe(coredump);
-	struct xe_devcoredump_snapshot *ss;
+	struct xe_devcoredump_snapshot *ss = &coredump->snapshot;
 	struct drm_printer p;
 	struct drm_print_iterator iter;
 	struct timespec64 ts;
@@ -74,12 +85,14 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
 	if (!data || !coredump_to_xe(coredump))
 		return -ENODEV;
 
+	/* Ensure delayed work is captured before continuing */
+	flush_work(&ss->work);
+
 	iter.data = buffer;
 	iter.offset = 0;
 	iter.start = offset;
 	iter.remain = count;
 
-	ss = &coredump->snapshot;
 	p = drm_coredump_printer(&iter);
 
 	drm_printf(&p, "**** Xe Device Coredump ****\n");
@@ -104,6 +117,10 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
 		if (coredump->snapshot.hwe[i])
 			xe_hw_engine_snapshot_print(coredump->snapshot.hwe[i],
 						    &p);
+	if (coredump->snapshot.vm) {
+		drm_printf(&p, "\n**** VM state ****\n");
+		xe_vm_snapshot_print(coredump->snapshot.vm, &p);
+	}
 
 	return count - iter.remain;
 }
@@ -117,12 +134,15 @@ static void xe_devcoredump_free(void *data)
 	if (!data || !coredump_to_xe(coredump))
 		return;
 
+	cancel_work_sync(&coredump->snapshot.work);
+
 	xe_guc_ct_snapshot_free(coredump->snapshot.ct);
 	xe_guc_exec_queue_snapshot_free(coredump->snapshot.ge);
 	xe_sched_job_snapshot_free(coredump->snapshot.job);
 	for (i = 0; i < XE_NUM_HW_ENGINES; i++)
 		if (coredump->snapshot.hwe[i])
 			xe_hw_engine_snapshot_free(coredump->snapshot.hwe[i]);
+	xe_vm_snapshot_free(coredump->snapshot.vm);
 
 	/* To prevent stale data on next snapshot, clear everything */
 	memset(&coredump->snapshot, 0, sizeof(coredump->snapshot));
@@ -147,6 +167,9 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump,
 	ss->snapshot_time = ktime_get_real();
 	ss->boot_time = ktime_get_boottime();
 
+	ss->gt = q->gt;
+	INIT_WORK(&ss->work, xe_devcoredump_deferred_snap_work);
+
 	cookie = dma_fence_begin_signalling();
 	for (i = 0; q->width > 1 && i < XE_HW_ENGINE_MAX_INSTANCE;) {
 		if (adj_logical_mask & BIT(i)) {
@@ -162,6 +185,7 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump,
 	coredump->snapshot.ct = xe_guc_ct_snapshot_capture(&guc->ct, true);
 	coredump->snapshot.ge = xe_guc_exec_queue_snapshot_capture(job);
 	coredump->snapshot.job = xe_sched_job_snapshot_capture(job);
+	coredump->snapshot.vm = xe_vm_snapshot_capture(q->vm);
 
 	for_each_hw_engine(hwe, q->gt, id) {
 		if (hwe->class != q->hwe->class ||
@@ -172,6 +196,9 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump,
 		coredump->snapshot.hwe[id] = xe_hw_engine_snapshot_capture(hwe);
 	}
 
+	if (ss->vm)
+		queue_work(system_unbound_wq, &ss->work);
+
 	xe_force_wake_put(gt_to_fw(q->gt), XE_FORCEWAKE_ALL);
 	dma_fence_end_signalling(cookie);
 }
@@ -205,3 +232,4 @@ void xe_devcoredump(struct xe_sched_job *job)
 		      xe_devcoredump_read, xe_devcoredump_free);
 }
 #endif
+
diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h
index d259119b2c9802..6f654b63c7f1cb 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump_types.h
+++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h
@@ -12,6 +12,7 @@
 #include "xe_hw_engine_types.h"
 
 struct xe_device;
+struct xe_gt;
 
 /**
  * struct xe_devcoredump_snapshot - Crash snapshot
@@ -26,6 +27,11 @@ struct xe_devcoredump_snapshot {
 	/** @boot_time:  Relative boot time so the uptime can be calculated. */
 	ktime_t boot_time;
 
+	/** @gt: Affected GT, used by forcewake for delayed capture */
+	struct xe_gt *gt;
+	/** @work: Workqueue for deferred capture outside of signaling context */
+	struct work_struct work;
+
 	/* GuC snapshots */
 	/** @ct: GuC CT snapshot */
 	struct xe_guc_ct_snapshot *ct;
@@ -36,6 +42,8 @@ struct xe_devcoredump_snapshot {
 	struct xe_hw_engine_snapshot *hwe[XE_NUM_HW_ENGINES];
 	/** @job: Snapshot of job state */
 	struct xe_sched_job_snapshot *job;
+	/** @vm: Snapshot of VM state */
+	struct xe_vm_snapshot *vm;
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 5c8cbaf0b8d985..5c1276d887c162 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -13,6 +13,7 @@
 #include <drm/ttm/ttm_execbuf_util.h>
 #include <drm/ttm/ttm_tt.h>
 #include <drm/xe_drm.h>
+#include <linux/ascii85.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/mm.h>
@@ -1523,8 +1524,6 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 
 	up_write(&vm->lock);
 
-	mutex_destroy(&vm->snap_mutex);
-
 	mutex_lock(&xe->usm.lock);
 	if (vm->flags & XE_VM_FLAG_FAULT_MODE)
 		xe->usm.num_vm_in_fault_mode--;
@@ -1550,6 +1549,8 @@ static void vm_destroy_work_func(struct work_struct *w)
 	/* xe_vm_close_and_put was not called? */
 	xe_assert(xe, !vm->size);
 
+	mutex_destroy(&vm->snap_mutex);
+
 	if (!(vm->flags & XE_VM_FLAG_MIGRATION)) {
 		xe_device_mem_access_put(xe);
 
@@ -3269,3 +3270,168 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
 
 	return 0;
 }
+
+struct xe_vm_snapshot {
+	unsigned long num_snaps;
+	struct {
+		u64 ofs, bo_ofs;
+		unsigned long len;
+		struct xe_bo *bo;
+		void *data;
+		struct mm_struct *mm;
+	} snap[];
+};
+
+struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm)
+{
+	unsigned long num_snaps = 0, i;
+	struct xe_vm_snapshot *snap = NULL;
+	struct drm_gpuva *gpuva;
+
+	if (!vm)
+		return NULL;
+
+	mutex_lock(&vm->snap_mutex);
+	drm_gpuvm_for_each_va(gpuva, &vm->gpuvm) {
+		if (gpuva->flags & XE_VMA_DUMPABLE)
+			num_snaps++;
+	}
+
+	if (num_snaps)
+		snap = kvzalloc(offsetof(struct xe_vm_snapshot, snap[num_snaps]), GFP_NOWAIT);
+	if (!snap)
+		goto out_unlock;
+
+	snap->num_snaps = num_snaps;
+	i = 0;
+	drm_gpuvm_for_each_va(gpuva, &vm->gpuvm) {
+		struct xe_vma *vma = gpuva_to_vma(gpuva);
+		struct xe_bo *bo = vma->gpuva.gem.obj ?
+			gem_to_xe_bo(vma->gpuva.gem.obj) : NULL;
+
+		if (!(gpuva->flags & XE_VMA_DUMPABLE))
+			continue;
+
+		snap->snap[i].ofs = xe_vma_start(vma);
+		snap->snap[i].len = xe_vma_size(vma);
+		if (bo) {
+			snap->snap[i].bo = xe_bo_get(bo);
+			snap->snap[i].bo_ofs = xe_vma_bo_offset(vma);
+		} else if (xe_vma_is_userptr(vma)) {
+			struct mm_struct *mm =
+				to_userptr_vma(vma)->userptr.notifier.mm;
+
+			if (mmget_not_zero(mm))
+				snap->snap[i].mm = mm;
+			else
+				snap->snap[i].data = ERR_PTR(-EFAULT);
+
+			snap->snap[i].bo_ofs = xe_vma_userptr(vma);
+		} else {
+			snap->snap[i].data = ERR_PTR(-ENOENT);
+		}
+		i++;
+	}
+
+out_unlock:
+	mutex_unlock(&vm->snap_mutex);
+	return snap;
+}
+
+void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap)
+{
+	for (int i = 0; i < snap->num_snaps; i++) {
+		struct xe_bo *bo = snap->snap[i].bo;
+		struct iosys_map src;
+		int err;
+
+		if (IS_ERR(snap->snap[i].data))
+			continue;
+
+		snap->snap[i].data = kvmalloc(snap->snap[i].len, GFP_USER);
+		if (!snap->snap[i].data) {
+			snap->snap[i].data = ERR_PTR(-ENOMEM);
+			goto cleanup_bo;
+		}
+
+		if (bo) {
+			dma_resv_lock(bo->ttm.base.resv, NULL);
+			err = ttm_bo_vmap(&bo->ttm, &src);
+			if (!err) {
+				xe_map_memcpy_from(xe_bo_device(bo),
+						   snap->snap[i].data,
+						   &src, snap->snap[i].bo_ofs,
+						   snap->snap[i].len);
+				ttm_bo_vunmap(&bo->ttm, &src);
+			}
+			dma_resv_unlock(bo->ttm.base.resv);
+		} else {
+			void __user *userptr = (void __user *)(size_t)snap->snap[i].bo_ofs;
+
+			kthread_use_mm(snap->snap[i].mm);
+			if (!copy_from_user(snap->snap[i].data, userptr, snap->snap[i].len))
+				err = 0;
+			else
+				err = -EFAULT;
+			kthread_unuse_mm(snap->snap[i].mm);
+
+			mmput(snap->snap[i].mm);
+			snap->snap[i].mm = NULL;
+		}
+
+		if (err) {
+			kvfree(snap->snap[i].data);
+			snap->snap[i].data = ERR_PTR(err);
+		}
+
+cleanup_bo:
+		xe_bo_put(bo);
+		snap->snap[i].bo = NULL;
+	}
+}
+
+void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct drm_printer *p)
+{
+	unsigned long i, j;
+
+	for (i = 0; i < snap->num_snaps; i++) {
+		if (IS_ERR(snap->snap[i].data))
+			goto uncaptured;
+
+		drm_printf(p, "[%llx].length: 0x%lx\n", snap->snap[i].ofs, snap->snap[i].len);
+		drm_printf(p, "[%llx].data: ",
+			   snap->snap[i].ofs);
+
+		for (j = 0; j < snap->snap[i].len; j += sizeof(u32)) {
+			u32 *val = snap->snap[i].data + j;
+			char dumped[ASCII85_BUFSZ];
+
+			drm_puts(p, ascii85_encode(*val, dumped));
+		}
+
+		drm_puts(p, "\n");
+		continue;
+
+uncaptured:
+		drm_printf(p, "Unable to capture range [%llx-%llx]: %li\n",
+			   snap->snap[i].ofs, snap->snap[i].ofs + snap->snap[i].len - 1,
+			   PTR_ERR(snap->snap[i].data));
+	}
+}
+
+void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
+{
+	unsigned long i;
+
+	if (!snap)
+		return;
+
+	for (i = 0; i < snap->num_snaps; i++) {
+		if (!IS_ERR(snap->snap[i].data))
+			kvfree(snap->snap[i].data);
+		xe_bo_put(snap->snap[i].bo);
+		if (snap->snap[i].mm)
+			mmput(snap->snap[i].mm);
+	}
+	kvfree(snap);
+}
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index df4a82e960ff0c..6df1f1c7f85d98 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -271,3 +271,8 @@ static inline void vm_dbg(const struct drm_device *dev,
 { /* noop */ }
 #endif
 #endif
+
+struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm);
+void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap);
+void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct drm_printer *p);
+void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);

From de74079f00897b88879fa54476320928c5605774 Mon Sep 17 00:00:00 2001
From: Matthew Brost <matthew.brost@intel.com>
Date: Tue, 20 Feb 2024 19:27:43 -0800
Subject: [PATCH 197/200] drm/xe: Add debug prints for skipping rebinds

Will help debug issues with VM binds.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240221032743.3698849-1-matthew.brost@intel.com
---
 drivers/gpu/drm/xe/xe_vm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 5c1276d887c162..394f516f20df1d 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2301,6 +2301,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 				   struct xe_sync_entry *syncs, u32 num_syncs,
 				   struct list_head *ops_list, bool last)
 {
+	struct xe_device *xe = vm->xe;
 	struct xe_vma_op *last_op = NULL;
 	struct drm_gpuva_op *__op;
 	int err = 0;
@@ -2381,6 +2382,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 						xe_vma_end(vma) -
 						xe_vma_start(old);
 					op->remap.start = xe_vma_end(vma);
+					vm_dbg(&xe->drm, "REMAP:SKIP_PREV: addr=0x%016llx, range=0x%016llx",
+					       (ULL)op->remap.start,
+					       (ULL)op->remap.range);
 				}
 			}
 
@@ -2414,6 +2418,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 					op->remap.range -=
 						xe_vma_end(old) -
 						xe_vma_start(vma);
+					vm_dbg(&xe->drm, "REMAP:SKIP_NEXT: addr=0x%016llx, range=0x%016llx",
+					       (ULL)op->remap.start,
+					       (ULL)op->remap.range);
 				}
 			}
 			break;

From a24d9099777d9c314c984b94653407710c2358b4 Mon Sep 17 00:00:00 2001
From: Dafna Hirschfeld <dhirschfeld@habana.ai>
Date: Wed, 21 Feb 2024 10:36:22 +0200
Subject: [PATCH 198/200] drm/xe: Do not include current dir for
 generated/xe_wa_oob.h

The generated file 'generated/xe_wa_oob.h' is included using:
"generated/xe_wa_oob.h"
which first look inside the source code. But the file resides
in the build directory and should therefore be included using:
<generated/xe_wa_oob.h>

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240221083622.1584492-1-dhirschfeld@habana.ai
---
 drivers/gpu/drm/xe/xe_gsc.c            | 3 ++-
 drivers/gpu/drm/xe/xe_guc.c            | 3 ++-
 drivers/gpu/drm/xe/xe_migrate.c        | 3 ++-
 drivers/gpu/drm/xe/xe_ring_ops.c       | 3 ++-
 drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 3 ++-
 drivers/gpu/drm/xe/xe_vm.c             | 3 ++-
 drivers/gpu/drm/xe/xe_wa.c             | 3 ++-
 7 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index 0b90fd9ef63a44..a61994292c4356 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -7,8 +7,9 @@
 
 #include <drm/drm_managed.h>
 
+#include <generated/xe_wa_oob.h>
+
 #include "abi/gsc_mkhi_commands_abi.h"
-#include "generated/xe_wa_oob.h"
 #include "xe_bb.h"
 #include "xe_bo.h"
 #include "xe_device.h"
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 68089556c14096..0d2a2dd13f1121 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -7,9 +7,10 @@
 
 #include <drm/drm_managed.h>
 
+#include <generated/xe_wa_oob.h>
+
 #include "abi/guc_actions_abi.h"
 #include "abi/guc_errors_abi.h"
-#include "generated/xe_wa_oob.h"
 #include "regs/xe_gt_regs.h"
 #include "regs/xe_guc_regs.h"
 #include "xe_bo.h"
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 3d2438dc86eeef..a66fdf2d299122 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -12,7 +12,8 @@
 #include <drm/ttm/ttm_tt.h>
 #include <drm/xe_drm.h>
 
-#include "generated/xe_wa_oob.h"
+#include <generated/xe_wa_oob.h>
+
 #include "instructions/xe_mi_commands.h"
 #include "regs/xe_gpu_commands.h"
 #include "tests/xe_test.h"
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index d5e9621428ef38..c4edffcd4a3206 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -5,7 +5,8 @@
 
 #include "xe_ring_ops.h"
 
-#include "generated/xe_wa_oob.h"
+#include <generated/xe_wa_oob.h>
+
 #include "instructions/xe_mi_commands.h"
 #include "regs/xe_engine_regs.h"
 #include "regs/xe_gpu_commands.h"
diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
index 662f1e9bfc65df..3107d2a12426ca 100644
--- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
@@ -11,7 +11,8 @@
 #include <drm/ttm/ttm_placement.h>
 #include <drm/ttm/ttm_range_manager.h>
 
-#include "generated/xe_wa_oob.h"
+#include <generated/xe_wa_oob.h>
+
 #include "regs/xe_gt_regs.h"
 #include "regs/xe_regs.h"
 #include "xe_bo.h"
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 394f516f20df1d..e3bde897f6e8aa 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -19,6 +19,8 @@
 #include <linux/mm.h>
 #include <linux/swap.h>
 
+#include <generated/xe_wa_oob.h>
+
 #include "xe_assert.h"
 #include "xe_bo.h"
 #include "xe_device.h"
@@ -35,7 +37,6 @@
 #include "xe_res_cursor.h"
 #include "xe_sync.h"
 #include "xe_trace.h"
-#include "generated/xe_wa_oob.h"
 #include "xe_wa.h"
 
 static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
index 3299130ba10ae0..a0264eedd443e2 100644
--- a/drivers/gpu/drm/xe/xe_wa.c
+++ b/drivers/gpu/drm/xe/xe_wa.c
@@ -9,7 +9,8 @@
 #include <kunit/visibility.h>
 #include <linux/compiler_types.h>
 
-#include "generated/xe_wa_oob.h"
+#include <generated/xe_wa_oob.h>
+
 #include "regs/xe_engine_regs.h"
 #include "regs/xe_gt_regs.h"
 #include "regs/xe_regs.h"

From 7a975748d4dc0a524c99a390c6f74b7097ef8cf7 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Thu, 22 Feb 2024 06:41:24 -0800
Subject: [PATCH 199/200] drm/xe: Use pointers in trace events
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Commit a0df2cc858c3 ("drm/xe/xe_bo_move: Enhance xe_bo_move trace")
inadvertently reverted commit 8d038f49c1f3 ("drm/xe: Fix cast on trace
variable"), breaking the build on 32bits.

As noted by Ville, there's no point in converting the pointers to u64
and add casts everywhere. In fact, it's better to just use %p and let
the address be hashed. Convert all the cases in xe_trace.h to use
pointers.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Priyanka Dandamudi <priyanka.dandamudi@intel.com>
Cc: Oak Zeng <oak.zeng@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240222144125.2862546-1-lucas.demarchi@intel.com
---
 drivers/gpu/drm/xe/xe_trace.h | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 0cce98a6b14b78..3b97633d81d850 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -27,16 +27,16 @@ DECLARE_EVENT_CLASS(xe_gt_tlb_invalidation_fence,
 		    TP_ARGS(fence),
 
 		    TP_STRUCT__entry(
-			     __field(u64, fence)
+			     __field(struct xe_gt_tlb_invalidation_fence *, fence)
 			     __field(int, seqno)
 			     ),
 
 		    TP_fast_assign(
-			   __entry->fence = (u64)fence;
+			   __entry->fence = fence;
 			   __entry->seqno = fence->seqno;
 			   ),
 
-		    TP_printk("fence=0x%016llx, seqno=%d",
+		    TP_printk("fence=%p, seqno=%d",
 			      __entry->fence, __entry->seqno)
 );
 
@@ -83,16 +83,16 @@ DECLARE_EVENT_CLASS(xe_bo,
 		    TP_STRUCT__entry(
 			     __field(size_t, size)
 			     __field(u32, flags)
-			     __field(u64, vm)
+			     __field(struct xe_vm *, vm)
 			     ),
 
 		    TP_fast_assign(
 			   __entry->size = bo->size;
 			   __entry->flags = bo->flags;
-			   __entry->vm = (unsigned long)bo->vm;
+			   __entry->vm = bo->vm;
 			   ),
 
-		    TP_printk("size=%zu, flags=0x%02x, vm=0x%016llx",
+		    TP_printk("size=%zu, flags=0x%02x, vm=%p",
 			      __entry->size, __entry->flags, __entry->vm)
 );
 
@@ -346,16 +346,16 @@ DECLARE_EVENT_CLASS(xe_hw_fence,
 		    TP_STRUCT__entry(
 			     __field(u64, ctx)
 			     __field(u32, seqno)
-			     __field(u64, fence)
+			     __field(struct xe_hw_fence *, fence)
 			     ),
 
 		    TP_fast_assign(
 			   __entry->ctx = fence->dma.context;
 			   __entry->seqno = fence->dma.seqno;
-			   __entry->fence = (unsigned long)fence;
+			   __entry->fence = fence;
 			   ),
 
-		    TP_printk("ctx=0x%016llx, fence=0x%016llx, seqno=%u",
+		    TP_printk("ctx=0x%016llx, fence=%p, seqno=%u",
 			      __entry->ctx, __entry->fence, __entry->seqno)
 );
 
@@ -384,7 +384,7 @@ DECLARE_EVENT_CLASS(xe_vma,
 		    TP_ARGS(vma),
 
 		    TP_STRUCT__entry(
-			     __field(u64, vma)
+			     __field(struct xe_vma *, vma)
 			     __field(u32, asid)
 			     __field(u64, start)
 			     __field(u64, end)
@@ -392,14 +392,14 @@ DECLARE_EVENT_CLASS(xe_vma,
 			     ),
 
 		    TP_fast_assign(
-			   __entry->vma = (unsigned long)vma;
+			   __entry->vma = vma;
 			   __entry->asid = xe_vma_vm(vma)->usm.asid;
 			   __entry->start = xe_vma_start(vma);
 			   __entry->end = xe_vma_end(vma) - 1;
 			   __entry->ptr = xe_vma_userptr(vma);
 			   ),
 
-		    TP_printk("vma=0x%016llx, asid=0x%05x, start=0x%012llx, end=0x%012llx, ptr=0x%012llx,",
+		    TP_printk("vma=%p, asid=0x%05x, start=0x%012llx, end=0x%012llx, userptr=0x%012llx,",
 			      __entry->vma, __entry->asid, __entry->start,
 			      __entry->end, __entry->ptr)
 )
@@ -484,16 +484,16 @@ DECLARE_EVENT_CLASS(xe_vm,
 		    TP_ARGS(vm),
 
 		    TP_STRUCT__entry(
-			     __field(u64, vm)
+			     __field(struct xe_vm *, vm)
 			     __field(u32, asid)
 			     ),
 
 		    TP_fast_assign(
-			   __entry->vm = (unsigned long)vm;
+			   __entry->vm = vm;
 			   __entry->asid = vm->usm.asid;
 			   ),
 
-		    TP_printk("vm=0x%016llx, asid=0x%05x",  __entry->vm,
+		    TP_printk("vm=%p, asid=0x%05x",  __entry->vm,
 			      __entry->asid)
 );
 

From a7a3d73686f5837916ebffda77afa4343754e7dc Mon Sep 17 00:00:00 2001
From: Erick Archer <erick.archer@gmx.com>
Date: Sat, 10 Feb 2024 15:19:12 +0100
Subject: [PATCH 200/200] drm/xe: Prefer struct_size over open coded arithmetic

This is an effort to get rid of all multiplications from allocation
functions in order to prevent integer overflows [1].

As the "q" variable is a pointer to "struct xe_exec_queue" and this
structure ends in a flexible array:

struct xe_exec_queue {
	[...]
	struct xe_lrc lrc[];
};

the preferred way in the kernel is to use the struct_size() helper to
do the arithmetic instead of the argument "size + size * count" in the
kzalloc() function.

This way, the code is more readable and more safer.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
Link: https://github.com/KSPP/linux/issues/160 [2]
Signed-off-by: Erick Archer <erick.archer@gmx.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240210141913.6611-1-erick.archer@gmx.com
---
 drivers/gpu/drm/xe/xe_exec_queue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index dda90f0c989031..4bb8f897bf1502 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -46,7 +46,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 	/* only kernel queues can be permanent */
 	XE_WARN_ON((flags & EXEC_QUEUE_FLAG_PERMANENT) && !(flags & EXEC_QUEUE_FLAG_KERNEL));
 
-	q = kzalloc(sizeof(*q) + sizeof(struct xe_lrc) * width, GFP_KERNEL);
+	q = kzalloc(struct_size(q, lrc, width), GFP_KERNEL);
 	if (!q)
 		return ERR_PTR(-ENOMEM);