Skip to content

Commit

Permalink
Revert "drm/sched: Use parent fence instead of finished"
Browse files Browse the repository at this point in the history
This reverts commit e4dc45b.

This is causing instability on Linus' desktop, and I'm seeing
oops with VK CTS runs.

netconsole got me the following oops:
[ 1234.778760] BUG: kernel NULL pointer dereference, address: 0000000000000088
[ 1234.778782] #PF: supervisor read access in kernel mode
[ 1234.778787] #PF: error_code(0x0000) - not-present page
[ 1234.778791] PGD 0 P4D 0
[ 1234.778798] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1234.778803] CPU: 7 PID: 805 Comm: systemd-journal Not tainted 6.0.0+ #2
[ 1234.778809] Hardware name: System manufacturer System Product
Name/PRIME X370-PRO, BIOS 5603 07/28/2020
[ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched]
[ 1234.778828] Code: aa 0f 1d ce e9 57 ff ff ff 48 89 d7 e8 9d 8f 3f
ce e9 4a ff ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53
48 89 fb <48> 8b af 88 00 00 00 f0 ff 8d f0 00 00 00 48 8b 85 80 01 00
00 f0
[ 1234.778834] RSP: 0000:ffffabe680380de0 EFLAGS: 00010087
[ 1234.778839] RAX: ffffffffc04e9230 RBX: 0000000000000000 RCX: 0000000000000018
[ 1234.778897] RDX: 00000ba278e8977a RSI: ffff953fb288b460 RDI: 0000000000000000
[ 1234.778901] RBP: ffff953fb288b598 R08: 00000000000000e0 R09: ffff953fbd98b808
[ 1234.778905] R10: 0000000000000000 R11: ffffabe680380ff8 R12: ffffabe680380e00
[ 1234.778908] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff953fbd9ec458
[ 1234.778912] FS:  00007f35e7008580(0000) GS:ffff95428ebc0000(0000)
knlGS:0000000000000000
[ 1234.778916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1234.778919] CR2: 0000000000000088 CR3: 000000010147c000 CR4: 00000000003506e0
[ 1234.778924] Call Trace:
[ 1234.778981]  <IRQ>
[ 1234.778989]  dma_fence_signal_timestamp_locked+0x6a/0xe0
[ 1234.778999]  dma_fence_signal+0x2c/0x50
[ 1234.779005]  amdgpu_fence_process+0xc8/0x140 [amdgpu]
[ 1234.779234]  sdma_v3_0_process_trap_irq+0x70/0x80 [amdgpu]
[ 1234.779395]  amdgpu_irq_dispatch+0xa9/0x1d0 [amdgpu]
[ 1234.779609]  amdgpu_ih_process+0x80/0x100 [amdgpu]
[ 1234.779783]  amdgpu_irq_handler+0x1f/0x60 [amdgpu]
[ 1234.779940]  __handle_irq_event_percpu+0x46/0x190
[ 1234.779946]  handle_irq_event+0x34/0x70
[ 1234.779949]  handle_edge_irq+0x9f/0x240
[ 1234.779954]  __common_interrupt+0x66/0x100
[ 1234.779960]  common_interrupt+0xa0/0xc0
[ 1234.779965]  </IRQ>
[ 1234.779968]  <TASK>
[ 1234.779971]  asm_common_interrupt+0x22/0x40
[ 1234.779976] RIP: 0010:finish_mkwrite_fault+0x22/0x110
[ 1234.779981] Code: 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 55 41
54 55 48 89 fd 53 48 8b 07 f6 40 50 08 0f 84 eb 00 00 00 48 8b 45 30
48 8b 18 <48> 89 df e8 66 bd ff ff 48 85 c0 74 0d 48 89 c2 83 e2 01 48
83 ea
[ 1234.779985] RSP: 0000:ffffabe680bcfd78 EFLAGS: 00000202

Revert it for now and figure it out later.

Signed-off-by: Dave Airlie <[email protected]>
  • Loading branch information
airlied committed Oct 7, 2022
1 parent 6589868 commit bafaf67
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions drivers/gpu/drm/scheduler/sched_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -829,7 +829,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
job = list_first_entry_or_null(&sched->pending_list,
struct drm_sched_job, list);

if (job && dma_fence_is_signaled(job->s_fence->parent)) {
if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
/* remove job from pending_list */
list_del_init(&job->list);

Expand All @@ -841,7 +841,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)

if (next) {
next->s_fence->scheduled.timestamp =
job->s_fence->parent->timestamp;
job->s_fence->finished.timestamp;
/* start TO timer for next job */
drm_sched_start_timeout(sched);
}
Expand Down

0 comments on commit bafaf67

Please sign in to comment.