You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
commit da67179 upstream.
At the moment - the memory allocation for fwsec-sb is created as-needed and
is released after being used. Typically this is at some point well after
driver load, which can cause runtime suspend/resume to initially work on
driver load but then later fail on a machine that has been running for long
enough with sufficiently high enough memory pressure:
kworker/7:1: page allocation failure: order:5, mode:0xcc0(GFP_KERNEL),
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 7 UID: 0 PID: 875159 Comm: kworker/7:1 Not tainted
6.17.8-300.fc43.x86_64 #1 PREEMPT(lazy)
Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024
Workqueue: pm pm_runtime_work
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
warn_alloc+0x163/0x190
? __alloc_pages_direct_compact+0x1b3/0x220
__alloc_pages_slowpath.constprop.0+0x57a/0xb10
__alloc_frozen_pages_noprof+0x334/0x350
__alloc_pages_noprof+0xe/0x20
__dma_direct_alloc_pages.isra.0+0x1eb/0x330
dma_direct_alloc_pages+0x3c/0x190
dma_alloc_pages+0x29/0x130
nvkm_firmware_ctor+0x1ae/0x280 [nouveau]
nvkm_falcon_fw_ctor+0x3e/0x60 [nouveau]
nvkm_gsp_fwsec+0x10e/0x2c0 [nouveau]
? sysvec_apic_timer_interrupt+0xe/0x90
nvkm_gsp_fwsec_sb+0x27/0x70 [nouveau]
tu102_gsp_fini+0x65/0x110 [nouveau]
? ktime_get+0x3c/0xf0
nvkm_subdev_fini+0x67/0xc0 [nouveau]
nvkm_device_fini+0x94/0x140 [nouveau]
nvkm_udevice_fini+0x50/0x70 [nouveau]
nvkm_object_fini+0xb1/0x140 [nouveau]
nvkm_object_fini+0x70/0x140 [nouveau]
? __pfx_pci_pm_runtime_suspend+0x10/0x10
nouveau_do_suspend+0xe4/0x170 [nouveau]
nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau]
pci_pm_runtime_suspend+0x67/0x1a0
? __pfx_pci_pm_runtime_suspend+0x10/0x10
__rpm_callback+0x45/0x1f0
? __pfx_pci_pm_runtime_suspend+0x10/0x10
rpm_callback+0x6d/0x80
rpm_suspend+0xe5/0x5e0
? finish_task_switch.isra.0+0x99/0x2c0
pm_runtime_work+0x98/0xb0
process_one_work+0x18f/0x350
worker_thread+0x25a/0x3a0
? __pfx_worker_thread+0x10/0x10
kthread+0xf9/0x240
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0xf1/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
The reason this happens is because the fwsec-sb firmware image only
supports being booted from a contiguous coherent sysmem allocation. If a
system runs into enough memory fragmentation from memory pressure, such as
what can happen on systems with low amounts of memory, this can lead to a
situation where it later becomes impossible to find space for a large
enough contiguous allocation to hold fwsec-sb. This causes us to fail to
boot the firmware image, causing the GPU to fail booting and causing the
driver to fail.
Since this firmware can't use non-contiguous allocations, the best solution
to avoid this issue is to simply allocate the memory for fwsec-sb during
initial driver-load, and reuse the memory allocation when fwsec-sb needs to
be used. We then release the memory allocations on driver unload.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 594766c ("drm/nouveau/gsp: move booter handling to GPU-specific code")
Cc: <stable@vger.kernel.org> # v6.16+
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Link: https://patch.msgid.link/20251202175918.63533-1-lyude@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0 commit comments