Skip to content

Commit c41c0eb

Browse files
committed
printk/nbcon: Block printk kthreads when any CPU is in an emergency context
In emergency contexts, printk() tries to flush messages directly even on nbcon consoles. And it is allowed to takeover the console ownership and interrupt the printk kthread in the middle of a message. Only one takeover and one repeated message should be enough in most situations. The first emergency message flushes the backlog and printk kthreads get to sleep. Next emergency messages are flushed directly and printk() does not wake up the kthreads. However, the one takeover is not guaranteed. Any printk() in normal context on another CPU could wake up the kthreads. Or a new emergency message might be added before the kthreads get to sleep. Note that the interrupted .write_thread() callbacks usually have to call nbcon_reacquire_nobuf() and restore the original device setting before checking for pending messages. The risk of the repeated takeovers will be even bigger because __nbcon_atomic_flush_pending_con is going to release the console ownership after each emitted record. It will be needed to prevent hardlockup reports on other CPUs which are busy waiting for the context ownership, for example, by nbcon_reacquire_nobuf() or __uart_port_nbcon_acquire(). The repeated takeovers break the output, for example: [ 5042.650211][ T2220] Call Trace: [ 5042.6511 ** replaying previous printk message ** [ 5042.651192][ T2220] <TASK> [ 5042.652160][ T2220] kunit_run_ ** replaying previous printk message ** [ 5042.652160][ T2220] kunit_run_tests+0x72/0x90 [ 5042.653340][ T22 ** replaying previous printk message ** [ 5042.653340][ T2220] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5042.654628][ T2220] ? stack_trace_save+0x4d/0x70 [ 5042.6553 ** replaying previous printk message ** [ 5042.655394][ T2220] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5042.656713][ T2220] ? save_trace+0x5b/0x180 A more robust solution is to block the printk kthread entirely whenever *any* CPU enters an emergency context. This ensures that critical messages can be flushed without contention from the normal, non-atomic printing path. Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20250926124912.243464-2-pmladek@suse.com [pmladek@suse.com: Added changes proposed by John Ogness] Signed-off-by: Petr Mladek <pmladek@suse.com>
1 parent 48e3694 commit c41c0eb

1 file changed

Lines changed: 37 additions & 1 deletion

File tree

kernel/printk/nbcon.c

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,9 @@
118118
* from scratch.
119119
*/
120120

121+
/* Counter of active nbcon emergency contexts. */
122+
static atomic_t nbcon_cpu_emergency_cnt = ATOMIC_INIT(0);
123+
121124
/**
122125
* nbcon_state_set - Helper function to set the console state
123126
* @con: Console to update
@@ -1163,6 +1166,16 @@ static bool nbcon_kthread_should_wakeup(struct console *con, struct nbcon_contex
11631166
if (kthread_should_stop())
11641167
return true;
11651168

1169+
/*
1170+
* Block the kthread when the system is in an emergency or panic mode.
1171+
* It increases the chance that these contexts would be able to show
1172+
* the messages directly. And it reduces the risk of interrupted writes
1173+
* where the context with a higher priority takes over the nbcon console
1174+
* ownership in the middle of a message.
1175+
*/
1176+
if (unlikely(atomic_read(&nbcon_cpu_emergency_cnt)))
1177+
return false;
1178+
11661179
cookie = console_srcu_read_lock();
11671180

11681181
flags = console_srcu_read_flags(con);
@@ -1214,6 +1227,13 @@ static int nbcon_kthread_func(void *__console)
12141227
if (kthread_should_stop())
12151228
return 0;
12161229

1230+
/*
1231+
* Block the kthread when the system is in an emergency or panic
1232+
* mode. See nbcon_kthread_should_wakeup() for more details.
1233+
*/
1234+
if (unlikely(atomic_read(&nbcon_cpu_emergency_cnt)))
1235+
goto wait_for_event;
1236+
12171237
backlog = false;
12181238

12191239
/*
@@ -1655,6 +1675,8 @@ void nbcon_cpu_emergency_enter(void)
16551675

16561676
preempt_disable();
16571677

1678+
atomic_inc(&nbcon_cpu_emergency_cnt);
1679+
16581680
cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
16591681
(*cpu_emergency_nesting)++;
16601682
}
@@ -1669,10 +1691,24 @@ void nbcon_cpu_emergency_exit(void)
16691691
unsigned int *cpu_emergency_nesting;
16701692

16711693
cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
1672-
16731694
if (!WARN_ON_ONCE(*cpu_emergency_nesting == 0))
16741695
(*cpu_emergency_nesting)--;
16751696

1697+
/*
1698+
* Wake up kthreads because there might be some pending messages
1699+
* added by other CPUs with normal priority since the last flush
1700+
* in the emergency context.
1701+
*/
1702+
if (!WARN_ON_ONCE(atomic_read(&nbcon_cpu_emergency_cnt) == 0)) {
1703+
if (atomic_dec_return(&nbcon_cpu_emergency_cnt) == 0) {
1704+
struct console_flush_type ft;
1705+
1706+
printk_get_console_flush_type(&ft);
1707+
if (ft.nbcon_offload)
1708+
nbcon_kthreads_wake();
1709+
}
1710+
}
1711+
16761712
preempt_enable();
16771713
}
16781714

0 commit comments

Comments
 (0)