Skip to content

Commit ee6bdb3

Browse files
diandersakpm00
authored andcommitted
watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
If two CPUs end up reporting a hardlockup at the same time then their logs could get interleaved which is hard to read. The interleaving problem was especially bad with the "perf" hardlockup detector where the locked up CPU is always the same as the running CPU and we end up in show_regs(). show_regs() has no inherent serialization so we could mix together two crawls if two hardlockups happened at the same time (and if we didn't have `sysctl_hardlockup_all_cpu_backtrace` set). With this change we'll fully serialize hardlockups when using the "perf" hardlockup detector. The interleaving problem was less bad with the "buddy" hardlockup detector. With "buddy" we always end up calling `trigger_single_cpu_backtrace(cpu)` on some CPU other than the running one. trigger_single_cpu_backtrace() always at least serializes the individual stack crawls because it eventually uses printk_cpu_sync_get_irqsave(). Unfortunately the fact that trigger_single_cpu_backtrace() eventually calls printk_cpu_sync_get_irqsave() (on a different CPU) means that we have to drop the "lock" before calling it and we can't fully serialize all printouts associated with a given hardlockup. However, we still do get the advantage of serializing the output of print_modules() and print_irqtrace_events(). Aside from serializing hardlockups from each other, this change also has the advantage of serializing hardlockups and softlockups from each other if they happen to happen at the same time since they are both using the same "lock". Even though nobody is expected to hang while holding the lock associated with printk_cpu_sync_get_irqsave(), out of an abundance of caution, we don't call printk_cpu_sync_get_irqsave() until after we print out about the hardlockup. This makes extra sure that, even if printk_cpu_sync_get_irqsave() somehow never runs we at least print that we saw the hardlockup. This is different than the choice made for softlockup because hardlockup is really our last resort. Link: https://lkml.kernel.org/r/20231220131534.3.I6ff691b3b40f0379bc860f80c6e729a0485b5247@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent 896260a commit ee6bdb3

1 file changed

Lines changed: 13 additions & 0 deletions

File tree

kernel/watchdog.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
151151
*/
152152
if (is_hardlockup(cpu)) {
153153
unsigned int this_cpu = smp_processor_id();
154+
unsigned long flags;
154155

155156
/* Only print hardlockups once. */
156157
if (per_cpu(watchdog_hardlockup_warned, cpu))
@@ -165,15 +166,27 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
165166
return;
166167
}
167168

169+
/*
170+
* NOTE: we call printk_cpu_sync_get_irqsave() after printing
171+
* the lockup message. While it would be nice to serialize
172+
* that printout, we really want to make sure that if some
173+
* other CPU somehow locked up while holding the lock associated
174+
* with printk_cpu_sync_get_irqsave() that we can still at least
175+
* get the message about the lockup out.
176+
*/
168177
pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu);
178+
printk_cpu_sync_get_irqsave(flags);
179+
169180
print_modules();
170181
print_irqtrace_events(current);
171182
if (cpu == this_cpu) {
172183
if (regs)
173184
show_regs(regs);
174185
else
175186
dump_stack();
187+
printk_cpu_sync_put_irqrestore(flags);
176188
} else {
189+
printk_cpu_sync_put_irqrestore(flags);
177190
trigger_single_cpu_backtrace(cpu);
178191
}
179192

0 commit comments

Comments
 (0)