Skip to content

pkg/aflow: pass fault injection info to LLM#7023

Open
officialasishkumar wants to merge 3 commits intogoogle:masterfrom
officialasishkumar:features/fault-injection-info
Open

pkg/aflow: pass fault injection info to LLM#7023
officialasishkumar wants to merge 3 commits intogoogle:masterfrom
officialasishkumar:features/fault-injection-info

Conversation

@officialasishkumar
Copy link
Copy Markdown

Extract fault injection reports from the kernel console output produced during crash reproduction, and pass them to the debugger LLM agent.

Fault injection is an important debugging signal: when the reproducer uses fault injection, the FAULT_INJECTION trace in the kernel log shows exactly which allocation was forced to fail and the call path that led to it. This context helps the LLM understand the root cause more accurately, especially for bugs triggered by allocation failures.

Changes:

  • Add ExtractFaultInjectionInfo() to parse FAULT_INJECTION blocks from the raw kernel console output (up to 5 blocks, each up to 50 lines).
  • Capture RawOutput from test results in RunTest.
  • Add ReproducedFaultInjection field to the reproduce result and cache it.
  • Include fault injection info in the debugger LLM prompt (conditionally, only when present).
  • Bump cache version to invalidate entries missing this field.

Fixes #6762

Extract fault injection reports from the kernel console output produced
during crash reproduction, and pass them to the debugger LLM agent.

Fault injection is an important debugging signal: when the reproducer
uses fault injection, the FAULT_INJECTION trace in the kernel log shows
exactly which allocation was forced to fail and the call path that led
to it. This context helps the LLM understand the root cause more
accurately, especially for bugs triggered by allocation failures.

Changes:
- Add ExtractFaultInjectionInfo() to parse FAULT_INJECTION blocks from
  raw kernel console output.
- Capture RawOutput from test results in RunTest.
- Add ReproducedFaultInjection field to the reproduce result and cache.
- Include fault injection info in the debugger LLM prompt when present.
- Bump cache version to invalidate entries missing this field.

Fixes google#6762
Comment thread pkg/aflow/action/crash/reproduce.go Outdated
// ExtractFaultInjectionInfo extracts fault injection reports from kernel console output.
// Fault injection is an important debugging signal: it shows which specific allocation
// was forced to fail and the call path that led to it.
func ExtractFaultInjectionInfo(output []byte) string {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unexport this function, it's not used outside of the package.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the fault-injection extraction into pkg/report, so the helper in pkg/aflow is gone now. Fixed in officialasishkumar@2dce7f1.

{{.SimplifiedCRepro}}
{{if .ReproducedFaultInjection}}
The reproducer uses fault injection to force allocation failure at a specific point.
The following fault injection report(s) show what was injected:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a sentence saying that fault injections trigger rarely executed errors handling code paths, and frequently the bug is related to these code paths. At least that's what I would say to a human debugging this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that note to the prompt, so it now explicitly says fault injection often exercises rarely hit error-handling paths and the bug is frequently there. Fixed in officialasishkumar@2dce7f1.

Comment thread pkg/aflow/action/crash/reproduce.go Outdated
// Fault injection is an important debugging signal: it shows which specific allocation
// was forced to fail and the call path that led to it.
func ExtractFaultInjectionInfo(output []byte) string {
const marker = "FAULT_INJECTION: forcing a failure"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert it to []byte here instead of converting and generating garbage in the loop below.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched the marker handling to a shared []byte value, so there is no repeated conversion in the scan anymore. Fixed in officialasishkumar@2dce7f1.

Comment thread pkg/aflow/action/crash/reproduce.go Outdated
}
block = append(block, l)
}
if len(block) > 0 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can blocks be empty here? I don't see how.

Comment thread pkg/aflow/action/crash/reproduce.go Outdated
block = append(block, l)
}
if len(block) > 0 {
blocks = append(blocks, string(bytes.Join(block, []byte{'\n'})))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are real reports delimited by new line? I don't think so.

Comment thread pkg/aflow/action/crash/reproduce.go Outdated
if len(block) > 0 {
blocks = append(blocks, string(bytes.Join(block, []byte{'\n'})))
}
if len(blocks) >= 5 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to deduplicate blocks b/c for reproducers that run the program repeatedly, we can get dozens of the same report. Though not sure if all reports will be exactly the same (are there any varying fields?).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added deduplication after compacting the extracted fault report down to the stable marker, name, and call-trace content, so repeated runs do not spam the prompt. Fixed in officialasishkumar@2dce7f1.

Comment thread pkg/aflow/action/crash/reproduce.go Outdated
// was forced to fail and the call path that led to it.
func ExtractFaultInjectionInfo(output []byte) string {
const marker = "FAULT_INJECTION: forcing a failure"
if !bytes.Contains(output, []byte(marker)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 important things:

  • we need to symbolize the output, raw console output does not contain line numbers and is hard to interpret
  • we need to use the same line context analysis logic we use in pkg/report to analyze raw output; for example, 2 threads can print fault reports at the same time, and the output may be intermixed on the line level

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked this to use pkg/report for extraction and symbolization, so the fault-injection trace now goes through the same line-context handling as Linux report parsing. Fixed in officialasishkumar@2dce7f1.

@dvyukov
Copy link
Copy Markdown
Collaborator

dvyukov commented Apr 15, 2026

Please rebase and remove the merge commit, we don't use merge commits.
There are some other errors reported by CI as well.

@dvyukov dvyukov requested review from a-nogikh, dvyukov, ramosian-glider and tarasmadan and removed request for dvyukov April 15, 2026 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pkg/aflow/flow/patching: pass fault injection info to LLM

2 participants