Skip to content

Commit 6db5d9d

Browse files
committed
pkg/aflow/docs: repro from crash generation workflow
1 parent de114fb commit 6db5d9d

1 file changed

Lines changed: 87 additions & 0 deletions

File tree

pkg/aflow/docs/crash-to-repro.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Proposal: AI-Driven Reproducer Generation Workflow
2+
3+
## 1. Objective
4+
To design a new AI agent workflow within the syzkaller `aflow` framework (`pkg/aflow/flow/repro/repro.go`) that
5+
automatically converts a kernel crash report (and associated execution logs) into a reliable
6+
syzkaller reproducer (syzlang).
7+
The agent will leverage existing syzlang descriptions (`sys/linux/*.txt`) to ensure the generated reproducers conform
8+
to syzkaller's type system and API constraints.
9+
10+
First step is to generate the MVP that works and allows to parallelize the work. If some tool or feature may be
11+
postponed it is better to postpone it.
12+
13+
## 2. High-Level Architecture & Agent Loop
14+
15+
The workflow will operate as an iterative feedback loop, utilizing the LLM's reasoning capabilities to bridge the gap
16+
between a crash signature and a functional syzlang program.
17+
18+
1. **Context Initialization:** Ingest the kernel crash log with stack trace,
19+
target kernel information (`.config`, `kernel repo`, `kernel commit`) and
20+
the raw execution log leading up to the crash (if available from the fuzzing instance).
21+
* MVP will get all the available information from syzbot dashboard. Bug ID is the input.
22+
2. **Subsystem Analysis:** Identify the vulnerable subsystem (e.g., `io_uring`, `bpf`, `ext4`) based on the stack trace.
23+
* MVP gets subsystem information from dashboard.
24+
3. **Syzlang Contextualization:** Query the syzlang descriptions to extract the relevant syscall signatures, structs,
25+
and valid flags for the identified subsystem.
26+
* MVP tries to get ALL the descriptions assuming LLM context is big enough.
27+
4. **Draft Generation:** The LLM generates an initial candidate `.syz` reproducer.
28+
5. **Execution & Verification:** Compile and run the candidate against an instrumented kernel VM.
29+
* MVP reuses syzkaller code to verity programs.
30+
* TODO: explore execution options. See open questions.
31+
6. **Iterative Refinement:** If the crash does not reproduce, or if there is a syzlang compilation error, the agent
32+
analyzes the failure output, tweaks the arguments/syscall sequence, and tries again (up to a defined maximum
33+
iteration limit).
34+
35+
## 3. Required Framework Extensions
36+
37+
To achieve this, the `aflow` framework will need new tools and actions specifically tailored for syzlang manipulation
38+
and program execution.
39+
40+
### A. New Tools (`pkg/aflow/tool`)
41+
No tools are needed for MVP.
42+
* `SyzlangSearch`: A tool allowing the LLM to search for syzlang definitions.
43+
* *Input:* Subsystem name, syscall name, or resource type (e.g., `Search("bpf_prog_load")`).
44+
* *Output:* The syzlang syntax block defining the syscall, its arguments, and dependent structures from `sys/linux/`.
45+
* *Modification to Source Browser:* Ensure the existing source browsing tools can read `sys/` directory
46+
contents natively, so the LLM can cross-reference kernel source with syzlang API constraints.
47+
48+
### B. New Actions (`pkg/aflow/action`)
49+
For MVP we need only `ExecuteSyzProg` and `SyzCompilerCheck`.
50+
* `ExecuteSyzProg`: Runs the generated program in the test VM.
51+
* *Input:* Valid `.syz` program.
52+
* *Output:* Kernel log delta, crash signature produced (if any), and dmesg output.
53+
* `SyzCompilerCheck`: Validates the LLM-generated `.syz` program syntax.
54+
* *Input:* Raw syzlang text.
55+
* *Output:* Success, or a list of syntax/type errors from the syzkaller compiler.
56+
* TODO: this functionality is likely available in syzkaller.
57+
* `CompareCrashSignature`: Evaluates if the produced crash matches the target crash we are trying to reproduce.
58+
59+
## 4. Implementation Plan
60+
61+
### Phase 1: Tooling & Infrastructure (Foundation)
62+
* **Implement `SyzlangSearch` tool:** Parse the AST of `sys/linux/` and expose a search interface to the agent.
63+
* **Implement `ExecuteSyzProg` action:** Wrap `syz-execprog` so the agent can trigger executions inside the isolated
64+
test VMs already managed by `aflow`'s checkout/build actions.
65+
66+
### Phase 2: Prompt Engineering & Context Management
67+
For MVP we don't care about the Context Window Optimization.
68+
* **System Prompt:** Define the persona.
69+
(e.g., *"You are an expert kernel security researcher. Your goal is to write a syzkaller program to trigger
70+
a specific bug. Use syzlang syntax strictly."*)
71+
* **Context Window Optimization:** Kernel logs and syzlang files can be large.
72+
Implement truncation and selective inclusion for dmesg and syzlang structs to avoid blowing out the token limit.
73+
74+
### Phase 3: Workflow Implementation (`pkg/aflow/flow/repro/repro.go`)
75+
* Wire the state machine. Initialize the flow with the bug report ID.
76+
* Implement the iterative loop: *Generate -> Compile -> Execute -> Evaluate -> Refine*.
77+
* Implement exit conditions: Success (matching crash signature produced), Max Iterations Reached,
78+
or Unrecoverable Error.
79+
80+
### Phase 4: Evaluation & Syzbot Integration
81+
* Test the workflow against historical syzbot bugs that already have known reproducers to measure the
82+
agent's success rate and iteration average.
83+
* Deploy as an experimental job type on `syzbot.org/upstream/ai`.
84+
85+
## 5. Open Questions for Discussion
86+
* Is it better to run syz-manager in MCP mode or create the new tool?
87+
* How to verify the generated program? It is likely already implemented.

0 commit comments

Comments
 (0)