AI-powered whitebox penetration testing for Claude Code.
One command. Full audit. Any codebase.
/whitebox-pentest:full-audit /path/to/code
VulnScout is a Claude Code plugin that turns Claude into an autonomous security reviewer. It brings battle-tested pentesting methodology (HTB Academy, OffSec AWAE/OSWE) into your terminal with STRIDE threat modeling, evidence-first findings, and support for 9 languages including Solidity smart contracts.
Tested end-to-end on OWASP Juice Shop v17.1.1 -- 62 findings across SQL injection, XSS, path traversal, SSTI, SSRF, hardcoded secrets, and more.
Traditional SAST tools find patterns. VulnScout understands your application.
- Automated scan pipeline -- Semgrep + Joern CPG + secret scanning in one command, with SARIF and Markdown output
- Threat models first, then hunts -- STRIDE analysis identifies what matters before scanning
- Traces data flow, not just patterns -- follows user input from source to sink across files and services
- 15 CPG verification scripts -- proves exploitability through Code Property Graph analysis, not just pattern matching
- CVSS 3.1 auto-scoring -- every finding gets a CVSS vector and numeric score
- Handles massive codebases -- language-aware compression (Go: 97% reduction, Python: 90%) lets it audit million-token monorepos
- Chains vulnerabilities -- finds SSRF-to-SSTI-to-RCE attack chains that single-pattern scanners miss
- Polyglot-native -- audits Go + Python + TypeScript microservices as one interconnected system
# Option 1: Symlink into your project's plugin directory
mkdir -p .claude/plugins
ln -s /path/to/vuln-scout/whitebox-pentest .claude/plugins/whitebox-pentest
# Option 2: Copy into your project
cp -r /path/to/vuln-scout/whitebox-pentest .claude/plugins/whitebox-pentest
# Run a full audit
/whitebox-pentest:full-audit .
# Or start with threat modeling
/whitebox-pentest:threatsNote:
.claude/plugins/is relative to your project root. Claude Code automatically discovers plugins in this directory.
VulnScout is available as a native task in Kuzushi, the AI security scanner. When installed as a dependency, Kuzushi loads the vuln-scout plugin into its task DAG alongside Semgrep, CodeQL, and 15+ other detection tasks.
# Run vuln-scout as part of a Kuzushi scan
npx kuzushi /path/to/repo --tasks vuln-scout
# Combine with other tasks
npx kuzushi /path/to/repo --tasks semgrep,vuln-scout,threat-hunt
# Use a specific model for vuln-scout
npx kuzushi /path/to/repo --tasks vuln-scout --task-model vuln-scout=anthropic:claude-opus-4-6
# Configure via .kuzushi/config.json
# { "tasks": ["semgrep", "vuln-scout"], "taskConfig": { "vuln-scout": { "model": "anthropic:claude-opus-4-6", "maxFindings": 30 } } }Kuzushi handles triage, verification, PoC generation, and reporting on top of vuln-scout's findings.
VulnScout includes Python scripts that run independently of Claude Code:
# Scan with Semgrep + secret scanning
python3 scripts/scan_orchestrator.py /path/to/code --tools semgrep --secrets --format sarif
# Create a Joern CPG (cached by content hash)
python3 scripts/create_cpg.py /path/to/code
# Batch-verify findings with Joern CPG analysis
python3 scripts/batch_verify.py --findings .claude/findings.json --cpg .joern/*.cpg
# Render HTML or Markdown from an existing findings artifact
python3 scripts/report.py .claude/findings.json --format html --output security-report.html
# CI gate: fail on high-severity findings
python3 scripts/scan_orchestrator.py . --tools semgrep --fail-on high --format sarif --output findings.sarif
# Validate and run prompt/skill eval suites
python3 whitebox-pentest/scripts/validate_evals.py
python3 whitebox-pentest/scripts/run_prompt_evals.py| Command | What it does |
|---|---|
/whitebox-pentest:full-audit |
One command does everything -- scopes, threat models, audits, reports |
/whitebox-pentest:threats |
STRIDE threat modeling with data flow diagrams |
/whitebox-pentest:sinks |
Find dangerous functions across 9 languages |
/whitebox-pentest:trace |
Follow data from source to sink |
/whitebox-pentest:scan |
Run Semgrep, CodeQL, and Joern into a shared findings artifact |
/whitebox-pentest:scope |
Handle large codebases with smart compression |
/whitebox-pentest:propagate |
Found one bug? Find every instance of the pattern |
/whitebox-pentest:verify |
CPG-based false positive elimination |
/whitebox-pentest:report |
Render Markdown, JSON, SARIF, or HTML from the shared findings artifact |
/whitebox-pentest:diff |
Compare security posture between git refs and highlight regressions |
/whitebox-pentest:auto-fix |
Auto-remediate verified findings with generated patches |
/whitebox-pentest:create-rule |
Generate a custom Semgrep rule from a confirmed vulnerability pattern |
/whitebox-pentest:mutate |
Mutation-test security controls to find detection gaps |
Agents run independently and return detailed analysis:
- app-mapper -- Maps architecture and trust boundaries
- threat-modeler -- STRIDE analysis and data flow diagrams (consumes app-mapper output)
- code-reviewer -- Proactive vulnerability identification
- local-tester -- Dynamic testing guidance (hands off to poc-developer)
- poc-developer -- Proof of concept development
- patch-advisor -- Specific remediation with code patches
- false-positive-verifier -- Evidence-based verification with NEEDS_REVIEW resolution path
- attack-researcher -- Autonomous attack vector exploration beyond pattern matching
Each script proves or disproves a vulnerability through Code Property Graph data flow analysis:
| Script | What it verifies |
|---|---|
| verify-sqli | SQL injection (parameterization, binding APIs) |
| verify-cmdi | Command injection (shell vs array execution) |
| verify-xss | Cross-site scripting (encoding, Content-Type) |
| verify-path | Path traversal (strong vs weak normalization) |
| verify-ssrf | Server-side request forgery (URL validation, allowlists) |
| verify-xxe | XML external entity injection (entity disabling) |
| verify-ssti | Server-side template injection (filesystem vs user templates) |
| verify-deser | Unsafe deserialization (SafeLoader, ObjectInputFilter) |
| verify-ldap | LDAP injection (filter escaping) |
| verify-randomness | Insecure randomness (crypto alternatives) |
| verify-reentrancy | Solidity reentrancy (CEI pattern) |
| verify-overflow | Solidity integer overflow (SafeMath, Solidity >=0.8) |
| verify-access-control | Solidity missing access control (onlyOwner, tx.origin) |
| verify-delegatecall | Solidity delegatecall risks (proxy patterns, EIP-1967) |
| verify-generic | Fallback for types without a dedicated script |
Skills activate automatically when relevant -- no configuration needed:
Core Analysis: dangerous-functions, vuln-patterns, data-flow-tracing, cpg-analysis, exploit-techniques
OWASP Mapping: security-misconfiguration, cryptographic-failures, logging-failures, exception-handling, sensitive-data-leakage, business-logic
Advanced: threat-modeling, vulnerability-chains, cross-component, cache-poisoning, postmessage-xss, sandbox-escapes, framework-patterns, nextjs-react
Infrastructure: workspace-discovery, mixed-language-monorepos, owasp-2025, secret-scanning
Extended Coverage: ai-ml-attacks, owasp-api-top10, cloud-native, compliance-mapping
Dedicated detection patterns for:
- Django -- ORM bypass, template injection, CSRF exemptions, settings exposure
- Rails -- Mass assignment, SQL interpolation, ERB injection, Marshal.load
- Spring Security -- SpEL injection, CORS/CSRF misconfiguration, actuator exposure
- GraphQL -- Introspection, depth/complexity limits, batching, nested auth bypass
- Next.js/React -- Server Actions SSRF, middleware bypass, Server Component data exposure
- Flask/Twig/Blade -- SSTI, filter callbacks, sandbox escapes
| Language | Token Reduction | Static Analysis | CPG Verification |
|---|---|---|---|
| Go | 95-97% fewer tokens | Semgrep, Joern | Yes |
| TypeScript/JS | ~80% fewer tokens | Semgrep, CodeQL | Yes |
| Python | 85-90% fewer tokens | Semgrep, Joern | Yes |
| Java | 80-85% fewer tokens | Semgrep, CodeQL | Yes |
| PHP | 80-85% fewer tokens | Semgrep | Yes |
| Ruby | 85-90% fewer tokens | Semgrep | Yes |
| Rust | 85-90% fewer tokens | Semgrep | -- |
| C#/.NET | 80-85% fewer tokens | Semgrep, CodeQL | -- |
| Solidity | 70-80% fewer tokens | Semgrep, Slither | Yes (4 scripts) |
| # | OWASP Top 10 | Coverage | Primary Skills |
|---|---|---|---|
| A01 | Broken Access Control | Covered | business-logic, owasp-api-top10 |
| A02 | Cryptographic Failures | Covered | cryptographic-failures |
| A03 | Injection | Covered | vuln-patterns, dangerous-functions |
| A04 | Insecure Design | Covered | business-logic, threat-modeling |
| A05 | Security Misconfiguration | Covered | security-misconfiguration, cloud-native |
| A06 | Vulnerable and Outdated Components | Out of scope | -- |
| A07 | Identification and Authentication Failures | Covered | vuln-patterns |
| A08 | Software and Data Integrity Failures | Covered | vuln-patterns, ai-ml-attacks |
| A09 | Security Logging and Monitoring Failures | Covered | logging-failures, sensitive-data-leakage |
| A10 | Server-Side Request Forgery | Covered | vuln-patterns, framework-patterns, cloud-native |
9/10 categories covered. A06 is intentionally out of scope -- VulnScout focuses on source review and exploitability, not dependency inventory.
/scan, /verify, and /full-audit share one contract: .claude/findings.json.
schema_versionidentifies the artifact version.kindseparates reportablefindingentries from audithotspotpivots.stable_keygives each entry a suppression-safe identifier.source_toolandevidenceare required on every entry.cvss_vectorandcvss_scoreprovide CVSS 3.1 scoring.- Severity summaries count only unsuppressed entries where
kind == "finding".
Prompt-first orchestration adds two persisted companion artifacts:
.claude/audit-plan.mdcaptures module priority, attack surfaces, and verification strategy before deep-dive auditing..claude/review-ledger.jsonrecords adversarial review rounds for audit plans, threat models, and finding verification.
CI-focused flags:
--since-commit <sha> # Scope to recent changes
--suppressions <path> # Apply .vuln-scout-ignore suppressions
--fail-on <severity> # Exit code 2 when blocking findings remain
--format sarif|json|md # Machine-readable or human-readable output
--secrets # Enable gitleaks/trufflehog secret scanning/full-audit automatically:
1. Measures codebase --> Too big? Compresses with language-aware strategy
2. Detects frameworks --> Next.js, Flask, Spring, Rails, Django, Solidity...
3. Threat models --> STRIDE analysis, DFDs, trust boundaries
4. Plans the audit --> Writes `.claude/audit-plan.md` before deep dives
5. Adversarial review --> Writes `.claude/review-ledger.json` for threat/finding review loops
6. Scans (Semgrep) --> Pattern matching + taint analysis
7. Verifies (Joern) --> CPG data flow proof per finding
8. Chains findings --> Connects SSRF + SSTI + RCE across services
9. Reports --> Markdown, JSON, or SARIF with CVSS scores
Got a Go gateway, Python ML service, and TypeScript frontend? VulnScout handles it:
/whitebox-pentest:full-audit ~/code/platform
Polyglot detected: Go (450 files) + Python (380) + TypeScript (420)
Findings by Service:
auth-service (Go): 2 CRITICAL, 1 HIGH
api-gateway (Go): 1 HIGH, 2 MEDIUM
ml-pipeline (Python): 1 CRITICAL, 2 HIGH
web-frontend (TypeScript): 3 MEDIUM
Cross-Service Findings:
Auth token not validated in ml-pipeline (CRITICAL)
Error messages leak from Python to Gateway (MEDIUM)
- Injection: SQL, Command, LDAP, Template (SSTI), XXE
- Authentication: Bypass, Session attacks, JWT flaws
- Access Control: IDOR, Privilege escalation, BOLA (API)
- Business Logic: Workflow bypass, state manipulation, trust boundary violations
- Cryptography: Weak algorithms, hardcoded secrets, insecure randomness
- Deserialization: Java, PHP, Python, .NET, ML pipeline (joblib/torch.load)
- API Security: GraphQL depth attacks, mass assignment, gRPC reflection
- Cloud Native: IMDS endpoints, S3 misconfiguration, K8s RBAC, serverless env leakage
- Race Conditions: TOCTOU, double-spend attacks
- Data Leakage: Credentials in logs, error exposure, secret scanning (git history)
- Smart Contracts: Reentrancy, integer overflow, access control, delegatecall, flash loans
- Compliance: PCI-DSS, HIPAA, SOC 2, NIST CSF mapping
Required:
npm install -g repomix # Codebase compression for large reposRecommended (enhances scanning):
pip install semgrep # Pattern matching + taint analysisOptional (deepens analysis):
# Joern CPG analysis (data flow verification)
curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" | bash
# Secret scanning (git history)
brew install gitleaks # or: pip install trufflehog
# Solidity analysis
pip install slither-analyzerVulnScout implements methodologies from:
- HTB Academy -- Whitebox Pentesting Process (4-phase)
- OffSec AWAE -- Advanced Web Attacks and Exploitation (WEB-300)
- NahamSec -- Deep application understanding and business logic focus
"Understanding the application deeply will always beat automation."
The plugin supports two complementary approaches:
- Sink-First -- Find dangerous functions, trace data flow backward
- Understanding-First -- Map the application, then hunt with context
Both work together. Understanding reveals business logic bugs that sink scanning misses.
Scope audits to recent changes or PR diffs for fast CI feedback:
# Scan only files changed since a known base
/whitebox-pentest:full-audit . --since-commit origin/main
# Prioritize modules with recent changes
/whitebox-pentest:full-audit . --recent 7
# Headless PR gate: diff scan, JSON output, no prompts
/whitebox-pentest:full-audit . --since-commit origin/main --quick --json --no-interactive
# Incremental Semgrep scan of changed files
/whitebox-pentest:scan . --since-commit origin/main --format sarif --fail-on high--diff-base remains as a backward-compatible alias for older automation.
Optionally execute generated PoC scripts to confirm exploitability:
# Audit with dynamic PoC verification (requires explicit approval per PoC)
/whitebox-pentest:full-audit . --verify-dynamicSafety-first: PoCs run in --dry-run mode by default, require user confirmation, have a 30s timeout, and must include cleanup functions.
whitebox-pentest/
.claude-plugin/plugin.json # Plugin manifest
agents/ # 8 autonomous security analysts
commands/ # 13 slash commands
hooks/ # 4 background automation hooks
skills/ # 27 auto-activated knowledge modules
evals/ # Prompt/skill trigger and workflow eval definitions
scripts/
scan_orchestrator.py # Main scan pipeline
run_semgrep.py # Semgrep wrapper + normalizer
run_secrets.py # Secret scanner (gitleaks/trufflehog)
create_cpg.py # Joern CPG creation + caching
batch_verify.py # Batch CPG verification
bundle_joern.py # Script bundler for Joern compatibility
markdown_report.py # Report generator
artifact_utils.py # Findings schema, SARIF, CVSS, dedup
prompt_artifacts.py # Audit plan, review ledger, and state validators
validate_evals.py # Prompt eval definition validator
run_prompt_evals.py # Prompt/skill benchmark runner
tool_runners/ # Modular tool runner package
joern/ # 15 CPG verification scripts
