Skip to content

allsmog/vuln-scout

Repository files navigation

VulnScout

VulnScout

AI-powered whitebox penetration testing for Claude Code.

One command. Full audit. Any codebase.

/whitebox-pentest:full-audit /path/to/code

VulnScout is a Claude Code plugin that turns Claude into an autonomous security reviewer. It brings battle-tested pentesting methodology (HTB Academy, OffSec AWAE/OSWE) into your terminal with STRIDE threat modeling, evidence-first findings, and support for 9 languages including Solidity smart contracts.

Tested end-to-end on OWASP Juice Shop v17.1.1 -- 62 findings across SQL injection, XSS, path traversal, SSTI, SSRF, hardcoded secrets, and more.

Why VulnScout?

Traditional SAST tools find patterns. VulnScout understands your application.

  • Automated scan pipeline -- Semgrep + Joern CPG + secret scanning in one command, with SARIF and Markdown output
  • Threat models first, then hunts -- STRIDE analysis identifies what matters before scanning
  • Traces data flow, not just patterns -- follows user input from source to sink across files and services
  • 15 CPG verification scripts -- proves exploitability through Code Property Graph analysis, not just pattern matching
  • CVSS 3.1 auto-scoring -- every finding gets a CVSS vector and numeric score
  • Handles massive codebases -- language-aware compression (Go: 97% reduction, Python: 90%) lets it audit million-token monorepos
  • Chains vulnerabilities -- finds SSRF-to-SSTI-to-RCE attack chains that single-pattern scanners miss
  • Polyglot-native -- audits Go + Python + TypeScript microservices as one interconnected system

Quick Start

As a Claude Code plugin

# Option 1: Symlink into your project's plugin directory
mkdir -p .claude/plugins
ln -s /path/to/vuln-scout/whitebox-pentest .claude/plugins/whitebox-pentest

# Option 2: Copy into your project
cp -r /path/to/vuln-scout/whitebox-pentest .claude/plugins/whitebox-pentest

# Run a full audit
/whitebox-pentest:full-audit .

# Or start with threat modeling
/whitebox-pentest:threats

Note: .claude/plugins/ is relative to your project root. Claude Code automatically discovers plugins in this directory.

As a Kuzushi task

VulnScout is available as a native task in Kuzushi, the AI security scanner. When installed as a dependency, Kuzushi loads the vuln-scout plugin into its task DAG alongside Semgrep, CodeQL, and 15+ other detection tasks.

# Run vuln-scout as part of a Kuzushi scan
npx kuzushi /path/to/repo --tasks vuln-scout

# Combine with other tasks
npx kuzushi /path/to/repo --tasks semgrep,vuln-scout,threat-hunt

# Use a specific model for vuln-scout
npx kuzushi /path/to/repo --tasks vuln-scout --task-model vuln-scout=anthropic:claude-opus-4-6

# Configure via .kuzushi/config.json
# { "tasks": ["semgrep", "vuln-scout"], "taskConfig": { "vuln-scout": { "model": "anthropic:claude-opus-4-6", "maxFindings": 30 } } }

Kuzushi handles triage, verification, PoC generation, and reporting on top of vuln-scout's findings.

Standalone Scan Pipeline

VulnScout includes Python scripts that run independently of Claude Code:

# Scan with Semgrep + secret scanning
python3 scripts/scan_orchestrator.py /path/to/code --tools semgrep --secrets --format sarif

# Create a Joern CPG (cached by content hash)
python3 scripts/create_cpg.py /path/to/code

# Batch-verify findings with Joern CPG analysis
python3 scripts/batch_verify.py --findings .claude/findings.json --cpg .joern/*.cpg

# Render HTML or Markdown from an existing findings artifact
python3 scripts/report.py .claude/findings.json --format html --output security-report.html

# CI gate: fail on high-severity findings
python3 scripts/scan_orchestrator.py . --tools semgrep --fail-on high --format sarif --output findings.sarif

# Validate and run prompt/skill eval suites
python3 whitebox-pentest/scripts/validate_evals.py
python3 whitebox-pentest/scripts/run_prompt_evals.py

What You Get

13 Commands

Command What it does
/whitebox-pentest:full-audit One command does everything -- scopes, threat models, audits, reports
/whitebox-pentest:threats STRIDE threat modeling with data flow diagrams
/whitebox-pentest:sinks Find dangerous functions across 9 languages
/whitebox-pentest:trace Follow data from source to sink
/whitebox-pentest:scan Run Semgrep, CodeQL, and Joern into a shared findings artifact
/whitebox-pentest:scope Handle large codebases with smart compression
/whitebox-pentest:propagate Found one bug? Find every instance of the pattern
/whitebox-pentest:verify CPG-based false positive elimination
/whitebox-pentest:report Render Markdown, JSON, SARIF, or HTML from the shared findings artifact
/whitebox-pentest:diff Compare security posture between git refs and highlight regressions
/whitebox-pentest:auto-fix Auto-remediate verified findings with generated patches
/whitebox-pentest:create-rule Generate a custom Semgrep rule from a confirmed vulnerability pattern
/whitebox-pentest:mutate Mutation-test security controls to find detection gaps

8 Autonomous Agents

Agents run independently and return detailed analysis:

  • app-mapper -- Maps architecture and trust boundaries
  • threat-modeler -- STRIDE analysis and data flow diagrams (consumes app-mapper output)
  • code-reviewer -- Proactive vulnerability identification
  • local-tester -- Dynamic testing guidance (hands off to poc-developer)
  • poc-developer -- Proof of concept development
  • patch-advisor -- Specific remediation with code patches
  • false-positive-verifier -- Evidence-based verification with NEEDS_REVIEW resolution path
  • attack-researcher -- Autonomous attack vector exploration beyond pattern matching

15 Joern CPG Verification Scripts

Each script proves or disproves a vulnerability through Code Property Graph data flow analysis:

Script What it verifies
verify-sqli SQL injection (parameterization, binding APIs)
verify-cmdi Command injection (shell vs array execution)
verify-xss Cross-site scripting (encoding, Content-Type)
verify-path Path traversal (strong vs weak normalization)
verify-ssrf Server-side request forgery (URL validation, allowlists)
verify-xxe XML external entity injection (entity disabling)
verify-ssti Server-side template injection (filesystem vs user templates)
verify-deser Unsafe deserialization (SafeLoader, ObjectInputFilter)
verify-ldap LDAP injection (filter escaping)
verify-randomness Insecure randomness (crypto alternatives)
verify-reentrancy Solidity reentrancy (CEI pattern)
verify-overflow Solidity integer overflow (SafeMath, Solidity >=0.8)
verify-access-control Solidity missing access control (onlyOwner, tx.origin)
verify-delegatecall Solidity delegatecall risks (proxy patterns, EIP-1967)
verify-generic Fallback for types without a dedicated script

27 Auto-Activated Skills

Skills activate automatically when relevant -- no configuration needed:

Core Analysis: dangerous-functions, vuln-patterns, data-flow-tracing, cpg-analysis, exploit-techniques

OWASP Mapping: security-misconfiguration, cryptographic-failures, logging-failures, exception-handling, sensitive-data-leakage, business-logic

Advanced: threat-modeling, vulnerability-chains, cross-component, cache-poisoning, postmessage-xss, sandbox-escapes, framework-patterns, nextjs-react

Infrastructure: workspace-discovery, mixed-language-monorepos, owasp-2025, secret-scanning

Extended Coverage: ai-ml-attacks, owasp-api-top10, cloud-native, compliance-mapping

Framework Security Patterns

Dedicated detection patterns for:

  • Django -- ORM bypass, template injection, CSRF exemptions, settings exposure
  • Rails -- Mass assignment, SQL interpolation, ERB injection, Marshal.load
  • Spring Security -- SpEL injection, CORS/CSRF misconfiguration, actuator exposure
  • GraphQL -- Introspection, depth/complexity limits, batching, nested auth bypass
  • Next.js/React -- Server Actions SSRF, middleware bypass, Server Component data exposure
  • Flask/Twig/Blade -- SSTI, filter callbacks, sandbox escapes

Supported Languages

Language Token Reduction Static Analysis CPG Verification
Go 95-97% fewer tokens Semgrep, Joern Yes
TypeScript/JS ~80% fewer tokens Semgrep, CodeQL Yes
Python 85-90% fewer tokens Semgrep, Joern Yes
Java 80-85% fewer tokens Semgrep, CodeQL Yes
PHP 80-85% fewer tokens Semgrep Yes
Ruby 85-90% fewer tokens Semgrep Yes
Rust 85-90% fewer tokens Semgrep --
C#/.NET 80-85% fewer tokens Semgrep, CodeQL --
Solidity 70-80% fewer tokens Semgrep, Slither Yes (4 scripts)

OWASP Top 10 Mapping

# OWASP Top 10 Coverage Primary Skills
A01 Broken Access Control Covered business-logic, owasp-api-top10
A02 Cryptographic Failures Covered cryptographic-failures
A03 Injection Covered vuln-patterns, dangerous-functions
A04 Insecure Design Covered business-logic, threat-modeling
A05 Security Misconfiguration Covered security-misconfiguration, cloud-native
A06 Vulnerable and Outdated Components Out of scope --
A07 Identification and Authentication Failures Covered vuln-patterns
A08 Software and Data Integrity Failures Covered vuln-patterns, ai-ml-attacks
A09 Security Logging and Monitoring Failures Covered logging-failures, sensitive-data-leakage
A10 Server-Side Request Forgery Covered vuln-patterns, framework-patterns, cloud-native

9/10 categories covered. A06 is intentionally out of scope -- VulnScout focuses on source review and exploitability, not dependency inventory.

Findings Artifact and CI Workflow

/scan, /verify, and /full-audit share one contract: .claude/findings.json.

  • schema_version identifies the artifact version.
  • kind separates reportable finding entries from audit hotspot pivots.
  • stable_key gives each entry a suppression-safe identifier.
  • source_tool and evidence are required on every entry.
  • cvss_vector and cvss_score provide CVSS 3.1 scoring.
  • Severity summaries count only unsuppressed entries where kind == "finding".

Prompt-first orchestration adds two persisted companion artifacts:

  • .claude/audit-plan.md captures module priority, attack surfaces, and verification strategy before deep-dive auditing.
  • .claude/review-ledger.json records adversarial review rounds for audit plans, threat models, and finding verification.

CI-focused flags:

--since-commit <sha>     # Scope to recent changes
--suppressions <path>    # Apply .vuln-scout-ignore suppressions
--fail-on <severity>     # Exit code 2 when blocking findings remain
--format sarif|json|md   # Machine-readable or human-readable output
--secrets                # Enable gitleaks/trufflehog secret scanning

How It Works

/full-audit automatically:

1. Measures codebase    -->  Too big? Compresses with language-aware strategy
2. Detects frameworks   -->  Next.js, Flask, Spring, Rails, Django, Solidity...
3. Threat models        -->  STRIDE analysis, DFDs, trust boundaries
4. Plans the audit      -->  Writes `.claude/audit-plan.md` before deep dives
5. Adversarial review   -->  Writes `.claude/review-ledger.json` for threat/finding review loops
6. Scans (Semgrep)      -->  Pattern matching + taint analysis
7. Verifies (Joern)     -->  CPG data flow proof per finding
8. Chains findings      -->  Connects SSRF + SSTI + RCE across services
9. Reports              -->  Markdown, JSON, or SARIF with CVSS scores

Polyglot Monorepos

Got a Go gateway, Python ML service, and TypeScript frontend? VulnScout handles it:

/whitebox-pentest:full-audit ~/code/platform

Polyglot detected: Go (450 files) + Python (380) + TypeScript (420)

Findings by Service:
  auth-service (Go):        2 CRITICAL, 1 HIGH
  api-gateway (Go):         1 HIGH, 2 MEDIUM
  ml-pipeline (Python):     1 CRITICAL, 2 HIGH
  web-frontend (TypeScript): 3 MEDIUM

Cross-Service Findings:
  Auth token not validated in ml-pipeline (CRITICAL)
  Error messages leak from Python to Gateway (MEDIUM)

Vulnerability Coverage

  • Injection: SQL, Command, LDAP, Template (SSTI), XXE
  • Authentication: Bypass, Session attacks, JWT flaws
  • Access Control: IDOR, Privilege escalation, BOLA (API)
  • Business Logic: Workflow bypass, state manipulation, trust boundary violations
  • Cryptography: Weak algorithms, hardcoded secrets, insecure randomness
  • Deserialization: Java, PHP, Python, .NET, ML pipeline (joblib/torch.load)
  • API Security: GraphQL depth attacks, mass assignment, gRPC reflection
  • Cloud Native: IMDS endpoints, S3 misconfiguration, K8s RBAC, serverless env leakage
  • Race Conditions: TOCTOU, double-spend attacks
  • Data Leakage: Credentials in logs, error exposure, secret scanning (git history)
  • Smart Contracts: Reentrancy, integer overflow, access control, delegatecall, flash loans
  • Compliance: PCI-DSS, HIPAA, SOC 2, NIST CSF mapping

Prerequisites

Required:

npm install -g repomix    # Codebase compression for large repos

Recommended (enhances scanning):

pip install semgrep       # Pattern matching + taint analysis

Optional (deepens analysis):

# Joern CPG analysis (data flow verification)
curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" | bash

# Secret scanning (git history)
brew install gitleaks     # or: pip install trufflehog

# Solidity analysis
pip install slither-analyzer

Methodology

VulnScout implements methodologies from:

  • HTB Academy -- Whitebox Pentesting Process (4-phase)
  • OffSec AWAE -- Advanced Web Attacks and Exploitation (WEB-300)
  • NahamSec -- Deep application understanding and business logic focus

"Understanding the application deeply will always beat automation."

The plugin supports two complementary approaches:

  1. Sink-First -- Find dangerous functions, trace data flow backward
  2. Understanding-First -- Map the application, then hunt with context

Both work together. Understanding reveals business logic bugs that sink scanning misses.

Diff-Aware Scanning

Scope audits to recent changes or PR diffs for fast CI feedback:

# Scan only files changed since a known base
/whitebox-pentest:full-audit . --since-commit origin/main

# Prioritize modules with recent changes
/whitebox-pentest:full-audit . --recent 7

# Headless PR gate: diff scan, JSON output, no prompts
/whitebox-pentest:full-audit . --since-commit origin/main --quick --json --no-interactive

# Incremental Semgrep scan of changed files
/whitebox-pentest:scan . --since-commit origin/main --format sarif --fail-on high

--diff-base remains as a backward-compatible alias for older automation.

Dynamic Verification

Optionally execute generated PoC scripts to confirm exploitability:

# Audit with dynamic PoC verification (requires explicit approval per PoC)
/whitebox-pentest:full-audit . --verify-dynamic

Safety-first: PoCs run in --dry-run mode by default, require user confirmation, have a 30s timeout, and must include cleanup functions.

Project Structure

whitebox-pentest/
  .claude-plugin/plugin.json   # Plugin manifest
  agents/                       # 8 autonomous security analysts
  commands/                     # 13 slash commands
  hooks/                        # 4 background automation hooks
  skills/                       # 27 auto-activated knowledge modules
  evals/                        # Prompt/skill trigger and workflow eval definitions
  scripts/
    scan_orchestrator.py        # Main scan pipeline
    run_semgrep.py              # Semgrep wrapper + normalizer
    run_secrets.py              # Secret scanner (gitleaks/trufflehog)
    create_cpg.py               # Joern CPG creation + caching
    batch_verify.py             # Batch CPG verification
    bundle_joern.py             # Script bundler for Joern compatibility
    markdown_report.py          # Report generator
    artifact_utils.py           # Findings schema, SARIF, CVSS, dedup
    prompt_artifacts.py         # Audit plan, review ledger, and state validators
    validate_evals.py           # Prompt eval definition validator
    run_prompt_evals.py         # Prompt/skill benchmark runner
    tool_runners/               # Modular tool runner package
    joern/                      # 15 CPG verification scripts

License

MIT

About

AI-powered whitebox penetration testing plugin for Claude Code. 9 languages, 22 skills, 7 autonomous agents. STRIDE threat modeling, OWASP 2025 coverage, polyglot monorepo support.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages