This repository is the walkthrough demo for the execution-evidence path.
It is the guided walkthrough surface across the stack, not the canonical architecture hub and not the canonical evidence-profile spec.
- Architecture -> digital-biosphere-architecture
- Evidence -> agent-evidence
- Audit -> aro-audit
This repo proves the path, agent-evidence is the evidence substrate, and
aro-audit is the audit control plane.
Fastest local run:
python3 -m demo.agentFastest enterprise sandbox artifact chain:
python3 examples/enterprise_sandbox_demo/run.pyThe sandbox run writes artifacts/enterprise_sandbox_demo/ with:
intent.jsonpolicy.jsontrace.jsonlsep.bundle.jsonreplay_verdict.jsonaudit_receipt.json
- digital-biosphere-architecture
- persona-object-protocol
- agent-intent-protocol
- token-governor
- fdo-kernel-mvk
- aro-audit
- agent-evidence
- active walkthrough demo
- research annexes remain secondary to the demo path
- not a canonical implementation repo
Shared doctrine:
Sandbox controls execution; portable evidence verifies execution.
- Governance decides what should be allowed.
- Execution integrity proves what actually happened.
- Audit evidence exports artifacts for independent review.
flowchart LR
Persona["Persona (POP)"] --> Intent["Intent Object (AIP)"]
Intent --> Governance["Governance Check"]
Governance --> Trace["Execution Trace"]
Trace --> Audit["Audit Evidence (ARO)"]
- a portable persona-oriented entry point can be projected into runtime
- explicit intent and action objects can be emitted before execution
- result objects can be emitted after execution
- execution steps can be recorded as inspectable evidence
- audit-facing artifacts can be exported as bounded outputs
- Persona Layer -> POP-aligned persona context carried into the run
- Interaction Layer -> intent, action, and result objects emitted under
interaction/ - Governance Layer -> referenced as the control checkpoint for runtime policy and budget constraints
- Execution Integrity Layer -> runtime execution trace and verifiable execution context
- Audit Evidence Layer -> ARO-style exported evidence artifacts
This repository does not claim a full Token Governor integration. It demonstrates a minimal aligned path across the broader stack, with explicit governance checkpoint references in the emitted interaction and result objects.
It now also includes one fixed enterprise sandbox artifact chain for the
scenario organize client visit notes -> generate weekly report -> request approval,
while still not claiming a general full-stack Token Governor integration.
This demo is a guided path across layers. It is not the normative specification for each layer, and it points outward to the canonical repositories for those layers: digital-biosphere-architecture, persona-object-protocol, agent-intent-protocol, token-governor, and aro-audit.
See docs/execution-evidence-demo-note.md.
Repo-tracked sample bundle:
interaction/intent.jsoninteraction/action.jsoninteraction/result.jsonevidence/example_audit.jsonevidence/result.jsonevidence/sample-manifest.json
Additional tracked example:
evidence/crew_demo_audit.json
Current concrete examples in this repository include:
docs/quick-walkthrough.mddocs/interaction-flow.mddocs/shortest-validation-loop.md
bash scripts/run_demo.shThis local wrapper writes fresh output under artifacts/demo_output/.
bash scripts/run_demo.sh
make killer-demo
python3 -m http.server --directory docs 8000The receipt for the enterprise sandbox chain is checked through the canonical
ARO surface aro_audit.receipt_validation with the minimal profile.
bash scripts/setup_framework_venv.sh
.venv/bin/python crew/crew_demo.pyEnvironment notes:
- Python 3 is sufficient for the minimal local path.
- Refresh the tracked deterministic sample bundle with
python3 scripts/refresh_demo_samples.py. - The optional CrewAI and LangChain paths should run from a git-ignored local
.venv/created byscripts/setup_framework_venv.sh. - The pinned framework helper environment currently uses
crewai 1.10.1,langchain 1.2.12, andlangchain-core 1.2.18. - CrewAI currently requires Python
<3.14. - Both demo paths use deterministic local mock data and do not require external API calls.
- The Mermaid render workflow opens PRs to
mainonly through a dedicated GitHub App. - Configure repository variable
PROTOCOL_BOT_APP_IDand repository secretPROTOCOL_BOT_PRIVATE_KEYunderSettings -> Secrets and variables -> Actions. - The default repository
GITHUB_TOKENremains read-only and is not used for auto-PR promotion.
This repository now includes a paper-ready evaluation harness for
Execution Evidence Architecture for Agentic Software Systems: From Intent Objects to Verifiable Audit Receipts.
Primary entry points:
make eval-baselinemake eval-evidencemake eval-external-baselinemake eval-framework-pairmake eval-langchain-pairmake eval-ablationmake falsification-checksmake human-review-kitmake review-samplemake comparemake paper-evalmake top-journal-pack
Supporting material:
- Task Suite
- Export Format
- Review Workflow
- Comparison Workflow
- External Baseline
- Same-Framework Comparison
- LangChain Comparison
- Ablation Study
- Human Review Study
- Falsification Workflow
Generated outputs:
artifacts/runs/<task_id>/<mode>/docs/paper_support/comparison-summary.mddocs/paper_support/comparison-summary.csvartifacts/metrics/comparison-summary.jsondocs/paper_support/external-baseline-summary.mddocs/paper_support/framework-pair-summary.mddocs/paper_support/langchain-pair-summary.mddocs/paper_support/ablation-summary.mddocs/paper_support/falsification-summary.mdartifacts/human_review/synthetic-review-summary.json
The repository also includes a manuscript draft grounded in the current implemented harness and checked-in metrics:
- paper/latex/README.md
paper/latex/main.texpaper/latex/main.pdfafter local compilation
- digital-biosphere-architecture - system overview and canonical architecture hub
- persona-object-protocol - portable persona object layer
- agent-intent-protocol - semantic interaction layer
- token-governor - runtime governance and budget-policy control layer
- aro-audit - audit evidence and conformance-oriented verification layer
interaction/for explicit interaction objectsevidence/for audit and result artifactsdemo/andcrew/for runnable entry pointsintegration/for persona and intent adaptersdocs/spec/for schema notes and example payloads