How it works¶
readme2demo is a pipeline of small, resumable stages orchestrated over a crash-safe manifest.json. Each stage reads and writes the run directory, so a failed run resumes exactly where it stopped.
repo URL / guide → ingest & plan → agent run (in Docker) → normalize transcript
→ distill minimal path → VERIFY replay in a fresh container
→ tutorial + step_by_step.md → render VHS demo video
The stages¶
- ingest — clone the repo (if any), collect the README and docs, and run a single planner pass that emits a machine-checkable plan with a feasibility verdict. Infeasible quickstarts (needs GPUs, credentials, a GUI) fail here for pennies, before any agent time is spent.
- agent — the chosen agent engine runs inside a hardened sandbox until the quickstart works, blocks, or runs out of turns. When a step-by-step guide is present, the agent must execute every step.
- normalize — pure, deterministic Python turns the messy transcript into a structured command log. No LLM calls.
- distill — one LLM pass reduces the run to the minimal reproduction path and writes
commands.sh. The grounding validator runs here. - verify — the moat.
commands.shis replayed in a brand-new container that the agent never touched. This is the only source of the "verified" verdict. - tutorial — finalizes
step_by_step.mdwith the verified outputs. - render — builds the VHS tape from the finalized
step_by_step.md, so the video provably follows the published guide, and records it asdemo.mp4/demo.gif.
The grounding invariant¶
The LLM never publishes anything a fresh container did not independently execute.
This is the single most important rule in the system, and it is enforced in code, not prompts. Every command the distiller emits must fuzzy-match a command that actually succeeded in the agent's log; anything else is a violation. The fresh-container replay in the verify stage is what earns the verification badge that every tutorial carries:
✅ Verified on <date> · image <digest> · commit <sha>
If the replay does not pass, the output ships loudly labeled ⚠ UNVERIFIED — it is never silently published.
Security model¶
READMEs are untrusted code, so the agent runs inside a container hardened with cap-drop ALL, no-new-privileges, non-root execution, and memory/CPU/PID limits. That container is the permission boundary. See the Security model page for the threat model and known tradeoffs.