EvanFlow – A TDD driven feedback loop for Claude Code
A TDD-driven iterative feedback loop for software development. 16 cohesive Claude Code skills walk an idea from brainstorm → plan → execute → tdd → iterate, with checkpoints throughout. - evanklem/...
Full article excerpt tap to expand
EvanFlow A TDD-driven iterative feedback loop for software development with Claude Code. 16 cohesive skills + 2 custom subagents walk an idea from brainstorm through implementation, with checkpoints throughout where you stay in control. One entry point: say "let's evanflow this" and the orchestrator runs the loop. brainstorm → plan → execute (sequential or parallel) → tdd → iterate → STOP The loop is conductor, not autopilot: real checkpoints at design approval, plan approval, and after iteration. The agent stops short of every git operation and waits for your direction. No auto-commits. No forced ceremony. No "must invoke a skill" tax. Quick Install The recommended path — Claude Code's plugin marketplace: /plugin marketplace add evanklem/evanflow /plugin install evanflow@evanflow Restart, then try: "Let's evanflow this — I want to add a small feature that does X." evanflow-go fires and walks the loop. The git-guardrails hook auto-activates with the plugin (no settings.json edit needed). Skills appear under the evanflow: namespace (e.g., /evanflow:evanflow-go). See Installation below for two alternative paths. What Makes It a Feedback Loop The loop is built around discipline that compounds across iterations, not single-shot generation. Every step has a checkpoint that gates the next: Brainstorm clarifies intent, proposes 2–3 approaches with embedded grill (stress-test) → you approve the design Plan maps file structure first (deep modules, deletion test) → you approve the plan Execute runs task-by-task with inline verification → blockers stop the loop and surface to you TDD is vertical-slice only: one failing test → minimal impl → repeat. Tests verify behavior through public interfaces, so they survive refactors Iterate re-reads the diff with fresh eyes, runs quality checks, screenshots UI changes, and runs against a Five Failure Modes checklist (hallucinated actions, scope creep, cascading errors, context loss, tool misuse). Hard cap of 5 iterations STOP. Report. Await your direction. The agent never auto-commits, never auto-stages, never proposes a PR For plans with 3+ truly independent units, the loop forks into a parallel coder/overseer orchestration: one coder per unit (using vertical-slice TDD with a RED checkpoint), one overseer per coder (read-only review subagent that can't modify code), plus an integration overseer that runs named integration tests at every touchpoint. The integration tests are the executable contract — interfaces can't drift if both sides have to satisfy the same passing test. Hard Rules Baked Into the Loop Several rules come from 2025-2026 industry research on agentic coding failure modes and are baked into every skill: Never invent values — file paths, env vars, IDs, function names, library APIs. If unsure, the agent stops and asks. (Action-hallucination is the most dangerous agent failure.) Assertion-correctness warning — research shows 62% of LLM-generated test assertions are wrong. Both evanflow-tdd and the overseer review explicitly check whether a one-character bug in the implementation would still let the assertion pass. Watch for context drift — evanflow-compact triggers when symptoms appear (re-asking established questions, contradicting earlier decisions). Industry data: ~65% of enterprise AI coding failures trace to context drift, not raw token exhaustion. Five Failure Modes pass in iterate + overseer review — explicit check against hallucinated actions, scope creep, cascading errors, context…
This excerpt is published under fair use for community discussion. Read the full article at GitHub.