All Projects
experiment Foundation Complete

Recursive Agent Loop

Patch-and-Retry Sandbox

A recursive agent experiment that writes code, reads the traceback, patches itself, and retries inside a sandbox.

Project Brief

7 of 7 solved
Problems run
2.0 to green test
Avg attempts
Codex CLI via subprocess
Patcher
Test pass or budget
Exit gate
01 - Project Brief

Problem, Hypothesis, Outcome.

Summary

A contained patch-and-retry loop where an agent observes its own failures, proposes changes, and keeps iterating until the test passes or the budget runs out.

Problem

One-shot code generation produces brittle results, especially when the real failure only appears after execution.

Hypothesis

If an agent can fail safely, read the failure clearly, and patch iteratively, its reliability improves more than it would through one-shot generation alone.

Outcome

Built and ran the loop against 7 real problems. All 7 solved. Every problem failed on attempt 1 and passed on attempt 2 — including FizzBuzz with two interacting bugs and a merge sort with a subtle index error.

02 - Goals & Stack

What the build was trying to do.

Goals

  • Give the agent a safe place to fail repeatedly.
  • Use tracebacks as a structured feedback signal.
  • Exit only when a real test passes or the retry budget is exhausted.

Technologies Used

Sandboxed execution Traceback parsing Patch loop orchestration Test gating
03 - Breakdown & Notes

Implementation notes.

Breakdown

The point of this project is not that the agent writes perfect code. The point is that the system gives the agent a contained place to fail, a readable signal about why it failed, and permission to try again. That changes the reliability story completely.

By wrapping generation, execution, traceback parsing, and patching into one controlled loop, the project turns runtime errors into fuel for the next attempt. The success condition is not “looks plausible.” It is “the test actually passed.”

Build notes

  • Code runs inside a sandbox so the loop can fail without creating collateral damage.
  • Tracebacks become a structured signal for the next patch attempt.
  • Stop conditions matter as much as retry logic.
  • Test results are used as the final gate rather than vibes.

Lessons Learned

The main lesson was that reliability does not come from asking the agent to be smarter in a single shot. It comes from building a system around the agent that makes failure cheap, observable, and correctable.

04 - Analysis

Findings.

01

0 of 7 problems passed on first attempt. Every problem required the loop to fire — confirming that one-shot generation alone would have failed the full batch.

02

All 7 solved on attempt 2. The traceback-as-feedback pattern worked across every error class tested, including wrong-output cases where there was no exception to parse.

03

FizzBuzz had two interacting bugs — a wrong range and a wrong check order — and the patcher resolved both in one patch cycle after reading the structured assertion failure.

Analysis

Attempt Timeline — Failure to Pass Per Problem

Loading chart...

7 problems across 5 error classes: SyntaxError, NameError, IndexError, TypeError, and wrong output. Every problem failed on attempt 1 and passed on attempt 2. The loop was driven by real Codex CLI calls — each failed test output was passed directly to the patcher as a structured signal. FizzBuzz carried two interacting bugs (wrong range and wrong check order); the patcher identified and fixed both in a single pass.

[ Connect ]

Worth a conversation?

If you are exploring agent loops, sandboxes, or self-correcting automation, I am especially interested in those conversations.

All Projects →

You are reaching

John Meyer

Security Engineer → AI

  • Open to roles
  • Contract + consulting
  • Architecture advisory