Summary
A FastAPI middleware layer that inspects task shape and routes requests to the lowest-cost model that can still do the job well.
Cost-Aware Prompt Router
A routing middleware experiment that sends prompts to the cheapest model that can still clear the quality bar.
Project Brief
Summary
A FastAPI middleware layer that inspects task shape and routes requests to the lowest-cost model that can still do the job well.
Problem
Too many AI stacks default to the most expensive model every time, which turns architecture indecision into a cost problem.
Hypothesis
If prompt complexity is classified well enough, model spend can drop significantly without making quality unacceptable.
Outcome
Built and measured a cost-aware routing layer. Smart routing cut spend by 46.4% across a 20-prompt batch. The fallback path fired correctly on the one prompt that fell below the quality floor.
Goals
Technologies Used
This project treats model choice as an architectural decision instead of a default. Each request is inspected for task type and likely complexity, then routed to the least expensive model that is still likely to succeed. The middleware also records its reasoning so the path is reviewable later.
That matters because cost control in AI is usually less about procurement and more about disciplined routing. If everything gets the premium model, there is no strategy, only spend.
I started by thinking about token savings. The stronger takeaway was that routing quality is a product design issue as much as an infrastructure issue. People trust the cheaper path only when the system can explain how it made the call.
01
Smart routing cut cost by 46.4% across the full batch. Classify and translate tasks routed to gpt-4.1-nano at 92–96% below brute-force. Extract and summarize routed to gpt-4.1-mini at 84% below.
02
Judge and reason tasks routed correctly to gpt-4o — the router does not over-optimize. It only claims savings where the cheaper model can genuinely pass the quality bar.
03
The fallback path fired once, on a nuanced disambiguation prompt that scored 0.75 against a 0.80 floor for nano. It escalated to mini automatically with a logged reason.
Analysis
Smart Routing vs. Brute Force — Cost Per Task Type
Loading chart...
20 prompts across six task types. Classify and translate route to gpt-4.1-nano (92–96% cheaper than gpt-4o). Extract and summarize route to gpt-4.1-mini (84% cheaper). Judge and reason tasks correctly use gpt-4o — no routing savings claimed where the premium model is genuinely needed. One fallback fired when a nuanced classification prompt scored below the nano quality floor and escalated to mini.
[ Connect ]
If you are fighting model cost inflation, routing strategy, or quality-versus-cost tradeoffs, this is exactly the kind of problem I like talking through.
You are reaching
John Meyer
Security Engineer → AI