All Projects
tooling Foundation Complete

Model Routing Middleware

Cost-Aware Prompt Router

A routing middleware experiment that sends prompts to the cheapest model that can still clear the quality bar.

Project Brief

46.4% across 20 prompts
Cost reduction
FastAPI middleware
Routing layer
1 of 20 (working correctly)
Fallback triggers
$9.66 per 10K requests
Projected savings
01 - Project Brief

Problem, Hypothesis, Outcome.

Summary

A FastAPI middleware layer that inspects task shape and routes requests to the lowest-cost model that can still do the job well.

Problem

Too many AI stacks default to the most expensive model every time, which turns architecture indecision into a cost problem.

Hypothesis

If prompt complexity is classified well enough, model spend can drop significantly without making quality unacceptable.

Outcome

Built and measured a cost-aware routing layer. Smart routing cut spend by 46.4% across a 20-prompt batch. The fallback path fired correctly on the one prompt that fell below the quality floor.

02 - Goals & Stack

What the build was trying to do.

Goals

  • Reduce routine model spend without breaking quality.
  • Route requests based on task complexity instead of habit.
  • Preserve a fallback path when a cheaper model misses the mark.

Technologies Used

FastAPI Model routing Cost telemetry Threshold-based fallback
03 - Breakdown & Notes

Implementation notes.

Breakdown

This project treats model choice as an architectural decision instead of a default. Each request is inspected for task type and likely complexity, then routed to the least expensive model that is still likely to succeed. The middleware also records its reasoning so the path is reviewable later.

That matters because cost control in AI is usually less about procurement and more about disciplined routing. If everything gets the premium model, there is no strategy, only spend.

Build notes

  • Prompt shape and task intent drive the route decision.
  • The router keeps a quality threshold rather than assuming cheaper is always better.
  • A fallback path protects the system when the low-cost route fails.
  • Logging is part of the product because trust in the router depends on traceability.

Lessons Learned

I started by thinking about token savings. The stronger takeaway was that routing quality is a product design issue as much as an infrastructure issue. People trust the cheaper path only when the system can explain how it made the call.

04 - Analysis

Findings.

01

Smart routing cut cost by 46.4% across the full batch. Classify and translate tasks routed to gpt-4.1-nano at 92–96% below brute-force. Extract and summarize routed to gpt-4.1-mini at 84% below.

02

Judge and reason tasks routed correctly to gpt-4o — the router does not over-optimize. It only claims savings where the cheaper model can genuinely pass the quality bar.

03

The fallback path fired once, on a nuanced disambiguation prompt that scored 0.75 against a 0.80 floor for nano. It escalated to mini automatically with a logged reason.

Analysis

Smart Routing vs. Brute Force — Cost Per Task Type

Loading chart...

20 prompts across six task types. Classify and translate route to gpt-4.1-nano (92–96% cheaper than gpt-4o). Extract and summarize route to gpt-4.1-mini (84% cheaper). Judge and reason tasks correctly use gpt-4o — no routing savings claimed where the premium model is genuinely needed. One fallback fired when a nuanced classification prompt scored below the nano quality floor and escalated to mini.

[ Connect ]

Worth a conversation?

If you are fighting model cost inflation, routing strategy, or quality-versus-cost tradeoffs, this is exactly the kind of problem I like talking through.

All Projects →

You are reaching

John Meyer

Security Engineer → AI

  • Open to roles
  • Contract + consulting
  • Architecture advisory