The Architecture Behind Cost-Effective AI Agents

Aruna Veerappan is Senior Director of Engineering at Upwork, leading Developer Enablement to reduce friction and boost team productivity.

Engineering leaders are discovering that the hardest part of AI agents isn’t the AI—it’s the architecture underneath.

I learned this firsthand when a quarterly budget disappeared in weeks. Nothing was broken, the models worked and the engineers were strong. But the system hadn’t been designed for cost, and the bill arrived before a single workflow reached production.

The root cause: we were pointing expensive models at every task. Verifying file existence. Checking ownership against APIs. Routing logic that could have been a single if-statement. Each call seemed reasonable. The cumulative cost was not.

I’ve come to call this the Agent Cost Spiral—and engineering teams across the industry are running into it right now.

“An Agent Cost Spiral isn’t an AI problem. It’s an architecture problem. And once you see it, you can’t unsee it.”

This pattern has a precedent. A decade ago, teams migrated to the cloud chasing savings, then watched their bills explode past on-premise costs. The architecture was the problem—not the technology. AI inference costs follow the same arc. The fix: stop treating it like a utility and start treating it like an engineering problem.

Tiered Architecture Every Agentic System Needs

A well-built AI agent isn’t a single model receiving a single prompt. It’s a choreographed system where each task is matched to the minimum level of intelligence required to complete it well.

Tier 1: The Deterministic Skeleton—Just Use Code

If your process follows a fixed rule—“if a customer’s order exceeds $5,000, route to a Senior Rep”—you don’t need AI. You need a conditional statement. Enterprise teams routinely spend real money asking frontier models to handle basic routing logic, and the cost problem is the smaller concern. AI is probabilistic, which means even a capable model can get a simple rule wrong some percentage of the time. For business logic that must be consistent 100% of the time, probabilistic is another word for broken. Build your guardrails in code. Let AI operate within them.

Tier 2: The Workhorse Models—Cheap, Fast and Good Enough

Summarizing documents. Extracting fields from structured data. Reformatting outputs. These are real, valuable tasks—but they don’t require a frontier model. Smaller “flash” models handle these workloads at roughly 1% of the cost of a premium model. If you’re using a frontier model for this work, you’re not just overpaying—you’re slowing down your pipeline.

Tier 3: The Frontier Model—Reserve It For What It’s Good At

Top-tier models are extraordinary at synthesis: taking conflicting information from multiple sources and producing nuanced, well-reasoned output. That’s where the cost is justified. The mistake is giving them everything else too. When you feed a frontier model thousands of lines of raw, unfiltered context, two bad things happen—costs spike and quality drops. The right move is to let Tier 2 do the reading and summarizing, then hand a clean, pre-processed brief to your Tier 3 model. You’re paying for reasoning, not retrieval.

What This Looks Like In Practice

One of the most common enterprise headaches is keeping technical documentation current—most teams either let it stale or throw expensive engineering hours at it.

The Lazy Approach

Send the entire codebase to a premium model and ask for documentation. Cost: ~$15 per service. The model is overwhelmed by irrelevant code, hallucinates configuration details, misses security settings and gets version numbers wrong.

The Architected Approach

This approach can be divided into three tiers:

Tier 1.

Code: automatically identify and extract the relevant configuration files—no AI needed, just pattern matching.

Tier 2.

Workhorse Model: summarize those files into a structured brief. Fast, cheap and accurate.

Tier 3.

Frontier Model: take the brief and write the final, polished documentation.

“Cost: $0.50 per service. Accuracy: measurably higher. That’s a 30× cost reduction with better output—not a trade-off.”

The quality improvement isn’t incidental—it’s structural. The frontier model performs better because it’s receiving cleaner input. You’ve set it up to succeed.

The Staircase Scaling Rule

There’s a second failure mode that hits teams who’ve already built something good. The agent tests well, confidence is high and someone makes the call to run it on everything at once.

High-cost failures almost always trace back to under-validated systems running at scale. The fix is Staircase Scaling—earning the right to scale by proving the system at each step before moving to the next.

Step 1.

The Quintet (n=5): run five samples and manually review every output. If the agent fails here, your debugging cost is $2, not $2,000.

Step 2.

The Squad (n=15): run a more diverse batch of fifteen. This is where edge cases surface.

Step 3.

Full Rollout: only when your Squad pass rate is consistently above 90% should you scale to the full dataset.

This sounds slow. It isn’t. Teams that skip this process lose weeks to remediation. Teams that follow it reach production confidently within days.

The Only Metric That Matters

Here’s what separates teams genuinely automating from teams just shifting work around: Cost per Successful Output (CSO)—not cost per API call or tokens consumed, but cost per output that clears your quality bar without human correction.

If a senior engineer spends three hours cleaning up AI-generated documentation that cost $500 to produce, nothing was automated. The work simply moved—with frustration on top. The real test is whether your CSO is lower than the cost of a human doing the same task well. Everything else is theater.

Engineering leaders getting this right share a shift: they stopped asking “Which model is smartest?” and started asking “What does each task need?” Costs start making sense. Failure modes become predictable. The architecture becomes something you can defend to a CFO.

You don’t need the smartest model. You need the right model for each job—and the discipline to know the difference.

The Agent Cost Spiral is real. It isn’t a reason to pull back—it’s a reason to build deliberately. Get the architecture right first. The ROI will follow.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Source link

What's Hot

Rob Lea Becomes First Person to Complete “Double Seven”

Signs, Causes, and Hydration Guide

The Adidas Terrex Agravic SL Is My Go-To Trail Running Shoe

The Architecture Behind Cost-Effective AI Agents

What if the office is actually a workplace perk?

The Backbone Of AI: Unscrambling The Basics

Google AI leader Noam Shazeer leaves company for OpenAI

Jeff Bezos says AI will cause “labor scarcity,” not job loss

Meta CTO: Company morale is ‘probably one of the worst it’s ever been’ after layoffs

Study finds asking AI for advice could be making you a worse person

Pico 4 Review: Should You Actually Buy One Instead Of Quest 2?

A Review of the Venus Optics Argus 18mm f/0.95 MFT APO Lens

DJI Avata Review: Immersive FPV Flying For Drone Enthusiasts

Subscribe to Updates

What's Hot

The Architecture Behind Cost-Effective AI Agents

Tiered Architecture Every Agentic System Needs

Tier 1: The Deterministic Skeleton—Just Use Code

Tier 2: The Workhorse Models—Cheap, Fast and Good Enough

Tier 3: The Frontier Model—Reserve It For What It’s Good At

What This Looks Like In Practice

The Lazy Approach

The Architected Approach

Tier 1.

Tier 2.

Tier 3.

The Staircase Scaling Rule

Step 1.

Step 2.

Step 3.

The Only Metric That Matters​

Related Posts

The Only Metric That Matters