Token Cost Optimization
tags#llm-costs#agents
date2026-06-24
LLM token spend grows quickly once prompts get long, agents loop, or output volume rises. The five levers below cover where most of the cost actually lives.
Core strategies
Prompt caching.Cache the stable system + context prefix so repeat calls only pay for the new turn.
Prompt compression.Summarise or condense long history before sending it back to the model.
Model routing.Send simple turns to a cheap model and only escalate hard ones to the flagship.
Output caps.Pin max_tokens to a realistic ceiling so a runaway generation cannot eat the budget.
Batch API.Move offline workloads to the batch endpoint to get a 50% discount for the same model.
