AI agents burn cash. Recent reports indicate AI task costs can exceed human wages. Microsoft data highlights this inefficiency. Agents loop until completion. Without limits, loops run forever. Your bill spikes. You need hard guardrails.
This tutorial adds a budget breaker to your agent loop. We will track tokens in real-time. We will stop execution when costs hit a threshold.
Step 1: Instrument Token Counting
You cannot manage what you do not measure. Most SDKs hide token usage. You must expose it. Wrap your LLM call. Capture prompt_tokens and completion_tokens.
``python
def track_usage(response):
usage = response.usage
total = usage.prompt_tokens + usage.completion_tokens
return total
`Store this count in a session variable. Do not rely on external dashboards. They update too slowly. You need immediate feedback within the code execution path. Log every transaction to a local database or Redis cache. This ensures you have an audit trail if billing disputes arise later.
Step 2: Define Hard Budget Limits
Set a dollar cap per task. $5.00 is a safe starting point for internal tools. Consumer-facing features need lower limits. Calculate cost per token based on your model. GPT-4o costs roughly $0.03 per 1k input tokens.
`python
MAX_BUDGET_CENTS = 500 # $5.00
COST_PER_TOKEN = 0.00003 # Approximate rate
`Convert tokens to currency immediately. Do not wait for the monthly bill. Compare the running total against your limit before every new API call. Multiply your accumulated token count by the model's specific rate. Different models have different pricing tiers. Hardcode these rates in a config file so you can update them without redeploying code when providers change pricing.
Step 3: Interrupt the Loop on Breach
Agents run in
while loops. Insert a check at the start of every iteration. If the cost exceeds the limit, raise an exception. Return a specific error message to the user. Do not let the agent attempt to "fix" the budget error. It will spend more money trying to explain why it spent too much money.`python
if current_cost > MAX_BUDGET:
raise BudgetExceededError("Task cost limit reached")
``This forces a hard stop. It protects your infrastructure from runaway recursion. Ensure your frontend handles this exception gracefully. Show a clear message stating the task was too complex for the allocated budget.
Common pitfalls
Developers often ignore context window bloat. Agents pass full history to every turn. Token counts grow exponentially. Clear old messages after three turns. Another error is forgetting embedding costs. Vector searches charge per query. Include these in your total calculation. Finally, do not set limits too low. If the limit is $0.10, the agent will fail before completing useful work. Test with realistic tasks.
Next step
Once costs are controlled, reduce them. Learn to compress system prompts to cut input tokens by 40%: [Prompt Compression Techniques](https://example.com/prompt-compression)