Back to Blog

AI Tokens: More Volatile and Costly Than You Think

EN 🇺🇸Article9 min read
#AI#LLM#Cost Optimization#FinOps#Software Engineering#Tech Leadership

When the promise of AI-driven efficiency first hit the mainstream, many of us envisioned a future where tasks evaporated, and costs plummeted. We pictured a world where powerful language models (LLMs) would handle grunt work for pennies, with token usage barely registering on the balance sheet. This comforting narrative, however, is increasingly at odds with reality.

What many companies are now discovering is that token spend isn't a benign rounding error; it's becoming a significant, volatile, and often unpredictable line item in the budget. Ignoring this shift means overlooking a critical financial and architectural challenge that can quickly outpace the cost of even junior human labor, demanding a more mature and disciplined approach to AI adoption.

What AI Tokens actually is

At its core, an AI token is the fundamental unit of text (or code, or data) that large language models process. Think of tokens as the building blocks of language that an LLM understands. When you send text to a model or receive a response, that text is first broken down into these tokens by a tokenizer. This process isn't always intuitive; a single word might be one token, or it might be split into multiple sub-word tokens, especially for complex words or punctuation.

The core mechanism involves two main types of tokens: input tokens and output tokens. Input tokens are what you send to the model in your prompts and context. Output tokens are what the model generates as its response. Each API call consumes a certain number of input tokens and generates a certain number of output tokens, and most commercial LLM providers charge different rates for each, with output tokens often being significantly more expensive. This differential pricing is a key factor in cost accumulation.

Key components

Here's a concrete flow example for a coding assistant agent:

  1. Engineer's request: An engineer provides a natural language prompt like "Generate a Python function to validate email addresses." This becomes input tokens.
  2. Agent's internal processing: The agent might use tools, search internal documentation, or perform multi-step reasoning. Each internal step, prompt to tool, tool result to agent, adds more input tokens.
  3. Code generation: The LLM generates the Python function and potentially test cases. These are output tokens.
  4. Feedback loop: The agent might then feed its generated code to a linter or test runner (more input tokens) and analyze the results.
  5. Final delivery: If successful, the agent presents the code to the engineer. The total cost is derived from all these input and output tokens consumed throughout the entire interaction.

Why engineers choose it

Engineers don't choose AI tokens directly; they choose the capabilities that AI offers, which happen to be powered by tokens. The motivation is clear: leverage.

The trade-offs you need to know

The power of AI tokens, like any powerful tool, comes with its own set of complexities. It doesn't remove challenges; it often shifts them, introducing new considerations for architecture and budget.

When to use it (and when not to)

Navigating the landscape of AI token usage requires strategic thinking, not just opportunistic adoption. Knowing when to lean in and when to hold back is key to both cost efficiency and engineering integrity.

Use it when:

Avoid it when:

Best practices that make the difference

Effectively managing AI token costs and maximizing the value of LLM integration isn't about avoiding AI; it's about applying sound engineering discipline to a new class of computing.

Model Tiering and Selection

The "best" model isn't always the right model for every task. Implement a strategy where you use the least expensive model that can reliably achieve the desired outcome. For simple classifications or summarizations, a smaller, cheaper model might suffice. Reserve the more powerful, expensive frontier models for complex tasks requiring advanced reasoning or creativity, such as multi-step agentic workflows or sophisticated content generation. Continuously evaluate and switch models as their capabilities and pricing evolve.

Implement Cost Observability and Tagging

Just as with cloud resources, visibility into token consumption is non-negotiable. Integrate robust logging and monitoring for all API calls to LLMs, capturing input/output token counts, model used, and associated metadata. Utilize tags or labels to attribute token spend to specific teams, projects, or features. This granular data allows you to identify cost centers, understand usage patterns, and forecast future expenses more accurately, turning an opaque expense into an auditable one.

Optimize Prompt Engineering for Efficiency

Tokens aren't free, so every character in a prompt counts. Practice concise and precise prompt engineering. Focus on clearly articulating the task without unnecessary verbosity. Experiment with different prompt structures, few-shot examples, and fine-tuning where appropriate to achieve better results with fewer tokens. This includes techniques like summarization of past turns in conversational agents or using structured data formats (like JSON) that are often more token-efficient than verbose natural language.

Maintain Human-in-the-Loop and Verification Layers

AI is a powerful amplifier, not a flawless autonomous system. For any critical workflow, integrate a human-in-the-loop (HITL). This means human review, editing, and approval of AI-generated content or actions before they impact production. Build automated verification layers using traditional code analysis, tests, and static checks to validate AI output. This combination ensures quality, prevents costly errors, and protects against the unpredictable nature of LLMs, making token spend a calculated investment rather than a gamble.

Wrapping up

The initial allure of AI tokens as a magically cheap resource is giving way to a more nuanced, realistic understanding. They are not merely an infrastructure utility; they represent a new, highly variable, and often opaque form of computational labor. For professional software engineers and tech leaders, this means token spend requires the same rigorous scrutiny, architectural consideration, and financial discipline as any other significant budget item.

Treating tokens as a strategic cost, not a rounding error, is the only sustainable path forward. Implement robust observability, optimize your prompts, right-size your models, and critically, always maintain human oversight. The goal isn't to shy away from AI, but to wield its immense power with intelligence and accountability. By doing so, we can truly harness AI's potential to amplify our engineering capabilities, rather than letting its hidden costs erode our budgets and trust.


Newsletter

Stay ahead of the curve

Deep technical insights on software architecture, AI and engineering. No fluff. One email per week.

No spam. Unsubscribe anytime.

AI Tokens: More Volatile and Costly Than You Think | Antonio Ferreira