Guide to Artificial Intelligence Terminology and Costs

AI as a Measurable Resource: Beyond the "Magic"
In today's technology landscape, interacting with advanced models like Claude can feel like a magical process. However, for a professional or business decision-maker, it's essential to shift perspective: Artificial Intelligence is not an abstraction, but a quantifiable computational resource, similar to electricity or bandwidth consumption.
Understanding the parameters that govern this consumption is not merely a technical exercise, but a fundamental FinOps (Financial Operations) skill. Being able to decipher the terms that follow will allow you to predict costs, optimize performance, and ensure that your AI projects remain sustainable and scalable.
The Token: The Atom of Information
The fundamental unit of measure in AI is not the word or the character, but the token. A token represents a segment of text (a word, part of one, or punctuation). For planning purposes, keep this ratio in mind: 1 million tokens (MTok) equals approximately 750,000 words.
The cost structure is based on the distinction between what we send and what we receive, primarily through API tools or environments like Claude Code:
| Concept | Description | Cost Logic |
|---|---|---|
| Input Token | Data, instructions, and documents provided by the user. | Lower cost: the model is "reading" existing information. |
| Output Token | Text or code generated by the AI. | Higher cost (up to 5x): requires active computational effort for creating new content. |
FinOps Strategy: Adopting concise and precise writing is not just good communication practice, but a direct cost-saving strategy. Fewer unnecessary input tokens translate into immediate budget savings.
Model Families: Opus, Sonnet, and Haiku
Anthropic organizes the Claude series (versions 4.5 and 4.6) into three main tiers.
Prices are expressed per Million Tokens (MTok).
| Model | Usage Profile | Input (per MTok) | Output (per MTok) |
|---|---|---|---|
| Opus 4.6 | The most powerful "brain" for complex reasoning. | $5.00 | $25.00 |
| Sonnet 4.6 | The best trade-off between quality and speed. | $3.00 | $15.00 |
| Haiku 4.5 | Extreme speed and minimal costs for massive volumes. | $1.00 | $5.00 |
Note on Opus 4.6 "Fast Mode": For critical operations requiring the lowest possible latency, Opus offers an accelerated mode. However, speed comes at a premium: costs are multiplied by 6 times ($30 input / $150 output).
Context Window: Working Memory and the Long Context "Premium"
The Context Window is the amount of information the AI can keep in mind simultaneously during a session. While the standard window is 200,000 (200K) tokens, Claude 4.6 models now support a Long Context Window of up to 1 million tokens.
⚠️ Critical Cost Warning: For the Sonnet 4.6 model, exceeding the 200K token threshold in a single request triggers premium pricing. It's crucial to note that the entire input is billed at the higher rate (from $3 to $6 for input and from $15 to $22.50 for output) as soon as the limit is exceeded, not just the excess tokens.
Using a large context is justified in specific scenarios:
- Analysis of entire codebases (hundreds of code files).
- Synthesis of massive technical manuals or legal transcripts.
- Searching for correlations in large document archives.
Strategic Caching and Batch Processing
To optimize spending on recurring or high-volume projects, two financial efficiency tools are available:
- Prompt Caching: Allows you to store frequently used instruction parts (e.g., company documentation).
- Cache Write: You pay a surcharge to save data ($1.25x for a 5-minute save or $2x for 1 hour).
- Cache Read: Each time the AI reuses that saved data, you pay only $0.1x (a 90% discount).
- Batch API: For non-urgent tasks (asynchronous processing within 24 hours), Anthropic offers a flat 50% discount on all tokens.
Practical Calculation Examples
Let's see how these variables affect the actual budget:
Scenario 1: Customer Support (Sonnet 4.5)
A startup handles a monthly volume of 5M input tokens and 2M output tokens.
Calculation:
- Input: 5 × $3.00 = $15.00
- Output: 2 × $15.00 = $30.00
- Total: $45.00/month
Scenario 2: Massive Document Analysis (Sonnet 4.6)
An analysis requiring 250K input tokens (exceeding the 200K threshold) and generating 1M output tokens.
Calculation (Long Context rates applied retroactively):
- Input: 0.25 × $6.00 = $1.50 (Instead of $0.75)
- Output: 1 × $22.50 = $22.50 (Instead of $15.00)
- Total: $24.00 per single operation
Scenario 3: SEO Automation (Haiku 4.5)
Processing 20M input tokens and 10M output tokens.
Calculation:
- Input: 20 × $1.00 = $20.00
- Output: 10 × $5.00 = $50.00
- Total: $70.00/month
Agent Skills: Procedural Knowledge
To understand the architecture of a modern AI, we can use a computing analogy: Models are the processors (CPU), the Agent Harness (like Claude Code) is the Operating System, and Skills are the Applications.
Skills are structured packages of instructions and resources that guide the AI in executing specific tasks. Data from the SkillsBench benchmark shows unequivocal results:
- Human-Curated Skills: Increase the success rate by up to +16.2 percentage points. Humans provide that "procedural knowledge" that models don't natively possess.
- AI-Generated Skills: Often ineffective or harmful. Models tend to generate incomplete or imprecise procedures, failing to recognize when specialized expertise is needed.
Principles for Effective Skills:
- Focus: Better to have 2-3 targeted modules than encyclopedic documentation (which consumes budget without benefits).
- Human Instructions: The experience of a domain expert is the only way to overcome the limits of the AI's latent training.
- Verifiability: Every Skill must have clear criteria to allow the AI to self-evaluate.
Best Practices for Conscious Management
Mastery of AI is measured by the ability to balance power and costs. Here are three immediate actions:
- Strategic Model Mixing: Don't use Opus for tasks that Haiku can handle. Reserve "premium" models only for the most critical reasoning steps.
- Consumption Monitoring: Use the Anthropic console to track consumption in real time and set spending limits.
- Context Optimization: Regularly clean up conversation history and use Caching for static data.
Efficiency in AI usage is not just about saving money: it's the hallmark of excellent didactic and technological design. Data-driven experimentation is the key to turning AI into a real competitive advantage.