AI API Cost Calculator: How to Optimize Your AI Spending
April 17, 2026 · 9 min read
AI API billing is based on tokens, not requests. Understanding how tokens work, how to count them, and how to minimize them is the difference between a $100/month bill and a $10,000/month bill for the same functionality.
How AI API Pricing Works
Every AI API charges separately for input tokens (your prompt) and output tokens (the model's response). The formula is simple:
Cost = (input_tokens / 1,000,000) * input_price + (output_tokens / 1,000,000) * output_price
# Example: GPT-5 via AIPower
# 2,000 input tokens + 500 output tokens
cost = (2000 / 1_000_000) * 3.75 + (500 / 1_000_000) * 22.50
# cost = $0.0075 + $0.01125 = $0.019 per requestToken Counting: Rules of Thumb
- 1 token is roughly 4 characters or 0.75 words in English
- 1,000 tokens is roughly 750 words
- A typical chat message: 50-200 tokens
- A system prompt: 200-2,000 tokens
- A full page of text: ~500 tokens
- CJK languages use more tokens per character (Chinese: ~1.5 tokens/character)
Count Tokens Programmatically
import tiktoken
def count_tokens(text, model="gpt-4o"):
"""Count tokens for a given text. Works for most models."""
enc = tiktoken.encoding_for_model(model)
return len(enc.encode(text))
# Count before sending
prompt = "Explain the theory of relativity in simple terms."
token_count = count_tokens(prompt)
print(f"Tokens: {token_count}") # ~11 tokens
# Estimate cost before calling API
input_price = 0.34 # DeepSeek V3 per M tokens
estimated_cost = (token_count / 1_000_000) * input_price
print(f"Estimated input cost: $\{estimated_cost:.6f}")Cost Comparison Table (per 1M tokens)
| Model | Input | Output | 1K Requests Cost* |
|---|---|---|---|
| GLM-4 Flash | $0.01 | $0.01 | $0.03 |
| Doubao Pro | $0.06 | $0.11 | $0.17 |
| Qwen Turbo | $0.08 | $0.30 | $0.46 |
| DeepSeek V3 | $0.32 | $0.48 | $0.88 |
| Gemini 2.5 Flash | $0.35 | $2.88 | $2.14 |
| GPT-5 | $2.88 | $17.25 | $14.08 |
| Claude Sonnet 4 | $3.45 | $17.25 | $15.53 |
| Claude Opus 4.6 | $5.75 | $28.75 | $25.88 |
*Estimated for 2K input + 500 output tokens per request.
Optimization Strategy 1: Model Tiering
Route requests to the cheapest model that can handle the task:
from openai import OpenAI
client = OpenAI(base_url="https://api.aipower.me/v1", api_key="YOUR_KEY")
def cost_optimized_call(prompt, task_type="general"):
model_map = {
"classification": "zhipu/glm-4-flash", # $0.01/M
"extraction": "doubao/doubao-pro-256k", # $0.06/M
"general": "deepseek/deepseek-chat", # $0.34/M
"coding": "anthropic/claude-sonnet", # $4.50/M
"reasoning": "deepseek/deepseek-reasoner", # $0.34/M
}
model = model_map.get(task_type, "auto")
return client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
).choices[0].message.contentOptimization Strategy 2: Response Caching
import hashlib, json
cache = {}
def cached_call(messages, model="deepseek/deepseek-chat"):
key = hashlib.md5(json.dumps(messages).encode()).hexdigest()
if key in cache:
return cache[key] # Free — no API call
response = client.chat.completions.create(model=model, messages=messages)
result = response.choices[0].message.content
cache[key] = result
return result
# 30-60% cache hit rate is typical for production appsOptimization Strategy 3: Prompt Compression
- Trim system prompts: Remove verbose instructions. "Be concise" works as well as a 500-word style guide.
- Limit history: Send last 5-10 messages, not the entire conversation.
- Summarize context: Compress long documents before including them as context.
- Use max_tokens: Cap output length to avoid runaway responses.
Monthly Cost Estimator
| Daily Requests | GLM-4 Flash | DeepSeek V3 | GPT-5 |
|---|---|---|---|
| 100 | $0.09/mo | $2.79/mo | $56.25/mo |
| 1,000 | $0.90/mo | $27.90/mo | $562.50/mo |
| 10,000 | $9.00/mo | $279.00/mo | $5,625/mo |
| 100,000 | $90.00/mo | $2,790/mo | $56,250/mo |
Monitor your spending in real time on the AIPower dashboard. Start with 10 free API calls at aipower.me to benchmark costs for your use case.
GET STARTED WITH AIPOWER
16 AI models. One API. OpenAI SDK compatible.
Who should use AIPower?
- • Developers needing both Chinese and Western AI models
- • Chinese teams that can't access OpenAI / Anthropic directly
- • Startups wanting multi-model redundancy through one API
- • Anyone tired of paying grey-market intermediary premiums
3 steps to first API call
- Sign up — email only, 10 free trial calls, no card
- Copy your API key from the dashboard
- Change
base_urlin your OpenAI SDK → done
from openai import OpenAI
client = OpenAI(
base_url="https://api.aipower.me/v1", # ← only change
api_key="sk-your-aipower-key",
)
response = client.chat.completions.create(
model="auto-cheap", # or anthropic/claude-opus, deepseek/deepseek-chat, openai/gpt-5, etc.
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)+100 bonus calls on first $5 top-up · WeChat Pay + Alipay + card accepted · docs · security