Smart Routing

One call, the right model.

Stop hardcoding gpt-5 for every request.

Use model="auto-cheap" / "auto-code" / "auto-best" and let AIPower pick. Save 60-95% without quality loss.

How it works in 1 line of code:

response = client.chat.completions.create(
    model="auto-code",              # ← router picks Claude Sonnet 4
    messages=[{"role":"user", "content":"Refactor this function..."}],
)

No change to request / response shape. The modelfield in the response shows which model actually ran, so you can track cost & quality per route.

Six routing modes

Each mode targets a different optimization goal.

General chat / most tasks

model="auto"

🎯

Routes to: DeepSeek V3

Goal: Balance of cost & quality

Cost: ~$0.35/M

Default choice when you don't know which to pick

Batch processing, classification

model="auto-cheap"

💰

Routes to: Doubao Pro 256K

Goal: Lowest possible price

Cost: ~$0.08/M

Classify 1M user messages — cost ~$2 vs $75 on Claude

Realtime chat, suggestions, autocomplete

model="auto-fast"

⚡

Routes to: Qwen Turbo

Goal: Fastest time-to-first-token

Cost: ~$0.15/M

Live chatbot where latency > quality

Writing / refactoring code

model="auto-code"

💻

Routes to: Claude Sonnet 4

Goal: Best coding accuracy (78% SWE-bench)

Cost: ~$10/M

AI-coded features, bug fixing, agentic workflows

High-stakes reasoning

model="auto-best"

🧠

Routes to: Claude Opus 4.6

Goal: Highest quality regardless of price

Cost: ~$18/M

Legal analysis, research synthesis, complex decisions

Demos, experiments, dev scripts

model="auto-free"

🆓

Routes to: GLM-4 Flash

Goal: Near-zero cost

Cost: ~$0.01/M

Prompt engineering experiments, demos

Real cost math — 1M requests/day app

Assume 500 input tokens + 200 output per request (typical chatbot). Here's what you'd pay.

Strategy	Cost / day	Cost / month	Savings
Only Claude Opus 4.6	$5,750	$172,500	baseline
Only GPT-5	$3,300	$99,000	-43%
Only DeepSeek V3	$310	$9,300	-95%
AIPower smart routing 80% auto-cheap · 15% auto-code · 5% auto-best	$530	$15,900	-91%

Savings come from sending simple tasks to cheap models and reserving expensive models for where they matter. Same user experience, 10-20× less spend.

Automatic failover

If your primary model returns 5xx, we fall back to a different provider — same request, different upstream. Transparent to your app.

Real providers go down. Last 90 days in 2026:

• OpenAI — 3 major outages (2h+ each)
• Anthropic — 2 major outages
• Google AI — 1 capacity event
• DeepSeek — 1 peak-hour throttle

Without failover: your app was dead during at least one. With AIPower: degraded latency but stayed up.

# Failover chains by provider family
FALLBACK = {
  openai:    [claude-sonnet, deepseek-chat],
  anthropic: [gpt-5, deepseek-chat],
  deepseek:  [qwen-plus, gpt-4o-mini],
  qwen:      [deepseek-chat, gpt-4o-mini],
  google:    [claude-sonnet, gpt-5],
  ...
}

# Your code stays 1 line:
client.chat.completions.create(
    model="openai/gpt-5",
    messages=[...])
# If OpenAI 5xx, routes to Claude.
# Transparent, logged.

"Why not just build this myself?"

You could. But you'd have to:

• Maintain accounts with 10 providers (OpenAI, Anthropic, Google, DeepSeek, Qwen, Zhipu, Moonshot, MiniMax, ByteDance, Alibaba)
• Handle 10 different API formats (OpenAI-compat is not universal)
• Keep track of when each provider's pricing / rate limits change
• Monitor uptime per provider and update fallback logic
• Track which model is best per task-type (benchmarks shift quarterly)
• Build dashboards for per-model cost/latency observability

Or use AIPower:

• One account, one API key
• OpenAI SDK compatible (change 1 line)
• Pricing updated weekly in the platform
• Failover built-in
• Smart routing by scenario
• /dashboard/analytics shows you everything

Routing, failover, pricing updates, and analytics are included in the managed gateway instead of becoming another internal DevOps project.

Full code example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aipower.me/v1",
    api_key="sk-your-aipower-key",
)

def smart_ask(task_type: str, messages: list):
    """Route based on task type."""
    route_map = {
        "chat":      "auto",            # DeepSeek V3 — balanced
        "batch":     "auto-cheap",      # Doubao Pro — cheapest
        "realtime":  "auto-fast",       # Qwen Turbo — fastest
        "code":      "auto-code",       # Claude Sonnet 4 — best at code
        "hard":      "auto-best",       # Claude Opus — highest quality
        "demo":      "auto-free",       # GLM-4 Flash — near-zero cost
    }
    return client.chat.completions.create(
        model=route_map.get(task_type, "auto"),
        messages=messages,
    )

# Usage
cheap_classification = smart_ask("batch", [...])   # ~$0.001 per call
code_refactor        = smart_ask("code",  [...])   # ~$0.03 per call
important_decision   = smart_ask("hard",  [...])   # ~$0.05 per call
live_chat_response   = smart_ask("realtime", [...]) # ~$0.001 per call, < 500ms

Stop overpaying. Start routing.

One API key. 16 models. 6 routing modes. Save 60-95% vs hardcoding a premium model.

Get API key — 10 free calls See pricing →

Also: docs / analytics / in-depth blog