AI compute + token cost trends 2024-2026 — unit economics for AI ITSM
Source report: /tmp/ai-compute-token-cost-trends-2026-05-12.md.
TL;DR — token cost is no longer the binding constraint
A realistic 30K-token IT ticket costs:
| Tier | Cost-per-ticket (with caching) |
|---|---|
| Frontier (Opus 4.7) | 0.35 raw |
| Default (Sonnet 4.6) | 0.10 |
| Lightweight (Haiku 4.5 / GPT-5.4-mini) | 0.04 |
At a $2/resolution price point and realistic 2026 routing (default Sonnet tier), the implied gross margin is 75-90%.
Token cost trajectory (May 2024 → May 2026)
- GPT-4-equivalent quality is ~50× cheaper than May 2024 (Epoch AI: 0.40/M at Pareto frontier).
- Frontier reasoning tier has been notably sticky — Opus / GPT-5.5 prices roughly flat in nominal terms.
- As the per-token curve continues down, gross margin expands ~5-10 points/year at a fixed price.
Prompt caching is the decisive architecture lever
- 90% off cached input on Claude.
- Similar on Gemini.
- 50% on OpenAI.
For AI ITSM (KB + system prompt recurs across tickets):
- Drops effective COGS by 50-80%.
- Relevant to oss-agent-infra-2026 tool-transport design.
Pricing reference anchors
Competitor pricing points referenced against a $2/resolution model:
| Reference | Fact |
|---|---|
| Salesforce Agentforce | implied $35-50/resolution |
| Atomicwork | floor $90/employee/year (seat-priced) |
| Aisera / ServiceNow | ”outcome pricing as rhetoric” — see pricing-benchmarks-ai-itsm-2026 |
At 500/mo = $6k/yr.
Token-envelope clause (cost-tail context)
A token envelope (e.g. $2 up to 50K tokens; surcharge above) bounds margin exposure to p95 long-tail tickets (rare deep multi-turn debugging chains) under per-resolution pricing.
Self-host break-even
- 50-100M tokens/day = ~$50M ARR equivalent.
- NOT a near-term lever for early-stage Init Intelligence.
- On-prem is a data-residency moat, not a cost play.
- See data-residency-sovereignty-2026 for sovereign cloud framing.
Hardware trends (context, not action)
- NVIDIA H100 → H200 → B100 / B200 generation transition active.
- Trainium 2 + TPU v6 alternatives mature.
- AMD MI300X / MI325 viable.
- Groq LPU + Cerebras gaining inference-only market share.
- Edge AI (Apple Silicon) viable for on-device — not relevant to enterprise ITSM at scale.
Notes
- Default routing tier at realistic 2026 economics is Sonnet 4.6; frontier Opus is needed for ~5-10% of complex tickets.
Related
- pricing-benchmarks-ai-itsm-2026 — published-per-resolution SKU wedge
- data-residency-sovereignty-2026 — self-host as residency moat
- oss-agent-infra-2026 — engineering stack for cache-first design
- pre-series-a-operating-metrics-2026 — GM trajectory for investor reporting
- build-vs-buy-ai-agents-2026 — TCO context
- anthropic · openai · google-cloud
- Init Intelligence