AI compute + token cost trends 2024-2026 — unit economics for AI ITSM

Source report: /tmp/ai-compute-token-cost-trends-2026-05-12.md.

TL;DR — token cost is no longer the binding constraint

A realistic 30K-token IT ticket costs:

TierCost-per-ticket (with caching)
Frontier (Opus 4.7)0.35 raw
Default (Sonnet 4.6)0.10
Lightweight (Haiku 4.5 / GPT-5.4-mini)0.04

At a $2/resolution price point and realistic 2026 routing (default Sonnet tier), the implied gross margin is 75-90%.

Token cost trajectory (May 2024 → May 2026)

  • GPT-4-equivalent quality is ~50× cheaper than May 2024 (Epoch AI: 0.40/M at Pareto frontier).
  • Frontier reasoning tier has been notably sticky — Opus / GPT-5.5 prices roughly flat in nominal terms.
  • As the per-token curve continues down, gross margin expands ~5-10 points/year at a fixed price.

Prompt caching is the decisive architecture lever

  • 90% off cached input on Claude.
  • Similar on Gemini.
  • 50% on OpenAI.

For AI ITSM (KB + system prompt recurs across tickets):

Pricing reference anchors

Competitor pricing points referenced against a $2/resolution model:

ReferenceFact
Salesforce Agentforceimplied $35-50/resolution
Atomicworkfloor $90/employee/year (seat-priced)
Aisera / ServiceNow”outcome pricing as rhetoric” — see pricing-benchmarks-ai-itsm-2026

At 500/mo = $6k/yr.

Token-envelope clause (cost-tail context)

A token envelope (e.g. $2 up to 50K tokens; surcharge above) bounds margin exposure to p95 long-tail tickets (rare deep multi-turn debugging chains) under per-resolution pricing.

Self-host break-even

  • 50-100M tokens/day = ~$50M ARR equivalent.
  • NOT a near-term lever for early-stage Init Intelligence.
  • On-prem is a data-residency moat, not a cost play.
  • See data-residency-sovereignty-2026 for sovereign cloud framing.
  • NVIDIA H100 → H200 → B100 / B200 generation transition active.
  • Trainium 2 + TPU v6 alternatives mature.
  • AMD MI300X / MI325 viable.
  • Groq LPU + Cerebras gaining inference-only market share.
  • Edge AI (Apple Silicon) viable for on-device — not relevant to enterprise ITSM at scale.

Notes

  • Default routing tier at realistic 2026 economics is Sonnet 4.6; frontier Opus is needed for ~5-10% of complex tickets.