AI compute + token cost trends 2024-2026 — unit economics for AI ITSM

Source report: /tmp/ai-compute-token-cost-trends-2026-05-12.md.

TL;DR — token cost is no longer the binding constraint

A realistic 30K-token IT ticket costs:

Tier	Cost-per-ticket (with caching)
Frontier (Opus 4.7)	$0.20 -$ 0.35 raw
Default (Sonnet 4.6)	$0.06 -$ 0.10
Lightweight (Haiku 4.5 / GPT-5.4-mini)	$0.02 -$ 0.04

At a $2/resolution price point and realistic 2026 routing (default Sonnet tier), the implied gross margin is 75-90%.

Token cost trajectory (May 2024 → May 2026)

GPT-4-equivalent quality is ~50× cheaper than May 2024 (Epoch AI: $20/ M \to$ 0.40/M at Pareto frontier).
Frontier reasoning tier has been notably sticky — Opus / GPT-5.5 prices roughly flat in nominal terms.
As the per-token curve continues down, gross margin expands ~5-10 points/year at a fixed price.

Prompt caching is the decisive architecture lever

90% off cached input on Claude.
Similar on Gemini.
50% on OpenAI.

For AI ITSM (KB + system prompt recurs across tickets):

Drops effective COGS by 50-80%.
Relevant to oss-agent-infra-2026 tool-transport design.

Pricing reference anchors

Competitor pricing points referenced against a $2/resolution model:

Reference	Fact
Salesforce Agentforce	implied $35-50/resolution
Atomicwork	floor $90/employee/year (seat-priced)
Aisera / ServiceNow	”outcome pricing as rhetoric” — see pricing-benchmarks-ai-itsm-2026

At $2/ reso l u t i o n \times 250/ m o minim u m =$ 500/mo = $6k/yr.

Token-envelope clause (cost-tail context)

A token envelope (e.g. $2 up to 50K tokens; surcharge above) bounds margin exposure to p95 long-tail tickets (rare deep multi-turn debugging chains) under per-resolution pricing.

Self-host break-even

50-100M tokens/day = ~$50M ARR equivalent.
NOT a near-term lever for early-stage Init Intelligence.
On-prem is a data-residency moat, not a cost play.
See data-residency-sovereignty-2026 for sovereign cloud framing.

Hardware trends (context, not action)

NVIDIA H100 → H200 → B100 / B200 generation transition active.
Trainium 2 + TPU v6 alternatives mature.
AMD MI300X / MI325 viable.
Groq LPU + Cerebras gaining inference-only market share.
Edge AI (Apple Silicon) viable for on-device — not relevant to enterprise ITSM at scale.

Notes

Default routing tier at realistic 2026 economics is Sonnet 4.6; frontier Opus is needed for ~5-10% of complex tickets.

pricing-benchmarks-ai-itsm-2026 — published-per-resolution SKU wedge
data-residency-sovereignty-2026 — self-host as residency moat
oss-agent-infra-2026 — engineering stack for cache-first design
pre-series-a-operating-metrics-2026 — GM trajectory for investor reporting
build-vs-buy-ai-agents-2026 — TCO context
anthropic · openai · google-cloud
Init Intelligence

Init Intelligence Atlas

Contents

AI compute + token cost trends 2024-2026 — unit economics for AI ITSM

AI compute + token cost trends 2024-2026 — unit economics for AI ITSM

TL;DR — token cost is no longer the binding constraint

Token cost trajectory (May 2024 → May 2026)

Prompt caching is the decisive architecture lever

Pricing reference anchors

Token-envelope clause (cost-tail context)

Self-host break-even

Hardware trends (context, not action)

Notes

Graph View

Table of Contents

Backlinks

Init Intelligence Atlas

Contents

AI compute + token cost trends 2024-2026 — unit economics for AI ITSM

AI compute + token cost trends 2024-2026 — unit economics for AI ITSM

TL;DR — token cost is no longer the binding constraint

Token cost trajectory (May 2024 → May 2026)

Prompt caching is the decisive architecture lever

Pricing reference anchors

Token-envelope clause (cost-tail context)

Self-host break-even

Hardware trends (context, not action)

Notes

Related

Graph View

Table of Contents

Backlinks