OSS AI agent infrastructure — stack recommendation (May 2026)

This page summarizes the May 11 2026 OSS agent infrastructure research pass. Source report: /tmp/oss-agent-infra-2026-05-11.md (312 lines, 34KB).

Top findings (shifts since prior wiki snapshot)

1. AutoGen is in maintenance mode

Microsoft’s own AutoGen README now points users to microsoft/agent-framework (10.3k stars, 675 commits/90d). Original AutoGen has 3 commits in 90 days. Community fork is ag2ai/ag2.

2. MCP 2025-11-25 spec — major upgrade

  • Async Tasks primitive added (long-running agent operations).
  • OAuth Client ID Metadata Documents.
  • OpenID Connect Discovery.

Directly relevant to Init Intelligence’s tool-governance model — see agent-tool-governance.

3. ServiceNow Action Fabric MCP GA’d at Knowledge 2026 (May 2026)

Routed through AI Control Tower for governance. This validates the wiki’s agent-tool-governance thesis with an existing-vendor implementationInit Intelligence isn’t ahead of the curve, the incumbent is shipping the same primitive.

4. MCP-server vendor landscape (May 2026)

VendorMCP status
Atlassian RovoGA Feb 2026
SlackGA (collaboration with Anthropic)
OktaOfficial
WorkdayCommunity-only or absent
JamfCommunity-only or absent
IntuneCommunity-only or absent
RipplingCommunity-only or absent
TeamsCommunity-only or absent
GustoCommunity-only or absent

Vendors with no official MCP server (community-only or absent) as of May 2026: Workday, Jamf, Intune, Rippling, Teams, Gusto. Workday is the largest gap given enterprise HR coverage.

5. Inspect-AI is the highest-velocity eval tool

  • UK AISI project.
  • 2.0k stars, 1,299 commits/90d — highest commits-per-star ratio of any tool surveyed.
  • Phoenix is the strongest true-OSS observability (9.6k stars, 1,245 commits/90d).
  • OpenAI evals is effectively dormant (2 commits/90d).

6. No public IT-ops agent benchmark exists

SWE-bench, AgentBench, WebArena, GAIA all miss IT verbs:

  • Provision access
  • Run scoped IdP actions
  • Evidence gathering
  • Policy-conflict resolution

The source report designs a 6-task / 6-axis ITSM-bench proposal. No first IT-tier agent benchmark exists publicly as of May 2026.

7. Production adopters verified (WebFetch of vendor customer pages)

  • LangChain customers: ServiceNow, Workday, Rippling, Cisco, LinkedIn, Coinbase, Elastic, Cloudflare.
  • Temporal customers: OpenAI itself, Snap, Cloudflare, GitLab, Replit, Lovable AI.
  • LlamaIndex: enterprise + AI-startup (Carlyle, KPMG, Cemex, NTT DATA, 11x.ai).

OSS agent-infra stack by layer (7 picks)

LayerPicks
PlanningLangGraph or Pydantic AI
Durable executionTrigger.dev or Inngest or Hatchet
Tool transportMCP 2025-11-25 (everything ships as MCP server + A2A endpoint)
ObservabilityOpenTelemetry + Phoenix
EvalsInspect-AI + Promptfoo + proprietary ITSM-bench
Memory (optional)Letta or Mem0
Capability trackingMETR Time Horizons + SWE-bench Verified

Build vs Buy split (per wiki engineering-stack research)

Components with no off-the-shelf OSS equivalent (custom-built layer):

  1. Request object schema
  2. Context graph
  3. Tool governance gateway (see agent-tool-governance)
  4. Customer-visible trace
  5. ITSM benchmark task corpus

Components with mature OSS coverage: planning frameworks, durable execution, MCP transport, observability, base eval frameworks, memory.

Notes

  • AutoGen is in maintenance mode; active forks are microsoft/agent-framework and ag2ai/ag2 (see finding #1).
  • MCP 2025-11-25 async Tasks supports long-running operations such as compliance audits and multi-step IT runbooks; legacy MCP was sync-only.
  • All three hyperscalers natively support MCP server + A2A endpoint — see hyperscaler-agent-platforms-2026.
  • Workday is the highest-priority MCP vendor gap (enterprise HR coverage).
  • Inspect-AI (UK AISI) is the highest-velocity eval harness in the survey.