← What's New

Buy, Build, or Borrow: Choosing Your Agent Stack

Decision framework across SaaS copilots, open agent frameworks, and internal platforms—covering TCO, risk, and lock-in.

Decision framework across SaaS copilots, open agent frameworks, and internal platforms—covering TCO, risk, and lock-in.

The three paths at a glance

  • Buy (SaaS copilots): fastest time-to-value, opinionated guardrails, limited custom tools. Great for horizontal use (docs, code, CRM), weaker on deep domain workflows.
  • Borrow (open frameworks): assemble with open runtimes (e.g., LangGraph-style graphs, planner/executor patterns) plus your tools. Good balance of speed + control; ops burden still yours.
  • Build (internal platform): your own agent runtime on top of existing platform/IDP (auth, CI/CD, observability). Highest control, highest fixed cost; only worth it if agents are core product or you need strict data/latency guarantees.

Decision drivers

  • Time-to-value: pilot in weeks vs quarters.
  • Risk & compliance: data residency, PII handling, auditability, model provenance.
  • Lock-in: portability of prompts, tools, memories, eval suites, and telemetry.
  • Unit economics: $/interaction, tokens/sec, latency p95; caching and retrieval costs.
  • Talent & ops: who will run prompts, tools, evals, incident response?

Quick matrix

DriverBuy (SaaS)Borrow (Open)Build (Internal)
Time-to-ValueBestGoodSlowest
Customization (tools, policies)LimitedHighHighest
Compliance/AuditGood (vendor-led)Varies (you own)Best (you design)
TCO (12–24 mo)Predictable OpExMixed (Dev+Ops)High CapEx → lower OpEx at scale
Lock-in riskHighestModerateLowest (if standards-based)
Performance controlVendor limitsModerateFull

TCO model (plug your numbers)

TCO_annual = Build_FTE + Platform_Fees + Inference_Cost + Storage/Retrieval + Observability + Evals + Security/Compliance + Oncall
// Inference_Cost = interactions * tokens_per_interaction * cost_per_token (after cache/quantization)

Risk & lock-in hedges

  • Portability: keep prompts/templates, tools, and memory in your repos; avoid vendor-proprietary formats.
  • Abstraction: wrap model and tool calls behind your interfaces; support ≥2 model providers.
  • Data contracts: versioned schemas for RAG and events; compatibility tests in CI.
  • Provenance & audit: capture prompt → tool calls → outputs with trace IDs.

Reference architectures

Buy

  • Vendor copilot ⇢ native integrations ⇢ admin policies (PII redaction, spend caps) ⇢ export telemetry via webhooks.
  • Use for: docs/search, support macros, sales email drafting.

Borrow

  • Open agent runtime ⇢ your tool adapters (CRM, ERP, ticketing) ⇢ retrieval layer ⇢ eval harness ⇢ observability (traces/metrics/logs).
  • Use for: domain workflows needing custom tools & approvals.

Build

  • Internal platform (auth, CI/CD, feature flags) ⇢ agent runtime + memory ⇢ policy-as-code ⇢ cost guardrails ⇢ enterprise observability ⇢ incident playbooks.
  • Use for: regulated, latency-sensitive, or product-differentiating agents.

RFP questions (copy/paste)

  1. Portability: how do we export prompts, memories, and run logs? In what formats?
  2. Controls: budget caps, PII policies, allow-lists, sandboxing of tools—how enforced?
  3. Observability: do you emit OpenTelemetry traces with token/cost/latency?
  4. Evals: built-in evaluation suite? CI/canary gates? Custom tests?
  5. Roadmap & exit: what’s the deprecation policy? Data retention and account deletion SLA?

Eval rubric for pilots

  • Quality: task success rate, groundedness, hallucination budget.
  • Speed: latency p95 and tokens/sec per backend.
  • Cost: $/interaction with caching/RAG; cost predictability under load.
  • Operations: runbook maturity, alert quality, rollback success rate.
  • Security/Compliance: data residency, audit logs, least privilege, vendor posture.

30 / 60 / 90 plan

  1. 30 days: pick 2 high-value workflows; baseline manual KPI (cycle time, cost, quality); run a Buy and a Borrow pilot in parallel with identical evals.
  2. 60 days: expand best pilot to 10–20% traffic; add policy-as-code and budget caps; draft internal platform design if the case for Build emerges.
  3. 90 days: decide the primary path; negotiate pricing/SLAs or fund the platform; publish a portability plan (multi-model, artifact exports, schema contracts).

Definition of Done (for stack selection)

  • Signed decision brief with TCO, risk, and exit plan.
  • Evals wired into CI/canary; gates block regressions.
  • Observability standardized (traces/metrics/logs + cost/latency/quality).
  • Policy-as-code enforcing budgets, data classes, approvals.

Anti-patterns

  • Feature bingo: choosing the shiniest checklist over TCO/ops reality.
  • Single-vendor bet without data export or abstraction layer.
  • Prompt theater: no evals, no telemetry, no rollback path.
  • DIY platform too early: building before proof that agents are core to your product or risk posture.

Bottom line: Start with outcomes and constraints, not ideology. Buy when speed wins, borrow when you need custom tools with sane ops, and build when agents are the product—or when risk and scale demand it. Hedge lock-in from day one, and make evals/observability the contract, not an afterthought.