Buy, Build, or Borrow: Choosing Your Agent Stack

Decision framework across SaaS copilots, open agent frameworks, and internal platforms—covering TCO, risk, and lock-in.

The three paths at a glance

Buy (SaaS copilots): fastest time-to-value, opinionated guardrails, limited custom tools. Great for horizontal use (docs, code, CRM), weaker on deep domain workflows.
Borrow (open frameworks): assemble with open runtimes (e.g., LangGraph-style graphs, planner/executor patterns) plus your tools. Good balance of speed + control; ops burden still yours.
Build (internal platform): your own agent runtime on top of existing platform/IDP (auth, CI/CD, observability). Highest control, highest fixed cost; only worth it if agents are core product or you need strict data/latency guarantees.

Decision drivers

Time-to-value: pilot in weeks vs quarters.
Risk & compliance: data residency, PII handling, auditability, model provenance.
Lock-in: portability of prompts, tools, memories, eval suites, and telemetry.
Unit economics: $/interaction, tokens/sec, latency p95; caching and retrieval costs.
Talent & ops: who will run prompts, tools, evals, incident response?

Quick matrix

Driver	Buy (SaaS)	Borrow (Open)	Build (Internal)
Time-to-Value	Best	Good	Slowest
Customization (tools, policies)	Limited	High	Highest
Compliance/Audit	Good (vendor-led)	Varies (you own)	Best (you design)
TCO (12–24 mo)	Predictable OpEx	Mixed (Dev+Ops)	High CapEx → lower OpEx at scale
Lock-in risk	Highest	Moderate	Lowest (if standards-based)
Performance control	Vendor limits	Moderate	Full

TCO model (plug your numbers)

TCO_annual = Build_FTE + Platform_Fees + Inference_Cost + Storage/Retrieval + Observability + Evals + Security/Compliance + Oncall
// Inference_Cost = interactions * tokens_per_interaction * cost_per_token (after cache/quantization)

Risk & lock-in hedges

Portability: keep prompts/templates, tools, and memory in your repos; avoid vendor-proprietary formats.
Abstraction: wrap model and tool calls behind your interfaces; support ≥2 model providers.
Data contracts: versioned schemas for RAG and events; compatibility tests in CI.
Provenance & audit: capture prompt → tool calls → outputs with trace IDs.

Reference architectures

Buy

Vendor copilot ⇢ native integrations ⇢ admin policies (PII redaction, spend caps) ⇢ export telemetry via webhooks.
Use for: docs/search, support macros, sales email drafting.

Borrow

Open agent runtime ⇢ your tool adapters (CRM, ERP, ticketing) ⇢ retrieval layer ⇢ eval harness ⇢ observability (traces/metrics/logs).
Use for: domain workflows needing custom tools & approvals.

Build

Internal platform (auth, CI/CD, feature flags) ⇢ agent runtime + memory ⇢ policy-as-code ⇢ cost guardrails ⇢ enterprise observability ⇢ incident playbooks.
Use for: regulated, latency-sensitive, or product-differentiating agents.

RFP questions (copy/paste)

Portability: how do we export prompts, memories, and run logs? In what formats?
Controls: budget caps, PII policies, allow-lists, sandboxing of tools—how enforced?
Observability: do you emit OpenTelemetry traces with token/cost/latency?
Evals: built-in evaluation suite? CI/canary gates? Custom tests?
Roadmap & exit: what’s the deprecation policy? Data retention and account deletion SLA?

Eval rubric for pilots

Quality: task success rate, groundedness, hallucination budget.
Speed: latency p95 and tokens/sec per backend.
Cost: $/interaction with caching/RAG; cost predictability under load.
Operations: runbook maturity, alert quality, rollback success rate.
Security/Compliance: data residency, audit logs, least privilege, vendor posture.

30 / 60 / 90 plan

30 days: pick 2 high-value workflows; baseline manual KPI (cycle time, cost, quality); run a Buy and a Borrow pilot in parallel with identical evals.
60 days: expand best pilot to 10–20% traffic; add policy-as-code and budget caps; draft internal platform design if the case for Build emerges.
90 days: decide the primary path; negotiate pricing/SLAs or fund the platform; publish a portability plan (multi-model, artifact exports, schema contracts).

Definition of Done (for stack selection)

Signed decision brief with TCO, risk, and exit plan.
Evals wired into CI/canary; gates block regressions.
Observability standardized (traces/metrics/logs + cost/latency/quality).
Policy-as-code enforcing budgets, data classes, approvals.

Anti-patterns

Feature bingo: choosing the shiniest checklist over TCO/ops reality.
Single-vendor bet without data export or abstraction layer.
Prompt theater: no evals, no telemetry, no rollback path.
DIY platform too early: building before proof that agents are core to your product or risk posture.

Bottom line: Start with outcomes and constraints, not ideology. Buy when speed wins, borrow when you need custom tools with sane ops, and build when agents are the product—or when risk and scale demand it. Hedge lock-in from day one, and make evals/observability the contract, not an afterthought.