← What's New

Hire for Systems Thinking, Not Syntax

Redesign roles and interviews to test decomposition, interfaces, and trade-offs over raw coding speed.

Redesign roles and interviews to test decomposition, interfaces, and trade-offs over raw coding speed.

Most interviews still reward fast fingers and trivia. Meanwhile, the work that moves companies is upstream: clarifying goals, choosing boundaries, defining contracts, and navigating trade-offs under constraints. Speed matters—but only after we’re pointed in the right direction. Hiring should reflect that.

What great systems thinkers actually do

  • Decompose problems: turn a fuzzy brief into coherent components with clear responsibilities.
  • Design interfaces: APIs/events with versioning, ownership, and compatibility plans.
  • Reason about failure: timeouts, retries with jitter, circuit breakers, and graceful degradation.
  • Balance constraints: cost, latency, reliability, privacy, and team capacity.
  • Make work observable: dashboards, SLOs, alerts, and runbooks before turning traffic on.

Rewrite the job

  • Outcomes over tools: state the business capabilities the hire must unlock (e.g., multi-tenant billing, cross-region failover), not a laundry list of frameworks.
  • Artifacts required: expect ADRs, interface specs, migration plans, and postmortems in the portfolio.
  • Dual track growth: IC architecture path (Staff→Principal→Distinguished) alongside management.

Interview format (75–90 minutes)

  1. Context brief (5 min): A realistic problem: “Design a rate-limited notifications service for 5M users across 3 regions.”
  2. Decomposition (15 min): Candidate outlines domains, data flows, and risks. Look for boundaries, not boxes.
  3. Interfaces (20 min): Define 1–2 key contracts (API/Event + schema), versioning, and compatibility tests.
  4. Reliability & ops (15 min): SLOs, error budgets, rollout strategy (flags/canaries), rollback, and runbooks.
  5. Trade-offs (10 min): Compare two approaches (e.g., queue vs streaming; active-active vs active-passive) with cost/latency/failure analysis.
  6. Targeted coding (10–15 min): Small, testable slice (idempotent worker or rate-limit check) with clear contracts—quality over volume.
  7. Retro (5 min): What would they validate next? Which risks remain?

Rubric (weighting)

  • Decomposition & boundaries (25%) — coherent modules, ownership, and seams.
  • Interface design (20%) — contracts, versioning, compatibility strategy.
  • Reliability & operability (20%) — SLOs, rollout/rollback, observability.
  • Trade-off clarity (20%) — cost/latency/failure analysis; evidence-based choices.
  • Implementation quality (15%) — small, correct, testable code over speed.

Copy-ready exercises

API Contract Kata (30 min)

Design a /v1/messages API supporting idempotency, retries, and tenant rate limits. Deliver: OpenAPI snippet, idempotency key policy, and a backward-compat plan for /v2.

Resilience Drill (20 min)

Given a dependency with 15% timeout risk, propose timeouts, backoff+jitter, and a circuit-breaker policy. Explain impact on SLOs and user experience.

Migration Plan (20 min)

Move billing from single-region DB to sharded, multi-region setup. Deliver: data movement strategy (CDC + reconciliation), cutover plan, rollback, and drift monitoring.

Signals of excellence

  • Writes an ADR with options A/B/C and rationale tied to goals/SLOs.
  • Uses precise language on consistency (at-least-once + idempotency; avoids “exactly-once” myths).
  • Thinks in guardrails: budgets, cohorts, blast radius, and auditability.
  • Makes observability first-class: golden signals and dashboards before launch.

30 / 60 / 90 roll-out for hiring teams

  1. 30 days: Replace trivia rounds with a single design exercise; publish the rubric; train interviewers.
  2. 60 days: Introduce contract tests in the exercise; require SLOs and rollout/rollback details; calibrate scoring across panels.
  3. 90 days: Add portfolio review (ADRs, postmortems); measure signal quality (on-the-job performance vs interview scores); iterate.

Definition of Done (for a systems-first process)

  • Every candidate produces at least one ADR + interface spec during the loop.
  • Exercises include reliability, cost, and security considerations—not just code.
  • Rubrics are published, calibrated, and audited quarterly.
  • Offer decisions cite evidence mapped to rubric categories.

Anti-patterns to avoid

  • Speed worship: rewarding fastest code over correct, observable systems.
  • Architecture theater: hand-wavy boxes without contracts or rollout plans.
  • Tool tribalism: rejecting strong designs for lack of a specific framework keyword.
  • Opaque decisions: no rubric, no calibration, no feedback loop.

Hire the people who can shape problems, not just type solutions. Systems thinking scales; syntax can be learned and assisted. Design your process accordingly.