Hire for Systems Thinking, Not Syntax
Redesign roles and interviews to test decomposition, interfaces, and trade-offs over raw coding speed.
TL;DR
Reading the post…
Redesign roles and interviews to test decomposition, interfaces, and trade-offs over raw coding speed.
Most interviews still reward fast fingers and trivia. Meanwhile, the work that moves companies is upstream: clarifying goals, choosing boundaries, defining contracts, and navigating trade-offs under constraints. Speed matters—but only after we’re pointed in the right direction. Hiring should reflect that.
What great systems thinkers actually do
- Decompose problems: turn a fuzzy brief into coherent components with clear responsibilities.
- Design interfaces: APIs/events with versioning, ownership, and compatibility plans.
- Reason about failure: timeouts, retries with jitter, circuit breakers, and graceful degradation.
- Balance constraints: cost, latency, reliability, privacy, and team capacity.
- Make work observable: dashboards, SLOs, alerts, and runbooks before turning traffic on.
Rewrite the job
- Outcomes over tools: state the business capabilities the hire must unlock (e.g., multi-tenant billing, cross-region failover), not a laundry list of frameworks.
- Artifacts required: expect ADRs, interface specs, migration plans, and postmortems in the portfolio.
- Dual track growth: IC architecture path (Staff→Principal→Distinguished) alongside management.
Interview format (75–90 minutes)
- Context brief (5 min): A realistic problem: “Design a rate-limited notifications service for 5M users across 3 regions.”
- Decomposition (15 min): Candidate outlines domains, data flows, and risks. Look for boundaries, not boxes.
- Interfaces (20 min): Define 1–2 key contracts (API/Event + schema), versioning, and compatibility tests.
- Reliability & ops (15 min): SLOs, error budgets, rollout strategy (flags/canaries), rollback, and runbooks.
- Trade-offs (10 min): Compare two approaches (e.g., queue vs streaming; active-active vs active-passive) with cost/latency/failure analysis.
- Targeted coding (10–15 min): Small, testable slice (idempotent worker or rate-limit check) with clear contracts—quality over volume.
- Retro (5 min): What would they validate next? Which risks remain?
Rubric (weighting)
- Decomposition & boundaries (25%) — coherent modules, ownership, and seams.
- Interface design (20%) — contracts, versioning, compatibility strategy.
- Reliability & operability (20%) — SLOs, rollout/rollback, observability.
- Trade-off clarity (20%) — cost/latency/failure analysis; evidence-based choices.
- Implementation quality (15%) — small, correct, testable code over speed.
Copy-ready exercises
API Contract Kata (30 min)
Design a /v1/messages API supporting idempotency, retries, and tenant rate limits. Deliver: OpenAPI snippet, idempotency key policy, and a backward-compat plan for /v2.
Resilience Drill (20 min)
Given a dependency with 15% timeout risk, propose timeouts, backoff+jitter, and a circuit-breaker policy. Explain impact on SLOs and user experience.
Migration Plan (20 min)
Move billing from single-region DB to sharded, multi-region setup. Deliver: data movement strategy (CDC + reconciliation), cutover plan, rollback, and drift monitoring.
Signals of excellence
- Writes an ADR with options A/B/C and rationale tied to goals/SLOs.
- Uses precise language on consistency (at-least-once + idempotency; avoids “exactly-once” myths).
- Thinks in guardrails: budgets, cohorts, blast radius, and auditability.
- Makes observability first-class: golden signals and dashboards before launch.
30 / 60 / 90 roll-out for hiring teams
- 30 days: Replace trivia rounds with a single design exercise; publish the rubric; train interviewers.
- 60 days: Introduce contract tests in the exercise; require SLOs and rollout/rollback details; calibrate scoring across panels.
- 90 days: Add portfolio review (ADRs, postmortems); measure signal quality (on-the-job performance vs interview scores); iterate.
Definition of Done (for a systems-first process)
- Every candidate produces at least one ADR + interface spec during the loop.
- Exercises include reliability, cost, and security considerations—not just code.
- Rubrics are published, calibrated, and audited quarterly.
- Offer decisions cite evidence mapped to rubric categories.
Anti-patterns to avoid
- Speed worship: rewarding fastest code over correct, observable systems.
- Architecture theater: hand-wavy boxes without contracts or rollout plans.
- Tool tribalism: rejecting strong designs for lack of a specific framework keyword.
- Opaque decisions: no rubric, no calibration, no feedback loop.
Hire the people who can shape problems, not just type solutions. Systems thinking scales; syntax can be learned and assisted. Design your process accordingly.