← Back to Design & Development
Backend Interview · Q&A Deep Dive

Backend & System Design — Questions, Storytold

Every backend interview question that shows up in senior rounds — architecture, microservices, sagas, idempotency, observability, databases, Node.js, resilience — answered the way you'd actually say it in the room.

How to read this page. Each section is a cluster of related interview questions. Every question is answered in Q · / answer pairs. Architecture-style questions (design an ordering system, draw your project) carry a Mermaid diagram and a story-arc walkthrough. The shorter conceptual questions get tight, story-flavored answers — the kind you can deliver in 60–90 seconds when the panel asks.
01 · Architecture & system design

Walk me through your project's architecture

This is almost always the first question after introductions. The panel doesn't actually want a tour of every microservice you own — they want to see whether you can tell a story about your system: what comes in, where it goes, what each box exists to solve, and what would break first under load. Drawing a clean diagram and narrating it well is roughly 30% of the round.

Walk me through the architecture of a project you have worked on and draw the diagram.

Open with the user-facing story, not the boxes. "A customer opens our checkout page on their phone. By the time the button turns from Pay to Order Confirmed, the request has touched seven services, two databases, a cache, and a message bus. Let me draw what that path looks like." Then draw — and only then label what each box is.

A clean five-tier diagram you can reproduce on any whiteboard:

flowchart LR U([① User app
web + mobile]) CDN[② CDN
Cloudflare] GW[③ API Gateway
+ Auth + Rate-limit] S1[④ Order service] S2[④ Inventory service] S3[④ Payment service] CACHE[⑤ Redis cluster] MQ[(⑥ Kafka)] DB[(⑦ Postgres
+ read replica)] WK[⑧ Workers / consumers] U --> CDN --> GW GW --> S1 & S2 & S3 S1 --> CACHE S1 --> DB S1 --> MQ MQ --> WK WK --> DB S2 --> DB S3 --> DB style U fill:#171d27,stroke:#4dfeee,color:#d4dae5 style CDN fill:#e8743b,stroke:#e8743b,color:#fff style GW fill:#4a90d9,stroke:#4a90d9,color:#fff style S1 fill:#38b265,stroke:#38b265,color:#fff style S2 fill:#38b265,stroke:#38b265,color:#fff style S3 fill:#38b265,stroke:#38b265,color:#fff style CACHE fill:#9b72cf,stroke:#9b72cf,color:#fff style MQ fill:#d4a838,stroke:#d4a838,color:#fff style DB fill:#3cbfbf,stroke:#3cbfbf,color:#fff style WK fill:#ec5d8a,stroke:#ec5d8a,color:#fff

Now narrate left-to-right, answering "what is it, and what would break without it?" for each box:

  • ① User app — issues HTTPS requests with a JWT in the header. Without it there's no traffic; with it we need to assume everything a client sends is untrusted.
  • ② CDN — caches static bundles and short-TTL public reads at the edge. Without it, every page load adds 150 ms of cross-continent latency.
  • ③ API Gateway — terminates TLS, validates JWT, applies per-user rate limits, and routes by path. Without it, every microservice would re-implement auth and we'd have no central rate-limit choke point.
  • ④ Microservices — Order, Inventory, Payment. Each owns its own data and its own deploy cadence. Without this split, a slow payment partner would block order placement entirely.
  • ⑤ Redis — session state, hot SKU lookups, idempotency keys. Without it, the DB sees 10× the read load and p99 latency triples.
  • ⑥ Kafka — the async backbone. OrderPlaced, PaymentCaptured, InventoryReserved events flow through it. Without it, services would be tightly coupled via synchronous calls and any one slow downstream would topple the chain.
  • ⑦ Postgres — system of record. One write primary per service, plus a read replica for analytics-style queries. Without replicas, heavy reports would lock up checkout.
  • ⑧ Workers — Kafka consumers that do the slow stuff: email, invoice PDF, shipping label, analytics. Without them, every order would have to wait synchronously for those side-effects.

Close with what breaks first. "If traffic 10×'d tomorrow, the first thing to fold would be the Postgres write primary — single-writer, single point. We've already shard-keyed the order table by customer_id, so the migration path is ready when we need it." This sentence — naming the next bottleneck — is the single most senior-sounding thing you can say in this question.

Explain the backend architecture of your recent project.

Same shape as above, but lean harder into three story beats: the request path (gateway → service → DB), the event path (service → Kafka → workers), and the data plane (which service owns which table, which cache). If you remember nothing else, remember that an interviewer is satisfied when you've named the boundaries — who owns what, who calls whom synchronously, and what crosses the bus.

Mention concrete numbers if you have them: "we serve ~120 RPS at p99 of 180 ms; payment is the slowest call at 600 ms p99, which is why it's async." Numbers move you from "candidate who has seen architecture diagrams" to "candidate who has run one in production."

Design a scalable backend for high-traffic systems.

There's no magic answer — there's a checklist you walk in the right order. Use this five-step ladder:

  • Step 1 — Stateless app tier. Push session state to Redis or JWT so any pod can handle any request. This unlocks horizontal scaling behind a load balancer.
  • Step 2 — Cache the hot reads. Cache-aside in Redis for the top-N keys (Zipf distribution means <5% of keys take 80% of traffic). Drops DB read load by 10–30×.
  • Step 3 — Offload slow work to a queue. Email, PDF, analytics, anything that doesn't need to be in the synchronous response path goes onto Kafka / SQS. Frees up app threads for real user requests.
  • Step 4 — Split reads from writes. Read replicas absorb dashboards and reports. Then, if a single primary is still the bottleneck, shard by tenant / user-id.
  • Step 5 — Push to the edge. CDN for static + cacheable API responses. Regional read replicas for cross-region latency. Multi-region active-active only when you really need it (it's expensive in consistency complexity).

The trick in the interview is to stop and ask: "what's the read/write ratio?" and "what's the consistency tolerance?" These two answers tell you which of the five steps actually matters for this system.

How would you scale a backend service handling millions of requests?

Same five steps as above, plus three operational realities:

  • Autoscaling on CPU + custom queue-depth metric, not just CPU — CPU lags by ~2 minutes; a backing-up queue lets you scale 60 seconds earlier.
  • Connection pooling at the DB. A naive 1000-pod fleet with 50 connections each = 50K connections, which Postgres can't even handle. Pool via PgBouncer.
  • Backpressure at every async boundary. If consumers can't keep up, the producer must slow down or shed load — otherwise queue lag grows unboundedly.

And the unglamorous truth: most "millions of RPS" systems are actually "millions of requests/day with a hot 50K RPS window." Ask for the daily / peak distribution before you over-design.

02 · Microservices vs monolith

The split — when, why, and how to draw the lines

Half of all "system design" interviews come down to whether you can justify your split. The wrong answer is "microservices because they're scalable." The right answer always starts from a constraint — team size, deploy independence, blast radius, scaling shape — and works backward to the architecture.

Why did you choose microservices over monolith (or vice versa)?

Frame this as a tradeoff, not a victory. "We picked microservices when our team grew past ~30 engineers and we kept stepping on each other in a single repo — every deploy needed sign-off from four people. The split bought us independent deploys at the cost of a Kafka cluster, a service mesh, and three new failure modes."

If you went the other way: "We're a five-engineer team. A monolith means one repo, one deploy pipeline, in-process function calls instead of network hops, and ACID transactions for free. The day we hit ten engineers or a real bi-modal scaling need, we'll start carving off services." This is a strong answer — it shows you understand microservices are a solution to an organizational problem first, a technical one second.

Monolith vs microservices — pros, cons, and tradeoffs.
DimensionMonolithMicroservices
DeployOne artifact, one pipelineIndependent per service
Local devRun the whole thingMocks, contracts, docker-compose hell
TransactionsACID in-processSaga / 2PC / eventual consistency
LatencyFunction call (ns)Network hop (ms)
Failure modesOne process crashesCascade, retry storms, partial failure
Team scalingConway's law bites past ~20 engineersEach team owns a service end-to-end
CostCheap to runService mesh + ops overhead
Best forSmall teams, unclear domainLarge orgs, divergent scaling needs

The senior take: start monolith, modularize internally, extract services only when the pain forces you. Premature microservices is the most common architectural sin of the last decade.

How do you decide what functionality goes into different microservices?

Three lenses, in order:

  • Bounded contexts (Domain-Driven Design). Where does a word change meaning? "Order" in checkout (a cart being paid for) is a different thing from "Order" in fulfilment (a packing slip). When the language diverges, you've found a boundary.
  • Rate of change. Things that change together belong together. If your shipping logic changes weekly but your auth logic changes once a year, they're different services.
  • Scaling shape. If module A handles 10K RPS and module B handles 50 RPS, you don't want them in the same pod competing for memory and CPU. Different scaling profile → different service.

What people get wrong: splitting by technical layer (UserController, UserRepository, UserModel as three services). That's not a service boundary, that's a coupling disaster. Split by business capability.

What factors do you consider while grouping or splitting services?

Beyond the three above, watch for:

  • Data ownership. Every table should have exactly one service that writes to it. If two services need to write the same table, you've drawn the line wrong.
  • Team ownership. One team per service is the ideal. Two teams sharing a service means slow decisions and merge conflicts.
  • Chatty calls. If two "services" make 5 synchronous calls to each other to serve one user request, they should be one service. Network is too expensive.
  • Consistency requirements. If two pieces of data must update atomically and you can't afford a saga, keep them in one service / one DB.
How would you split a large backend into microservices?

The proven path is strangler fig: don't rewrite, peel. Steps:

  • 1. Map your domain with event-storming. Get the team in a room with sticky notes; list every domain event ("OrderPlaced", "PaymentRefunded"). Cluster them — those clusters are candidate services.
  • 2. Pick the lowest-risk peripheral first. Notifications service. Reports service. Something with few inbound dependencies. Get the team comfortable with the new patterns (deploy, observability, error handling) before you touch the core.
  • 3. Introduce an API gateway / facade in front of the monolith. New service calls go through it; the old in-process calls still work. This is the seam.
  • 4. Extract one service at a time. Carve out the code, move the relevant tables (or expose them via an API), point the facade at the new service. Run both paths until you trust the new one. Then remove the old.
  • 5. Repeat for 12–24 months. The truth no one tells you: a monolith-to-microservices migration is measured in years, not quarters. If your CTO thinks it's a 6-month project, you have a different problem to solve first.
03 · Event-driven & service communication

How services actually talk

Once you've split into services, the next question is how they talk. Synchronous REST is the obvious default — and the wrong default for half the calls in a typical system. Event-driven flow is what stops your checkout from failing because the recommendations service is slow.

What is Event-Driven Architecture?

An architecture where services don't call each other directly — instead they publish events ("OrderPlaced", "PaymentCaptured") to a broker (Kafka, RabbitMQ, SNS), and any service that cares subscribes. The publisher has no idea who's listening and doesn't wait for them.

The mental shift: in REST, the Order service commands the Email service ("send this email, please"). In events, the Order service announces ("an order happened, here's the data"); Email decides for itself whether to act. That inversion is what gives you loose coupling — you can add a fraud-detection consumer next month and the Order service doesn't even know.

Explain producer–consumer flow in an event-driven system.
sequenceDiagram autonumber participant P as Order Service
(producer) participant K as Kafka topic
"orders" participant C1 as Email consumer participant C2 as Analytics consumer participant C3 as Shipping consumer P->>K: publish OrderPlaced { id:42, ... } K-->>P: ack (committed to log) Note over K: Event durably stored,
offset assigned K->>C1: deliver event (offset 7) K->>C2: deliver event (offset 7) K->>C3: deliver event (offset 7) C1->>K: commit offset 7 C2->>K: commit offset 7 C3->>K: commit offset 7

Key properties: (1) the producer is done once Kafka acks — it doesn't wait for consumers. (2) Each consumer has its own offset, so Email being slow doesn't slow down Analytics. (3) Events are durable — even if all consumers are down, the event survives in the log; they catch up when they come back.

How do services communicate in microservices architecture?

Three patterns, and a real system uses all three:

  • Synchronous REST/gRPC — request/response, blocking. Use when the caller needs an answer right now (auth check, price lookup, inventory availability).
  • Asynchronous events — pub/sub over Kafka or similar. Use for "this happened, others might care" (OrderPlaced, UserSignedUp).
  • Async commands / queues — one-to-one task queues over SQS, RabbitMQ. Use for "do this work for me but I won't wait" (send email, generate invoice).

The trap is using REST for everything. Three nested REST calls = three failure modes stacked + 3× the latency. The senior instinct is to push as much as possible onto async paths and keep the synchronous path short.

Synchronous vs asynchronous communication in distributed systems.
SynchronousAsynchronous
Caller knows resultYes, immediatelyNo — eventual or via callback
CouplingTight (caller needs callee up)Loose (broker buffers)
LatencySum of all hopsProducer returns instantly
FailureCascades up the call chainIsolated, retried independently
Best forReads, validations, "must answer now"Side effects, fan-out, slow work
ToolingHTTP, gRPCKafka, RabbitMQ, SQS, SNS

Rule of thumb: if the user is waiting for it, sync. If a system is waiting for it, async.

REST vs event-driven communication.

REST is a verb — "do this thing, return the result." Events are a noun — "this happened, here's the data." REST is one-to-one and coupled (the caller must know the callee's URL). Events are one-to-many and decoupled (publisher doesn't care who listens).

In practice you mix them. Public APIs and synchronous queries → REST/gRPC. Internal state changes that other services need to react to → events. A typical request flow: user hits REST endpoint → service writes DB row + publishes event → 4 downstream consumers each do their thing without blocking the user.

04 · Distributed transactions & saga

How do you stay consistent across services?

The single hardest question in microservices interviews. You can't use a database transaction across service boundaries (each owns its own DB). So how do you place an order, reserve inventory, charge payment, and arrange shipping — and end up consistent even if one of them fails halfway through?

How would you design an online ordering / inventory system?

Four services, one event bus, one saga. Let me draw it and walk through what each piece earns its keep doing.

flowchart LR U([Customer]) GW[API Gateway] OS[① Order Service
+ orders DB] IS[② Inventory Service
+ stock DB] PS[③ Payment Service
+ payments DB] SS[④ Shipping Service
+ shipments DB] K[(Kafka)] U --> GW --> OS OS --> K K --> IS K --> PS K --> SS IS --> K PS --> K SS --> K style OS fill:#e8743b,stroke:#e8743b,color:#fff style IS fill:#4a90d9,stroke:#4a90d9,color:#fff style PS fill:#38b265,stroke:#38b265,color:#fff style SS fill:#9b72cf,stroke:#9b72cf,color:#fff style K fill:#d4a838,stroke:#d4a838,color:#fff
  • ① Order Service — receives POST /orders, creates an order in PENDING state, publishes OrderCreated. Owns the order lifecycle state machine. Without it, no single service owns "what state is this order in?"
  • ② Inventory Service — consumes OrderCreated, reserves stock atomically (decrement with row lock), publishes InventoryReserved or InventoryRejected. Owns the source of truth for "how many of SKU X are available right now."
  • ③ Payment Service — consumes InventoryReserved, calls the payment gateway, publishes PaymentCaptured or PaymentFailed. The only place that knows how to talk to Stripe/Razorpay.
  • ④ Shipping Service — consumes PaymentCaptured, generates a label, schedules pickup, publishes OrderShipped. Order Service consumes that and flips the order to SHIPPED.

The reverse path matters more than the happy path. If payment fails: Order Service consumes PaymentFailed, publishes OrderCancelled, Inventory Service consumes it and releases the reservation. This compensation chain is the saga.

How would you ensure atomic transactions across multiple microservices?

The blunt truth: you can't, not in the ACID sense. There's no cross-service "BEGIN TRANSACTION ... COMMIT". You have three options, in roughly increasing order of practicality:

  • Two-Phase Commit (2PC). Coordinator asks every service "can you commit?", waits for all yeses, then says "commit". Theoretically atomic, but in practice it requires every participant to hold locks until the coordinator decides, and one slow participant blocks everyone. Modern microservice stacks essentially don't use it.
  • Saga pattern. Replace one big transaction with a sequence of local transactions plus compensating actions. Each local step commits in its own DB; if a later step fails, you run the compensations in reverse. Eventually consistent — but practical.
  • Outbox pattern + idempotent consumers. The implementation detail that makes sagas actually work. Write the DB row and the "to be published" event in the same local transaction (to an outbox table), then a separate process drains the outbox to Kafka. Guarantees you can't update the DB and forget to publish the event.

In 95% of real systems the answer is "saga + outbox + idempotent consumers." Say that and be ready to draw it.

Saga pattern vs distributed transactions.

2PC says: "I will hold everyone's locks until I'm certain everyone can commit, then we all commit atomically." Strong consistency, terrible availability — one slow participant freezes the cluster.

Saga says: "Each step commits locally and immediately. If a later step fails, I'll run compensations to undo the earlier ones." Eventually consistent, far better availability, but you accept that for a brief window the system is in an "in-flight" state. For business workflows (place order, ship goods, refund) that's almost always the right trade. For "transfer ₹1000 from A to B" inside a single bank you might still want 2PC or a stored procedure.

Choreography vs orchestration in saga.

Two ways to wire the saga together:

  • Choreography — each service listens for the previous event and publishes the next. Order publishes OrderCreated, Inventory listens and publishes InventoryReserved, Payment listens and publishes PaymentCaptured... No central brain. Pros: loose coupling, easy to add a new step. Cons: the workflow is implicit — to understand what happens after an order is placed, you have to grep across 5 services.
  • Orchestration — a dedicated orchestrator service holds the workflow state. It tells Inventory to reserve, waits for the reply, tells Payment to charge, waits, etc. Pros: workflow is explicit and observable in one place (Temporal, AWS Step Functions, Camunda exist for exactly this). Cons: the orchestrator becomes a coupling point and a single point of failure.

Rule of thumb: 3 or fewer steps → choreography is fine. 5+ steps or strong audit/observability needs → orchestration. The fact that you'd choose differently for a 3-step vs 8-step saga is itself a senior signal.

05 · Idempotency, ordering, failure handling

The three things that bite every distributed system

Networks lose packets. Consumers crash mid-process. Producers retry. Out of these three innocent facts come the three hardest correctness problems in distributed systems: duplicate processing, out-of-order delivery, and partial failure. Senior engineers are the ones who name these by reflex and have a pattern ready.

How do you handle failures in distributed systems?

Three layers, applied together:

  • Retry with exponential backoff + jitter. Network glitches and transient 503s usually resolve in seconds. Retry with 100ms → 200ms → 400ms → ... and add ±20% random jitter so a thundering herd of clients doesn't all retry at the same millisecond.
  • Circuit breaker. If a downstream is failing 50%+ of the time, stop calling it for 30 seconds. Better to fast-fail than queue up doomed requests.
  • Bulkhead + timeout + DLQ. Isolate failure domains (separate thread pools per downstream); always set a timeout (never wait forever); push permanently-failing messages to a dead-letter queue for human inspection.

And the meta-rule: design for partial failure from day one. The question isn't "what if this service goes down?" — it's "what does my system do while this service is down?"

How do you ensure idempotency in distributed systems?

Idempotency = doing the same operation N times has the same effect as doing it once. The standard recipe:

  • Idempotency key — client generates a UUID (e.g. Idempotency-Key: 7f3a-…) and sends it with every request. Server stores (key → result) in a table or Redis with a TTL (24h is typical).
  • Server-side check — on every request, look up the key. If seen, return the stored result without re-executing. If new, execute, store, return.
  • Make the underlying operation naturally idempotent where possibleSET balance = 100 is idempotent; UPDATE balance = balance + 50 is not. Prefer set-style operations to delta-style when you can.

Stripe's API is the canonical reference here — every POST /charges takes an idempotency key.

How do you prevent duplicate event processing?

Two strategies, used together:

  • Consumer-side dedup. Store every processed event ID (or business key like order_id) in a "seen" table. Before processing, check if you've seen it. After processing, insert the ID — in the same transaction as your business write, so either both happen or neither.
  • Idempotent writes. Even if you do process twice, design the write so it doesn't double up: INSERT ... ON CONFLICT DO NOTHING, or UPDATE ... WHERE status = 'PENDING' (no-op if already updated).

Kafka gives you "at-least-once" by default — your consumer code has to make it "effectively-once" via dedup or idempotent writes. Kafka 0.11+ also supports exactly-once-semantics for Kafka-to-Kafka pipelines, but for Kafka-to-external-DB you still need consumer-side dedup.

How do you guarantee ordering of events / messages?

You can't get global ordering at scale — but you don't need it. You need per-key ordering. In Kafka: pick a partition key (e.g. user_id or order_id) and Kafka guarantees that all events with the same key land in the same partition in the order produced. Within a partition, ordering is total. Across partitions, no ordering.

So all events for order #42 are ordered with respect to each other. Events for order #43 may interleave — and that's fine, they're independent.

The gotcha: if you ever change your partition key, old events keep going to the old partition and new ones to a new one. You can briefly see two events for the same logical entity in different partitions. Plan for that during migrations.

What happens if a consumer crashes after processing but before committing offset?

The classic interview gotcha. The consumer's next instance (after restart, or rebalance to a different pod) re-reads from the last committed offset — which is before the message you just processed. So it processes that message again. This is at-least-once delivery in action.

The fix isn't to commit offsets earlier (that would risk losing messages on crash before processing — that's at-most-once, worse for most use cases). The fix is to make your processing idempotent: track processed event IDs, use ON CONFLICT DO NOTHING, design so re-processing is a no-op. Then "at-least-once + idempotent" gives you "effectively-once" without the cost of full exactly-once semantics.

06 · Observability & production debugging

Knowing your system is working — and finding out fast when it's not

"Observability" is interview-shorthand for "can you debug a system you didn't write at 2 AM?" The answer always comes back to three pillars — logs, metrics, traces — plus a good story about how you actually used them.

How do you know your services are working correctly in production?

Four signals, monitored as SLIs (service-level indicators):

  • Latency — p50, p95, p99 of every endpoint. The average is a lie; p99 is where users feel pain.
  • Error rate — 4xx/5xx as a percentage. Alert when sustained 5xx > 1% for 5 minutes.
  • Traffic — RPS. A sudden 50% drop is often a bug somewhere (load balancer, DNS, a deployed regression).
  • Saturation — CPU, memory, queue depth. Tells you how close to a cliff you are.

These four together are Google's "USE method" / "four golden signals." If you have alerts on those, your pager gets quiet.

Which observability tools do you use? (Grafana, Prometheus, ELK, Datadog, New Relic, Jaeger, Zipkin)
PillarWhat it capturesCommon tools
MetricsTime-series numbers (RPS, latency, CPU)Prometheus + Grafana, Datadog, New Relic
LogsDiscrete events with contextELK (Elasticsearch + Logstash + Kibana), Loki, Datadog Logs, Splunk
TracesPer-request flow across servicesJaeger, Zipkin, Tempo, Datadog APM, New Relic
AlertingNotify when SLOs breachAlertmanager, PagerDuty, Opsgenie

In an interview, name the stack you actually used and what you'd do differently. "We were on ELK; the cost grew faster than the value at our scale and we migrated to Loki — 70% cheaper, same dashboards." That single sentence shows you've actually paid the bill.

Explain logs, metrics, and distributed tracing.
  • Logs are discrete events with context — "User 42 logged in at 03:11, IP 10.0.4.7." Great for forensics ("what exactly happened to this one request?"). Terrible for aggregation ("how often does this happen?") because grep is slow.
  • Metrics are time-series numbers, pre-aggregated — "http_requests_total{status=500} = 47 in the last minute." Cheap to store, fast to query, perfect for dashboards and alerts. But they can't tell you which request failed.
  • Traces tie a single request across all services it touched, with timing for each hop. A single trace ID flows through Gateway → Order → Inventory → Payment → DB. When you see "checkout is slow", you open the trace and see Payment is taking 800 ms — you've pinpointed the bottleneck in 30 seconds.

You need all three. Metrics tell you something is wrong. Traces tell you where. Logs tell you why.

How do you debug a production issue?

The standard funnel, in order:

  • 1. Confirm it's real — check your error-rate / latency dashboards. Is it a spike or a slow drift? Region-specific?
  • 2. Narrow to a service — which service's metrics turned red first? Often the symptom (slow checkout) is downstream of the cause (slow payment).
  • 3. Pull a sample trace — pick one slow request, open its distributed trace, see which span is fat.
  • 4. Read the logs for that trace ID — your logs should be queryable by trace ID. Look at the error message, the stack trace, the DB query that timed out.
  • 5. Form a hypothesis, verify, fix — never deploy a fix on a guess; reproduce it locally or in staging if at all possible. If you can't reproduce, add more logging and wait for the next occurrence.

The mistake juniors make: jumping to step 5 directly. Senior engineers always finish steps 1–4 first.

A Node.js endpoint takes 3–5 seconds in production but works fast locally. How would you debug it?

This is one of the most common Node.js interview scenarios. The answer is to methodically eliminate suspects. The latency budget has to come from somewhere — find which slice owns it.

  • Step 1 — APM / tracing. Open a slow trace in Datadog/New Relic/Jaeger. Look at the flame graph. One span is going to dominate — that's your bottleneck. 90% of debugging stops here.
  • Step 2 — Network vs CPU vs I/O. If the fat span is an HTTP call to an external API → network/external. If it's a DB query → DB. If it's CPU between spans (no outbound calls but time passing) → CPU or event loop block.
  • Step 3 — DB-specific. Run EXPLAIN ANALYZE on the query in prod. Local DB has 100 rows; prod has 100 million. Missing index, table scan, N+1 query.
  • Step 4 — Event loop lag. Add perf_hooks.monitorEventLoopDelay() and check if the loop is blocked. A CPU-heavy synchronous loop (JSON parse of 10 MB payload, big regex) freezes everything else in the same Node process.
  • Step 5 — Thread pool / connection pool exhaustion. Default libuv pool is 4 threads. If 100 concurrent requests all need DNS or crypto, requests queue behind each other. Same for DB connection pool — if pool is 10 and you have 100 in-flight requests, 90 are waiting.
  • Step 6 — Memory pressure / GC. Long GC pauses look like generic slowness. Check heap usage trend.

The reason it's fast locally: small data, no concurrency, no external network. Production multiplies all three. Always name the multiplier when you answer this.

How would you identify whether the bottleneck is database, external API, network, CPU, memory, thread pool, or event loop?
SuspectSignal that points at it
DatabaseTrace shows DB span dominates. pg_stat_activity shows query running long. EXPLAIN reveals seq scan.
External APIOutbound HTTP span is the fat one. Check the provider's status page. Compare your timeout vs their p99.
NetworkHigh latency + high packet loss between AZs. traceroute, ping, cloud provider's network insights.
CPUPod CPU pegged at 100%. Flame graph in perf / 0x shows one hot function.
MemoryHeap growth + frequent / long GC pauses. --inspect heap snapshots, leak suspect.
Thread poollibuv thread pool saturated (DNS / crypto / fs queueing). Symptoms: slow even for cheap requests.
Event loopmonitorEventLoopDelay() > 100 ms. Synchronous CPU work in the main thread.
How do you trace requests across multiple microservices?

OpenTelemetry (the open standard) or a vendor agent (Datadog, New Relic, Elastic APM). At the API Gateway, the tracing agent — typically the OpenTelemetry SDK with auto-instrumentation, a vendor agent like dd-trace, or a service-mesh sidecar (Envoy/Istio) — injects a trace ID (e.g. traceparent: 00-4bf92f...) into the request headers. The agent monkey-patches your HTTP framework (Express, Spring, etc.), DB drivers, and message clients so every downstream service automatically propagates that header on its outbound calls. Each service emits "spans" — start time, end time, tags — tagged with the trace ID. The collector stitches them into one waterfall view.

Two practical rules: (1) always propagate the trace header on async paths too — when you publish to Kafka, include the trace ID in the message metadata; when the consumer picks it up, continue the trace. Otherwise your trace ends at the producer and you lose visibility into half the system. (2) Always include the trace ID in your log lines — that's how you bridge from "this trace is slow" to "here's the log message that explains why."

How would you monitor Node.js applications in production?

Beyond the four golden signals, the Node-specific ones:

  • Event loop lagmonitorEventLoopDelay(). Alert if p99 > 100ms; that means user-facing requests are stalling.
  • Heap usage — RSS, heap_used, external. Climbing trend = leak.
  • Active handles / requestsprocess._getActiveHandles().length. A leak shows up as ever-growing handle count.
  • Active timers — unclosed setIntervals.
  • GC pause time — exposed via perf_hooks or APM.

Wire all of these to Prometheus via prom-client or use Datadog's dd-trace agent which exposes them out of the box.

How do you handle alerts and incident management?

Four things, in order:

  • Alert on symptoms, not causes. Page on latency, error rate, and traffic drops — things the user can feel. Don't page on CPU 90% or disk 80%; those are dashboards. Alerting on causes burns out on-call within a month.
  • Severity tiers — P0/P1 page the on-call; P2/P3 go to Slack. Not every alert is a 3 AM wake-up.
  • Runbook per alert — linked in PagerDuty. Answers: what does this mean, what do I check, how do I mitigate (rollback, flip a flag, scale up). If an alert has no runbook, it's not ready.
  • Incident process — declare an incident channel (#inc-...), assign an Incident Commander, mitigate before root-causing (stop the bleeding first), update the status page every 15 min, then run a blameless postmortem within a week.

Tooling: Prometheus / Grafana for alerts, PagerDuty or Opsgenie for paging, Statuspage for customer comms, Notion or Confluence for runbooks and postmortems.

The one-line takeaway: mitigate first, root-cause later, and never blame the human — fix the system that let the human make the mistake.

07 · Databases & scaling

SQL, NoSQL, sharding, replication, indexing, caching

Database questions are where interviews separate "I've used Postgres" from "I've operated Postgres." The fundamentals — SQL vs NoSQL, CAP, indexing, sharding, replication — are table stakes.

SQL vs NoSQL — when would you choose each?

Reach for SQL (Postgres, MySQL) when: schema is stable, you need ACID transactions, you need joins, your data is highly relational (orders, line items, payments). 90% of business apps still belong here. Modern Postgres also gives you JSONB, partial indexes, listen/notify — it's no longer "the boring choice."

Reach for NoSQL when:

  • Document store (MongoDB, DynamoDB) — schema-flexible nested docs, single-entity reads are the dominant pattern.
  • Key-value (Redis, DynamoDB) — sub-millisecond reads, simple access by key.
  • Wide-column (Cassandra, ScyllaDB) — massive write throughput, time-series, append-heavy.
  • Search (Elasticsearch, OpenSearch) — full-text, faceted search, log search.
  • Graph (Neo4j) — relationships are the data (social networks, fraud rings).

The senior answer: "I'd start with Postgres and only add a NoSQL store when a specific access pattern shows it's the right tool — usually search via Elasticsearch or hot key-value via Redis."

CAP theorem explanation.

In a network Partition (some nodes can't talk to others), a distributed system has to choose: serve potentially-stale data (Available) or refuse to serve at all (Consistent). You can have CP or AP, never both in the moment of a partition. The "C" here is linearizability — every read sees the latest write.

The framing people miss: outside a partition, you can have both consistency and availability. CAP only kicks in during the failure. So the real question is "how often do you partition, and what do you want to do when it happens?" Bank transfers want CP — refuse rather than serve stale balances. Social feeds want AP — show a slightly stale timeline rather than an empty page.

Modern extension: PACELC. "If Partition, choose A or C; Else choose Latency or Consistency." Even without a partition, sync replication trades latency for stronger consistency.

What is partitioning / sharding?

Partitioning splits one logical table into multiple physical pieces on the same machine (e.g. Postgres declarative partitioning by date — January goes in one file, February in another). Helps with query planning and archival.

Sharding splits the data across multiple machines. Choose a shard key (user_id is common); a hash of the key tells you which machine owns the row. Now your one 10 TB database is ten 1 TB databases, each handling 1/10th the writes.

The pain of sharding: cross-shard joins become application-level fan-outs; cross-shard transactions need a saga; re-sharding is one of the hardest migrations in software. Consistent hashing + replication mitigates re-shard pain. Don't shard until you've exhausted vertical scaling and read replicas.

Horizontal scaling vs vertical scaling.

Vertical = bigger box. Move from 4 vCPU / 16 GB to 32 vCPU / 256 GB. Simple, no code changes, but capped at whatever the cloud offers — and 2× more CPU never gives you 2× the performance.

Why? Four hidden ceilings:

  • Amdahl's Law — any serial part of your code (auth check, transaction commit, response serialization) caps the speedup. 20% serial = max 5× no matter how many CPUs.
  • Memory bandwidth is shared — all cores read/write through the same RAM bus. Past 2–4 cores reading hot, they queue behind each other.
  • Lock contention — more threads fighting for the same lock (connection pool, app cache, GC) means more time waiting than working.
  • Single-threaded bottlenecks — Node's event loop is one thread; adding cores does nothing for it unless you cluster.

Realistic factor: doubling vCPU buys ~1.3–1.7×, not 2×. That's why vertical hits diminishing returns around the 16-vCPU mark.

Horizontal = more boxes. Add another pod, another replica, another shard. Linearly scalable in theory; in practice limited by coordination, shared resources (DB), and Conway's law.

Rule of thumb: scale vertically first (it's cheap effort), then add read replicas, then introduce caches, only then shard or split services.

What are compound indexes?

An index on more than one column, in order: CREATE INDEX idx ON orders (user_id, created_at). The index is sorted first by user_id, then by created_at within each user.

It can serve queries that match a prefix: WHERE user_id = 42 (uses index), WHERE user_id = 42 AND created_at > '2024-01-01' (uses index). It can't efficiently serve WHERE created_at > '2024-01-01' alone — the leading column isn't in the predicate.

Senior tip: order columns from most-selective to least-selective, and put equality predicates before range predicates.

Explain indexing in databases.

An index is a separate B-tree (mostly) that maps column-value → row-location. Without an index, the DB scans every row (O(N)). With an index, it walks the tree in O(log N) — finding 1 row in a billion takes ~30 comparisons.

Index types you'll be asked about:

  • B-tree — default, supports equality and range.
  • Hash — only equality, no range; rarely used.
  • GIN / GiST (Postgres) — full-text, JSONB, geographic.
  • Covering / include — store extra columns inside the index so the DB never touches the heap.
  • Partial — index only matching rows (WHERE status = 'ACTIVE'). Smaller, faster.
How do indexes improve query performance, and what are the drawbacks?

Benefit: O(log N) instead of O(N) lookups. A query that took 10 seconds on 100 M rows can drop to 5 ms.

Drawbacks:

  • Writes get slower — every INSERT/UPDATE/DELETE must also update every index. 5 indexes on a table = 5× the write amplification.
  • Disk space — indexes are often 30–50% the size of the table itself.
  • Maintenance — bloated / fragmented indexes need rebuilds.
  • Index-not-used surprises — wrong column order, function on indexed column (WHERE LOWER(email) = ...), low selectivity. Always check with EXPLAIN ANALYZE.
Explain database normalization vs denormalization.

Normalization — split data into many tables, each fact stored once. Goes up to 3NF/BCNF. Pros: no update anomalies, smaller storage. Cons: every read needs joins, joins are expensive at scale.

Denormalization — duplicate data on purpose for read speed. The orders table stores customer_name even though it's also in customers. Pros: fewer joins, blazing reads. Cons: when a customer renames themselves, you have to update both — and you can drift.

OLTP (transactional) tends normalized; OLAP / analytics / wide reads tend denormalized (star schema, materialized views). Realistic systems sit somewhere in between — normalize first, denormalize where reads demand it.

How would you optimize slow database queries?

The checklist, in order:

  • 1. Run EXPLAIN ANALYZE. Identify seq scans on large tables, expensive joins, sort spills.
  • 2. Add or fix the index. Most slow queries are missing an index on the WHERE/JOIN/ORDER BY column.
  • 3. Reduce scanned rows. Push predicates earlier (filter before join, not after); add LIMIT.
  • 4. Fix N+1. If you're doing 100 SELECTs in a loop, batch them into one with IN (...) or a join.
  • 5. Cache. If the result rarely changes, cache it in Redis.
  • 6. Materialize / pre-aggregate. For repeated heavy aggregations, store the pre-computed result.
  • 7. Read replica. If the query is OK-slow but you don't want it killing the primary.
  • 8. Last resort — restructure schema or shard.
Explain the N+1 query problem.

You fetch a list of 100 orders. Then for each order you fetch its customer — 100 separate queries. That's 1 (the list) + N (one per row) = N+1 queries, when it could have been 1 (a join) or 2 (the list + one batched IN (...)).

ORMs love to do this silently — Hibernate, Sequelize, ActiveRecord. The fix is either eager loading (JOIN FETCH in JPA, include in Sequelize) or a batched WHERE customer_id IN (...) after the first query. Always look at the actual SQL the ORM emits — many "slow API" tickets are a hidden N+1.

Database replication and failover.

Replication = keep one or more standby copies of the primary, kept up to date by streaming the WAL (write-ahead log) or change events. Modes:

  • Synchronous replication — primary waits for replica's ack before saying "committed". Strong consistency, higher write latency, the replica being down can stall writes.
  • Asynchronous replication — primary commits locally and forwards changes in the background. Low latency, but on failover you might lose the last few committed transactions.

Failover = when the primary dies, promote a replica to be the new primary. Tools: Patroni for Postgres, Sentinel for Redis, managed services do it for you (RDS, Aurora). The interview gotcha: failover is rarely fully automatic and never zero-downtime — there's a 10–60s window where the new primary is being chosen and clients are reconnecting.

Read replicas vs primary database.

Primary handles all writes (and reads that need strong consistency). Replicas handle reads that tolerate seconds-stale data — dashboards, reports, analytics, list views. A typical web app has 80%+ reads, so adding 2–3 replicas can lift read capacity 3–4× without changing the schema.

The trap: replica lag. A user creates an order on the primary, then refreshes the page — the read goes to a replica that hasn't seen the write yet. The user sees "no orders found." Either route read-after-write to the primary, or use a session-pinned reader, or design the UI to tolerate "your order is being processed."

Caching strategies in distributed systems.

Six patterns, covered in depth on the Caching Strategies deep-dive. The two-line summary:

  • Cache-aside (lazy load) — most common. App checks cache; on miss, queries DB and populates cache. Simple, but first request is slow.
  • Read-through — cache library handles the DB load on miss. Same outcome as cache-aside, cleaner code.
  • Write-through — write to cache and DB synchronously. Stronger consistency, slower writes.
  • Write-behind / write-back — write to cache, flush to DB asynchronously. Fast writes; risk of data loss on cache failure.
  • Refresh-ahead — proactively refresh hot keys before they expire. Good for predictable hot data.
  • Cache-aside + TTL + jittered TTL — practical default; jitter prevents thundering herd on simultaneous expiry.
Cache invalidation strategies.

Phil Karlton: "There are only two hard things in computer science — cache invalidation and naming things." The four patterns:

  • TTL (time-to-live) — set a 5-min expiry; accept stale data within that window. Simplest, works for most cases.
  • Write-through — invalidate or update cache on every write. Strong consistency at the cost of write latency.
  • Event-driven invalidation — service that writes the DB also publishes a UserUpdated event; cache subscribes and evicts. Good for multi-cache fleets.
  • Versioned keys — instead of invalidating, bump the version. user:42:v3 instead of user:42. Old data ages out naturally.
Redis use cases in system design.

Way more than just "cache." The classic eight:

  • Cache — Cache-aside in front of Postgres.
  • Session store — replace sticky sessions; any pod can serve any user.
  • Rate limiterINCR + EXPIRE or token-bucket via Lua script.
  • Distributed lock — Redlock pattern (with caveats about clock skew).
  • Leaderboard / counters — sorted sets (ZADD, ZRANGE).
  • Pub/sub — lightweight messaging (Redis Streams for durability).
  • Real-time presence — "who's online" via short-TTL keys.
  • Idempotency keys — store request fingerprints with TTL.
08 · REST APIs & security

The contract between client and server

Explain HTTP methods and their meanings — GET, POST, PUT, PATCH, DELETE.
MethodIntentIdempotent?Safe?
GETRetrieve a resourceYesYes (no side effects)
POSTCreate / submit / processNoNo
PUTFull replacement of a resourceYesNo
PATCHPartial updateUsually yes (depends on semantics)No
DELETERemove a resourceYesNo

Idempotent = N identical requests = same effect as 1 request. Safe = no server state change. The reason these matter: idempotent methods are safe to retry, browsers/proxies can cache safe methods, and POST is the one that needs an idempotency key if you want safe retries.

RESTful API design best practices.
  • Use nouns for resources, not verbs: POST /orders, not POST /createOrder.
  • Plural collections: /users, /orders.
  • Nested resources up to two levels: /users/42/orders is fine; /users/42/orders/7/items/3/comments is not — flatten.
  • Use HTTP status codes correctly: 200 OK, 201 Created (with Location header), 204 No Content, 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 422 Unprocessable, 429 Too Many Requests, 5xx for server faults.
  • Return JSON, ISO-8601 timestamps, snake_case or camelCase (pick one and be consistent).
  • Paginate everything with ?limit=&cursor=. Don't ever return unbounded lists.
  • Filtering & sorting via query params: ?status=active&sort=-created_at.
  • Version — see below.
REST vs GraphQL.

REST exposes resources; GraphQL exposes a single endpoint with a query language. With REST, a complex screen often needs 4-5 round-trips and over-fetches each time. With GraphQL, the client asks for exactly the fields it needs in one request.

Pick REST when: you have many lightweight clients, you want HTTP caching to "just work", you have public APIs (REST is more discoverable for outside developers). Pick GraphQL when: you have one mobile + one web client with rich screens, the over-fetch tax is real, and you can invest in a schema + dataloader infrastructure to avoid N+1s under the hood. Don't pick GraphQL because it's trendy — operationally it's heavier (caching, rate-limiting, query-cost analysis are all new problems).

Idempotent APIs — what are they?

APIs where calling them N times has the same effect as calling them once. GET, PUT, DELETE are idempotent by HTTP spec. POST is not — but you can make a specific POST idempotent by accepting an Idempotency-Key header: the server stores (key → result), so a retry with the same key returns the original result without re-executing.

This is essential for any operation with side effects under unreliable networks — payments, charges, signups. Stripe, PayPal, Razorpay all implement this.

How do you version APIs?

Three options:

  • URL versioning/v1/orders, /v2/orders. Most explicit, easiest for clients to grep for in logs. The standard choice.
  • Header versioningAccept: application/vnd.acme.v2+json. Keeps URLs clean but harder to debug and test from a browser.
  • Query-param versioning/orders?version=2. Easy to forget; not recommended.

Versioning policy matters more than the mechanism: support at least N-1, deprecate with a deprecation header and a 6-month sunset date, monitor usage of old versions so you can actually retire them.

How do you secure backend APIs?

Defense in depth — every one of these matters:

  • TLS everywhere — HSTS, modern cipher suites, no plain HTTP.
  • AuthN — JWT or session cookies; never roll your own.
  • AuthZ — check permissions on every endpoint; don't trust the client's claims.
  • Input validation — strict schema (zod, JSON schema, Joi). Reject anything that doesn't match.
  • SQL injection — parameterized queries always; never string-concat user input into SQL.
  • Rate limiting — per-IP and per-user.
  • CORS — explicit allowlist, not *, when credentials are involved.
  • Secrets — Vault / AWS Secrets Manager / GCP Secret Manager. Never in code, never in env files in the repo.
  • Audit logging — log every authN/Z decision, every admin action.
  • OWASP Top 10 — know it. The interviewer might quiz you on injection, broken auth, sensitive data exposure, SSRF, deserialization.
Rate limiting implementation.

Four common algorithms:

  • Fixed window — count requests per minute per user; reset on minute boundary. Simple but bursty at boundaries (can take 2× the limit in the second straddling minute 0).
  • Sliding window log — store timestamps of recent requests; count those in last 60s. Accurate but memory-heavy.
  • Token bucket — bucket holds N tokens, refills at rate R/sec; each request consumes one. Allows bursts up to bucket size, smooth over time. The most common in production (AWS, Stripe).
  • Leaky bucket — fixed-rate queue; overflow rejected. Smooths out traffic with no burst. Used for outgoing rate limiting / shaping.

Implement in Redis with INCR + EXPIRE for fixed window, or a Lua script for token bucket. Apply at the gateway, not inside each service.

Authentication vs authorization.

Authentication (AuthN) — "who are you?" Verifying identity via password, OTP, OAuth, SSO. Result: a verified user identity.

Authorization (AuthZ) — "what can you do?" Verifying permissions for an action. Result: allow or deny.

You always do AuthN once (login), then AuthZ on every request (RBAC, ABAC, or ACL checks). Mixing them up — "this user is logged in so they can do anything" — is a privilege escalation bug waiting to happen.

JWT vs session-based authentication.
Session cookieJWT
Where state livesServer (session store)In the token, signed
Server lookup per requestYes (Redis hit)No (just verify signature)
RevocationEasy — delete sessionHard — until token expires
Cross-domainCookies are limitedEasy via Authorization header
SizeSmall (cookie = ID)Larger (claims + signature)
Best forWeb app, single domainMobile + multiple services, stateless backends

The senior nuance: JWT's "no DB lookup" benefit is real but its revocation problem is also real. Production systems usually combine: short-lived JWT (15 min) + long-lived refresh token in DB. Refresh-on-revoke gives you both stateless speed and timely revocation.

API gateway role in microservices.

The gateway is the single front door — every external request goes through it before reaching a service. It does:

  • Routing — path-based or host-based dispatch to the right service.
  • TLS termination — one cert to manage, not N.
  • AuthN — verify JWT once; downstream services trust the header.
  • Rate limiting + throttling — protect downstreams from spikes.
  • Request/response transformation — version translation, header injection.
  • Observability — first place to capture metrics, logs, trace IDs.
  • WAF — bot mitigation, OWASP-style attack signatures.

Common implementations: Kong, AWS API Gateway, Envoy / Istio ingress, Nginx Plus, Apigee. Without one, every microservice has to re-implement all of the above.

09 · Node.js deep dive

Single-threaded? Then how does it scale?

Node.js questions test whether you understand the event loop well enough to know why things go slow. "It's single-threaded" is the headline; the substance is everything that flows from it.

Explain Node.js architecture.

Three pieces wired together:

  • V8 — Google's JavaScript engine. Executes JS, manages the heap, runs the garbage collector.
  • libuv — C library that gives Node its event loop, its thread pool (default 4 threads for fs / DNS / crypto), and OS-level async I/O (epoll on Linux, kqueue on BSD).
  • Node bindings + standard library — fs, http, net, etc. — bridges between JS and libuv's C API.

The model: one thread runs your JS, and that thread never blocks on I/O. When you call fs.readFile, Node hands the actual disk read to libuv's thread pool (or kernel async I/O), registers a callback, and goes back to processing other events. When the read finishes, libuv pushes a "callback ready" event onto the loop; the next loop tick runs your callback.

How does the Node.js event loop work?

The loop has six phases, executed in order, forever:

  • TimerssetTimeout / setInterval callbacks whose time has elapsed.
  • Pending callbacks — OS-deferred callbacks from the previous loop (e.g. TCP errors).
  • Idle, prepare — internal.
  • Poll — the main phase; reads new I/O events and runs their callbacks. Blocks here if there's nothing else to do.
  • ChecksetImmediate callbacks.
  • Close callbackssocket.on('close', ...) etc.

Between every phase, Node drains the microtask queue: resolved promises (.then) and process.nextTick. nextTick runs before promise microtasks. That's why process.nextTick in a loop can starve the event loop entirely.

The single most useful sentence: "The event loop processes one callback at a time on one thread. If your callback takes 500 ms of CPU, every other request is paused for 500 ms."

How does Node.js handle concurrency?

By not blocking. One thread can handle thousands of in-flight connections because none of them are holding the thread — they're all parked waiting for I/O at the OS level. When data arrives for any of them, the OS notifies libuv, libuv puts the callback on the loop, and the single JS thread runs it.

For CPU work, you need to step outside the main thread: worker_threads (multiple V8 isolates in one process, shared memory via SharedArrayBuffer) or cluster (multiple Node processes sharing a port).

What are common performance bottlenecks in Node.js?
  • CPU-heavy work on the main thread — big JSON.parse, sync regex, image processing, password hashing with bcrypt.hashSync.
  • Blocking the event loop with fs.readFileSync, sync crypto, or a long synchronous loop.
  • libuv thread pool exhaustion — default 4 threads, used by DNS, fs, some crypto, zlib. Saturated under high concurrency.
  • Memory leaks — uncleared setInterval, growing global caches, listeners that aren't removed.
  • Database connection pool too small — requests queue waiting for a free connection.
  • Slow downstream APIs with no timeout — every in-flight call holds memory and a connection.
How do you debug memory leaks in Node.js?
  • Trend the heap. Watch process.memoryUsage().heapUsed over time. If it climbs and never drops after GC, you have a leak.
  • Take heap snapshots via --inspect in Chrome DevTools or v8.writeHeapSnapshot(). Take three snapshots over time; compare "Objects allocated between snapshots" — the type that keeps growing is the leak.
  • Common culprits: forgotten event listeners (use EventEmitter.setMaxListeners warnings), closures pinning big objects, Map/global cache that never evicts, unclosed DB connections, accidental retention via Promises that never settle.
  • Tools — clinic.js (clinic doctor, heapprofiler), 0x (flame graphs), --prof, Datadog continuous profiler.
How do you scale Node.js applications?
  • Cluster mode — spawn N worker processes (one per core), all listening on the same port via the OS. Linear scaling for CPU-bound work, since each worker is a separate V8 isolate.
  • Horizontal pods behind a load balancer — same idea but at the orchestrator level (Kubernetes).
  • Offload CPU to worker_threads — image resize, PDF generation, anything CPU-heavy that you want on the same machine as the main process.
  • Push slow work to a queue — emails, reports, anything not in the user's wait path.
  • Increase libuv thread poolUV_THREADPOOL_SIZE=16 if you're doing lots of DNS / crypto / fs.
Cluster mode in Node.js.

Built into Node via the cluster module (or, more commonly today, PM2 / Kubernetes replicas). The master process forks N workers, each a full Node process with its own V8 isolate. They share a port — the OS does the load balancing across them via SO_REUSEPORT or master-routed accept.

Why bother: a single Node process uses one core. On an 8-core box, cluster mode (or 8 pods) gives you 8× the throughput for CPU-bound work. For pure I/O-bound work the speedup is smaller but still real.

Worker threads vs child processes.
worker_threadschild_process
MemoryShared (SharedArrayBuffer)Separate
Startup cost~10 ms (thread)~30–100 ms (process)
IPCpostMessage, structured clonestdio / IPC channel, JSON
IsolationSame process — crash takes bothSeparate — isolated failure
Best forCPU-heavy JS tasksRunning other binaries (ffmpeg, python)
How would you optimize a slow Node.js API?

The funnel:

  • Profile first. APM trace or clinic.js. Don't guess.
  • Cache reads. Redis cache-aside for hot lookups.
  • Batch DB calls. Find and kill N+1s.
  • Indexes. EXPLAIN ANALYZE every query in the hot path.
  • Async / parallelize. If you have 3 independent DB calls, Promise.all them — don't await sequentially.
  • Push side effects to a queue — don't wait for email/PDF.
  • Increase thread pool if libuv is the bottleneck.
  • Move CPU work to a worker thread.
How do you handle CPU-intensive tasks in Node.js?

Three escape hatches, in order of severity:

  • Break the work into chunks and yield via setImmediate between chunks, so the event loop can process other requests in between.
  • worker_threads — for in-process CPU work like PDF rendering, image processing, JSON parsing of huge payloads.
  • Offload to a different service — push the job to a queue, let a worker fleet (in Node or in a more CPU-friendly language like Go or Rust) handle it.

The cardinal sin: doing CPU work on the main thread of a Node server. One slow request poisons the well for everyone else on that pod.

Express.js middleware lifecycle.

Middleware is a function with signature (req, res, next) => { ... }. They run in the order registered (app.use). Each one either:

  • Mutates req/res and calls next() to pass to the next middleware.
  • Calls next(err) to skip to the error-handling middleware (any function with 4 args: (err, req, res, next)).
  • Sends a response — terminates the chain.

Typical stack: helmetcorsbody-parser → request logger → auth → rate-limit → route handlers → error handler (last). Errors thrown inside async handlers need express-async-handler or Express 5 to be caught — Express 4 doesn't catch async throws automatically.

10 · Resilience patterns

Consistency tradeoffs, retries, queues, deployments

Explain consistency vs availability tradeoff.

See CAP above. The practical version: when a part of your system is misbehaving (slow, unreachable, lagging), you can either fail the request (preserve consistency — the user sees an error) or serve possibly-stale data (preserve availability — the user sees something, even if outdated). Bank transfers favor consistency; social timelines, search, recommendations favor availability.

Strong consistency vs eventual consistency.

Strong / linearizable consistency — every read sees the latest write. Required for things like balances and inventory ("don't sell the last item twice").

Eventual consistency — reads might see stale data for a brief window, but if writes stop, all replicas converge. Required for global scale. Most NoSQL stores, multi-region setups, and async pipelines are eventually consistent.

The hidden third tier: read-your-writes consistency — your own writes are immediately visible to you, even if not to others. Often achieved by routing post-write reads to the primary for the next few seconds.

What is distributed locking?

A way for multiple processes / pods to coordinate on a shared resource. "Only one worker may process job 42 at a time." Typical implementations:

  • Redis SET NX EXSET lock:job42 worker-7 NX EX 30 succeeds if no one holds the lock; the EX gives an automatic timeout in case the holder crashes.
  • Redlock — multi-instance Redis variant; controversial (clock skew issues).
  • ZooKeeper / etcd — strongly-consistent coordination service. The "right" tool for serious distributed locking; heavier ops.
  • DB row lockSELECT ... FOR UPDATE. Simple if you already have a DB.

The gotcha: a lock without an expiry is a deadlock generator. Always set a TTL. And know that any TTL-based lock can fail catastrophically if your worker hangs past the TTL — then a second worker also holds the lock and you have two writers. Truly correct distributed locking needs fencing tokens.

Circuit breaker pattern.

Wrap calls to a flaky downstream. The breaker has three states:

  • Closed — calls flow normally; failures are counted.
  • Open — failure rate over threshold; calls fail-fast without even trying the downstream, for a cool-off period (~30s).
  • Half-open — after cool-off, let a few probe calls through. If they succeed, close the breaker; if they fail, back to open.

Why: a downstream stuck at 30-second timeouts means every call ties up a thread / connection for 30s. With 100 RPS and a 30s timeout, you'd have 3000 in-flight calls within 30s — your service falls over. The breaker stops that cascade.

Libraries: resilience4j (Java), Hystrix (deprecated), Polly (.NET), opossum (Node).

Retry mechanisms and exponential backoff.

Retry only transient failures — 5xx, timeouts, network errors. Never retry 4xx (the request is wrong; retrying won't fix it). Always retry with:

  • Exponential backoff — 100ms, 200ms, 400ms, 800ms... so you don't hammer the downstream.
  • Jitter — add randomness (±20% of the delay). Otherwise 1000 clients all retry at exactly 200ms and create a synchronized thundering herd.
  • A maximum retry count (3–5) and a maximum total time budget. Don't retry forever.
  • Idempotency keys — if you retry a POST /charge, the server must not double-charge.
Dead Letter Queue (DLQ).

A separate queue where messages that have failed processing too many times get parked. After 5 retries, instead of looping forever, the consumer puts the message on the DLQ along with the failure reason. A human (or a script) inspects the DLQ, fixes the root cause (bug, schema mismatch, malformed payload), and replays the messages.

Without a DLQ, a single poison message can block its partition forever — every retry fails, the offset never advances, and the queue lag grows unboundedly.

Bulkhead pattern.

Borrowed from shipbuilding — a ship's hull is divided into watertight compartments so one breach doesn't sink the ship. In software: isolate calls to different downstreams in separate thread pools / connection pools so one slow downstream can't exhaust the pool and starve calls to the others.

Example: in a service that calls Payment and Notifications, give them separate connection pools of 20 each. If Payment hangs and saturates its 20 connections, Notifications still has 20 free. Without bulkheads, a shared pool of 40 gets exhausted by Payment and Notifications calls fail too — the failure cascades.

Rate limiting algorithms — token bucket, leaky bucket.

Both already covered above. The one-liner difference: token bucket allows bursts (you accumulate tokens up to a cap); leaky bucket doesn't (output is at a strictly fixed rate). Token bucket is what most public APIs use; leaky bucket is more common in network gear / traffic shaping.

API timeout handling.

The rule: every outbound call has a timeout. Default HTTP clients in many languages have no timeout. A single hanging call can hold a connection forever and pin a thread.

Set timeouts at multiple layers: connect timeout (1–3s), read timeout (depends on the endpoint, but 5–30s is typical), and an overall request budget. When a timeout fires, decide whether to retry (idempotent + transient) or fail-fast. Surface the timeout as a 504 to the upstream caller, with a clear log line that you can trace later.

Message queues vs streaming systems — Kafka, RabbitMQ, SQS.
RabbitMQ / SQSKafka
ModelQueue — message consumed once and goneLog — messages persist, consumers track offsets
ReplayNo (once consumed, lost)Yes — rewind to any offset
Multiple consumers of same messageVia exchanges / fan-outBuilt-in via consumer groups
ThroughputTens of thousands / secMillions / sec
OrderingPer queue (limited)Per partition (strong)
Best forTask queues, work distributionEvent streams, audit logs, replayable pipelines

Quick picker: Kafka when you want event sourcing, multiple downstream consumers, or replay. RabbitMQ / SQS when you want simple work distribution to a worker fleet.

Kafka partitioning and consumer groups.

A Kafka topic is split into partitions — each is an append-only log on disk. Messages with the same key always land in the same partition (so per-key ordering is preserved).

A consumer group is a set of consumers cooperating to read a topic. Kafka assigns each partition to exactly one consumer in the group. If you have 6 partitions and 3 consumers, each consumer gets 2 partitions. Add a 4th consumer → Kafka rebalances. Hit 7 consumers → one sits idle because partitions cap the parallelism.

Three takeaways: (1) Partition count is your concurrency ceiling per consumer group. Plan it generously up-front — you can add partitions later, but old keys stay in old partitions. (2) Different consumer groups can read the same topic independently (each tracks its own offset). (3) During rebalance, processing pauses — keep partition reassignment infrequent.

How do you design fault-tolerant systems?

Compose the patterns: redundancy at every layer (multi-AZ, multi-replica), health checks + automatic failover, timeouts + retries with backoff, circuit breakers, bulkheads, idempotent operations, DLQs, graceful degradation (fall back to a cached or simpler response when a downstream is down). And the meta-rule: chaos-test it. If you've never simulated the failure, you haven't tested fault tolerance.

High availability vs fault tolerance.

HA = the system stays available most of the time (99.9%+), usually via redundancy and quick failover. There may be a few seconds of disruption during failover.

Fault tolerance = the system continues operating with no perceptible disruption when something fails. Stronger guarantee, more expensive. Active-active replication, no-downtime failover.

For most business systems, HA is enough; fault tolerance is reserved for the truly mission-critical (trading, life-safety, telecom switches).

Blue-green vs canary deployment.

Blue-green — two identical production environments, "blue" (live) and "green" (idle). Deploy to green, smoke-test, then flip the load balancer to point at green. Rollback is one click — flip back. Cost: 2× the infrastructure during deploys.

Canary — deploy the new version to a small slice of traffic (say 5%), watch metrics, gradually increase to 100%. Catches regressions before they hit everyone. Needs a feature-flag / weighted-routing system.

Most modern teams do canary (cheaper, finer-grained). Blue-green still has its place for stateful systems where you can't mix versions.

How do you perform zero-downtime deployment?
  • Rolling deploy — replace pods one (or a few) at a time, behind a load balancer with health checks. The LB only sends traffic to healthy pods.
  • Graceful shutdown — when a pod gets SIGTERM, stop accepting new connections, finish in-flight ones, then exit. Otherwise mid-request connections are killed.
  • Backwards-compatible API + schema changes — never deploy a breaking change in one shot. Use expand-and-contract: add the new column/field, deploy the new version that uses it, remove the old field later. Same for API contracts.
  • Database migrations decoupled from app deploys — additive only, in advance.
Load balancer strategies.
  • Round-robin — simplest, distribute evenly. Bad if backends have different capacities.
  • Least-connections — send to the backend with fewest active connections. Good default for HTTP.
  • Weighted — give bigger boxes more weight.
  • IP hash / consistent hash — same client always lands on same backend (needed for some caching / stateful flows).
  • Latency-based — DNS/L7 routes to the closest or fastest region.
Sticky sessions vs stateless services.

Sticky sessions — load balancer pins a user to the same backend pod, so the pod's in-memory session works. Easy, but breaks when that pod dies (user logged out) and prevents true horizontal scaling (one popular user can hot-spot a pod).

Stateless services — session state lives in Redis or a JWT, so any pod can serve any request. Pods are interchangeable, easy to autoscale, zero-downtime deploys are simple. This is the modern default — sticky sessions are a tactic for legacy or websocket-heavy systems.

CDN usage in backend architecture.

A CDN is a globally-distributed cache that lives between users and your origin. It serves three jobs:

  • Static assets — JS, CSS, images, fonts cached at edges. Saves you bandwidth and lets users in Mumbai load assets from Mumbai instead of Virginia.
  • Cacheable API responses — public data with short TTLs (catalog pages, public profiles). Set Cache-Control: public, max-age=60 and the CDN does the rest.
  • DDoS protection + WAF — most CDNs (Cloudflare, Akamai, CloudFront) absorb volumetric attacks before they reach your origin.
Reverse proxy role.

An nginx / Envoy / HAProxy sitting between the public internet and your app servers. Functions:

  • TLS termination — handle HTTPS once, pass HTTP internally.
  • Load balancing across upstream pods.
  • Compression, caching, static file serving.
  • Request routing — path-based, host-based.
  • Connection pooling / keep-alive — fewer expensive TCP handshakes.
  • Buffer slow clients so they don't tie up your app threads.

API Gateway = reverse proxy + auth + rate-limit + metrics + (often) API-specific features like request transformation.

11 · Project & deep-dive follow-ups

The "tell me about your project" round

After all the conceptual questions, the panel usually circles back to your real work. These last questions are the ones that distinguish a candidate who has only studied from one who has actually shipped and felt the pain.

Draw the complete request flow of your project.

Same diagram as Section 1, but narrated for one specific user action. "Sarah taps Place Order on the mobile app at 7:42 PM Tuesday. (1) Mobile app sends POST /orders with JWT and an Idempotency-Key. (2) Cloudflare edge routes to the closest region. (3) API Gateway validates JWT, applies per-user rate limit, attaches a trace ID. (4) Order Service receives the request, opens a Postgres transaction, writes the order in PENDING + writes OrderCreated to its outbox table, commits. (5) Outbox poller picks up the event, publishes to Kafka. (6) Inventory Service consumes OrderCreated, reserves stock, publishes InventoryReserved. (7) Payment Service consumes that, calls Stripe with the idempotency key, publishes PaymentCaptured. (8) Shipping Service consumes, books pickup, publishes OrderShipped. (9) Order Service consumes the terminal event and flips the order to CONFIRMED. (10) Mobile app polls / receives push, updates the screen. End-to-end p99 ~1.8 seconds; the payment hop dominates."

That walkthrough — naming concrete payloads, concrete latencies, concrete failure-handling — is what makes you sound like an engineer who has actually run this in production.

Explain one production issue you solved.

Pick one with a clean three-act structure: symptom → investigation → fix. Example: "Last quarter our checkout p99 jumped from 400 ms to 4 seconds, only on weekends. Took 4 hours to root-cause. The symptom was DB CPU saturating during the spike. Tracing showed an N+1 on the cart-summary endpoint — for every cart with N items we did N+1 SELECTs to fetch product details. Locally this was 5 items per cart; in production during weekend sale traffic, carts had 40-80 items. Fix: replaced the per-item lookup with a single WHERE id IN (...) + added an index. P99 dropped to 280 ms. Lesson learned: load testing must use production-shaped data, not 10-row fixtures."

Always end with the lesson — that's what makes it senior-level storytelling.

Explain one scalability challenge you handled.

Same structure. The lesson should usually be about a tradeoff: "we sharded by customer_id which solved write throughput but made our admin queries (which span all customers) slow — so we added a read replica feeding a denormalized analytics table to handle those." Tradeoffs are the senior-engineer signal.

Explain one architecture decision you took and why.

Pick a decision where the obvious answer was wrong. "For our refund pipeline, the obvious move was synchronous — user clicks Refund, API charges the gateway, returns success. We made it async via Kafka instead because (1) Stripe occasionally takes 30s, which would have blown our SLA, and (2) we wanted at-least-once retry semantics for free if Stripe was down. The cost was a more complex UI — the user sees 'Refund in progress' for a few seconds. We judged that worth it. Six months in, when Stripe had a 2-hour outage, the queue absorbed the load and refunds completed automatically once Stripe recovered — exactly the win the design was for."

What would you improve in your current architecture?

Two ingredients: be specific, and frame it as a tradeoff you've consciously deferred — not as a regret. "Our Postgres write primary is a single point. We've designed the shard key (customer_id) but haven't pulled the trigger because (a) we're at 30% capacity headroom, (b) sharding triples our ops complexity, and (c) we have 18 months of runway before we need it. If I had to pick what I'd start today, it would be the outbox pattern across all services — three of our seven services still do synchronous double-writes, which means we've had two production incidents where the DB committed but the event publish failed."

That answer signals: you know where the seams are, you've thought about the cost, and you can make a judgment call instead of cargo-culting a "fix everything" answer.

What tradeoffs did you make in your design?

List them as named tradeoffs:

  • "Chose eventual consistency over strong consistency for the order-fulfilment chain — we accept a 2-3 second window where the order shows PROCESSING in the UI. Bought us full async + retry semantics."
  • "Chose choreography over orchestration — 3 services in the chain, didn't want the orchestrator overhead. Will revisit if it grows to 6+."
  • "Cached aggressively, accepted 60s staleness on product pages — drops DB read load 20×, the catalog team is fine with the lag."
  • "Picked Postgres + Redis over DynamoDB — simpler ops for our team's skillset, paid for it with a manual sharding plan when we get there."
If traffic suddenly becomes 100×, what breaks first?

The senior answer always names a specific bottleneck and the metric you'd watch. "Our write primary on Postgres. It runs at ~3K TPS today with headroom to ~8K. At 100× traffic, even with caching absorbing reads, writes 30× because the conversion-funnel ratio is roughly fixed. So Postgres pegs first, around hour two of the spike. The signal would be IOPS saturation on the primary's EBS volume before CPU does. Mitigation already in the runbook: temporarily disable the secondary indexes on the audit table (least-critical), buy us another 2× while we throw replicas at the read load and slow the spike with rate-limiting at the gateway."

If you can't name a specific component and a specific metric, the panel knows you've never been on call for a real spike. The fact that you can name what breaks first is the answer — it shows you've thought about your system as a chain of capacities, not as a marketing diagram.

How to use this page in interview prep. Don't memorize. Pick five questions from any section, close the page, and answer them out loud in 60–90 seconds each — like you would in the room. Then come back and compare. The patterns repeat across questions (idempotency, story-arc walkthroughs, "what breaks first") — once you can deliver those reflexively, the surface-level question variations stop mattering.
Companion deep dives. Pair this Q&A with the focused walkthroughs:  Caching Strategies · Kafka Interview · SQL vs NoSQL · Backend Fundamentals · Distributed Cache HLD.