Every backend interview question that shows up in senior rounds — architecture, microservices, sagas, idempotency, observability, databases, Node.js, resilience — answered the way you'd actually say it in the room.
Q · / answer pairs. Architecture-style questions (design an ordering system, draw your project) carry a Mermaid diagram and a story-arc walkthrough. The shorter conceptual questions get tight, story-flavored answers — the kind you can deliver in 60–90 seconds when the panel asks.
This is almost always the first question after introductions. The panel doesn't actually want a tour of every microservice you own — they want to see whether you can tell a story about your system: what comes in, where it goes, what each box exists to solve, and what would break first under load. Drawing a clean diagram and narrating it well is roughly 30% of the round.
Open with the user-facing story, not the boxes. "A customer opens our checkout page on their phone. By the time the button turns from Pay to Order Confirmed, the request has touched seven services, two databases, a cache, and a message bus. Let me draw what that path looks like." Then draw — and only then label what each box is.
A clean five-tier diagram you can reproduce on any whiteboard:
Now narrate left-to-right, answering "what is it, and what would break without it?" for each box:
OrderPlaced, PaymentCaptured, InventoryReserved events flow through it. Without it, services would be tightly coupled via synchronous calls and any one slow downstream would topple the chain.Close with what breaks first. "If traffic 10×'d tomorrow, the first thing to fold would be the Postgres write primary — single-writer, single point. We've already shard-keyed the order table by customer_id, so the migration path is ready when we need it." This sentence — naming the next bottleneck — is the single most senior-sounding thing you can say in this question.
Same shape as above, but lean harder into three story beats: the request path (gateway → service → DB), the event path (service → Kafka → workers), and the data plane (which service owns which table, which cache). If you remember nothing else, remember that an interviewer is satisfied when you've named the boundaries — who owns what, who calls whom synchronously, and what crosses the bus.
Mention concrete numbers if you have them: "we serve ~120 RPS at p99 of 180 ms; payment is the slowest call at 600 ms p99, which is why it's async." Numbers move you from "candidate who has seen architecture diagrams" to "candidate who has run one in production."
There's no magic answer — there's a checklist you walk in the right order. Use this five-step ladder:
The trick in the interview is to stop and ask: "what's the read/write ratio?" and "what's the consistency tolerance?" These two answers tell you which of the five steps actually matters for this system.
Same five steps as above, plus three operational realities:
And the unglamorous truth: most "millions of RPS" systems are actually "millions of requests/day with a hot 50K RPS window." Ask for the daily / peak distribution before you over-design.
Half of all "system design" interviews come down to whether you can justify your split. The wrong answer is "microservices because they're scalable." The right answer always starts from a constraint — team size, deploy independence, blast radius, scaling shape — and works backward to the architecture.
Frame this as a tradeoff, not a victory. "We picked microservices when our team grew past ~30 engineers and we kept stepping on each other in a single repo — every deploy needed sign-off from four people. The split bought us independent deploys at the cost of a Kafka cluster, a service mesh, and three new failure modes."
If you went the other way: "We're a five-engineer team. A monolith means one repo, one deploy pipeline, in-process function calls instead of network hops, and ACID transactions for free. The day we hit ten engineers or a real bi-modal scaling need, we'll start carving off services." This is a strong answer — it shows you understand microservices are a solution to an organizational problem first, a technical one second.
| Dimension | Monolith | Microservices |
|---|---|---|
| Deploy | One artifact, one pipeline | Independent per service |
| Local dev | Run the whole thing | Mocks, contracts, docker-compose hell |
| Transactions | ACID in-process | Saga / 2PC / eventual consistency |
| Latency | Function call (ns) | Network hop (ms) |
| Failure modes | One process crashes | Cascade, retry storms, partial failure |
| Team scaling | Conway's law bites past ~20 engineers | Each team owns a service end-to-end |
| Cost | Cheap to run | Service mesh + ops overhead |
| Best for | Small teams, unclear domain | Large orgs, divergent scaling needs |
The senior take: start monolith, modularize internally, extract services only when the pain forces you. Premature microservices is the most common architectural sin of the last decade.
Three lenses, in order:
What people get wrong: splitting by technical layer (UserController, UserRepository, UserModel as three services). That's not a service boundary, that's a coupling disaster. Split by business capability.
Beyond the three above, watch for:
The proven path is strangler fig: don't rewrite, peel. Steps:
Once you've split into services, the next question is how they talk. Synchronous REST is the obvious default — and the wrong default for half the calls in a typical system. Event-driven flow is what stops your checkout from failing because the recommendations service is slow.
An architecture where services don't call each other directly — instead they publish events ("OrderPlaced", "PaymentCaptured") to a broker (Kafka, RabbitMQ, SNS), and any service that cares subscribes. The publisher has no idea who's listening and doesn't wait for them.
The mental shift: in REST, the Order service commands the Email service ("send this email, please"). In events, the Order service announces ("an order happened, here's the data"); Email decides for itself whether to act. That inversion is what gives you loose coupling — you can add a fraud-detection consumer next month and the Order service doesn't even know.
Key properties: (1) the producer is done once Kafka acks — it doesn't wait for consumers. (2) Each consumer has its own offset, so Email being slow doesn't slow down Analytics. (3) Events are durable — even if all consumers are down, the event survives in the log; they catch up when they come back.
Three patterns, and a real system uses all three:
The trap is using REST for everything. Three nested REST calls = three failure modes stacked + 3× the latency. The senior instinct is to push as much as possible onto async paths and keep the synchronous path short.
| Synchronous | Asynchronous | |
|---|---|---|
| Caller knows result | Yes, immediately | No — eventual or via callback |
| Coupling | Tight (caller needs callee up) | Loose (broker buffers) |
| Latency | Sum of all hops | Producer returns instantly |
| Failure | Cascades up the call chain | Isolated, retried independently |
| Best for | Reads, validations, "must answer now" | Side effects, fan-out, slow work |
| Tooling | HTTP, gRPC | Kafka, RabbitMQ, SQS, SNS |
Rule of thumb: if the user is waiting for it, sync. If a system is waiting for it, async.
REST is a verb — "do this thing, return the result." Events are a noun — "this happened, here's the data." REST is one-to-one and coupled (the caller must know the callee's URL). Events are one-to-many and decoupled (publisher doesn't care who listens).
In practice you mix them. Public APIs and synchronous queries → REST/gRPC. Internal state changes that other services need to react to → events. A typical request flow: user hits REST endpoint → service writes DB row + publishes event → 4 downstream consumers each do their thing without blocking the user.
The single hardest question in microservices interviews. You can't use a database transaction across service boundaries (each owns its own DB). So how do you place an order, reserve inventory, charge payment, and arrange shipping — and end up consistent even if one of them fails halfway through?
Four services, one event bus, one saga. Let me draw it and walk through what each piece earns its keep doing.
POST /orders, creates an order in PENDING state, publishes OrderCreated. Owns the order lifecycle state machine. Without it, no single service owns "what state is this order in?"OrderCreated, reserves stock atomically (decrement with row lock), publishes InventoryReserved or InventoryRejected. Owns the source of truth for "how many of SKU X are available right now."InventoryReserved, calls the payment gateway, publishes PaymentCaptured or PaymentFailed. The only place that knows how to talk to Stripe/Razorpay.PaymentCaptured, generates a label, schedules pickup, publishes OrderShipped. Order Service consumes that and flips the order to SHIPPED.The reverse path matters more than the happy path. If payment fails: Order Service consumes PaymentFailed, publishes OrderCancelled, Inventory Service consumes it and releases the reservation. This compensation chain is the saga.
The blunt truth: you can't, not in the ACID sense. There's no cross-service "BEGIN TRANSACTION ... COMMIT". You have three options, in roughly increasing order of practicality:
In 95% of real systems the answer is "saga + outbox + idempotent consumers." Say that and be ready to draw it.
2PC says: "I will hold everyone's locks until I'm certain everyone can commit, then we all commit atomically." Strong consistency, terrible availability — one slow participant freezes the cluster.
Saga says: "Each step commits locally and immediately. If a later step fails, I'll run compensations to undo the earlier ones." Eventually consistent, far better availability, but you accept that for a brief window the system is in an "in-flight" state. For business workflows (place order, ship goods, refund) that's almost always the right trade. For "transfer ₹1000 from A to B" inside a single bank you might still want 2PC or a stored procedure.
Two ways to wire the saga together:
OrderCreated, Inventory listens and publishes InventoryReserved, Payment listens and publishes PaymentCaptured... No central brain. Pros: loose coupling, easy to add a new step. Cons: the workflow is implicit — to understand what happens after an order is placed, you have to grep across 5 services.Rule of thumb: 3 or fewer steps → choreography is fine. 5+ steps or strong audit/observability needs → orchestration. The fact that you'd choose differently for a 3-step vs 8-step saga is itself a senior signal.
Networks lose packets. Consumers crash mid-process. Producers retry. Out of these three innocent facts come the three hardest correctness problems in distributed systems: duplicate processing, out-of-order delivery, and partial failure. Senior engineers are the ones who name these by reflex and have a pattern ready.
Three layers, applied together:
And the meta-rule: design for partial failure from day one. The question isn't "what if this service goes down?" — it's "what does my system do while this service is down?"
Idempotency = doing the same operation N times has the same effect as doing it once. The standard recipe:
Idempotency-Key: 7f3a-…) and sends it with every request. Server stores (key → result) in a table or Redis with a TTL (24h is typical).SET balance = 100 is idempotent; UPDATE balance = balance + 50 is not. Prefer set-style operations to delta-style when you can.Stripe's API is the canonical reference here — every POST /charges takes an idempotency key.
Two strategies, used together:
order_id) in a "seen" table. Before processing, check if you've seen it. After processing, insert the ID — in the same transaction as your business write, so either both happen or neither.INSERT ... ON CONFLICT DO NOTHING, or UPDATE ... WHERE status = 'PENDING' (no-op if already updated).Kafka gives you "at-least-once" by default — your consumer code has to make it "effectively-once" via dedup or idempotent writes. Kafka 0.11+ also supports exactly-once-semantics for Kafka-to-Kafka pipelines, but for Kafka-to-external-DB you still need consumer-side dedup.
You can't get global ordering at scale — but you don't need it. You need per-key ordering. In Kafka: pick a partition key (e.g. user_id or order_id) and Kafka guarantees that all events with the same key land in the same partition in the order produced. Within a partition, ordering is total. Across partitions, no ordering.
So all events for order #42 are ordered with respect to each other. Events for order #43 may interleave — and that's fine, they're independent.
The gotcha: if you ever change your partition key, old events keep going to the old partition and new ones to a new one. You can briefly see two events for the same logical entity in different partitions. Plan for that during migrations.
The classic interview gotcha. The consumer's next instance (after restart, or rebalance to a different pod) re-reads from the last committed offset — which is before the message you just processed. So it processes that message again. This is at-least-once delivery in action.
The fix isn't to commit offsets earlier (that would risk losing messages on crash before processing — that's at-most-once, worse for most use cases). The fix is to make your processing idempotent: track processed event IDs, use ON CONFLICT DO NOTHING, design so re-processing is a no-op. Then "at-least-once + idempotent" gives you "effectively-once" without the cost of full exactly-once semantics.
"Observability" is interview-shorthand for "can you debug a system you didn't write at 2 AM?" The answer always comes back to three pillars — logs, metrics, traces — plus a good story about how you actually used them.
Four signals, monitored as SLIs (service-level indicators):
These four together are Google's "USE method" / "four golden signals." If you have alerts on those, your pager gets quiet.
| Pillar | What it captures | Common tools |
|---|---|---|
| Metrics | Time-series numbers (RPS, latency, CPU) | Prometheus + Grafana, Datadog, New Relic |
| Logs | Discrete events with context | ELK (Elasticsearch + Logstash + Kibana), Loki, Datadog Logs, Splunk |
| Traces | Per-request flow across services | Jaeger, Zipkin, Tempo, Datadog APM, New Relic |
| Alerting | Notify when SLOs breach | Alertmanager, PagerDuty, Opsgenie |
In an interview, name the stack you actually used and what you'd do differently. "We were on ELK; the cost grew faster than the value at our scale and we migrated to Loki — 70% cheaper, same dashboards." That single sentence shows you've actually paid the bill.
You need all three. Metrics tell you something is wrong. Traces tell you where. Logs tell you why.
The standard funnel, in order:
The mistake juniors make: jumping to step 5 directly. Senior engineers always finish steps 1–4 first.
This is one of the most common Node.js interview scenarios. The answer is to methodically eliminate suspects. The latency budget has to come from somewhere — find which slice owns it.
EXPLAIN ANALYZE on the query in prod. Local DB has 100 rows; prod has 100 million. Missing index, table scan, N+1 query.perf_hooks.monitorEventLoopDelay() and check if the loop is blocked. A CPU-heavy synchronous loop (JSON parse of 10 MB payload, big regex) freezes everything else in the same Node process.libuv pool is 4 threads. If 100 concurrent requests all need DNS or crypto, requests queue behind each other. Same for DB connection pool — if pool is 10 and you have 100 in-flight requests, 90 are waiting.The reason it's fast locally: small data, no concurrency, no external network. Production multiplies all three. Always name the multiplier when you answer this.
| Suspect | Signal that points at it |
|---|---|
| Database | Trace shows DB span dominates. pg_stat_activity shows query running long. EXPLAIN reveals seq scan. |
| External API | Outbound HTTP span is the fat one. Check the provider's status page. Compare your timeout vs their p99. |
| Network | High latency + high packet loss between AZs. traceroute, ping, cloud provider's network insights. |
| CPU | Pod CPU pegged at 100%. Flame graph in perf / 0x shows one hot function. |
| Memory | Heap growth + frequent / long GC pauses. --inspect heap snapshots, leak suspect. |
| Thread pool | libuv thread pool saturated (DNS / crypto / fs queueing). Symptoms: slow even for cheap requests. |
| Event loop | monitorEventLoopDelay() > 100 ms. Synchronous CPU work in the main thread. |
OpenTelemetry (the open standard) or a vendor agent (Datadog, New Relic, Elastic APM). At the API Gateway, the tracing agent — typically the OpenTelemetry SDK with auto-instrumentation, a vendor agent like dd-trace, or a service-mesh sidecar (Envoy/Istio) — injects a trace ID (e.g. traceparent: 00-4bf92f...) into the request headers. The agent monkey-patches your HTTP framework (Express, Spring, etc.), DB drivers, and message clients so every downstream service automatically propagates that header on its outbound calls. Each service emits "spans" — start time, end time, tags — tagged with the trace ID. The collector stitches them into one waterfall view.
Two practical rules: (1) always propagate the trace header on async paths too — when you publish to Kafka, include the trace ID in the message metadata; when the consumer picks it up, continue the trace. Otherwise your trace ends at the producer and you lose visibility into half the system. (2) Always include the trace ID in your log lines — that's how you bridge from "this trace is slow" to "here's the log message that explains why."
Beyond the four golden signals, the Node-specific ones:
monitorEventLoopDelay(). Alert if p99 > 100ms; that means user-facing requests are stalling.process._getActiveHandles().length. A leak shows up as ever-growing handle count.setIntervals.perf_hooks or APM.Wire all of these to Prometheus via prom-client or use Datadog's dd-trace agent which exposes them out of the box.
Four things, in order:
#inc-...), assign an Incident Commander, mitigate before root-causing (stop the bleeding first), update the status page every 15 min, then run a blameless postmortem within a week.Tooling: Prometheus / Grafana for alerts, PagerDuty or Opsgenie for paging, Statuspage for customer comms, Notion or Confluence for runbooks and postmortems.
The one-line takeaway: mitigate first, root-cause later, and never blame the human — fix the system that let the human make the mistake.
Database questions are where interviews separate "I've used Postgres" from "I've operated Postgres." The fundamentals — SQL vs NoSQL, CAP, indexing, sharding, replication — are table stakes.
Reach for SQL (Postgres, MySQL) when: schema is stable, you need ACID transactions, you need joins, your data is highly relational (orders, line items, payments). 90% of business apps still belong here. Modern Postgres also gives you JSONB, partial indexes, listen/notify — it's no longer "the boring choice."
Reach for NoSQL when:
The senior answer: "I'd start with Postgres and only add a NoSQL store when a specific access pattern shows it's the right tool — usually search via Elasticsearch or hot key-value via Redis."
In a network Partition (some nodes can't talk to others), a distributed system has to choose: serve potentially-stale data (Available) or refuse to serve at all (Consistent). You can have CP or AP, never both in the moment of a partition. The "C" here is linearizability — every read sees the latest write.
The framing people miss: outside a partition, you can have both consistency and availability. CAP only kicks in during the failure. So the real question is "how often do you partition, and what do you want to do when it happens?" Bank transfers want CP — refuse rather than serve stale balances. Social feeds want AP — show a slightly stale timeline rather than an empty page.
Modern extension: PACELC. "If Partition, choose A or C; Else choose Latency or Consistency." Even without a partition, sync replication trades latency for stronger consistency.
Partitioning splits one logical table into multiple physical pieces on the same machine (e.g. Postgres declarative partitioning by date — January goes in one file, February in another). Helps with query planning and archival.
Sharding splits the data across multiple machines. Choose a shard key (user_id is common); a hash of the key tells you which machine owns the row. Now your one 10 TB database is ten 1 TB databases, each handling 1/10th the writes.
The pain of sharding: cross-shard joins become application-level fan-outs; cross-shard transactions need a saga; re-sharding is one of the hardest migrations in software. Consistent hashing + replication mitigates re-shard pain. Don't shard until you've exhausted vertical scaling and read replicas.
Vertical = bigger box. Move from 4 vCPU / 16 GB to 32 vCPU / 256 GB. Simple, no code changes, but capped at whatever the cloud offers — and 2× more CPU never gives you 2× the performance.
Why? Four hidden ceilings:
Realistic factor: doubling vCPU buys ~1.3–1.7×, not 2×. That's why vertical hits diminishing returns around the 16-vCPU mark.
Horizontal = more boxes. Add another pod, another replica, another shard. Linearly scalable in theory; in practice limited by coordination, shared resources (DB), and Conway's law.
Rule of thumb: scale vertically first (it's cheap effort), then add read replicas, then introduce caches, only then shard or split services.
An index on more than one column, in order: CREATE INDEX idx ON orders (user_id, created_at). The index is sorted first by user_id, then by created_at within each user.
It can serve queries that match a prefix: WHERE user_id = 42 (uses index), WHERE user_id = 42 AND created_at > '2024-01-01' (uses index). It can't efficiently serve WHERE created_at > '2024-01-01' alone — the leading column isn't in the predicate.
Senior tip: order columns from most-selective to least-selective, and put equality predicates before range predicates.
An index is a separate B-tree (mostly) that maps column-value → row-location. Without an index, the DB scans every row (O(N)). With an index, it walks the tree in O(log N) — finding 1 row in a billion takes ~30 comparisons.
Index types you'll be asked about:
WHERE status = 'ACTIVE'). Smaller, faster.Benefit: O(log N) instead of O(N) lookups. A query that took 10 seconds on 100 M rows can drop to 5 ms.
Drawbacks:
WHERE LOWER(email) = ...), low selectivity. Always check with EXPLAIN ANALYZE.Normalization — split data into many tables, each fact stored once. Goes up to 3NF/BCNF. Pros: no update anomalies, smaller storage. Cons: every read needs joins, joins are expensive at scale.
Denormalization — duplicate data on purpose for read speed. The orders table stores customer_name even though it's also in customers. Pros: fewer joins, blazing reads. Cons: when a customer renames themselves, you have to update both — and you can drift.
OLTP (transactional) tends normalized; OLAP / analytics / wide reads tend denormalized (star schema, materialized views). Realistic systems sit somewhere in between — normalize first, denormalize where reads demand it.
The checklist, in order:
EXPLAIN ANALYZE. Identify seq scans on large tables, expensive joins, sort spills.IN (...) or a join.You fetch a list of 100 orders. Then for each order you fetch its customer — 100 separate queries. That's 1 (the list) + N (one per row) = N+1 queries, when it could have been 1 (a join) or 2 (the list + one batched IN (...)).
ORMs love to do this silently — Hibernate, Sequelize, ActiveRecord. The fix is either eager loading (JOIN FETCH in JPA, include in Sequelize) or a batched WHERE customer_id IN (...) after the first query. Always look at the actual SQL the ORM emits — many "slow API" tickets are a hidden N+1.
Replication = keep one or more standby copies of the primary, kept up to date by streaming the WAL (write-ahead log) or change events. Modes:
Failover = when the primary dies, promote a replica to be the new primary. Tools: Patroni for Postgres, Sentinel for Redis, managed services do it for you (RDS, Aurora). The interview gotcha: failover is rarely fully automatic and never zero-downtime — there's a 10–60s window where the new primary is being chosen and clients are reconnecting.
Primary handles all writes (and reads that need strong consistency). Replicas handle reads that tolerate seconds-stale data — dashboards, reports, analytics, list views. A typical web app has 80%+ reads, so adding 2–3 replicas can lift read capacity 3–4× without changing the schema.
The trap: replica lag. A user creates an order on the primary, then refreshes the page — the read goes to a replica that hasn't seen the write yet. The user sees "no orders found." Either route read-after-write to the primary, or use a session-pinned reader, or design the UI to tolerate "your order is being processed."
Six patterns, covered in depth on the Caching Strategies deep-dive. The two-line summary:
Phil Karlton: "There are only two hard things in computer science — cache invalidation and naming things." The four patterns:
UserUpdated event; cache subscribes and evicts. Good for multi-cache fleets.user:42:v3 instead of user:42. Old data ages out naturally.Way more than just "cache." The classic eight:
INCR + EXPIRE or token-bucket via Lua script.ZADD, ZRANGE).| Method | Intent | Idempotent? | Safe? |
|---|---|---|---|
GET | Retrieve a resource | Yes | Yes (no side effects) |
POST | Create / submit / process | No | No |
PUT | Full replacement of a resource | Yes | No |
PATCH | Partial update | Usually yes (depends on semantics) | No |
DELETE | Remove a resource | Yes | No |
Idempotent = N identical requests = same effect as 1 request. Safe = no server state change. The reason these matter: idempotent methods are safe to retry, browsers/proxies can cache safe methods, and POST is the one that needs an idempotency key if you want safe retries.
POST /orders, not POST /createOrder./users, /orders./users/42/orders is fine; /users/42/orders/7/items/3/comments is not — flatten.?limit=&cursor=. Don't ever return unbounded lists.?status=active&sort=-created_at.REST exposes resources; GraphQL exposes a single endpoint with a query language. With REST, a complex screen often needs 4-5 round-trips and over-fetches each time. With GraphQL, the client asks for exactly the fields it needs in one request.
Pick REST when: you have many lightweight clients, you want HTTP caching to "just work", you have public APIs (REST is more discoverable for outside developers). Pick GraphQL when: you have one mobile + one web client with rich screens, the over-fetch tax is real, and you can invest in a schema + dataloader infrastructure to avoid N+1s under the hood. Don't pick GraphQL because it's trendy — operationally it's heavier (caching, rate-limiting, query-cost analysis are all new problems).
APIs where calling them N times has the same effect as calling them once. GET, PUT, DELETE are idempotent by HTTP spec. POST is not — but you can make a specific POST idempotent by accepting an Idempotency-Key header: the server stores (key → result), so a retry with the same key returns the original result without re-executing.
This is essential for any operation with side effects under unreliable networks — payments, charges, signups. Stripe, PayPal, Razorpay all implement this.
Three options:
/v1/orders, /v2/orders. Most explicit, easiest for clients to grep for in logs. The standard choice.Accept: application/vnd.acme.v2+json. Keeps URLs clean but harder to debug and test from a browser./orders?version=2. Easy to forget; not recommended.Versioning policy matters more than the mechanism: support at least N-1, deprecate with a deprecation header and a 6-month sunset date, monitor usage of old versions so you can actually retire them.
Defense in depth — every one of these matters:
*, when credentials are involved.Four common algorithms:
Implement in Redis with INCR + EXPIRE for fixed window, or a Lua script for token bucket. Apply at the gateway, not inside each service.
Authentication (AuthN) — "who are you?" Verifying identity via password, OTP, OAuth, SSO. Result: a verified user identity.
Authorization (AuthZ) — "what can you do?" Verifying permissions for an action. Result: allow or deny.
You always do AuthN once (login), then AuthZ on every request (RBAC, ABAC, or ACL checks). Mixing them up — "this user is logged in so they can do anything" — is a privilege escalation bug waiting to happen.
| Session cookie | JWT | |
|---|---|---|
| Where state lives | Server (session store) | In the token, signed |
| Server lookup per request | Yes (Redis hit) | No (just verify signature) |
| Revocation | Easy — delete session | Hard — until token expires |
| Cross-domain | Cookies are limited | Easy via Authorization header |
| Size | Small (cookie = ID) | Larger (claims + signature) |
| Best for | Web app, single domain | Mobile + multiple services, stateless backends |
The senior nuance: JWT's "no DB lookup" benefit is real but its revocation problem is also real. Production systems usually combine: short-lived JWT (15 min) + long-lived refresh token in DB. Refresh-on-revoke gives you both stateless speed and timely revocation.
The gateway is the single front door — every external request goes through it before reaching a service. It does:
Common implementations: Kong, AWS API Gateway, Envoy / Istio ingress, Nginx Plus, Apigee. Without one, every microservice has to re-implement all of the above.
Node.js questions test whether you understand the event loop well enough to know why things go slow. "It's single-threaded" is the headline; the substance is everything that flows from it.
Three pieces wired together:
The model: one thread runs your JS, and that thread never blocks on I/O. When you call fs.readFile, Node hands the actual disk read to libuv's thread pool (or kernel async I/O), registers a callback, and goes back to processing other events. When the read finishes, libuv pushes a "callback ready" event onto the loop; the next loop tick runs your callback.
The loop has six phases, executed in order, forever:
setTimeout / setInterval callbacks whose time has elapsed.setImmediate callbacks.socket.on('close', ...) etc.Between every phase, Node drains the microtask queue: resolved promises (.then) and process.nextTick. nextTick runs before promise microtasks. That's why process.nextTick in a loop can starve the event loop entirely.
The single most useful sentence: "The event loop processes one callback at a time on one thread. If your callback takes 500 ms of CPU, every other request is paused for 500 ms."
By not blocking. One thread can handle thousands of in-flight connections because none of them are holding the thread — they're all parked waiting for I/O at the OS level. When data arrives for any of them, the OS notifies libuv, libuv puts the callback on the loop, and the single JS thread runs it.
For CPU work, you need to step outside the main thread: worker_threads (multiple V8 isolates in one process, shared memory via SharedArrayBuffer) or cluster (multiple Node processes sharing a port).
bcrypt.hashSync.fs.readFileSync, sync crypto, or a long synchronous loop.setInterval, growing global caches, listeners that aren't removed.process.memoryUsage().heapUsed over time. If it climbs and never drops after GC, you have a leak.--inspect in Chrome DevTools or v8.writeHeapSnapshot(). Take three snapshots over time; compare "Objects allocated between snapshots" — the type that keeps growing is the leak.EventEmitter.setMaxListeners warnings), closures pinning big objects, Map/global cache that never evicts, unclosed DB connections, accidental retention via Promises that never settle.--prof, Datadog continuous profiler.UV_THREADPOOL_SIZE=16 if you're doing lots of DNS / crypto / fs.Built into Node via the cluster module (or, more commonly today, PM2 / Kubernetes replicas). The master process forks N workers, each a full Node process with its own V8 isolate. They share a port — the OS does the load balancing across them via SO_REUSEPORT or master-routed accept.
Why bother: a single Node process uses one core. On an 8-core box, cluster mode (or 8 pods) gives you 8× the throughput for CPU-bound work. For pure I/O-bound work the speedup is smaller but still real.
| worker_threads | child_process | |
|---|---|---|
| Memory | Shared (SharedArrayBuffer) | Separate |
| Startup cost | ~10 ms (thread) | ~30–100 ms (process) |
| IPC | postMessage, structured clone | stdio / IPC channel, JSON |
| Isolation | Same process — crash takes both | Separate — isolated failure |
| Best for | CPU-heavy JS tasks | Running other binaries (ffmpeg, python) |
The funnel:
EXPLAIN ANALYZE every query in the hot path.Promise.all them — don't await sequentially.Three escape hatches, in order of severity:
setImmediate between chunks, so the event loop can process other requests in between.The cardinal sin: doing CPU work on the main thread of a Node server. One slow request poisons the well for everyone else on that pod.
Middleware is a function with signature (req, res, next) => { ... }. They run in the order registered (app.use). Each one either:
req/res and calls next() to pass to the next middleware.next(err) to skip to the error-handling middleware (any function with 4 args: (err, req, res, next)).Typical stack: helmet → cors → body-parser → request logger → auth → rate-limit → route handlers → error handler (last). Errors thrown inside async handlers need express-async-handler or Express 5 to be caught — Express 4 doesn't catch async throws automatically.
See CAP above. The practical version: when a part of your system is misbehaving (slow, unreachable, lagging), you can either fail the request (preserve consistency — the user sees an error) or serve possibly-stale data (preserve availability — the user sees something, even if outdated). Bank transfers favor consistency; social timelines, search, recommendations favor availability.
Strong / linearizable consistency — every read sees the latest write. Required for things like balances and inventory ("don't sell the last item twice").
Eventual consistency — reads might see stale data for a brief window, but if writes stop, all replicas converge. Required for global scale. Most NoSQL stores, multi-region setups, and async pipelines are eventually consistent.
The hidden third tier: read-your-writes consistency — your own writes are immediately visible to you, even if not to others. Often achieved by routing post-write reads to the primary for the next few seconds.
A way for multiple processes / pods to coordinate on a shared resource. "Only one worker may process job 42 at a time." Typical implementations:
SET lock:job42 worker-7 NX EX 30 succeeds if no one holds the lock; the EX gives an automatic timeout in case the holder crashes.SELECT ... FOR UPDATE. Simple if you already have a DB.The gotcha: a lock without an expiry is a deadlock generator. Always set a TTL. And know that any TTL-based lock can fail catastrophically if your worker hangs past the TTL — then a second worker also holds the lock and you have two writers. Truly correct distributed locking needs fencing tokens.
Wrap calls to a flaky downstream. The breaker has three states:
Why: a downstream stuck at 30-second timeouts means every call ties up a thread / connection for 30s. With 100 RPS and a 30s timeout, you'd have 3000 in-flight calls within 30s — your service falls over. The breaker stops that cascade.
Libraries: resilience4j (Java), Hystrix (deprecated), Polly (.NET), opossum (Node).
Retry only transient failures — 5xx, timeouts, network errors. Never retry 4xx (the request is wrong; retrying won't fix it). Always retry with:
POST /charge, the server must not double-charge.A separate queue where messages that have failed processing too many times get parked. After 5 retries, instead of looping forever, the consumer puts the message on the DLQ along with the failure reason. A human (or a script) inspects the DLQ, fixes the root cause (bug, schema mismatch, malformed payload), and replays the messages.
Without a DLQ, a single poison message can block its partition forever — every retry fails, the offset never advances, and the queue lag grows unboundedly.
Borrowed from shipbuilding — a ship's hull is divided into watertight compartments so one breach doesn't sink the ship. In software: isolate calls to different downstreams in separate thread pools / connection pools so one slow downstream can't exhaust the pool and starve calls to the others.
Example: in a service that calls Payment and Notifications, give them separate connection pools of 20 each. If Payment hangs and saturates its 20 connections, Notifications still has 20 free. Without bulkheads, a shared pool of 40 gets exhausted by Payment and Notifications calls fail too — the failure cascades.
Both already covered above. The one-liner difference: token bucket allows bursts (you accumulate tokens up to a cap); leaky bucket doesn't (output is at a strictly fixed rate). Token bucket is what most public APIs use; leaky bucket is more common in network gear / traffic shaping.
The rule: every outbound call has a timeout. Default HTTP clients in many languages have no timeout. A single hanging call can hold a connection forever and pin a thread.
Set timeouts at multiple layers: connect timeout (1–3s), read timeout (depends on the endpoint, but 5–30s is typical), and an overall request budget. When a timeout fires, decide whether to retry (idempotent + transient) or fail-fast. Surface the timeout as a 504 to the upstream caller, with a clear log line that you can trace later.
| RabbitMQ / SQS | Kafka | |
|---|---|---|
| Model | Queue — message consumed once and gone | Log — messages persist, consumers track offsets |
| Replay | No (once consumed, lost) | Yes — rewind to any offset |
| Multiple consumers of same message | Via exchanges / fan-out | Built-in via consumer groups |
| Throughput | Tens of thousands / sec | Millions / sec |
| Ordering | Per queue (limited) | Per partition (strong) |
| Best for | Task queues, work distribution | Event streams, audit logs, replayable pipelines |
Quick picker: Kafka when you want event sourcing, multiple downstream consumers, or replay. RabbitMQ / SQS when you want simple work distribution to a worker fleet.
A Kafka topic is split into partitions — each is an append-only log on disk. Messages with the same key always land in the same partition (so per-key ordering is preserved).
A consumer group is a set of consumers cooperating to read a topic. Kafka assigns each partition to exactly one consumer in the group. If you have 6 partitions and 3 consumers, each consumer gets 2 partitions. Add a 4th consumer → Kafka rebalances. Hit 7 consumers → one sits idle because partitions cap the parallelism.
Three takeaways: (1) Partition count is your concurrency ceiling per consumer group. Plan it generously up-front — you can add partitions later, but old keys stay in old partitions. (2) Different consumer groups can read the same topic independently (each tracks its own offset). (3) During rebalance, processing pauses — keep partition reassignment infrequent.
Compose the patterns: redundancy at every layer (multi-AZ, multi-replica), health checks + automatic failover, timeouts + retries with backoff, circuit breakers, bulkheads, idempotent operations, DLQs, graceful degradation (fall back to a cached or simpler response when a downstream is down). And the meta-rule: chaos-test it. If you've never simulated the failure, you haven't tested fault tolerance.
HA = the system stays available most of the time (99.9%+), usually via redundancy and quick failover. There may be a few seconds of disruption during failover.
Fault tolerance = the system continues operating with no perceptible disruption when something fails. Stronger guarantee, more expensive. Active-active replication, no-downtime failover.
For most business systems, HA is enough; fault tolerance is reserved for the truly mission-critical (trading, life-safety, telecom switches).
Blue-green — two identical production environments, "blue" (live) and "green" (idle). Deploy to green, smoke-test, then flip the load balancer to point at green. Rollback is one click — flip back. Cost: 2× the infrastructure during deploys.
Canary — deploy the new version to a small slice of traffic (say 5%), watch metrics, gradually increase to 100%. Catches regressions before they hit everyone. Needs a feature-flag / weighted-routing system.
Most modern teams do canary (cheaper, finer-grained). Blue-green still has its place for stateful systems where you can't mix versions.
Sticky sessions — load balancer pins a user to the same backend pod, so the pod's in-memory session works. Easy, but breaks when that pod dies (user logged out) and prevents true horizontal scaling (one popular user can hot-spot a pod).
Stateless services — session state lives in Redis or a JWT, so any pod can serve any request. Pods are interchangeable, easy to autoscale, zero-downtime deploys are simple. This is the modern default — sticky sessions are a tactic for legacy or websocket-heavy systems.
A CDN is a globally-distributed cache that lives between users and your origin. It serves three jobs:
Cache-Control: public, max-age=60 and the CDN does the rest.An nginx / Envoy / HAProxy sitting between the public internet and your app servers. Functions:
API Gateway = reverse proxy + auth + rate-limit + metrics + (often) API-specific features like request transformation.
After all the conceptual questions, the panel usually circles back to your real work. These last questions are the ones that distinguish a candidate who has only studied from one who has actually shipped and felt the pain.
Same diagram as Section 1, but narrated for one specific user action. "Sarah taps Place Order on the mobile app at 7:42 PM Tuesday. (1) Mobile app sends POST /orders with JWT and an Idempotency-Key. (2) Cloudflare edge routes to the closest region. (3) API Gateway validates JWT, applies per-user rate limit, attaches a trace ID. (4) Order Service receives the request, opens a Postgres transaction, writes the order in PENDING + writes OrderCreated to its outbox table, commits. (5) Outbox poller picks up the event, publishes to Kafka. (6) Inventory Service consumes OrderCreated, reserves stock, publishes InventoryReserved. (7) Payment Service consumes that, calls Stripe with the idempotency key, publishes PaymentCaptured. (8) Shipping Service consumes, books pickup, publishes OrderShipped. (9) Order Service consumes the terminal event and flips the order to CONFIRMED. (10) Mobile app polls / receives push, updates the screen. End-to-end p99 ~1.8 seconds; the payment hop dominates."
That walkthrough — naming concrete payloads, concrete latencies, concrete failure-handling — is what makes you sound like an engineer who has actually run this in production.
Pick one with a clean three-act structure: symptom → investigation → fix. Example: "Last quarter our checkout p99 jumped from 400 ms to 4 seconds, only on weekends. Took 4 hours to root-cause. The symptom was DB CPU saturating during the spike. Tracing showed an N+1 on the cart-summary endpoint — for every cart with N items we did N+1 SELECTs to fetch product details. Locally this was 5 items per cart; in production during weekend sale traffic, carts had 40-80 items. Fix: replaced the per-item lookup with a single WHERE id IN (...) + added an index. P99 dropped to 280 ms. Lesson learned: load testing must use production-shaped data, not 10-row fixtures."
Always end with the lesson — that's what makes it senior-level storytelling.
Same structure. The lesson should usually be about a tradeoff: "we sharded by customer_id which solved write throughput but made our admin queries (which span all customers) slow — so we added a read replica feeding a denormalized analytics table to handle those." Tradeoffs are the senior-engineer signal.
Pick a decision where the obvious answer was wrong. "For our refund pipeline, the obvious move was synchronous — user clicks Refund, API charges the gateway, returns success. We made it async via Kafka instead because (1) Stripe occasionally takes 30s, which would have blown our SLA, and (2) we wanted at-least-once retry semantics for free if Stripe was down. The cost was a more complex UI — the user sees 'Refund in progress' for a few seconds. We judged that worth it. Six months in, when Stripe had a 2-hour outage, the queue absorbed the load and refunds completed automatically once Stripe recovered — exactly the win the design was for."
Two ingredients: be specific, and frame it as a tradeoff you've consciously deferred — not as a regret. "Our Postgres write primary is a single point. We've designed the shard key (customer_id) but haven't pulled the trigger because (a) we're at 30% capacity headroom, (b) sharding triples our ops complexity, and (c) we have 18 months of runway before we need it. If I had to pick what I'd start today, it would be the outbox pattern across all services — three of our seven services still do synchronous double-writes, which means we've had two production incidents where the DB committed but the event publish failed."
That answer signals: you know where the seams are, you've thought about the cost, and you can make a judgment call instead of cargo-culting a "fix everything" answer.
List them as named tradeoffs:
The senior answer always names a specific bottleneck and the metric you'd watch. "Our write primary on Postgres. It runs at ~3K TPS today with headroom to ~8K. At 100× traffic, even with caching absorbing reads, writes 30× because the conversion-funnel ratio is roughly fixed. So Postgres pegs first, around hour two of the spike. The signal would be IOPS saturation on the primary's EBS volume before CPU does. Mitigation already in the runbook: temporarily disable the secondary indexes on the audit table (least-critical), buy us another 2× while we throw replicas at the read load and slow the spike with rate-limiting at the gateway."
If you can't name a specific component and a specific metric, the panel knows you've never been on call for a real spike. The fact that you can name what breaks first is the answer — it shows you've thought about your system as a chain of capacities, not as a marketing diagram.