A curated set of deep dives on system design, low-level design, and backend engineering — written for interview prep, case studies, and battle-tested production patterns.
Architecture-level system design — the 4-step interview framework, a reference catalog of patterns & technologies, and 24 worked case studies covering every famous "design X" prompt with capacity math, trade-offs, and production-grade detail.
A 45-minute playbook for any HLD interview — Requirements → Core Entities → APIs → High-Level Design → Deep Dives. The four evaluation pillars interviewers actually score, common mistakes to avoid, and the specificity test that separates senior from junior.
HLD · ReferenceReal-time updates, long-running tasks, contention, scaling reads, scaling writes, large blobs, multi-step sagas, proximity-based services — each with when-to-use, decision tree, trade-offs, and 2–3 cross-links to live HLDs on this site.
HLD · ReferenceMove from "I'll use Redis" to "Redis HASH per user_id, LRU at 64GB/node, sharded across 6 nodes via consistent hashing" — the specificity that lands offers. Decision trees for SQL vs NoSQL, L4 vs L7, Kafka vs SQS, Redis vs Memcached.
Production-ready OTP architecture: Redis + Postgres split, bcrypt hashing, 5-layer brute-force defense, per-use-case TTLs, replay protection, and every mistake to avoid.
HLD · Case StudyEnd-to-end encrypted messaging at 2B-user scale — Signal Protocol's X3DH + Double Ratchet, multi-device fan-out (server can't read your messages), group sender keys, WebRTC for voice/video with TURN fallback, and the connection-server cluster holding 50K WebSockets each.
HLD · Case StudyTrending Now from 100K events/sec — Count-Min Sketch + Min-Heap for approximate top-K in fixed memory regardless of cardinality, sliding windows via per-minute sketch buckets, Flink-driven aggregation, and the trade-off vs exact counting.
HLD · Case Study1M clicks/sec where every click is billed — exactly-once aggregation via Kafka + Flink with idempotency keys, hot-ad sub-sharding, real-time Redis dashboards alongside daily Spark reconciliation against payment networks, fraud detection, and PCI-style audit.
HLD · Case StudyReal-time collaborative editing with sub-200ms keystroke latency — Operational Transformation for conflict resolution, Doc Session Server holding the canonical op log per document, offline merge on reconnect, op-log + snapshot persistence, and revision history reconstruction.
HLD · Case Study10M concurrent users at toss, 500K team-saves/sec, 12M-entry Mega Contest leaderboards refreshed every 5s — the three-plane split (transaction / stream / real-time), and the reasoning behind every tech pick: Postgres vs MySQL vs Oracle, Aerospike vs Redis for wallet, Cassandra vs DynamoDB, Redis ZSET vs SQL ORDER BY, ALB vs NLB vs HAProxy, Flink vs Spark Streaming.
HLD · Case StudyMove money without losing or duplicating a paisa — idempotency-key + DB UNIQUE for safe retries, double-entry ledger with sum-to-zero invariants, Temporal saga with compensating actions for multi-step rollback, fraud scoring, RBI-compliant tokenization, and a Razorpay-style India context (UPI, 3DS, NEFT/IMPS settlement) to keep raw card numbers out of your servers.
HLD + LLD · Payments & SecurityBuild a PCI-DSS-grade vault that swaps 16-digit PANs for opaque tokens 50,000 times a second — envelope encryption (DEK per record, KEK in HSM), random vs HMAC vs FPE tokens, vault/metadata DB split to shrink PCI scope, idempotent tokenize with Redis SET NX, audit firehose via Kafka + ClickHouse, GDPR crypto-shredding, and the full Java class diagram + Builder/Strategy/Facade code for the interview whiteboard.
HLD · Case StudyBuild Redis Cluster from first principles — consistent-hashing ring with virtual nodes, sync vs async replication, LRU/LFU/TTL eviction trade-offs, write-through vs write-around vs write-back, hot-key splitting, cache-stampede protection via request coalescing, and Sentinel-driven failover.
HLD · Case Study"Send reminder in 3 days" at billions-of-jobs scale — time-bucketed storage so finding due jobs is O(1), two-tier cold-DynamoDB + hot-Redis-ZSET split, leased execution for crash safety, ZooKeeper-driven partition assignment, and exactly-once-effective via idempotency keys.
HLD · Case StudyThe metadata-vs-blob plane split that makes a paste service scale — KGS-backed unique keys, S3 for content, MySQL/Cassandra for metadata, multi-tier cache, and the 5:1 read:write ratio that drives every choice.
HLD · Case StudyThree planes — Upload, Serve, Feed — with photo sharding by photo_id, fan-out vs fan-in for news feed, and the celebrity-user hybrid that makes 100M followers tractable. 1425TB of blobs over 10 years, plus the exact ER diagram + capacity math.
HLD · Case StudyFrom polling to WebSockets — chat servers each holding 50K open connections, HBase for time-sorted message storage, Kafka for cross-server fan-out, presence service that scales to 500M users without broadcast storms.
HLD · Case Study325K reads/sec timeline assembly via the push/pull hybrid — fan-out-on-write for normal users, pull for celebrities like Elon, merge on read. 64-bit time-sortable tweet IDs, Cassandra sharded by tweet_id, plus trending topics & who-to-follow.
HLD · Case StudyWhy CDN is non-negotiable for video — three-plane architecture (Upload, Transcode, Serve) with adaptive bitrate streaming, perceptual-hash dedup, and the 1TB/sec egress problem solved by edge POPs. 25GB/sec ingest math + transcode pipeline.
HLD · Case StudyIn-memory trie with offline frequency updates — why SQL LIKE can't do 60K QPS in under 200ms, how partition-by-hash + aggregator merge handles "give me top 10 for prefix sy", and the EMA-based ranking that makes trends bubble up.
HLD · Case StudyFive algorithms compared (fixed window, sliding window, sliding-with-counters, token bucket, leaky bucket), the atomic-INCR race condition solved with Redis Lua, and the IP-vs-user-vs-API-key hybrid that protects /login without locking real users out.
HLD · Case StudyInverted index over 730 billion tweets in under 200ms — shard-by-tweet_id with aggregator fan-in, per-shard local indexes, the reverse-index trick that makes crash recovery fast, and the ranking pipeline that scores recency + popularity + engagement.
HLD · Case Study15B pages in 4 weeks at 6200 pages/sec — sharded URL frontier with per-host politeness queues, robots.txt cache, document & URL dedupe via SHA checksums (no bloom filters!), checkpointing for week-long crawl resilience, and crawler-trap defenses.
HLD · Case StudyPre-computed personalized feeds for 300M DAU — the push/pull/hybrid trade-off, ML ranking by relevance + recency + engagement, multi-tier cache (in-process → Redis → DB), and how new posts hit followers' feeds within 5 seconds.
HLD · Case StudyQuadTree spatial index for "find me ramen within 1 mile" across 500M places — why fixed grids fail in Manhattan, how dynamic 4-way splits keep leaf density uniform, doubly-linked leaves for fast neighbor traversal, and the QuadTree-Index reverse map for crash recovery.
HLD · Case Study167K driver location updates/sec without melting the QuadTree — the DriverLocationHT in-memory hash table that absorbs the firehose, lazy 15-second QuadTree refresh, and the Notification Service pub/sub that pushes live driver positions to subscribed riders.
HLD · Case Study50K fans hitting "buy" on the same 200 seats at 09:00:00.001 — SERIALIZABLE isolation + SELECT FOR UPDATE to prevent double-bookings, ActiveReservationsService with 5-min holds, and the WaitingUsersService FIFO queue that wakes the next buyer when a hold expires.
HLD · Case StudyThree-pass story arc from "one MySQL box" to a sharded NoSQL store fronted by a CDN, Memcached, and a Key Generation Service that pre-generates 6-char keys offline so the write path never collides — write/read/key-gen split, 20K redirects/sec, full capacity math.
HLD · Case StudyFile sync at planet scale: 4 MB chunking, in-line dedup, presigned-URL uploads, sharded metadata, long-poll notification fabric, conflict resolution, and the data-vs-control-plane split that makes it all work.
HLD · Case StudyRunning strangers' code without setting the host on fire: gVisor/Firecracker sandboxes, async submission queue, WebSocket verdict push, Redis ZSET leaderboards, and the web-tier vs. judge-tier split that makes 5K hostile submissions/sec routine.
Storage choices decide every other box in the diagram. Family-by-family comparisons, deep dives into specific engines, and the honest "stay on Postgres" decision tree — everything you need to defend a DB pick in design review or an interview.
The ten database families every system-design interview expects you to wield — RDBMS, Document, Wide-Column, Key-Value, Search Engine, Time-Series, Graph, Vector, Object Store, NewSQL. Storage engines, consistency story, decision tree, and polyglot stacks from Twitter, Uber, Netflix, and RAG systems.
DB · Deep DiveWhat problem distributed SQL actually solves, and how the three open-source players do it inside — Raft per range, hybrid logical clocks, MVCC across nodes, and Spanner-style 2PC. Side-by-side comparison, migration punch list from Postgres/MySQL, and the honest decision tree of when not to bother.
20 deep dives covering the four NoSQL families (Document, KV, Wide-Column, Graph), ACID vs BASE, joins vs denormalization, sharding & replication, CAP in practice, and the same e-commerce design done both ways — with concrete code in Postgres, MongoDB, Redis & Cassandra.
Databases · Relational Deep Dive20 sections comparing every relational DBMS side-by-side — storage engines, MVCC, indexes, isolation levels, replication, JSON support, stored-procedure dialects, full-text & vector search, sharding, HA, and NewSQL (Cockroach, Spanner, TiDB). With real SQL examples in MySQL, Postgres, Oracle and SQL Server dialects.
Object-modeling and class-design problems — first the framework that fits any 45-minute LLD round, then eight worked case studies with full Java code, state machines, and the gotchas Grokking glosses over.
Structured path to master low-level design interviews — SOLID, design patterns, concurrency building blocks, and the 20 problems you actually see in rounds.
LLD · In a Hurry · Part 1Story-driven intro to low-level design — LLD vs HLD, the four scoring pillars, SOLID as a spine, and the six question shapes you'll meet in the wild. Part 1 of a 3-part in-a-hurry series.
LLD · In a Hurry · Part 2Minute-by-minute playbook for any LLD prompt — Requirements, Entities, Class Diagram, Patterns + Sequence, Java Code — closed by a full Parking Lot walkthrough script with timing annotations.
LLD · In a Hurry · Part 3The eight GoF patterns that appear in 90% of LLD interviews — Strategy, Factory, Singleton, Observer, State, Decorator, Builder, Command. When each fits, the smell that summons it, and a one-page Java sketch each.
End-to-end object model, state machine, and class design for the classic LLD question — from requirements to code, with trade-off discussions at each step.
LLD · Case StudyA multi-floor smart parking lot built from a paper-ticket booth — Singleton + Strategy + State + Observer, full Java code, lost-ticket flow, peak-hour pricing decorator, and every gap in Grokking's classic answer plugged.
LLD · Case StudyA real ATM — 8-state machine for the device, Strategy for transactions, Chain of Responsibility for cash dispensing in mixed denominations, 2-phase debit so a jam never costs the customer, and full Java code with every Grokking gap plugged.
LLD · Case StudyA real lending library — Repository search (not HashMap), FIFO ReservationQueue per book, BookItem state machine, Observer-based notifications, FineStrategy per member-type, and Grokking's classic single-slot reservation bug fixed.
LLD · Case StudyA real BookMyShow-style booking — atomic seat-locking with TTL, Saga payment flow with compensations, decorator pricing for peak-hour + premium-seat surcharges, and the concurrency story Grokking glosses over (three users clicking F-12 at the same instant).
LLD · Case StudyMulti-channel sender (Email/SMS/Push/WhatsApp) with the API-vs-worker split that keeps Order Service unblocked when SendGrid is slow — Strategy + Factory + Decorator + retries with backoff, idempotency at two layers, and rate limits per user & provider.
LLD · Case StudyFull interview-grade design: requirements, entities, 10 design patterns mapped to real variability, booking state machine, concurrency, and 15 follow-up cross-questions with model answers.
LLD · Case StudyEnd-to-end hotel system design: actors & use cases, class/ER diagrams, full MySQL DDL with composite indexes, booking concurrency with pessimistic locks, sequence flows, and interview Q&A.
Story-driven interview guides for the languages you'll be writing on the whiteboard — Java fundamentals plus modern Java 25 features, and a JavaScript & Node.js deep dive that covers the runtime as much as the syntax.
Compact source files, flexible constructor bodies, module import declarations, primitive pattern matching, scoped values, and more — everything new in Java 25 for interviews.
Java · Storytelling20 deep dives across OOP, Strings, Collections, HashMap internals, threads, locks, JVM memory, GC, ClassLoaders, Streams, and the tricky gotchas — every concept walked through with a real-world scene and analogies you'll actually remember.
Java · JVM InterviewWhat GC actually does, the heap layout (Eden, S0, S1, Old), Minor vs Major vs Full GC, and how G1, ZGC and Shenandoah work — explained the way you'd say it in an interview, plus the 11 most-asked Q&A with 60-second answers.
The day-to-day toolkit of a backend engineer — frameworks, message brokers, databases, containers, browser storage. Each guide is interview-grade with the production gotchas baked in.
25 deep dives across topics & partitions, producer internals, acks & ISR, consumer groups & offsets, rebalancing, idempotent + transactional EOS, log compaction, hot partitions, KRaft, and the production gotchas every senior Kafka dev hits — each told the way you'd explain it to a friend over chai.
Kafka · PaymentsDeep dive on exactly-once semantics, idempotent producers, transactional outbox, and safely handling payment workflows under retries and partial failures.
How DNS resolves & caches, the 9 HTTP methods, Dockerfile anatomy, VM vs container deployment, API Gateway vs Load Balancer, and circuit breaker interview Q&A.
Distributed Systems · Deep DiveEvery cache pattern in production — Cache-Aside, Read/Write-Through, Write-Behind, Refresh-Ahead — plus eviction policies (LRU, LFU, W-TinyLFU), consistent hashing, the five famous failure modes (stampede, avalanche, penetration, hot key, big key), and a real 10-component cache architecture walked through end-to-end.
Java vs Go · Interview PerspectiveA short, interview-friendly take on why Java is still the better backend choice in 2026. Virtual threads (Java 21+) match goroutines, the ecosystem is unmatched, the JVM beats AOT at steady state — plus a clean template answer for "Java or Go?" in interviews.
High Availability · Deep DiveA load balancer exists to kill single points of failure — so what happens when it becomes one? Built up failure-by-failure: redundant active-passive pairs with VRRP floating IPs, active-active via ECMP & BGP anycast, health-checked GSLB above, the split-brain trap, a failover traced second-by-second, and how AWS/GCP managed LBs hide (most of) it.
Backend Interview · Q&AStory-driven answers to the questions that show up in senior backend rounds — architecture walkthroughs, monolith vs microservices, event-driven flow, sagas & idempotency, observability, SQL/NoSQL, Node.js event loop, resilience patterns (circuit breaker, DLQ, bulkhead), and the "draw your project" follow-ups.
intervue.io Question Bank · Q&AThe full bank of questions from the intervue.io question bank — put to a candidate interviewing for a Java application role — deduplicated and grouped by skill: write-through vs write-behind & cache stampede, Spring transactions & AOP self-invocation, circuit breakers/saga/CQRS, the transactional outbox & Kafka ordering, LRU/idempotency/back-pressure coding, query tuning & sharding-vs-replicas, plus stakeholder communication. Each answered the way you'd say it in the room.
Tech Lead Interview · SolutionsThe full tech-lead loop answered from the lead's chair — system design under judgment (clarify, trade-offs, what breaks first), architecture calls (monolith vs micro, tech debt, build/buy, definition of done), and the people half (mentoring, conflict & disagree-and-commit, delivery, incidents, culture) — every answer STAR-structured with a "what it probes" note.
Problems in the shape Google likes — bounded constraints, brutal scale, and an obvious-once-you-see-it trick that replaces sorting or coordination with a smaller, cheaper primitive.
How big systems generate unique IDs across many servers — explained simply. UUIDs (v4 & v7), Snowflake's 64-bit layout, the boot-only coordinator pattern, common mistakes, and a decision guide for when to use which.
Google Interview · Algorithm Deep Dive50 files, 40 GB of data, 16 GB of RAM. Why sorting is the wrong instinct, and how two passes of bucket counting find the exact median in ~1 MB of memory. Worked example, edge cases, and when this idea breaks.