The Core Mental Model β Tables vs Documents vs Keys
Almost every SQL-vs-NoSQL interview opens with "what's the difference?" and almost every candidate answers "SQL has schemas, NoSQL doesn't" β which is half-true and completely misses the point. The real difference is how the data is shaped on disk, and everything else (scaling, joins, transactions) flows from that.
SQL stores normalized rows in tables with a fixed schema, queried with a declarative language (SQL) that joins those tables on demand. NoSQL stores denormalized records (documents, key-value pairs, wide rows, or graphs) optimized for the access pattern, queried with APIs specific to that shape.
Notice what that sentence doesn't say: "NoSQL has no schema." NoSQL absolutely has a schema β it just lives in your application code, not in the database's CREATE TABLE statement. The schema didn't disappear; the responsibility for enforcing it moved.
| Property | SQL (Postgres, MySQL) | NoSQL (MongoDB, Cassandra, Redis) |
|---|---|---|
| Data shape | Tables of rows & columns | Documents, KV pairs, wide rows, graph nodes |
| Schema | Enforced by DB at write time | Often enforced by app at read time |
| Query language | SQL β declarative, joins, set ops | Per-DB API β usually no joins |
| Transactions | Multi-row, multi-table ACID | Usually single-document atomic, eventual consistency across docs |
| Scaling | Vertical first; horizontal is hard | Horizontal-first by design |
| Best at | Relationships, ad-hoc queries, integrity | Massive scale, flexible shape, single-record reads |
Schema β Rigid Contract vs Schema-on-Read
"Schemaless" is one of the most misleading marketing words in databases. Every NoSQL store has a schema; the question is who validates it and when.
In SQL, the database refuses bad data at write time. INSERT INTO users (age) VALUES ('twenty') errors out because age is declared INT. That's schema-on-write.
In MongoDB, the same insert succeeds β the database stores whatever JSON you give it. The error surfaces six weeks later when an analytics job runs avg(age) and chokes on the string. That's schema-on-read: the schema exists, but it's enforced by the consumer of the data, not the producer.
SQL β schema enforced by DB
- DDL:
CREATE TABLE,ALTER TABLE - Bad writes are rejected immediately
- Migrations are explicit, atomic, and reviewed
- Schema doubles as documentation
- Painful when the shape changes weekly
NoSQL β schema enforced by app
- Each document carries its own structure
- Adding a new field = just write it next time
- Mixed-shape data lives side-by-side during a migration
- Application code must defensively handle missing/old fields
- Bugs surface late, often in production
-- SQL (Postgres) β rigid, declared once CREATE TABLE users ( id BIGSERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, age INT CHECK (age >= 0), created_at TIMESTAMPTZ DEFAULT NOW() ); // MongoDB β flexible, per-document db.users.insertOne({ email: "riya@example.com", age: 28, preferences: { theme: "dark", lang: "en" } // new field, no migration }); db.users.insertOne({ email: "aman@example.com", age: "twenty-eight" // stored happily β bug ships to prod });
The 4 Flavors of NoSQL β Document, Key-Value, Wide-Column, Graph
"NoSQL" is a marketing umbrella that hides four genuinely different storage shapes. Treating MongoDB, Redis, Cassandra, and Neo4j as the same kind of thing is like treating a notebook, a sticky-note pad, a spreadsheet, and a mind-map as "all paper".
| Family | Shape of one record | Best at | Examples |
|---|---|---|---|
| Document | JSON-like nested object keyed by ID | Flexible product/content data, single-doc reads | MongoDB, CouchDB, Firestore |
| Key-Value | Opaque value blob behind a single key | Caching, sessions, leaderboards, counters | Redis, DynamoDB, Memcached, Riak |
| Wide-Column | Row keyed by ID + sparse columns grouped into families | Time-series, event logs, write-heavy at petabyte scale | Cassandra, HBase, ScyllaDB, Bigtable |
| Graph | Nodes + edges with properties | Social networks, fraud rings, recommendations | Neo4j, Amazon Neptune, ArangoDB |
Document Store β MongoDB With a Real Example
A document store treats one record as a self-contained JSON tree. The unit of read, write, and atomicity is the document β and that's exactly the design choice that makes it fast for some things and miserable for others.
products, variants, images, reviews β and a "load product page" query joins all four. In MongoDB he writes one document per product. One read, no joins. The product page renders in 8ms instead of 80.-- Postgres: 4 tables, joined at query time SELECT p.*, v.size, v.color, v.stock, i.url, r.rating, r.text FROM products p JOIN variants v ON v.product_id = p.id JOIN images i ON i.product_id = p.id LEFT JOIN reviews r ON r.product_id = p.id WHERE p.id = 42; // MongoDB: 1 document, 1 read { _id: "prod-42", name: "Mechanical Keyboard", price: 8999, variants: [ { sku: "k-red-tkl", color: "red", size: "TKL", stock: 12 }, { sku: "k-blk-100", color: "black", size: "100%", stock: 7 } ], images: ["cdn/k1.jpg", "cdn/k2.jpg"], reviews: [ { user: "riya", rating: 5, text: "Crispy switches" }, { user: "aman", rating: 4, text: "Loud but lovely" } ] } db.products.findOne({ _id: "prod-42" }) // done.
- The 16 MB doc limit. A product with 50,000 reviews? You'll hit MongoDB's per-document size cap. Reviews now need their own collection, and you're back to two reads.
- Highly shared sub-data. If a "tag" appears on 10,000 products, embedding it 10,000 times means one tag rename = 10,000 writes. Reference it instead.
- Frequent partial updates on a hot field. If every product has a
viewscounter that increments 100/sec, the whole document gets rewritten on every bump. Move that counter to its own collection or to Redis. - Cross-document transactions. Mongo added multi-doc transactions in 4.0, but they're slower and more limited than Postgres's native ACID. If your invariant spans documents, that's a sign you wanted a relational model.
Key-Value β Redis & DynamoDB With a Real Example
Key-value is the simplest possible database: get(key), put(key, value), delete(key). No queries, no joins, no scans (or at least, none you should run). What it gives up in expressiveness it returns in microsecond latency and effortless horizontal scale.
SET session:abc123 user:42 EX 3600. Lookup is now 0.3ms, the database is bored, and she goes home for Diwali.# 1. Session cache (with TTL) SET session:abc123 '{"userId":42,"role":"admin"}' EX 3600 # 2. Rate limiter β atomic increment with expiry INCR ratelimit:user:42:minute EXPIRE ratelimit:user:42:minute 60 # 3. Leaderboard β sorted set, O(log N) writes, O(log N + M) range read ZADD leaderboard:global 9800 "riya" ZREVRANGE leaderboard:global 0 9 WITHSCORES # top 10 # 4. Distributed lock β SET with NX (only if absent) + TTL SET lock:order:42 "worker-7" NX EX 30 # 5. Pub/Sub for real-time fan-out PUBLISH chat:room:5 '{"from":"aman","text":"hi"}'
- Redis is in-memory, sub-millisecond, and great for ephemeral data (cache, session, rate-limit, queue). Persistence exists (RDB/AOF) but it's not the primary use case.
- DynamoDB is durable, distributed, and managed by AWS. Single-digit millisecond reads at any scale, with secondary indexes and conditional writes. It's the "production database" version of KV.
- If the data must survive a reboot and your scale is large, DynamoDB. If it's hot data needing extreme speed, Redis. Many real systems use both β DynamoDB as the system of record, Redis as the read-through cache.
KEYS * in production Redis. It blocks the entire server while it scans every key. Use SCAN with a cursor for any range operation. (This question is a Redis-interview classic.)Wide-Column β Cassandra With a Real Example
Wide-column stores look like SQL tables on the surface but behave nothing like them underneath. Each row is keyed by a partition key, the partition decides which physical machine owns the data, and inside the partition you can have millions of "columns" sorted by a clustering key. They're optimized for one thing: writing massive amounts of time-ordered data and reading slices of it back, fast.
device_id and clusters by timestamp DESC β now "give me the last 24 hours for device X" is a single sequential disk read on a single machine. 2M writes/sec, no sweat.CREATE TABLE sensor_readings ( device_id UUID, ts TIMESTAMP, temp DOUBLE, humidity DOUBLE, PRIMARY KEY ((device_id), ts) -- partition key, clustering key ) WITH CLUSTERING ORDER BY (ts DESC); -- This is BLAZING fast β single partition, sequential scan, sorted SELECT * FROM sensor_readings WHERE device_id = ? AND ts > now() - 1d; -- This will FAIL or scan every node β no partition key in WHERE SELECT * FROM sensor_readings WHERE temp > 40;
Because in Cassandra the partition key tells the cluster which node to ask. device_id = X hashes to one of the 256 virtual nodes, the coordinator routes the read to that one machine, and the rows come back in clustering-key order from a single sequential file (SSTable).
Without a partition key, Cassandra has no idea which node has your data, so it has to ask all of them β a "full cluster scan", which is the worst thing you can do. SQL databases hide this from you because they're (usually) one box; Cassandra forces you to confront it.
sensors_by_location and dual-write to both. Denormalization isn't a smell here β it's the design.Graph β Neo4j With a Real Example
Graph databases are built for the one thing SQL is genuinely bad at: following relationships, deeply, fast. "Friends-of-friends-of-friends-who-bought-this" is one query in Cypher and a screaming fan-out of self-joins in SQL.
MATCH (a)-[:LINKED*1..3]-(b) RETURN b β runs in 80ms.// Find all friends-of-friends of Riya, who like the same band MATCH (riya:Person {name:"Riya"})-[:FRIEND*2]-(fof:Person), (fof)-[:LIKES]->(b:Band)<-[:LIKES]-(riya) RETURN DISTINCT fof.name, b.name; // Shortest path between two people MATCH p = shortestPath( (a:Person {name:"Riya"})-[:KNOWS*]-(b:Person {name:"Karthik"}) ) RETURN p;
A SQL join is an O(NΓM) set operation by default; even with indexes, every additional hop multiplies the work. A 5-hop friend lookup is 5 nested joins on the same table β the query planner does its best, but each level reads more rows.
A graph database stores each node with direct pointers to its neighbors. Traversing one edge is O(1) β follow the pointer. Five hops = 5 pointer dereferences, not 5 set scans. This is called index-free adjacency and it's the architectural reason graph DBs win at this workload.
ACID vs BASE β The Consistency Trade-off
ACID and BASE aren't opposites; they're two ends of a sliding bar. SQL traditionally lives at the ACID end (correctness first, scale later); NoSQL traditionally lives at the BASE end (scale first, eventual correctness). Both are valid. Picking wrong is a career-defining bug.
ACID β Strong Guarantees
- Atomic: all-or-nothing per transaction
- Consistent: invariants hold after each commit
- Isolated: concurrent txns don't see each other's middle state
- Durable: committed data survives a crash
- Postgres, MySQL InnoDB, SQL Server, Oracle
BASE β Practical at Scale
- Basically Available: every request gets some response
- Soft state: state can change without input over time
- Eventually consistent: replicas converge, given time
- Cassandra, DynamoDB (default), Riak, classic Mongo replica set reads
The line has blurred a lot since 2018. MongoDB 4.0+ supports multi-document ACID. DynamoDB supports transactions on up to 100 items. CockroachDB and Spanner are distributed-but-strongly-consistent SQL. FoundationDB is ACID at scale.
The honest framing today is: pick your consistency level per query, not per database. Both Postgres (with read replicas) and Mongo/Cassandra let you tune READ COMMITTED vs SERIALIZABLE, or quorum vs single-node reads. The real question is: which invariants does my business actually require, and am I willing to pay the latency cost to enforce them?
Joins vs Denormalization β Where the Work Lives
"NoSQL doesn't do joins" is correct β and that's a feature, not a flaw, once you understand where the work moved.
In SQL you write data once, normalized into many tables, and the join cost is paid every time you read. In NoSQL you pay the cost upfront β you denormalize: the same data is duplicated across multiple records so reads are single-record fetches.
Trade-off: SQL has cheap writes and expensive reads at scale; NoSQL has cheap reads but every write must update many places.
-- SQL: write once, join on every read SELECT a.name, p.title, p.body FROM authors a JOIN posts p ON p.author_id = a.id WHERE a.id = 7; // MongoDB pattern A: embed (author owns posts, posts < 100) { _id: 7, name: "Riya", posts: [{title:"..."}, {title:"..."}] } // MongoDB pattern B: reference (posts unbounded) { _id: 7, name: "Riya" } { _id: "p1", authorId: 7, authorName: "Riya", title: "..." } // duplicated // Now: rename Riya β Riyaa // SQL: 1 UPDATE on authors, done. // Mongo: 1 UPDATE on author, plus updateMany on posts to fix authorName.
Indexes β Same Idea, Different Cost
Both SQL and NoSQL use indexes (usually B-trees, sometimes LSM-trees in NoSQL). The mental model is the same β maintain a sorted lookup structure for fast finds β but the cost profile is very different.
- SQL (B-tree, mutable): reads are fast (O(log N)), writes update the tree in place. Random writes hurt as the table grows.
- NoSQL (LSM-tree, append-only): writes are blazing fast (O(1) into a memtable), but reads may have to merge several SSTables on disk. Background "compaction" rewrites files to keep reads fast.
- Cassandra, RocksDB, Bigtable, LevelDB, ScyllaDB β all LSM. That's why NoSQL ingests are usually faster, and why a bursty write workload doesn't tip them over.
Vertical vs Horizontal Scaling β The Real Reason NoSQL Won
If you only remember one thing about why NoSQL exists: SQL was designed for one machine; NoSQL was designed for a thousand. Everything else is consequence.
- Vertical scaling = bigger box. More CPU, more RAM, more disk. Postgres on a 96-core machine with 1 TB RAM goes very far β most companies never outgrow it.
- Horizontal scaling = more boxes. Cassandra or Mongo on 100 commodity nodes routes data by partition key. No single machine sees all the data.
- SQL can scale horizontally (Citus, Vitess, Postgres-XL), but joins across shards are painful and transactions become two-phase commits β slow and brittle.
- NoSQL is shard-aware from day one. The schema demands it.
Sharding & Replication β How Data Spreads
Two orthogonal techniques, often confused. Sharding splits data across machines so each holds a slice. Replication copies data across machines so each replica holds the same slice. Real systems do both.
| Technique | Goal | What you gain | What you lose |
|---|---|---|---|
| Replication | Many copies of all data | Read scale, fault tolerance | Write throughput stays the same; replication lag |
| Sharding (partitioning) | Each machine holds 1/N of data | Write scale, storage scale | Cross-shard queries hurt; rebalancing is painful |
| Both (real systems) | N shards Γ 3 replicas each | Read+write scale + durability | Operational complexity |
# 1. Hash sharding β uniform, but no range scans shard = murmur3(user_id) % numShards # 2. Range sharding β fast range scans, but hot spots if keys cluster shard = (user_id < 1M) ? S0 : (user_id < 2M) ? S1 : S2 # 3. Geographic / tenant sharding β data locality shard = (region == "IN") ? mumbai : (region == "US") ? virginia : ...
CAP Theorem in Practice β Pick Two, But Honestly Pick One
The CAP theorem says: in a distributed system, when a network partition (P) happens, you must choose between Consistency (C) and Availability (A). Since partitions will happen at some point, real systems are either CP or AP.
| Mode | Behavior on partition | Examples |
|---|---|---|
| CP (consistent) | Refuse the request rather than serve stale data | HBase, MongoDB (with majority writes), Spanner, Zookeeper |
| AP (available) | Serve a possibly-stale answer; reconcile later | Cassandra, DynamoDB (default), Riak, CouchDB |
| CA (no partition) | Only true on a single node β Postgres, MySQL standalone | Single-node SQL |
Same Problem, Two Designs β E-Commerce Order System
Theory's nice; let's design the same feature both ways. A user places an order with multiple items, payment, shipping, and order history. Compare on the same page.
SQL Design (Postgres)
CREATE TABLE orders ( id BIGSERIAL PK, user_id BIGINT, status TEXT, total NUMERIC, created_at TIMESTAMPTZ ); CREATE TABLE order_items ( id BIGSERIAL PK, order_id BIGINT FK, product_id BIGINT, qty INT, price NUMERIC ); CREATE TABLE payments (...); CREATE TABLE shipments (...); -- one ACID transaction BEGIN; INSERT INTO orders ...; INSERT INTO order_items ...; INSERT INTO payments ...; UPDATE inventory SET qty=qty-?; COMMIT;
Wins at: integrity (no orphan items, payment + order atomic), ad-hoc reporting ("revenue by category last month" is one query).
NoSQL Design (MongoDB)
{
_id: "ord_42",
userId: 7,
status: "PAID",
items: [
{productId:1, name:"Keyboard",
qty:1, price:8999},
{productId:2, name:"Mouse",
qty:1, price:1499}
],
payment: {method:"UPI", txn:"abc"},
shipment: {addr:"...", eta:"..."},
total: 10498,
createdAt: "2026-05-08"
}
db.orders.insertOne(...) // 1 op, 1 doc
Wins at: single-doc reads (order detail page in 1 round-trip), easy horizontal scale by userId, schema-evolution (add discount field tomorrow without migration).
Which is right? Depends on the next question your business asks.
- Need "monthly revenue by category, by region, by promo code" reports? SQL β denormalized JSON makes this miserable.
- Serving 10M order detail pages a second on Black Friday? Mongo β one read, no joins.
- Want both? Use SQL as the system of record and pipe it to Elasticsearch / a warehouse for analytics. (See section 17.)
When SQL Wins β A Checklist
Reach for SQL (Postgres is the safe default in 2026) when any of these are true:
- The data has rich relationships that you'll query in many ways. Foreign keys are friends.
- Strong consistency is non-negotiable β money, inventory, bookings, anything regulated.
- Ad-hoc analytical queries are a thing β finance, BI, internal dashboards.
- The schema is stable β domain is well understood, fields rarely change.
- Your scale fits one machine (this is most apps β modern Postgres comfortably does 50K+ TPS).
- You need referential integrity at the DB level (no orphan rows, cascades on delete).
- The team is small and you want fewer moving parts. One Postgres beats five "right-tool" databases.
When NoSQL Wins β A Checklist
Reach for NoSQL when any of these are true β and ideally more than one:
- Massive write throughput β IoT, telemetry, clickstream (>100K writes/sec sustained).
- Single-key access pattern dominates β sessions, user profiles, product detail by ID.
- Schema is genuinely flexible β different records have different fields (CMS, multi-tenant SaaS).
- Data scale exceeds a single machine β petabytes, hundreds of millions of users.
- Geographic distribution β multi-region writes, low-latency edge serving.
- The data shape is the workload β graph for relationships, KV for cache, time-series for metrics.
- You're already paying the operational cost β a team that knows how to run Cassandra is a serious investment.
ALTER TABLE.Polyglot Persistence β Use Both, Without Apology
Real production systems rarely pick "one database". They pick the right database per workload and stitch them together. This is called polyglot persistence, and it's the modern norm β not an anti-pattern.
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Postgres β β MongoDB β β Redis β
β (system of β β (catalog + β β (sessions, β
β record: β β reviews) β β cart, β
β orders, β β β β rate- β
β payments, β β β β limits) β
β users) β β β β β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββββββββββ
β CDC β search-sync
βΌ βΌ
ββββββββββββββββ ββββββββββββββββ
β Elasticsearchβ β ClickHouse β
β (full-text β β (analytics β
β search) β β + BI) β
ββββββββββββββββ ββββββββββββββββ
- Postgres β system of record. ACID. Anything financial.
- MongoDB β flexible product catalog with reviews and variants embedded.
- Redis β sessions, cart contents, rate-limiters, leaderboards.
- Elasticsearch β full-text product search. Fed by Postgres via CDC.
- ClickHouse / BigQuery β analytics. Fed nightly or via streaming.
- Kafka β the glue. Every "interesting" event ends up on a topic; everyone subscribes who needs it.
Real-World Stacks β How the Big Ones Mix & Match
Learning from how scaled companies actually combine SQL and NoSQL is more useful than any flowchart.
| Company | Primary stores | Why |
|---|---|---|
| Uber | MySQL (Schemaless layer), Cassandra, Redis, MyRocks | Schemaless hides MySQL behind a KV API, scaled horizontally with sharding. Cassandra for trip events & geo data. |
| Netflix | Cassandra (most), DynamoDB, EVCache (Memcached), Elasticsearch | Global multi-region writes for viewing history; AP over CP because "the screen showing one episode behind" is fine. |
| Discord | Cassandra β ScyllaDB for messages, Postgres for users/guilds | Trillions of messages β wide-column. Relational metadata stays in Postgres. |
| Postgres (sharded with Pgbouncer), Cassandra, Redis, RocksDB | Postgres is still the system of record, sharded by user_id. Cassandra for the activity feed. | |
| Stripe | MongoDB (primary!), Redis, Kafka | Yes β financial data on Mongo. With careful invariants, multi-doc transactions, and audit trails. Proves "SQL for money" isn't absolute, just default. |
| Airbnb | MySQL (Vitess sharded), HBase, Druid, Redis | MySQL for bookings, HBase for the "search the world" inventory, Druid for analytics. |
Migration Pitfalls β Going SQL β NoSQL Hurts
Every team eventually considers "moving from SQL to NoSQL" (or back). The migration is rarely the database's fault β it's the access patterns and the application code.
- Pitfall 1 β Lift-and-shift fails. Copying a normalized SQL schema into MongoDB collections gives you the worst of both worlds: no joins, no schema enforcement, same write amplification. Redesign for the access pattern.
- Pitfall 2 β Ignoring the shard key. The shard/partition key is the most important schema decision. Pick wrong and the second migration follows a year later.
- Pitfall 3 β Lost ACID invariants. Foreign keys, CHECK constraints, and triggers in Postgres encode business rules. If you move off SQL, those rules now live in app code β every service, every consumer.
- Pitfall 4 β Reporting goes dark. "Run a quick SQL query for the CFO" becomes a Spark job. Always plan an analytics path (CDC β warehouse) before you migrate.
- Pitfall 5 β Operational maturity. Running Postgres is well-understood by every DBA on the planet. Running a 30-node Cassandra ring at 99.99% is a specialized skill. Hire or train before you migrate, not after.
Rapid-Fire Q&A β The Quick Hits
Short, sharp answers to the questions that come right before "do you have any questions for us?".
Functionally close, yes. JSONB columns + GIN indexes + pgvector + full-text + partitioning let it do most NoSQL tricks. It's still relational at heart, but "Postgres for everything" is a defensible architecture in 2026.
Multi-document transactions are slower; cross-collection joins ($lookup) are awkward; ad-hoc reporting is harder; and "schemaless" becomes "schema-everywhere-in-app-code". For relational data with strong invariants, Postgres is simpler.
MongoDB is a document store with rich querying (find, aggregation pipeline) and you run the cluster. DynamoDB is a managed KV / wide-column hybrid with rigid access by partition+sort key β less flexible queries, but zero operational overhead and predictable latency at any scale.
Massive, sustained write throughput; multi-datacenter writes with tunable consistency; time-series or event-log workloads; queries always known in advance. Cassandra punishes you for ad-hoc queries; rewards you for write-heavy known-pattern workloads.
Both. With AOF persistence enabled it's a durable store. In practice, most teams use it as a cache + ephemeral data store and back the data with Postgres or DynamoDB for the system of record.
Distributed databases that offer SQL + ACID at horizontal scale. Spanner, CockroachDB, YugabyteDB, TiDB. They use Paxos/Raft + SQL planners to give you the relational model without the single-machine ceiling β at the cost of higher write latency than a single-node Postgres.
Read-after-write surprises. User updates their profile, refreshes the page, sees the old value because the read hit a stale replica. Fix by routing post-write reads to the primary, or by using a "read-your-writes" session token.
Trick question. Partitions happen, so you always pay P. The real choice is C-or-A during a partition. CP refuses requests; AP serves stale ones.
Write down your top 5 queries before you write any DDL. The database that makes those queries cheap is the right one. Everything else β schema, indexes, scaling β flows from that.