← Back to Design & Development
High-Level Design

TinyURL / Short URL Service

From "redirect 20K times a second" to a sharded, cached, KGS-backed system — the architecture, the trade-offs, and the production shape that earns every box

Read this with the framework in mind

This deep-dive applies the 4-step HLD interview framework. As you read, map each section to Requirements → Entities → APIs → High-Level Design → Deep Dives, and notice which of the 8 common patterns and key technologies are at play.

Framework → 8 Patterns → Tech Cheat Sheet →
Step 1

Why URL Shortening?

Imagine Sarah is tweeting a link to a long blog article. The original URL is https://www.educative.io/collection/page/5668639101419520/5649050225344512/5668600916475904/ — 100+ characters that eat up half of her tweet, look ugly, and are easy to mistype. She runs it through TinyURL and gets back http://tinyurl.com/jlg8zpc — 25 characters, clean, clickable. Behind that one tap of the "shorten" button is a system that has to: generate a unique key nobody else has ever used, store it forever, and redirect billions of people to the right place in milliseconds. That's the system we're designing.

Beyond just being shorter, URL shorteners are useful for: tracking clicks (campaign analytics), hiding affiliated original URLs, device-optimized links, and QR codes that fit on a poster.

The two questions that drive every design decision below: (1) How do we generate a short, unique key for each URL without collisions? (2) How do we serve 20,000 redirects per second with under-100ms latency, when most of those redirects are for the same handful of viral links?
Step 2

Requirements & Goals

Before drawing a single box, pin down exactly what the system must do. In an interview, asking these questions shows you're not just regurgitating a memorized solution.

✅ Functional Requirements

  • Given a long URL, generate a shorter unique alias (the "short link")
  • When users hit a short link, redirect them to the original URL
  • Users can optionally pick a custom alias for their URL
  • Links expire after a default timespan; users can override the expiry

⚙️ Non-Functional Requirements

  • Highly available — if redirection is down, every link on the internet pointing at us is broken
  • Real-time redirection — under 100ms latency p99
  • Unguessable short keys — should not be predictable (for privacy & abuse)

➕ Extended

  • Analytics — click counts, geo, referrer, browser
  • REST APIs for other services to integrate
The non-functional requirements are the harder ones. Generating a unique key is a 30-line problem. Serving 20K reads/sec with global low latency, and never losing a single mapping over 5 years, is the part that actually requires architecture.
Step 3

Capacity Estimation & Constraints

Numbers are not optional in HLD. They drive every architectural choice — sharding, caching, load balancing — so do them out loud, even if rough. Our system is read-heavy: assume a 100:1 read-to-write ratio (every URL gets created once but redirected hundreds of times).

Traffic estimates

Assume 500 million new URLs created per month. With a 100:1 read ratio, that's 50 billion redirects/month.

Writes

~200 URLs/sec

500M / (30 × 86400)

Reads

~20K redirects/sec

100 × 200/sec

Ingress

~100 KB/s

200 × 500 bytes

Egress

~10 MB/s

20K × 500 bytes

Storage estimate (5 years)

500M new URLs × 12 months × 5 years = 30 billion records. At ~500 bytes per record (URL + metadata): ~15 TB total. With 70% capacity headroom: ~22 TB provisioned.

Cache estimate (the 80/20 rule)

20% of URLs generate 80% of traffic. Daily read volume is 20K × 86400 = ~1.7B requests/day. To cache the hot 20% in memory: 0.2 × 1.7B × 500 bytes ≈ 170 GB cache. A modern server has 256GB RAM — this fits on one box, but we'll replicate for redundancy.

MetricValueWhy it matters
New URLs/sec200/sDrives KGS throughput & write DB sizing
Redirects/sec20K/sDrives cache size, LB tier, and read replicas
5-yr storage15 TBForces sharding — too big for one box
Hot cache170 GBJustifies a Memcached/Redis tier in front of DB
Egress10 MB/sStandard — single LB tier handles it
Step 4

System APIs

Two endpoints carry 99% of the traffic: create a short URL, and resolve a short URL (via HTTP redirect). A third for delete. Defining APIs early locks down the contract before architecture.

REST API surface
// Create — write path, low QPS
POST /api/v1/urls
{
  "api_dev_key":  "abc123...",
  "original_url": "https://www.educative.io/...long-url...",
  "custom_alias": "my-blog-post",   // optional
  "user_name":    "sarah",          // optional
  "expire_date":  "2027-12-31"      // optional
}
→ 201 Created  { "short_url": "https://tinyurl.com/jlg8zpc" }

// Resolve — read path, high QPS, served via HTTP redirect not JSON
GET /:url_key
→ 302 Found  Location: https://www.educative.io/...long-url...
   // 302 (not 301) so we keep getting hits for analytics

// Delete
DELETE /api/v1/urls/:url_key
Headers: { "X-API-Dev-Key": "abc123..." }
→ 204 No Content
Why 302 Found and not 301 Moved Permanently? 301 lets the browser cache the redirect target and skip our server on the next click — great for latency, terrible for click-counting analytics. 302 makes the browser re-ask us each time, so every click flows through our telemetry. Trade-off: ~10ms per redirect we wouldn't otherwise spend, in exchange for accurate metrics.
Abuse prevention via api_dev_key: a malicious user could exhaust our 56.8 billion 6-character keys by spamming creates. We tie every create to a developer key and rate-limit per key (e.g., 1000 URLs/hour). Anonymous users go through a captcha-gated UI flow.
Step 5

Database Design

What does our data actually look like? Four observations: (1) we'll store billions of records, (2) each record is tiny — under 1KB, (3) there are no relationships between records other than "user X created URL Y", and (4) the workload is read-heavy with extreme key-based access. This points loud and clear at a NoSQL key-value store — DynamoDB, Cassandra, or Riak. Joins are nonexistent; horizontal scaling is mandatory.

erDiagram URL { string hash PK string original_url timestamp creation_date timestamp expiration_date bigint user_id FK } USER { bigint id PK string name string email timestamp creation_date timestamp last_login } USER ||--o{ URL : "creates"

The URL table's primary key is the short hash itself (the 6-8 character alias). This is the most-queried field by orders of magnitude, so making it the PK gives us O(1) lookups. The original_url is just a payload — we never search by it.

Why NoSQL beats SQL here: our access pattern is "give me the URL for hash X" — the simplest possible key-value query, repeated 20K times/sec. SQL's joins, transactions, and schema rigidity buy us nothing; their cost (vertical scaling, harder sharding) hurts. A wide-column NoSQL store like Cassandra also gives us free multi-region replication, which we'll need for global low-latency redirects.
Step 6 · CORE

High-Level Architecture — From Naive to Production

This is the section that wins or loses the interview. We'll build the architecture in three passes: the simplest thing that could plausibly work, why it falls apart, and the production shape where every box justifies itself. Numbers from §3 drive every decision.

Pass 1 — The naive design (and why it breaks)

Sketch the simplest possible system: one app server talks to one MySQL database. To create a URL, the app generates a hash and inserts it. To redirect, the app looks up the hash and returns a 302.

flowchart LR C[Client] --> APP[App Server] APP --> DB[("MySQL")]

Three concrete failures emerge the moment traffic shows up:

💥 Hash collisions at write time

Two users submit the same long URL — they get the same short hash. Or worse, two different URLs hash to the same 6-character prefix. Each create now costs a DB read (to check collision) before the write. At 200 writes/sec each adding a round trip, write latency creeps up — and collision-retry storms can spike it 10×.

💥 20K reads/sec crushes one DB

A single MySQL instance tops out around 5K-10K simple SELECTs/sec on commodity hardware. We need 20K, growing 20%/year. The DB CPU pegs at 100% and p99 latency balloons from 5ms to 500ms — every redirect on the internet now feels broken.

💥 15TB doesn't fit on one box

Five years of growth gives us 15TB of data. A single MySQL server with one disk can't hold it. Even if it could, backup/restore takes hours and a single disk failure loses every link we ever made — which is unacceptable for a service whose entire promise is "this link works forever".

Pass 2 — The mental model: Write Path vs. Read Path vs. Key Generation

The single most important insight in this design is that three workloads have wildly different shapes and must scale independently:

✍️ Write Path

200 req/sec. Low volume, but every request must be durable — losing a write means a user got back a short link that points to nothing. Latency budget: 50-100ms. Needs strong consistency on the unique-key constraint.

📖 Read Path

20,000 req/sec. 100× the write volume. Latency budget: 50ms p99 — a slow redirect is a broken-feeling link. Stale reads are fine (URLs don't change). Cacheable up to the moon.

🔑 Key Generation

Constant background process. Manufacture unique 6-char keys offline so the write path never has to retry on collisions. Decoupled from request traffic entirely — pre-fills a pool that workers drain.

This three-way split lets us scale the read path with cheap stateless replicas, keep the write path simple, and bury the collision problem inside a dedicated Key Generation Service (KGS) that runs on its own clock.

Pass 3 — The production shape

Now the full picture. Every node is numbered — find its matching card below to see what it does and crucially what would break without it.

flowchart TB CL["① Client — Browser, App, SDK"] subgraph EDGE["Edge Tier"] CDN["② CDN — CloudFront / Akamai"] LB["③ Load Balancer"] end subgraph WRITE["Write Path"] WAPP["④ Write App Server"] KGS["⑤ Key Generation Service"] KDB[("⑥ Key DB — unused + used pools")] end subgraph READ["Read Path"] RAPP["⑦ Read App Server"] CACHE["⑧ Cache — Memcached / Redis"] end subgraph DATA["Data Tier"] DB[("⑨ URL DB — Cassandra / DynamoDB")] end subgraph OPS["Async / Ops"] CLEAN["⑩ Cleanup Service"] TEL["⑪ Telemetry Pipeline"] end CL --> CDN CDN --> LB LB -->|"POST create"| WAPP LB -->|"GET resolve"| RAPP WAPP --> KGS KGS --> KDB WAPP --> DB RAPP --> CACHE CACHE -.miss.-> DB RAPP -.events.-> TEL CLEAN -.scans.-> DB CLEAN -.recycles keys.-> KDB style CL fill:#e8743b,stroke:#e8743b,color:#fff style CDN fill:#171d27,stroke:#3cbfbf,color:#d4dae5 style LB fill:#171d27,stroke:#9b72cf,color:#d4dae5 style WAPP fill:#171d27,stroke:#e8743b,color:#d4dae5 style KGS fill:#171d27,stroke:#9b72cf,color:#d4dae5 style KDB fill:#171d27,stroke:#9b72cf,color:#d4dae5 style RAPP fill:#171d27,stroke:#4a90d9,color:#d4dae5 style CACHE fill:#171d27,stroke:#3cbfbf,color:#d4dae5 style DB fill:#171d27,stroke:#38b265,color:#d4dae5 style CLEAN fill:#171d27,stroke:#d4a838,color:#d4dae5 style TEL fill:#171d27,stroke:#d4a838,color:#d4dae5

Component-by-component — what each numbered box does

Use the numbers in the diagram above to find the matching card. Each one answers what is this, why is it here, and what would break without it.

Client

Anything that hits a TinyURL — a browser following a tweet link, a mobile app deep-linking, an integration calling our API. The browser is the most common; it sends a plain HTTP GET on the short URL and expects a 302 redirect. From the client's view, the entire system is one URL.

Solves: nothing on its own — but every design choice flows backward from "what does the client experience?" Latency, availability, and the redirect protocol are all client-facing concerns.

CDN (CloudFront / Akamai)

A globally-distributed edge network that sits in front of the load balancer. For very hot URLs (think the top 1000 viral links), the CDN caches the 302 response at edge POPs around the world. A user in Tokyo following a hot link never reaches our origin servers — they get the redirect from a CDN node 20ms away.

Solves: global latency. Without a CDN, every redirect from Tokyo travels to our US-East data center and back — 200ms minimum on physics alone. A CDN turns that into 20ms for the head of the long-tail distribution, which is most of the traffic.

Load Balancer

The traffic cop. Sits in front of both write and read app servers, distributes incoming HTTP, and yanks unhealthy backends out of rotation via health checks every few seconds. Round-robin is fine to start; switch to least-connections once we have enough nodes that load skew matters.

Solves: single-server bottleneck and single-server failure. Without an LB, one app crash takes down the whole service. With it, we lose 1/N of capacity for a few seconds until health checks fail the bad node out.

Write App Server

Stateless service that handles POST /api/v1/urls. The flow per request: validate input → ask KGS for an unused key → write (hash, original_url, ...) to the URL DB → return the short URL. Total budget: under 100ms. Scaling is easy because it's stateless — add pods behind the LB.

Solves: isolating the heavy lifting (key generation, DB write) behind a clean API surface. Without a dedicated write tier, write traffic would compete with the much larger read traffic for app server CPU — causing reads to slow down whenever a write spike hits.

Key Generation Service (KGS)

A standalone service whose only job is to pre-generate random 6-character base62 keys and stockpile them in the Key DB. It runs on its own cadence — say batches of 10K keys every minute — and serves keys to write app servers from an in-memory queue. When a key is handed out, it's atomically moved from the "unused" pool to the "used" pool. With 56.8 billion possible keys and 200 writes/sec, the pool will outlast 5 years easily.

Solves: the core write-path problem. Without KGS, every write must do "generate hash → check DB for collision → retry if dup → insert", which is two round trips minimum and gets worse as the namespace fills up. With KGS, the write path is one round trip with zero collision risk by construction.

What if KGS dies? Single point of failure — solved by running an active-standby pair. Each write app server also caches a small batch of keys locally (~100 keys), so a 30-second KGS outage doesn't stop creates.

Key DB (KGS storage)

Two tables: unused_keys (the pool waiting to be claimed) and used_keys (already assigned). At 6 bytes per key × 56.8B keys this is ~341GB if we ever fully populate it — but in practice we only keep maybe 1M unused keys ahead of demand, so it's tiny. A simple key-value store works fine.

Solves: persisting the pool so KGS can restart without losing keys. If KGS held keys only in memory and crashed, we'd lose 10K unallocated keys per restart — acceptable, but persisting them lets us also support fancier schemes like "reserve a key for 5 minutes for a logged-in user".

Read App Server

Stateless service that handles GET /:url_key. The flow per request: parse hash from URL → check cache → on hit, return 302; on miss, query URL DB, populate cache, return 302 → emit a click event to telemetry. Latency budget: under 50ms. Scaled aggressively because read traffic is 100× write traffic.

Solves: serving the redirect workload at scale without touching the write path. Splitting reads from writes lets each tier scale independently — 50 read pods and 5 write pods, not 55 of each.

Cache (Memcached / Redis)

An in-memory key-value store holding the hottest 20% of URL mappings — about 170GB. Read flow: app server sends GET hash → cache returns the long URL in microseconds. Eviction policy: LRU — the least-recently-viewed mapping gets evicted first. Replicated for fault tolerance and to spread the load (no single cache node serves all 20K req/sec).

Solves: the 20K req/sec problem. Without a cache, every redirect hits the DB. With it, 80% of requests never touch the DB — letting the DB tier stay sized for the cold-tail 4K req/sec instead of the full 20K.

URL DB (Cassandra / DynamoDB)

The source of truth — a partitioned, replicated NoSQL key-value store holding all 30 billion (hash → original_url) mappings. Sharded by hash via consistent hashing, with each shard replicated 3× across availability zones. Cassandra's quorum reads (R=2, W=2, N=3) give us strong-enough consistency without sacrificing availability.

Solves: durable storage that scales horizontally. A single MySQL box maxes out around 1TB of healthy operation; we need 15TB. Cassandra spreads it across dozens of nodes and handles node failures automatically.

Cleanup Service

A background worker that scans the URL DB for expired entries (every link has an expiration_date) and either soft-deletes them or fully removes them, returning the freed key to the KGS unused pool for recycling. Runs in low-traffic windows. Also lazily expires URLs on read — if a read app server fetches an expired URL, it deletes it and returns 404.

Solves: preventing the DB from growing unbounded with abandoned/expired links. Without cleanup, our 5-year storage estimate would be 5-year accumulation forever — eventually we'd be paying to store 100TB of dead links nobody clicks.

Telemetry Pipeline

Every redirect emits a click event ({hash, timestamp, geo, referrer, user_agent}) to a Kafka topic. Downstream consumers fan out: real-time aggregator updates click counts in a Redis sorted set; batch ETL feeds a data warehouse for "show me the click history of tinyurl.com/abc". Crucially this is async — the read app server doesn't wait for telemetry before returning the 302.

Solves: click analytics without slowing down the redirect. If we tried to UPDATE click_count = click_count + 1 in the URL DB on every read, a viral link with 10K clicks/sec would melt that DB row. The async pipeline absorbs the spike and rolls up counts safely in the background.

Concrete walkthrough — Sarah shortens, Raj clicks

Two real flows, mapped to the numbered components above:

✍️ Write flow — Sarah shortens her blog link at 14:02:06

  1. Sarah's browser ① POSTs to tinyurl.com/api/v1/urls with her long URL.
  2. The Load Balancer ③ routes it to a Write App Server ④. (CDN ② doesn't cache write requests.)
  3. The write app calls KGS ⑤: "give me a key" → KGS pops jlg8zpc from its in-memory pool and atomically marks it used in Key DB ⑥.
  4. Write app inserts {hash: "jlg8zpc", original_url: "https://...", user_id: 42, expiration: 2030-12-31} into URL DB ⑨ — sharded by hash, replicated 3×.
  5. Write app returns {short_url: "https://tinyurl.com/jlg8zpc"}. Total elapsed: 60ms.

📖 Read flow — Raj in Tokyo clicks the link 8 hours later

  1. Raj's browser ① GETs tinyurl.com/jlg8zpc. DNS routes him to the nearest CDN ② edge in Tokyo.
  2. CDN cache hit? If yes → 302 returned in 15ms. Done.
  3. If no → CDN forwards to our origin Load Balancer ③ → Read App Server ⑦.
  4. Read app checks Cache ⑧ for jlg8zpc → hit (the link is hot today) → returns 302 with the long URL. ~30ms.
  5. If cache miss → read app queries URL DB ⑨, populates the cache, returns 302. ~80ms.
  6. Read app fires off a click event to Telemetry ⑪ asynchronously. CDN may also cache this 302 for next time.
So what: the architecture is built around three insights — (1) writes and reads have different shapes so they get different tiers; (2) collisions are a write-path problem solved by pre-generation in KGS, not by retries; (3) the read path is mostly cacheable, so the DB is sized for the cold tail not the hot peak. Every box in the diagram earns its place by removing one of those failure modes from Pass 1.
Step 7

Encoding the URL vs. Key Generation Service

How do we actually produce a 6-character key? Two approaches; the PDF reference walks through both. We'll explain why production systems pick KGS.

Approach A — Hash & encode the URL

Compute a hash (MD5 or SHA-256) of the long URL, base62-encode it, take the first 6 characters. Why 6? Base62 has 62 characters (A-Z, a-z, 0-9), so 62⁶ = 56.8 billion possible keys — comfortably more than the 30 billion we need over 5 years. (Base62 deliberately drops base64's - and _, which hurt readability and double-click-to-select in URLs.)

⚠️ Three problems with this approach

  • Same URL → same key. Two users shortening the same URL get the same short link. Acceptable? Sometimes, but you lose per-user analytics and can't expire one without expiring the other.
  • URL-encoding collisions. foo.com/page?id=design and foo.com/page%3Fid%3Ddesign are the same URL but hash differently — duplicate entries.
  • Truncation collisions. Two different URLs may hash to different full SHA-256s but identical first-6-character prefixes. Now you must do "compute → check DB for collision → if dup, take the next 6 chars or rehash with salt → check again". Each write is now 1-3 DB round-trips, and the collision rate gets worse as the namespace fills.

Mitigations: append a sequence number or user_id to the URL before hashing (eliminates "same URL → same key"), but this still has the truncation-collision problem at write time.

Approach B — Key Generation Service (KGS) ← production winner

Manufacture random 6-character keys offline, store them in a "unused" pool. Writes claim a key from the pool — guaranteed unique by construction, zero collision checks at request time.

sequenceDiagram participant W as Write App participant K as KGS participant KDB as Key DB participant U as URL DB Note over K,KDB: KGS pre-generates batches of keys in background K->>KDB: Generate 10K random keys, INSERT into unused KDB-->>K: OK W->>K: Give me a key K->>KDB: Atomically move 1 key from unused to used KDB-->>K: jlg8zpc K-->>W: jlg8zpc W->>U: INSERT (jlg8zpc, original_url, ...) U-->>W: OK

✅ Why KGS wins

  • Zero collision risk at write time — uniqueness is guaranteed by the unused pool
  • One round-trip per write (vs. 1-3 with hashing)
  • Same long URL submitted twice → two different short keys → independent analytics & expiry
  • Can pre-warm the pool, app servers cache 100 keys locally for burst tolerance
  • Recycle expired keys back into the pool — namespace never permanently fills

⚠️ KGS's own challenges

  • Single point of failure — solved with an active-standby replica that takes over via failover
  • Concurrency — two write apps must not get the same key. Solved by an atomic POP from a Redis set, or a row-level SELECT FOR UPDATE on the unused table
  • Lost keys on crash — if KGS hands a key to a write app that then crashes before INSERT, the key is gone. Acceptable: 56.8B keys, losing a few thousand to crashes is rounding error
The interview move: volunteer Approach A first ("simplest thing that could work"), then walk through its three failure modes, then introduce KGS as the production answer. This shows you understand why KGS exists, not just that it exists.
Deep Dive

KGS Deep Dive — How Key Generation Actually Works in Production

Section 7 told you why we pick a Key Generation Service over inline hashing. This section opens the box and shows you how it runs in production — and it starts by killing the most common misconception. Picture the write app server in the middle of a 200-requests-per-second storm, and on each request it asks: "give me a key nobody has ever used." The naive answer — "generate a random string right now and check the database" — is exactly what a good KGS is built to never do. Let's see why, and then build up the real machinery one idea at a time.

The one sentence to remember: a production KGS does not generate a key per request. It manufactures unique keys ahead of time, proves they're unique once, and then just hands them out — so the write path becomes a pop off a pile that's already guaranteed collision-free.

Pass 1 — The naive KGS (generate-on-request) and why it breaks

The simplest "key service" generates a random 6-character string the moment a write arrives, then asks the database "is this taken?" before using it. Here's that flow:

flowchart LR W[Write App] -->|"need a key"| G[Generate random 6 chars] G -->|"SELECT where hash=?"| DB[("URL DB")] DB -->|"taken!"| G DB -->|"free"| INS[INSERT mapping] style W fill:#171d27,stroke:#e8743b,color:#d4dae5 style G fill:#171d27,stroke:#e05252,color:#d4dae5 style DB fill:#171d27,stroke:#38b265,color:#d4dae5 style INS fill:#171d27,stroke:#38b265,color:#d4dae5

Two things go wrong the instant real traffic and real scale show up:

💥 A read-before-write on every single create

Every shorten request now pays for at least one extra DB round trip just to ask "is this key free?" As the keyspace fills (30B of 56.8B keys used), collisions get more frequent, so some requests loop "generate → check → collision → regenerate → check" several times. The 60ms write budget blows up under collision-retry storms.

💥 A race condition the check can't catch

Two write servers generate the same key at the same millisecond. Both run "SELECT where hash=X" → both see "free" → both INSERT. One silently overwrites the other, and a user who just shortened a link finds it now points somewhere else. The existence check is not atomic with the insert, so it can never fully prevent this.

Both failures share one root cause: we're doing uniqueness work on the hot path, under concurrency. The fix is to move that work off the hot path entirely.

Pass 2 — The mental model: pre-generate, then hand out disjoint blocks

The KGS idea is to flip an expensive per-request problem into a cheap one-time background job. There are two production-grade ways to manufacture the keys, and then one crucial pattern for handing them out safely.

🎲 Strategy A — Pre-generated random pool

A background job generates random 6-char base62 keys using a cryptographically-secure RNG (SecureRandom), and inserts each into an unused_keys table with a UNIQUE constraint. Duplicate generations simply fail the insert and are skipped — and because this runs offline, that collision check costs nobody any request latency.

Pro: keys are genuinely unpredictable (unguessable). Con: you must store the pool of unused keys.

🔢 Strategy B — Counter + base62 encode

Keep a single monotonic counter and base62-encode the integer to get the key: 1 → "b", 125000000 → "8M0kX". Collisions are mathematically impossible — each integer maps to exactly one string — so no existence check ever runs.

Pro: zero storage (just an integer), zero collisions ever. Con: raw counters are sequential and guessable, so you run the counter through a reversible scramble (Feistel / multiply-by-large-coprime mod 62⁶) so output looks random while staying unique.

Why "random" is misleading. Even Strategy A doesn't generate randomly per request — it generates randomly in advance. The randomness is a CSPRNG run by a background batch job, or (Strategy B) a deterministic scramble applied to a counter. Either way, uniqueness is proven before the key ever reaches a request.

Now the part that makes KGS actually scale across many app servers: block (range) allocation. If every write server asked KGS for one key at a time, KGS would be hit 200 times a second and become the bottleneck — and you'd still need locking to stop two servers getting the same key. Instead, each app server claims a whole block of keys at once and serves them from local memory.

flowchart LR A1["Write App 1
holds block [1 – 1,000,000]"] -->|"block exhausted,
give me the next block"| KGS["KGS
(owns the counter)"] A2["Write App 2
holds block [1,000,001 – 2,000,000]"] -->|"give me the next block"| KGS KGS -->|"atomic INCRBY 1,000,000"| CNT[("Counter / Key DB")] style A1 fill:#171d27,stroke:#e8743b,color:#d4dae5 style A2 fill:#171d27,stroke:#e8743b,color:#d4dae5 style KGS fill:#171d27,stroke:#9b72cf,color:#d4dae5 style CNT fill:#171d27,stroke:#9b72cf,color:#d4dae5

Think of it like a deli counter. KGS is the ticket dispenser. Instead of every customer (request) walking up to the dispenser one at a time, each cashier (app server) grabs a whole roll of 1,000,000 tickets up front and hands them out from their own roll. The dispenser is touched once per roll — roughly once per million URLs instead of once per URL. And because each cashier holds a different roll, two customers can never get the same ticket number. That disjoint-by-construction property is what eliminates both the collision check and the race condition from Pass 1.

Pass 3 — The production shape

Here's the full KGS internals. Every node is numbered — match it to the card below to see what it does and what would break without it.

flowchart TB WAPP["① Write App Server
(holds in-memory block)"] subgraph KGSBOX["Key Generation Service"] ALLOC["② Block Allocator
(atomic counter bump)"] GEN["③ Background Generator
(random pool / scramble)"] PRIMARY["④ Active KGS"] STANDBY["⑤ Standby KGS"] end CNT[("⑥ Counter Row / Key DB
unused + used pools")] RECYCLE["⑦ Recycled-Key Feed
(from Cleanup Service)"] WAPP -->|"need a block"| ALLOC ALLOC -->|"atomic INCRBY / SELECT FOR UPDATE"| CNT GEN -->|"prefill unused pool"| CNT PRIMARY -.failover.-> STANDBY RECYCLE -.expired keys back to unused.-> CNT ALLOC -->|"block [N … N+blk]"| WAPP style WAPP fill:#171d27,stroke:#e8743b,color:#d4dae5 style ALLOC fill:#171d27,stroke:#9b72cf,color:#d4dae5 style GEN fill:#171d27,stroke:#9b72cf,color:#d4dae5 style PRIMARY fill:#171d27,stroke:#9b72cf,color:#d4dae5 style STANDBY fill:#171d27,stroke:#7b8599,color:#d4dae5 style CNT fill:#171d27,stroke:#38b265,color:#d4dae5 style RECYCLE fill:#171d27,stroke:#d4a838,color:#d4dae5

Component-by-component — what each numbered box does

Write App Server (key consumer)

The stateless service handling POST /api/v1/urls. It does not call KGS per request — it holds a block of keys (say 1,000) in local memory and pops one per create. When its local block runs low (e.g. 10% remaining), it asks the allocator for the next block in the background, so it never blocks a user while topping up.

Solves: turning a network call into a memory read. Without local blocks, every create is a synchronous round-trip to KGS — adding latency and making KGS a 200 req/sec choke point.

Block Allocator

The only contended operation in the whole service: "give out the next block." It performs one atomic step — INCRBY counter 1000 in Redis, or SELECT … FOR UPDATE on a counter row in SQL — and returns the range [N … N+1000). Sub-millisecond, and it runs only once per ~1,000 creates.

Solves: the race condition from Pass 1. Because the bump is atomic and ranges are disjoint, two app servers can never receive overlapping keys — uniqueness is guaranteed by construction, with zero existence checks.

Background Generator

For the random-pool strategy (A), this batch job runs on its own clock — generate 10K random base62 keys, bulk-insert into unused_keys with a UNIQUE constraint, skip the rare duplicate. For the counter strategy (B), there's no pool to fill; this box is just the scramble function applied at hand-out time.

Solves: moving collision-checking off the hot path. All the "is this key taken?" work happens here, offline, where latency doesn't matter and a retry costs nobody anything.

④⑤ Active + Standby KGS

KGS is a single point of failure — if it's the only thing that can mint keys, its death stops every create on the platform. So we run an active-standby pair: the standby watches the active via heartbeat and takes over the counter (which is persisted in ⑥) within seconds if the active dies.

Solves: availability of the write path. Combined with the app-server local block cache (①), even a 30-second failover window doesn't drop a single create.

Counter Row / Key DB

The durable source of truth. For strategy B it's literally one integer (the high-water mark of allocated blocks); for strategy A it's two tables — unused_keys and used_keys. Persisting it means a KGS restart resumes from the last allocated block, never from the last used key.

Solves: not re-issuing keys after a crash. If the counter lived only in memory, a restart could replay block [1…1M] twice and hand the same keys to two different URLs — silent overwrites again.

Recycled-Key Feed

When the Cleanup Service (§11) expires a URL, its key is freed and pushed back into the unused_keys pool (strategy A only — counter-based keys aren't recycled, the space is large enough not to bother).

Solves: the namespace never permanently fills. Without recycling, every expired link would burn a key forever; with it, the 56.8B keyspace effectively never runs dry.

The end-to-end hand-out sequence

This is what actually happens when a write server's local block is exhausted and it needs more keys, then serves a create:

sequenceDiagram actor Sarah participant W as Write App (local block) participant K as KGS Block Allocator participant C as Counter / Key DB participant U as URL DB Note over W: Local block nearly empty (10 keys left) W->>K: Allocate next block (background, non-blocking) K->>C: Atomic INCRBY counter 1000 → returns 5,000,000 C-->>K: block [5,000,000 … 5,001,000) K-->>W: here is your block Note over W: Now holds 1,000 fresh keys in memory Sarah->>W: POST /urls (shorten long URL) W->>W: pop next key from local block → encode → "8M0kX" W->>U: INSERT (8M0kX, original_url, ...) U-->>W: OK W-->>Sarah: https://tinyurl.com/8M0kX (≈60ms, zero KGS call on this request)

The lifecycle of a single key

Each key moves through a small state machine. The interesting edges are the failure ones — what happens when a server dies mid-flight.

stateDiagram-v2 [*] --> UNUSED : generated / counter not yet reached UNUSED --> RESERVED : block handed to an app server RESERVED --> USED : INSERTed into URL DB RESERVED --> LOST : app server crashed before INSERT USED --> EXPIRED : expiration_date passed EXPIRED --> UNUSED : recycled by Cleanup Service LOST --> [*] : abandoned (acceptable waste) USED --> [*] : permanent link
On the LOST state — do NOT try to reclaim it. If an app server grabs block [5,000,000 … 5,001,000) and crashes after using only 300 keys, the remaining 700 are simply gone. That's fine: the keyspace is 56.8 billion, so leaking a few thousand keys per crash is a rounding error. Trying to detect and reclaim partially-used blocks reintroduces exactly the coordination and race problems KGS exists to avoid. Waste is cheaper than coordination.

A tiny reference implementation

Strategy B (counter + base62), the simplest production-grade approach — no pool to store, collisions impossible:

KGS — counter + base62 (Java-ish pseudocode)
class KeyGenerationService {
  // The ONE contended resource — bumped atomically, once per block.
  private final RedisClient redis;   // or a SQL counter row
  static final long BLOCK = 1_000;
  static final String ALPHABET =
     "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; // 62

  // Called by an app server ONLY when its local block runs low.
  long[] allocateBlock() {
    long end = redis.incrBy("url:counter", BLOCK); // atomic, returns new value
    return new long[]{ end - BLOCK, end };          // [start, end) — disjoint per caller
  }

  // Pure function — no DB, no collision check, runs in the app server's memory.
  String encode(long id) {
    id = scramble(id);            // reversible mix so keys look random, stay unique
    StringBuilder sb = new StringBuilder();
    do { sb.append(ALPHABET.charAt((int)(id % 62))); id /= 62; }
    while (id > 0);
    return sb.reverse().toString();   // e.g. 5_000_000 → "JpL2"
  }
}

Notice what's absent: there is no SELECT … WHERE hash = ? anywhere on the create path. The app server pops the next id from its in-memory block, calls encode(id), and inserts. The only network call to KGS is the once-per-1,000 allocateBlock(), done in the background before the block empties.

Failure modes at a glance

What failsWhat happensHow we handle it
App server crashes mid-blockUnused keys in its block are lostDo nothing — keyspace is huge, leak is negligible
KGS active node diesNo new blocks can be allocatedStandby fails over in seconds; app servers run on cached local block meanwhile
Counter not persisted before crashRisk of re-issuing a blockPersist the counter durably; resume from last allocated value, never last used
Two app servers request a block at onceCould overlapImpossible — the INCRBY / SELECT FOR UPDATE is atomic, ranges are disjoint
Random pool (strategy A) generates a dupInsert failsUNIQUE constraint rejects it offline; generator skips and continues
So what: in production, KGS doesn't "randomly generate a key" when you click shorten. It pre-computes uniqueness offline — either a CSPRNG-filled pool or an atomic counter you base62-encode — and hands app servers disjoint blocks they serve from memory. That single design move deletes the collision check, the race condition, and the per-request network hop all at once. The write path becomes: pop an integer, encode it, insert — collision-free by construction, no database question ever asked.
Step 8

Data Partitioning & Replication

15TB doesn't fit on one box, and even if it did we couldn't survive its failure. We must shard (split) the URL DB across many boxes and replicate each shard 3× for fault tolerance.

❌ Range-based sharding by first letter

"All URLs whose hash starts with A go to shard 1, B to shard 2…" Easy to implement, predictable. But: hashes aren't uniformly distributed across the alphabet, so some shards get hot. And custom aliases like my-blog-post all start with M — that shard melts.

✅ Hash-based sharding (with consistent hashing)

Compute shard = consistent_hash(url_hash) % N. Distribution is uniform by construction. Use consistent hashing (not modulo) so adding/removing a shard only relocates 1/N of keys instead of all of them — no global rebalancing during scale-out.

Replication strategy

Each shard is replicated to 3 nodes across 3 availability zones. Cassandra's R=2, W=2, N=3 gives us: writes acknowledged by 2 of 3 replicas (durable even if 1 zone dies), reads from 2 replicas with conflict resolution (consistent within seconds even if a node lags). Multi-region: extra replicas in EU and APAC for global low-latency reads, eventually consistent.

Why consistent hashing matters specifically here: our shard count will grow over time (5 shards today, 50 in 5 years). With plain hash % N, adding a shard means N→N+1, every key's home changes, and we have a multi-day data migration with the cluster degraded throughout. Consistent hashing makes adding a shard a few-hour reshuffle of 1/(N+1) of keys.
Step 9

Cache

The single biggest performance lever in the system. Without a cache, every redirect hits the DB. With a cache holding the hot 20%, the DB sees only 20% of read traffic.

📦 Sizing

170GB to hold the hot 20% (calculated in §3). Modern servers ship with 256GB+ RAM, but we use 4-6 cache nodes for redundancy and to spread the 20K QPS load. Memcached or Redis both work; Redis wins if we want to also cache click counts in sorted sets.

♻️ Eviction

LRU (least recently used). Hot URLs stay hot — a viral tweet's link lives in cache for days. Cold URLs naturally roll out as fresh content takes their slot. Implemented via a linked hash map / Redis LRU policy.

🔄 Invalidation

URLs are immutable (you can't edit where a short link points), so we only invalidate on delete/expire. The cleanup service explicitly invalidates the cache key when it expires a URL. No write-through complexity needed.

Cache replication & warm-up

Each cache node is replicated 2× — if one dies, we lose 1/N of the hot set briefly until it gets re-fetched from DB on next read (a brief latency bump, not an outage). On startup, cache nodes don't pre-warm; they fill organically as requests come in. The first few minutes after a deploy see slightly higher DB load, then steady state.

Step 10

Load Balancer

LBs sit at three layers in our system, and each plays a slightly different role.

① Client → App tier

Public-facing LB (AWS ALB / nginx). Distributes incoming HTTPS across read and write app pods. Health-checks every 5s, evicts unhealthy pods. Terminates TLS so backend pods don't pay the crypto cost.

② App → Cache

Client-side or sidecar LB. Uses consistent hashing on the URL hash so the same hash always goes to the same cache node — maximizing hit rate.

③ App → DB

Cassandra/DynamoDB drivers do this themselves — the client knows the cluster topology and routes to the right shard's coordinator node.

Algorithm choice

Start with Round Robin — simple, no overhead, no state. Upgrade to Least Connections once you see one pod's CPU consistently higher than others (usually means it caught a few long-running requests and never recovered). Smart LBs also do per-pod load reporting and shift traffic away from struggling nodes proactively.

The LB is itself a single point of failure — solved with an active-passive pair using virtual IP failover, or natively in cloud LBs (ALB is multi-AZ by default).
Step 11

Purging & DB Cleanup

URLs have expiration_date — what happens when it passes?

🐢 Lazy cleanup (preferred)

Don't actively scan for expired entries — wait until a user actually requests one. On read, the app server checks the expiration; if expired, returns 404 to the user and asynchronously deletes the row. Trade-off: expired-but-unread entries linger in the DB forever, but storage is cheap and we save a giant background scan.

🧹 Active cleanup (supplement)

A lightweight Cleanup Service ⑩ runs nightly during low-traffic windows, scanning for entries where expiration_date < now() and bulk-deleting them. Returns the freed keys to the KGS unused pool for recycling.

Default expiry policy

Two-year default expiry for free-tier URLs, configurable per request, with paid-tier URLs being permanent. Storage is cheap enough that we can also choose "never expire" as the default — most production shorteners do, because a dead link breaks user trust permanently.

Step 12

Telemetry & Security

Telemetry — answering "how many clicks did this link get?"

Every redirect emits an async event. Naïvely doing UPDATE click_count = click_count + 1 on the URL row would melt that row for any viral link (10K writes/sec on one row = lock contention hell). Instead:

flowchart LR RAPP[Read App Server] -.click event.-> KAFKA[Kafka topic] KAFKA --> AGG[Real-time aggregator] KAFKA --> ETL[Batch ETL] AGG --> RDS[("Redis sorted set
per-URL counts")] ETL --> WH[("Data warehouse
geo, browser, history")] style KAFKA fill:#171d27,stroke:#d4a838,color:#d4dae5 style RDS fill:#171d27,stroke:#e05252,color:#d4dae5 style WH fill:#171d27,stroke:#9b72cf,color:#d4dae5

Counts that we care about (this URL got 1.2M clicks) live in Redis sorted sets — atomic ZINCRBY handles 100K writes/sec on one node. Detailed per-event data (geo, referrer, user_agent) flows to a data warehouse for ad-hoc analytics queries.

Security & Permissions

🔐 Private URLs

Add a visibility column: PUBLIC (anyone can resolve) or PRIVATE (only the creator + whitelisted users). On read, the app server checks the auth token against an ACL table; on mismatch, return 401 instead of 302.

🛡️ Abuse / phishing

Submitted URLs go through a real-time Google Safe Browsing check before being shortened. Rejected URLs return 400. Reactive: a "report abuse" link on every redirect page; reported URLs go into a moderation queue.

🚫 Rate limiting

Per-API-key quota — 1000 URLs/hour for free tier, higher for paid. Implemented at the LB or in a dedicated rate-limiter service (token bucket in Redis). Anonymous creates require captcha.

🔒 Unguessable keys

KGS-generated keys are random 6-character base62, not sequential. An attacker can't enumerate /aaaaaa, /aaaaab, ... to discover private URLs — the keyspace is 56.8B, sparsely populated, so brute force is infeasible.

Step 13

Interview Q&A

Why 302 redirects instead of 301?
Analytics. 301 lets the browser cache the redirect target permanently — every subsequent click bypasses our server, killing click counts. 302 forces the browser to ask us each time. Trade-off: ~10ms extra latency per click for accurate metrics. Bitly, TinyURL, and most shorteners pick 302.
Why NoSQL over SQL for the URL store?
Access pattern is pure key-value at huge scale. Our only query is "give me the URL for hash X" — no joins, no transactions, no complex filters. SQL's strengths (joins, ACID) buy us nothing; its weaknesses (vertical scaling ceiling, hard sharding) hurt. NoSQL like Cassandra/DynamoDB gives us automatic sharding, multi-region replication, and predictable single-digit-ms reads at any scale.
What if KGS goes down?
Three layers of defense. (1) Active-standby KGS pair with failover in seconds. (2) Each write app server caches ~100 unallocated keys locally — a 30-second KGS outage means writes still succeed from local cache. (3) If both fail, write apps degrade to inline hashing as fallback (slower, occasional collisions, but doesn't drop writes).
How do you handle a viral link getting 100K req/sec?
The cache + CDN absorb it entirely. A viral link is the definition of "hot" — it's in every CDN edge POP and every Memcached node. Origin DB sees zero traffic from it. The only choke point is bandwidth out of CDN, which is what CDNs are explicitly built to handle. Click telemetry is async via Kafka, which buffers the spike without dropping events.
How do you prevent collisions in the KGS unused pool?
KGS generates random keys (not sequential), then INSERTs into the unused table with a UNIQUE constraint. Duplicates fail and are skipped. With 56.8B keyspace and only 30B keys ever issued over 5 years, the birthday-paradox collision rate during generation is negligible (under 1 collision per 100K generated). Pool prefill makes this completely invisible to the write path.
How do you scale globally for low-latency reads?
Two layers. (1) CDN caches the 302 response at edge POPs in 200+ cities — a Tokyo user hits a Tokyo edge node, not US-East. (2) Multi-region DB replicas for cache misses — Cassandra/DynamoDB Global Tables replicate writes to EU and APAC within seconds. Read app servers in each region hit their local replica. Trade-off: writes are still single-region with global replication lag (~1-5s), so a URL just created in US-East might 404 in Tokyo for a few seconds. Acceptable for our use case.
What's the storage cost trade-off of "never expire" vs. "2-year default"?
15TB at 5 years vs. effectively unbounded growth. At AWS S3-class pricing (~$0.023/GB/mo for cold tier), 15TB costs ~$345/month — trivial. The reputational cost of expiring a link someone bookmarked 3 years ago is much higher. Most production shorteners default to permanent and only expire on user request or for spam.
How do you migrate from MySQL to Cassandra without downtime?
Dual-write pattern. Phase 1: read app servers read from MySQL only, write apps write to BOTH MySQL and Cassandra. Phase 2: backfill historical MySQL data into Cassandra via an ETL job. Phase 3: shadow-read from Cassandra and compare with MySQL for a week to verify correctness. Phase 4: flip read traffic to Cassandra, keep MySQL as fallback for a month. Phase 5: decommission MySQL. Total: ~6 weeks of planned migration with zero user-visible downtime.
The one-line summary the interviewer remembers: "It's a NoSQL key-value store partitioned with consistent hashing, fronted by a Memcached cache and a CDN, with a separate Key Generation Service that pre-generates unique 6-char keys offline so the write path never collides — three independent tiers each scaled to its own workload."