From "redirect 20K times a second" to a sharded, cached, KGS-backed system — the architecture, the trade-offs, and the production shape that earns every box
This deep-dive applies the 4-step HLD interview framework. As you read, map each section to Requirements → Entities → APIs → High-Level Design → Deep Dives, and notice which of the 8 common patterns and key technologies are at play.
Imagine Sarah is tweeting a link to a long blog article. The original URL is https://www.educative.io/collection/page/5668639101419520/5649050225344512/5668600916475904/ — 100+ characters that eat up half of her tweet, look ugly, and are easy to mistype. She runs it through TinyURL and gets back http://tinyurl.com/jlg8zpc — 25 characters, clean, clickable. Behind that one tap of the "shorten" button is a system that has to: generate a unique key nobody else has ever used, store it forever, and redirect billions of people to the right place in milliseconds. That's the system we're designing.
Beyond just being shorter, URL shorteners are useful for: tracking clicks (campaign analytics), hiding affiliated original URLs, device-optimized links, and QR codes that fit on a poster.
Before drawing a single box, pin down exactly what the system must do. In an interview, asking these questions shows you're not just regurgitating a memorized solution.
Numbers are not optional in HLD. They drive every architectural choice — sharding, caching, load balancing — so do them out loud, even if rough. Our system is read-heavy: assume a 100:1 read-to-write ratio (every URL gets created once but redirected hundreds of times).
Assume 500 million new URLs created per month. With a 100:1 read ratio, that's 50 billion redirects/month.
~200 URLs/sec
500M / (30 × 86400)
~20K redirects/sec
100 × 200/sec
~100 KB/s
200 × 500 bytes
~10 MB/s
20K × 500 bytes
500M new URLs × 12 months × 5 years = 30 billion records. At ~500 bytes per record (URL + metadata): ~15 TB total. With 70% capacity headroom: ~22 TB provisioned.
20% of URLs generate 80% of traffic. Daily read volume is 20K × 86400 = ~1.7B requests/day. To cache the hot 20% in memory: 0.2 × 1.7B × 500 bytes ≈ 170 GB cache. A modern server has 256GB RAM — this fits on one box, but we'll replicate for redundancy.
| Metric | Value | Why it matters |
|---|---|---|
| New URLs/sec | 200/s | Drives KGS throughput & write DB sizing |
| Redirects/sec | 20K/s | Drives cache size, LB tier, and read replicas |
| 5-yr storage | 15 TB | Forces sharding — too big for one box |
| Hot cache | 170 GB | Justifies a Memcached/Redis tier in front of DB |
| Egress | 10 MB/s | Standard — single LB tier handles it |
Two endpoints carry 99% of the traffic: create a short URL, and resolve a short URL (via HTTP redirect). A third for delete. Defining APIs early locks down the contract before architecture.
REST API surface// Create — write path, low QPS POST /api/v1/urls { "api_dev_key": "abc123...", "original_url": "https://www.educative.io/...long-url...", "custom_alias": "my-blog-post", // optional "user_name": "sarah", // optional "expire_date": "2027-12-31" // optional } → 201 Created { "short_url": "https://tinyurl.com/jlg8zpc" } // Resolve — read path, high QPS, served via HTTP redirect not JSON GET /:url_key → 302 Found Location: https://www.educative.io/...long-url... // 302 (not 301) so we keep getting hits for analytics // Delete DELETE /api/v1/urls/:url_key Headers: { "X-API-Dev-Key": "abc123..." } → 204 No Content
302 Found and not 301 Moved Permanently? 301 lets the browser cache the redirect target and skip our server on the next click — great for latency, terrible for click-counting analytics. 302 makes the browser re-ask us each time, so every click flows through our telemetry. Trade-off: ~10ms per redirect we wouldn't otherwise spend, in exchange for accurate metrics.api_dev_key: a malicious user could exhaust our 56.8 billion 6-character keys by spamming creates. We tie every create to a developer key and rate-limit per key (e.g., 1000 URLs/hour). Anonymous users go through a captcha-gated UI flow.What does our data actually look like? Four observations: (1) we'll store billions of records, (2) each record is tiny — under 1KB, (3) there are no relationships between records other than "user X created URL Y", and (4) the workload is read-heavy with extreme key-based access. This points loud and clear at a NoSQL key-value store — DynamoDB, Cassandra, or Riak. Joins are nonexistent; horizontal scaling is mandatory.
The URL table's primary key is the short hash itself (the 6-8 character alias). This is the most-queried field by orders of magnitude, so making it the PK gives us O(1) lookups. The original_url is just a payload — we never search by it.
This is the section that wins or loses the interview. We'll build the architecture in three passes: the simplest thing that could plausibly work, why it falls apart, and the production shape where every box justifies itself. Numbers from §3 drive every decision.
Sketch the simplest possible system: one app server talks to one MySQL database. To create a URL, the app generates a hash and inserts it. To redirect, the app looks up the hash and returns a 302.
Three concrete failures emerge the moment traffic shows up:
Two users submit the same long URL — they get the same short hash. Or worse, two different URLs hash to the same 6-character prefix. Each create now costs a DB read (to check collision) before the write. At 200 writes/sec each adding a round trip, write latency creeps up — and collision-retry storms can spike it 10×.
A single MySQL instance tops out around 5K-10K simple SELECTs/sec on commodity hardware. We need 20K, growing 20%/year. The DB CPU pegs at 100% and p99 latency balloons from 5ms to 500ms — every redirect on the internet now feels broken.
Five years of growth gives us 15TB of data. A single MySQL server with one disk can't hold it. Even if it could, backup/restore takes hours and a single disk failure loses every link we ever made — which is unacceptable for a service whose entire promise is "this link works forever".
The single most important insight in this design is that three workloads have wildly different shapes and must scale independently:
200 req/sec. Low volume, but every request must be durable — losing a write means a user got back a short link that points to nothing. Latency budget: 50-100ms. Needs strong consistency on the unique-key constraint.
20,000 req/sec. 100× the write volume. Latency budget: 50ms p99 — a slow redirect is a broken-feeling link. Stale reads are fine (URLs don't change). Cacheable up to the moon.
Constant background process. Manufacture unique 6-char keys offline so the write path never has to retry on collisions. Decoupled from request traffic entirely — pre-fills a pool that workers drain.
This three-way split lets us scale the read path with cheap stateless replicas, keep the write path simple, and bury the collision problem inside a dedicated Key Generation Service (KGS) that runs on its own clock.
Now the full picture. Every node is numbered — find its matching card below to see what it does and crucially what would break without it.
Use the numbers in the diagram above to find the matching card. Each one answers what is this, why is it here, and what would break without it.
Anything that hits a TinyURL — a browser following a tweet link, a mobile app deep-linking, an integration calling our API. The browser is the most common; it sends a plain HTTP GET on the short URL and expects a 302 redirect. From the client's view, the entire system is one URL.
Solves: nothing on its own — but every design choice flows backward from "what does the client experience?" Latency, availability, and the redirect protocol are all client-facing concerns.
A globally-distributed edge network that sits in front of the load balancer. For very hot URLs (think the top 1000 viral links), the CDN caches the 302 response at edge POPs around the world. A user in Tokyo following a hot link never reaches our origin servers — they get the redirect from a CDN node 20ms away.
Solves: global latency. Without a CDN, every redirect from Tokyo travels to our US-East data center and back — 200ms minimum on physics alone. A CDN turns that into 20ms for the head of the long-tail distribution, which is most of the traffic.
The traffic cop. Sits in front of both write and read app servers, distributes incoming HTTP, and yanks unhealthy backends out of rotation via health checks every few seconds. Round-robin is fine to start; switch to least-connections once we have enough nodes that load skew matters.
Solves: single-server bottleneck and single-server failure. Without an LB, one app crash takes down the whole service. With it, we lose 1/N of capacity for a few seconds until health checks fail the bad node out.
Stateless service that handles POST /api/v1/urls. The flow per request: validate input → ask KGS for an unused key → write (hash, original_url, ...) to the URL DB → return the short URL. Total budget: under 100ms. Scaling is easy because it's stateless — add pods behind the LB.
Solves: isolating the heavy lifting (key generation, DB write) behind a clean API surface. Without a dedicated write tier, write traffic would compete with the much larger read traffic for app server CPU — causing reads to slow down whenever a write spike hits.
A standalone service whose only job is to pre-generate random 6-character base62 keys and stockpile them in the Key DB. It runs on its own cadence — say batches of 10K keys every minute — and serves keys to write app servers from an in-memory queue. When a key is handed out, it's atomically moved from the "unused" pool to the "used" pool. With 56.8 billion possible keys and 200 writes/sec, the pool will outlast 5 years easily.
Solves: the core write-path problem. Without KGS, every write must do "generate hash → check DB for collision → retry if dup → insert", which is two round trips minimum and gets worse as the namespace fills up. With KGS, the write path is one round trip with zero collision risk by construction.
What if KGS dies? Single point of failure — solved by running an active-standby pair. Each write app server also caches a small batch of keys locally (~100 keys), so a 30-second KGS outage doesn't stop creates.
Two tables: unused_keys (the pool waiting to be claimed) and used_keys (already assigned). At 6 bytes per key × 56.8B keys this is ~341GB if we ever fully populate it — but in practice we only keep maybe 1M unused keys ahead of demand, so it's tiny. A simple key-value store works fine.
Solves: persisting the pool so KGS can restart without losing keys. If KGS held keys only in memory and crashed, we'd lose 10K unallocated keys per restart — acceptable, but persisting them lets us also support fancier schemes like "reserve a key for 5 minutes for a logged-in user".
Stateless service that handles GET /:url_key. The flow per request: parse hash from URL → check cache → on hit, return 302; on miss, query URL DB, populate cache, return 302 → emit a click event to telemetry. Latency budget: under 50ms. Scaled aggressively because read traffic is 100× write traffic.
Solves: serving the redirect workload at scale without touching the write path. Splitting reads from writes lets each tier scale independently — 50 read pods and 5 write pods, not 55 of each.
An in-memory key-value store holding the hottest 20% of URL mappings — about 170GB. Read flow: app server sends GET hash → cache returns the long URL in microseconds. Eviction policy: LRU — the least-recently-viewed mapping gets evicted first. Replicated for fault tolerance and to spread the load (no single cache node serves all 20K req/sec).
Solves: the 20K req/sec problem. Without a cache, every redirect hits the DB. With it, 80% of requests never touch the DB — letting the DB tier stay sized for the cold-tail 4K req/sec instead of the full 20K.
The source of truth — a partitioned, replicated NoSQL key-value store holding all 30 billion (hash → original_url) mappings. Sharded by hash via consistent hashing, with each shard replicated 3× across availability zones. Cassandra's quorum reads (R=2, W=2, N=3) give us strong-enough consistency without sacrificing availability.
Solves: durable storage that scales horizontally. A single MySQL box maxes out around 1TB of healthy operation; we need 15TB. Cassandra spreads it across dozens of nodes and handles node failures automatically.
A background worker that scans the URL DB for expired entries (every link has an expiration_date) and either soft-deletes them or fully removes them, returning the freed key to the KGS unused pool for recycling. Runs in low-traffic windows. Also lazily expires URLs on read — if a read app server fetches an expired URL, it deletes it and returns 404.
Solves: preventing the DB from growing unbounded with abandoned/expired links. Without cleanup, our 5-year storage estimate would be 5-year accumulation forever — eventually we'd be paying to store 100TB of dead links nobody clicks.
Every redirect emits a click event ({hash, timestamp, geo, referrer, user_agent}) to a Kafka topic. Downstream consumers fan out: real-time aggregator updates click counts in a Redis sorted set; batch ETL feeds a data warehouse for "show me the click history of tinyurl.com/abc". Crucially this is async — the read app server doesn't wait for telemetry before returning the 302.
Solves: click analytics without slowing down the redirect. If we tried to UPDATE click_count = click_count + 1 in the URL DB on every read, a viral link with 10K clicks/sec would melt that DB row. The async pipeline absorbs the spike and rolls up counts safely in the background.
Two real flows, mapped to the numbered components above:
tinyurl.com/api/v1/urls with her long URL.jlg8zpc from its in-memory pool and atomically marks it used in Key DB ⑥.{hash: "jlg8zpc", original_url: "https://...", user_id: 42, expiration: 2030-12-31} into URL DB ⑨ — sharded by hash, replicated 3×.{short_url: "https://tinyurl.com/jlg8zpc"}. Total elapsed: 60ms.tinyurl.com/jlg8zpc. DNS routes him to the nearest CDN ② edge in Tokyo.jlg8zpc → hit (the link is hot today) → returns 302 with the long URL. ~30ms.How do we actually produce a 6-character key? Two approaches; the PDF reference walks through both. We'll explain why production systems pick KGS.
Compute a hash (MD5 or SHA-256) of the long URL, base62-encode it, take the first 6 characters. Why 6? Base62 has 62 characters (A-Z, a-z, 0-9), so 62⁶ = 56.8 billion possible keys — comfortably more than the 30 billion we need over 5 years. (Base62 deliberately drops base64's - and _, which hurt readability and double-click-to-select in URLs.)
foo.com/page?id=design and foo.com/page%3Fid%3Ddesign are the same URL but hash differently — duplicate entries.Mitigations: append a sequence number or user_id to the URL before hashing (eliminates "same URL → same key"), but this still has the truncation-collision problem at write time.
Manufacture random 6-character keys offline, store them in a "unused" pool. Writes claim a key from the pool — guaranteed unique by construction, zero collision checks at request time.
POP from a Redis set, or a row-level SELECT FOR UPDATE on the unused tableSection 7 told you why we pick a Key Generation Service over inline hashing. This section opens the box and shows you how it runs in production — and it starts by killing the most common misconception. Picture the write app server in the middle of a 200-requests-per-second storm, and on each request it asks: "give me a key nobody has ever used." The naive answer — "generate a random string right now and check the database" — is exactly what a good KGS is built to never do. Let's see why, and then build up the real machinery one idea at a time.
The simplest "key service" generates a random 6-character string the moment a write arrives, then asks the database "is this taken?" before using it. Here's that flow:
Two things go wrong the instant real traffic and real scale show up:
Every shorten request now pays for at least one extra DB round trip just to ask "is this key free?" As the keyspace fills (30B of 56.8B keys used), collisions get more frequent, so some requests loop "generate → check → collision → regenerate → check" several times. The 60ms write budget blows up under collision-retry storms.
Two write servers generate the same key at the same millisecond. Both run "SELECT where hash=X" → both see "free" → both INSERT. One silently overwrites the other, and a user who just shortened a link finds it now points somewhere else. The existence check is not atomic with the insert, so it can never fully prevent this.
Both failures share one root cause: we're doing uniqueness work on the hot path, under concurrency. The fix is to move that work off the hot path entirely.
The KGS idea is to flip an expensive per-request problem into a cheap one-time background job. There are two production-grade ways to manufacture the keys, and then one crucial pattern for handing them out safely.
A background job generates random 6-char base62 keys using a cryptographically-secure RNG (SecureRandom), and inserts each into an unused_keys table with a UNIQUE constraint. Duplicate generations simply fail the insert and are skipped — and because this runs offline, that collision check costs nobody any request latency.
Pro: keys are genuinely unpredictable (unguessable). Con: you must store the pool of unused keys.
Keep a single monotonic counter and base62-encode the integer to get the key: 1 → "b", 125000000 → "8M0kX". Collisions are mathematically impossible — each integer maps to exactly one string — so no existence check ever runs.
Pro: zero storage (just an integer), zero collisions ever. Con: raw counters are sequential and guessable, so you run the counter through a reversible scramble (Feistel / multiply-by-large-coprime mod 62⁶) so output looks random while staying unique.
Now the part that makes KGS actually scale across many app servers: block (range) allocation. If every write server asked KGS for one key at a time, KGS would be hit 200 times a second and become the bottleneck — and you'd still need locking to stop two servers getting the same key. Instead, each app server claims a whole block of keys at once and serves them from local memory.
Think of it like a deli counter. KGS is the ticket dispenser. Instead of every customer (request) walking up to the dispenser one at a time, each cashier (app server) grabs a whole roll of 1,000,000 tickets up front and hands them out from their own roll. The dispenser is touched once per roll — roughly once per million URLs instead of once per URL. And because each cashier holds a different roll, two customers can never get the same ticket number. That disjoint-by-construction property is what eliminates both the collision check and the race condition from Pass 1.
Here's the full KGS internals. Every node is numbered — match it to the card below to see what it does and what would break without it.
The stateless service handling POST /api/v1/urls. It does not call KGS per request — it holds a block of keys (say 1,000) in local memory and pops one per create. When its local block runs low (e.g. 10% remaining), it asks the allocator for the next block in the background, so it never blocks a user while topping up.
Solves: turning a network call into a memory read. Without local blocks, every create is a synchronous round-trip to KGS — adding latency and making KGS a 200 req/sec choke point.
The only contended operation in the whole service: "give out the next block." It performs one atomic step — INCRBY counter 1000 in Redis, or SELECT … FOR UPDATE on a counter row in SQL — and returns the range [N … N+1000). Sub-millisecond, and it runs only once per ~1,000 creates.
Solves: the race condition from Pass 1. Because the bump is atomic and ranges are disjoint, two app servers can never receive overlapping keys — uniqueness is guaranteed by construction, with zero existence checks.
For the random-pool strategy (A), this batch job runs on its own clock — generate 10K random base62 keys, bulk-insert into unused_keys with a UNIQUE constraint, skip the rare duplicate. For the counter strategy (B), there's no pool to fill; this box is just the scramble function applied at hand-out time.
Solves: moving collision-checking off the hot path. All the "is this key taken?" work happens here, offline, where latency doesn't matter and a retry costs nobody anything.
KGS is a single point of failure — if it's the only thing that can mint keys, its death stops every create on the platform. So we run an active-standby pair: the standby watches the active via heartbeat and takes over the counter (which is persisted in ⑥) within seconds if the active dies.
Solves: availability of the write path. Combined with the app-server local block cache (①), even a 30-second failover window doesn't drop a single create.
The durable source of truth. For strategy B it's literally one integer (the high-water mark of allocated blocks); for strategy A it's two tables — unused_keys and used_keys. Persisting it means a KGS restart resumes from the last allocated block, never from the last used key.
Solves: not re-issuing keys after a crash. If the counter lived only in memory, a restart could replay block [1…1M] twice and hand the same keys to two different URLs — silent overwrites again.
When the Cleanup Service (§11) expires a URL, its key is freed and pushed back into the unused_keys pool (strategy A only — counter-based keys aren't recycled, the space is large enough not to bother).
Solves: the namespace never permanently fills. Without recycling, every expired link would burn a key forever; with it, the 56.8B keyspace effectively never runs dry.
This is what actually happens when a write server's local block is exhausted and it needs more keys, then serves a create:
Each key moves through a small state machine. The interesting edges are the failure ones — what happens when a server dies mid-flight.
LOST state — do NOT try to reclaim it. If an app server grabs block [5,000,000 … 5,001,000) and crashes after using only 300 keys, the remaining 700 are simply gone. That's fine: the keyspace is 56.8 billion, so leaking a few thousand keys per crash is a rounding error. Trying to detect and reclaim partially-used blocks reintroduces exactly the coordination and race problems KGS exists to avoid. Waste is cheaper than coordination.Strategy B (counter + base62), the simplest production-grade approach — no pool to store, collisions impossible:
KGS — counter + base62 (Java-ish pseudocode)class KeyGenerationService { // The ONE contended resource — bumped atomically, once per block. private final RedisClient redis; // or a SQL counter row static final long BLOCK = 1_000; static final String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; // 62 // Called by an app server ONLY when its local block runs low. long[] allocateBlock() { long end = redis.incrBy("url:counter", BLOCK); // atomic, returns new value return new long[]{ end - BLOCK, end }; // [start, end) — disjoint per caller } // Pure function — no DB, no collision check, runs in the app server's memory. String encode(long id) { id = scramble(id); // reversible mix so keys look random, stay unique StringBuilder sb = new StringBuilder(); do { sb.append(ALPHABET.charAt((int)(id % 62))); id /= 62; } while (id > 0); return sb.reverse().toString(); // e.g. 5_000_000 → "JpL2" } }
Notice what's absent: there is no SELECT … WHERE hash = ? anywhere on the create path. The app server pops the next id from its in-memory block, calls encode(id), and inserts. The only network call to KGS is the once-per-1,000 allocateBlock(), done in the background before the block empties.
| What fails | What happens | How we handle it |
|---|---|---|
| App server crashes mid-block | Unused keys in its block are lost | Do nothing — keyspace is huge, leak is negligible |
| KGS active node dies | No new blocks can be allocated | Standby fails over in seconds; app servers run on cached local block meanwhile |
| Counter not persisted before crash | Risk of re-issuing a block | Persist the counter durably; resume from last allocated value, never last used |
| Two app servers request a block at once | Could overlap | Impossible — the INCRBY / SELECT FOR UPDATE is atomic, ranges are disjoint |
| Random pool (strategy A) generates a dup | Insert fails | UNIQUE constraint rejects it offline; generator skips and continues |
15TB doesn't fit on one box, and even if it did we couldn't survive its failure. We must shard (split) the URL DB across many boxes and replicate each shard 3× for fault tolerance.
"All URLs whose hash starts with A go to shard 1, B to shard 2…" Easy to implement, predictable. But: hashes aren't uniformly distributed across the alphabet, so some shards get hot. And custom aliases like my-blog-post all start with M — that shard melts.
Compute shard = consistent_hash(url_hash) % N. Distribution is uniform by construction. Use consistent hashing (not modulo) so adding/removing a shard only relocates 1/N of keys instead of all of them — no global rebalancing during scale-out.
Each shard is replicated to 3 nodes across 3 availability zones. Cassandra's R=2, W=2, N=3 gives us: writes acknowledged by 2 of 3 replicas (durable even if 1 zone dies), reads from 2 replicas with conflict resolution (consistent within seconds even if a node lags). Multi-region: extra replicas in EU and APAC for global low-latency reads, eventually consistent.
hash % N, adding a shard means N→N+1, every key's home changes, and we have a multi-day data migration with the cluster degraded throughout. Consistent hashing makes adding a shard a few-hour reshuffle of 1/(N+1) of keys.The single biggest performance lever in the system. Without a cache, every redirect hits the DB. With a cache holding the hot 20%, the DB sees only 20% of read traffic.
170GB to hold the hot 20% (calculated in §3). Modern servers ship with 256GB+ RAM, but we use 4-6 cache nodes for redundancy and to spread the 20K QPS load. Memcached or Redis both work; Redis wins if we want to also cache click counts in sorted sets.
LRU (least recently used). Hot URLs stay hot — a viral tweet's link lives in cache for days. Cold URLs naturally roll out as fresh content takes their slot. Implemented via a linked hash map / Redis LRU policy.
URLs are immutable (you can't edit where a short link points), so we only invalidate on delete/expire. The cleanup service explicitly invalidates the cache key when it expires a URL. No write-through complexity needed.
Each cache node is replicated 2× — if one dies, we lose 1/N of the hot set briefly until it gets re-fetched from DB on next read (a brief latency bump, not an outage). On startup, cache nodes don't pre-warm; they fill organically as requests come in. The first few minutes after a deploy see slightly higher DB load, then steady state.
LBs sit at three layers in our system, and each plays a slightly different role.
Public-facing LB (AWS ALB / nginx). Distributes incoming HTTPS across read and write app pods. Health-checks every 5s, evicts unhealthy pods. Terminates TLS so backend pods don't pay the crypto cost.
Client-side or sidecar LB. Uses consistent hashing on the URL hash so the same hash always goes to the same cache node — maximizing hit rate.
Cassandra/DynamoDB drivers do this themselves — the client knows the cluster topology and routes to the right shard's coordinator node.
Start with Round Robin — simple, no overhead, no state. Upgrade to Least Connections once you see one pod's CPU consistently higher than others (usually means it caught a few long-running requests and never recovered). Smart LBs also do per-pod load reporting and shift traffic away from struggling nodes proactively.
URLs have expiration_date — what happens when it passes?
Don't actively scan for expired entries — wait until a user actually requests one. On read, the app server checks the expiration; if expired, returns 404 to the user and asynchronously deletes the row. Trade-off: expired-but-unread entries linger in the DB forever, but storage is cheap and we save a giant background scan.
A lightweight Cleanup Service ⑩ runs nightly during low-traffic windows, scanning for entries where expiration_date < now() and bulk-deleting them. Returns the freed keys to the KGS unused pool for recycling.
Two-year default expiry for free-tier URLs, configurable per request, with paid-tier URLs being permanent. Storage is cheap enough that we can also choose "never expire" as the default — most production shorteners do, because a dead link breaks user trust permanently.
Every redirect emits an async event. Naïvely doing UPDATE click_count = click_count + 1 on the URL row would melt that row for any viral link (10K writes/sec on one row = lock contention hell). Instead:
Counts that we care about (this URL got 1.2M clicks) live in Redis sorted sets — atomic ZINCRBY handles 100K writes/sec on one node. Detailed per-event data (geo, referrer, user_agent) flows to a data warehouse for ad-hoc analytics queries.
Add a visibility column: PUBLIC (anyone can resolve) or PRIVATE (only the creator + whitelisted users). On read, the app server checks the auth token against an ACL table; on mismatch, return 401 instead of 302.
Submitted URLs go through a real-time Google Safe Browsing check before being shortened. Rejected URLs return 400. Reactive: a "report abuse" link on every redirect page; reported URLs go into a moderation queue.
Per-API-key quota — 1000 URLs/hour for free tier, higher for paid. Implemented at the LB or in a dedicated rate-limiter service (token bucket in Redis). Anonymous creates require captcha.
KGS-generated keys are random 6-character base62, not sequential. An attacker can't enumerate /aaaaaa, /aaaaab, ... to discover private URLs — the keyspace is 56.8B, sparsely populated, so brute force is infeasible.