URL Shortener — System Design from First Principles

Step I·Clarify Scope

Start by shrinking the problem.

Interviewers leave scope vague on purpose. Three concrete answers — boundary, ratio, consistency — pre-select which walls will dominate. Say no early. Say no often.

Q · Functional Boundary

What are we actually building?

Two actions. Create a short code from a long URL. Redirect a short code back to its long URL. No analytics, no auth, no expiry, no custom aliases for v1. Anything else is a feature, not the system.

Q · Read / Write Ratio

Which direction does the traffic flow?

Brutally read-heavy. A hundred reads for every write is conservative — viral links push it higher. Treat this as a read-optimization problem wearing a write's clothing: the create path can be slow; the redirect path cannot.

Q · Consistency Need

Must a user see their own write immediately?

Eventual is fine. If a just-created link takes fifty milliseconds to propagate to every region, no human notices. We will spend that latitude — generously — on the read path.

Why this matters

A system built without first collapsing scope optimizes for every imaginable workload — which means optimizing for none. Every "what if" you entertain in step 1 is a constraint you carry through steps 2–5.

The senior move

After answering these three, restate the problem in one sentence. "This is a read-heavy, eventually-consistent system whose main job is to redirect short codes to long URLs." If you can't say that sentence, you haven't scoped it yet.

Step II·Establish Scale

Numbers, not adjectives.

A 100 RPS system and a 1M RPS system are not the same system made bigger. They are different designs. Concrete numbers tell you which axis hits its wall first.

User base

500 M URLs created / month · 6 B redirects / day

Write rate

200 creates / sec sustained · 1 K peak

Read rate

70 K redirects / sec sustained · 300 K peak

Storage

500 M × 500 B × 12 mo ≈ 3 TB / year

Latency target

< 50 ms p99 redirect (TTFB)

// the arithmetic that shapes the design

Reads per write
→ ~ 100 : 1

Working set
→ 3 TB

Per-request budget @ 50 ms p99
→ app + lookup + 301 ≤ 50 ms

If lookup hit SSD instead of RAM
→ +100 µs / req, fine in isolation, fragile under load.

The number to remember

Writes are a rounding error. The whole system is a redirect cache with a small write-attached database hanging off the side.

III

Step III·Identify the Walls

Which physical wall are we hitting first?

A URL Shortener is, at its core, a single read path: look up a shortcode, return a URL. At seventy thousand lookups per second, the question is not how fast you can compute — it is where the answer lives.

Storage
Hierarchy

Compute
& I/O

Time &
Ordering

Coordination
Cost

Storage Hierarchy dominates. Three terabytes of index must be reachable inside a fifty-millisecond budget. RAM at a hundred nanoseconds is air. SSD at a hundred microseconds is breathable, but only if you shard wide and keep the request path short. The primary wall is not chosen — it is revealed.

Compute & I/O sits second, not first: 70K QPS is sustained but not exotic. Stateless edge nodes scale linearly. The system's CPU does almost nothing except hand the request a pointer.

Coordination Cost registers nearly zero — no node need consult another per redirect. Time & Ordering matters only on writes, where short codes must be unique without round-tripping a central registry.

Step IV·Map Patterns

Each axis unlocks its own patterns.

Three patterns carry the system. Each is a workaround for a specific wall. Together, they make seventy thousand redirects per second feel like nothing.

Pattern 01 Wall I · Storage Hierarchy

KV Store
in RAM

Imagine a room where every address in the city is printed on a card — and the cards are kept not in a filing cabinet but held in your hands, fanned open, instantly readable. No drawer to open. No index to consult. The answer is already there.

The full mapping — shortcode to long URL, roughly three terabytes at five hundred million pairs — lives in a sharded Redis cluster. Not on SSD. Not on a spinning disk. In RAM. The working set is the whole set; there are no cache misses, because there is no slower tier to miss into. On a create, the mapping is written to a durable database and simultaneously warmed into the cache — so the very next redirect finds it without hesitation.

Wall I sets the price list. L1 cache: one nanosecond. RAM: one hundred nanoseconds. SSD: one hundred microseconds. At seventy thousand redirects per second across a sharded fleet, SSD is workable but brittle — every contended block, every garbage-collection pause, every tail-latency event arrives at the user. RAM erases all three. The storage hierarchy is not a scale. It is a cliff. The entire design is a decision about which ledge to stand on.

Pattern 02 Wall II · Compute & I/O

HTTP 301 & the edge cache

The best way to answer a question is to teach the asker — once — so they never ask again. A great reference librarian does not fetch the same book for the same patron twice. She shows him where it lives. After that, he goes himself.

The redirect service returns HTTP 301 — Permanent Redirect — with a long Cache-Control: max-age. The browser caches the destination. The user's ISP caches it. Corporate proxies cache it. Popular shortcodes are pinned at a CDN edge. Every subsequent visit from the same client, to the same code, never reaches the origin. The server has delegated its work to the edge of the network — permanently, at zero ongoing cost.

Wall II asks: how many operations per second can the fleet sustain? The 301 is the most elegant answer to that constraint — push the work to the client, to the ISP, to infrastructure you do not operate. At a hundred reads per write, making reads disappear is the only move that scales without a ceiling.

This is the production tradeoff worth naming: 301 is cacheable for the lifetime of the cache. If the URL is later disabled — abuse, takedown, or the user simply edits the destination — the browser's cached 301 will still resolve to the old target. Many real-world shorteners therefore use 302 Found instead, accepting more origin traffic for the right to revoke. The choice is a thermostat, not a switch: short max-age + 301, or 302 with origin-side hot cache.

Pattern 03 Wall IV · Time & Ordering

Counter, then base 62

A ticket machine with a number that only goes forward. No two tickets share a number. No one must ask what the last ticket was — the machine knows, because the machine is the guarantee. The counter and the identity are the same thing.

Hand each new link the next whole number from a counter and uniqueness comes for free: a number that only climbs can never repeat. The trouble is length — the billionth link would carry a ten-digit number, and we promised a short code. So we spell the same number in a denser alphabet. Everyday counting uses ten symbols, the digits zero through nine; here we use sixty-two — those ten digits plus both cases of the alphabet, the characters that ride through a URL untouched. More symbols in each slot pack more numbers into fewer slots, and that is all base 62 is: the same count, written compactly. Seven slots hold ~3.5 trillion codes; six hold ~57 billion. At 500 M new links a month, seven characters are forever.

But who owns the counter? Put it on one machine and every new link must stop to ask it for the next number — that machine becomes the single part whose failure halts everything (a single point of failure), and a queue everyone waits in. So we ask it less often. Each web server takes out a lease — numbers 3,000 through 3,999 are mine — and hands those out from its own memory with a local + 1, touching the shared counter not once per link but once per thousand. Consulted that rarely, the counter can afford to be careful: it is kept on a handful of machines that must agree on the last block handed out before granting the next, so one machine dying never forgets a number or hands the same block to two servers. That agreement procedure has a name — Raft — but the idea beneath it is plain: several copies that are never permitted to disagree.

Wall IV is the wall of ordering and uniqueness, and its law here is blunt: to promise that no two codes ever collide, something must be agreed upon. The lease agrees on a number. A second road agrees on time instead — each server builds an ID by gluing together the current millisecond, its own machine number, and a small counter for IDs minted inside that same millisecond. No two servers share a machine number, so the IDs cannot collide, and no server ever asks anyone's permission. Twitter named this Snowflake. Its weakness is the clock it leans on: servers keep their clocks aligned over the network through a service called NTP, and should two clocks drift apart — or one tick backward — a server may re-stamp a millisecond it already used and mint a duplicate that nothing catches. A number you must ask for, or a clock you must trust: Wall IV lets you escape one, never both.

Step V·State the Tradeoffs

Every choice has a cost.

The patterns above are not free. Each one trades one property for another. Saying the trade out loud is the difference between a design and a decision.

RAM cost vs. Read latency

Keeping the full index in memory costs roughly $300 / TB / month in a cloud Redis tier; SSD is closer to $20. At three terabytes that's about $10K / year extra. Trivial against the latency win — and trivial against the engineering cost of building a tiered cache that gets the same p99 under load.

Cache freshness vs. Redirect speed

Edge and browser caching means a deleted or repointed URL may still resolve for up to its TTL. For abuse-takedown and analytics this matters. The lever is the max-age header — short TTLs (minutes) keep takedowns fast at the cost of more origin traffic; long TTLs (days) maximize cache hit rate at the cost of revocation latency. A common compromise: a short edge TTL + an origin override path for abuse cases.

A counter you ask vs. A clock you trust

Wall IV won't hand over uniqueness for free — something must be coordinated. Choosing the counter means accepting a sliver of coordination: one lease covers a thousand links, so the price is a single quick request per thousand, plus a few numbers discarded when a server restarts mid-block — harmless, since codes must be unique, not consecutive. Choosing the clock — the Snowflake build of timestamp + machine number + counter — means accepting that correctness now rides on every clock staying in step; let two drift into the same millisecond and you mint a duplicate no one sees. Most teams take the counter: skipped numbers are cheap, silent collisions are not. The clock wins only when even one request per thousand is one too many.

System № 01 · Thesis Landing

And so URL Shortener is,
at the bottom, a particular treaty
negotiated against
Storage Hierarchy.

Given our primary constraint is Storage Hierarchy — a 3 TB index serving 70K QPS reads under a 50 ms budget — we keep the full mapping in a sharded Redis cluster fronted by a CDN. The cost is RAM spend; the win is hundred-nanosecond lookups at infinite horizontal scale. Writes use a leased-block counter to avoid coordination on every create, and a 301 redirect with a tuned max-age pushes the read path to the edge. At a hundred-to-one read/write ratio, consistency is eventual — and nobody, in the time it takes to resolve a redirect, ever notices.

❦ · ❦ · ❦