In the era of social media and character limits, URL shorteners like Bitly and TinyURL have become essential infrastructure. While the core functionality—mapping a long URL to a short one—seems simple, building a system that can handle billions of requests with millisecond latency and 99.99% availability is a classic system design challenge.

In this post, we will walk through the architectural blueprint of a scalable, resilient URL shortener.

1. Requirements and Goals

Before diving into the architecture, we must define our constraints.

Functional Requirements

  • Shortening: Given a long URL, the system should return a shorter, unique alias.
  • Redirection: When a user clicks a short link, the system should redirect them to the original URL with minimal latency.
  • Custom Aliases: Users should optionally be able to provide a custom short link.
  • Analytics: Track metrics like click counts and geographical data.

Non-Functional Requirements

  • High Availability: The system must be up 24/7; a broken redirect is a broken user experience.
  • Low Latency: Redirection should happen in under 100ms.
  • Scalability: The system should handle millions of new mappings and billions of redirections per day.

2. Capacity Estimation

Let’s assume we handle 100 million new URLs per month.

  • Total writes: 100 million / 30 days ≈ 3.3 million per day ≈ 40 per second.
  • Read/Write Ratio: Usually 100:1. If we have 100M writes, we expect 10B reads per month.
  • Read Queries Per Second (QPS): 10B / (30 * 24 * 3600) ≈ 4,000 QPS.

Storage Needs: If we store each mapping for 10 years, and each record is ~500 bytes: 100M * 12 months * 10 years * 500 bytes ≈ 6 Terabytes.

3. High-Level Design

The system consists of two primary workflows:

  1. The Write Path: User provides a Long URL -> System generates a Short Key -> Stores in DB.
  2. The Read Path: User hits short.ly/xyz -> System looks up xyz -> Returns 301 Redirect.

The API Layer

We need two main REST endpoints:

  • POST /v1/shorten: Takes a JSON body { "longUrl": "..." } and returns the short URL.
  • GET /{shortKey}: Redirects the user.

4. The Shortening Logic: Key Generation

The heart of the system is the Key Generation Service (KGS). To ensure uniqueness and scalability, we avoid generating hashes (like MD5) at runtime, which can lead to collisions.

Using Base62 Encoding

We can use an incremental counter (ID) and convert it to Base62 [a-z, A-Z, 0-9].

  • A 7-character string in Base62 provides $62^7 \approx 3.5$ trillion unique combinations, which is more than enough for our 10-year estimate.

Distributed Counter (Snowflake or Zookeeper)

In a distributed environment, multiple servers cannot simply increment a local variable. We use a Key Generation Service that pre-allocates ranges of IDs.

  1. A central coordinator (like Apache Zookeeper) maintains a range of IDs (e.g., 1 to 1,000,000).
  2. Individual App Servers request a “buffer” of IDs from the KGS.
  3. When a server exhausts its buffer, it requests a new one. This prevents collisions without requiring a DB hit for every single write.

5. Database and Caching

Data Model

Since our data is highly relational (a simple mapping) but needs to scale horizontally, a NoSQL Key-Value store like Amazon DynamoDB or Cassandra is ideal.

ColumnType
short_key (Partition Key)String
original_urlString
created_atTimestamp
expirationTimestamp

Caching Strategy

Redirection is read-heavy. We should use a Redis or Memcached layer to store the most frequently accessed URLs.

  • Eviction Policy: Least Recently Used (LRU) is most effective here.
  • Flow: When a request comes in, check Redis. If it’s a “cache miss,” query the DB and update Redis.

6. Resilience and Scalability

To ensure the system is resilient, we implement the following:

  • Load Balancing: Use a Layer 7 Load Balancer (like Nginx or AWS ALB) to distribute traffic across multiple application nodes.
  • Database Sharding: If using a SQL database, shard by the short_key to distribute the load.
  • Rate Limiting: Protect the shorten endpoint from abuse (DDoS or scraping) using a Token Bucket algorithm.
  • Geographical Distribution: Use a Content Delivery Network (CDN) or deploy the Read Path in multiple regions to reduce latency for global users.

7. Handling Redirection: 301 vs 302

  • 301 (Permanent Redirect): Browsers cache this. It reduces load on our servers but makes it harder to collect accurate analytics for every single click.
  • 302 (Temporary Redirect): Browsers do not cache this. Every click hits our server, allowing for precise analytics, though it increases server load.

For a commercial URL shortener, 302 is often preferred to ensure analytics data is captured for every interaction.

Conclusion

Building a production-ready URL shortener is a masterclass in balancing simplicity with distributed systems principles. By decoupling key generation from the write path, utilizing a high-performance NoSQL database, and layering in Redis for caching, we can create a system that handles billions of requests with ease.

The most critical takeaway is the Key Generation Service—by solving the ID generation problem upfront, you eliminate the most common bottleneck in distributed systems: resource contention.

Resources

  • Designing Data-Intensive Applications by Martin Kleppmann
  • System Design Interview by Alex Xu
  • High Scalability Blog (architectural case studies)