Design a Rate Limiter Algorithms & Implementation

Zaheer Ahmad Apr 15, 2026 5 min read min read

Python

Design a Rate Limiter Algorithms & Implementation

Introduction

Designing a rate limiter system is a fundamental concept in modern backend engineering. A rate limiter controls how many requests a user or system can make within a given time period. Without it, your API or application can easily be overwhelmed by too many requests—whether accidental or malicious.

For Pakistani students learning system design, understanding rate limiting algorithms like the token bucket algorithm is especially valuable. Many real-world platforms in Pakistan—such as e-commerce sites in Lahore, ride-hailing apps in Karachi, or fintech apps in Islamabad—depend heavily on rate limiting to ensure fairness, performance, and security.

Imagine Ahmad building an API for a startup in Islamabad. Without rate limiting, a single user (or bot) could flood the server, causing downtime. With a proper rate limiter, Ahmad can ensure every user gets a fair share of resources.

Prerequisites

Before diving into this tutorial, you should be comfortable with:

Basic programming (preferably Python or JavaScript)
Understanding of APIs and HTTP requests
Basic data structures (queues, counters)
Familiarity with caching tools like Redis (helpful but optional)
Basic system design concepts

Core Concepts & Explanation

Rate Limiting Basics

A rate limiter restricts how many requests a user can make in a defined time window.

Example:

Limit: 100 requests per minute
If Fatima sends 120 requests → last 20 are blocked

Why it's needed:

Prevent abuse (DDoS attacks)
Ensure fair usage
Protect backend services
Control infrastructure costs (important for startups using cloud in PKR budgets)

Fixed Window Algorithm

This is the simplest approach.

How it works:

Count requests in a fixed time window (e.g., 1 minute)
Reset counter after window ends

Example:

Ali makes 100 requests in 59 seconds → allowed
Makes 100 more at second 61 → allowed again (problem!)

Issue:

Burst traffic at boundaries

Sliding Window Algorithm

Improves fixed window limitations.

How it works:

Tracks requests over a rolling time window
More accurate than fixed window

Example:

If limit is 100/min, it checks last 60 seconds continuously

Token Bucket Algorithm

This is one of the most popular rate limiting algorithms.

How it works:

Tokens are added to a bucket at a fixed rate
Each request consumes one token
If no tokens → request denied

Example:

Bucket size: 10 tokens
Refill rate: 1 token/sec
Ahmad sends 10 requests instantly → allowed
Sends 11th → blocked until refill

Advantages:

Allows bursts
Smooth traffic control

Leaky Bucket Algorithm

Similar to token bucket but processes requests at a fixed rate.

Difference:

Token bucket allows bursts
Leaky bucket smooths output strictly

Practical Code Examples

Example 1: Token Bucket Implementation (Python)

import time

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_refill = time.time()

    def allow_request(self):
        current_time = time.time()
        elapsed = current_time - self.last_refill

        # Add tokens based on elapsed time
        refill_tokens = elapsed * self.refill_rate
        self.tokens = min(self.capacity, self.tokens + refill_tokens)

        self.last_refill = current_time

        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

Line-by-line explanation:

import time → Used to track time intervals
class TokenBucket: → Defines rate limiter class
__init__ → Initializes bucket size and refill rate
self.tokens = capacity → Start with full bucket
last_refill → Tracks last refill timestamp
allow_request() → Main function to check request
elapsed = current_time - last_refill → Calculate time passed
refill_tokens = elapsed * refill_rate → Add tokens
min(capacity, ...) → Prevent overflow
if tokens >= 1 → Allow request
tokens -= 1 → Consume token

Example 2: Real-World API Rate Limiter (Node.js + Express)

const express = require('express');
const app = express();

let requests = {};

const RATE_LIMIT = 5;
const WINDOW_SIZE = 60 * 1000; // 1 minute

app.use((req, res, next) => {
    const ip = req.ip;
    const currentTime = Date.now();

    if (!requests[ip]) {
        requests[ip] = [];
    }

    // Filter old requests
    requests[ip] = requests[ip].filter(timestamp => {
        return currentTime - timestamp < WINDOW_SIZE;
    });

    if (requests[ip].length >= RATE_LIMIT) {
        return res.status(429).send("Too Many Requests");
    }

    requests[ip].push(currentTime);
    next();
});

app.get('/', (req, res) => {
    res.send("Welcome to Pakistani API!");
});

app.listen(3000, () => {
    console.log("Server running on port 3000");
});

Line-by-line explanation:

express() → Creates web server
requests = {} → Stores IP-based request logs
RATE_LIMIT = 5 → Max 5 requests per minute
WINDOW_SIZE → Time window in milliseconds
Middleware function intercepts all requests
req.ip → Identify user
Initialize request array if new user
filter() → Remove old timestamps
if length >= limit → Block request
res.status(429) → Send HTTP error
push(currentTime) → Record new request

Common Mistakes & How to Avoid Them

Mistake 1: Ignoring Distributed Systems

If your app runs on multiple servers (common in cloud setups), local rate limiting won't work.

Problem:

Each server tracks separately → limits bypassed

Solution:

Use Redis or centralized storage

# Pseudo Redis example
INCR user:123
EXPIRE user:123 60

Mistake 2: Not Handling Burst Traffic

Using only fixed window can cause spikes.

Problem:

Users send many requests at window boundary

Solution:

Use token bucket or sliding window

Practice Exercises

Exercise 1: Build a Fixed Window Limiter

Problem:
Create a limiter allowing 10 requests per minute.

Solution:

import time

requests = {}

def allow_request(user):
    now = int(time.time() / 60)

    if user not in requests:
        requests[user] = {}

    if now not in requests[user]:
        requests[user] = {now: 0}

    if requests[user][now] < 10:
        requests[user][now] += 1
        return True
    return False

Explanation:

Groups requests by minute
Tracks count per user
Resets automatically every minute

Exercise 2: Token Bucket Simulation

Problem:
Allow burst traffic of 5 requests with refill rate of 1/sec.

Solution:

bucket = TokenBucket(5, 1)

for i in range(10):
    if bucket.allow_request():
        print("Request allowed")
    else:
        print("Blocked")
    time.sleep(0.5)

Explanation:

Creates bucket with capacity 5
Sends 10 requests
Sleeps 0.5 sec between requests
Demonstrates refill behavior

Frequently Asked Questions

What is rate limiting in system design?

Rate limiting is a technique used to control how many requests a user or client can send to a server within a specific time period. It helps prevent abuse, ensures fairness, and protects system resources.

How do I choose the best rate limiting algorithm?

It depends on your use case. For simple APIs, fixed window works. For more accurate control, sliding window is better. For real-world systems with burst traffic, the token bucket algorithm is preferred.

Can I implement rate limiting without Redis?

Yes, but only for small-scale systems. In production systems (like apps in Karachi or Lahore), Redis is recommended for distributed consistency.

What is HTTP 429 error?

HTTP 429 means "Too Many Requests". It is returned when a user exceeds the allowed request limit.

How do I handle rate-limited users?

You can send a response with a Retry-After header, informing the user when they can retry. This improves user experience and prevents frustration.

Summary & Key Takeaways

Rate limiting is essential for scalable and secure systems
Fixed window is simple but has edge-case issues
Sliding window provides more accuracy
Token bucket algorithm is ideal for real-world applications
Distributed systems require centralized storage like Redis
Proper rate limiting improves performance and reduces costs

To deepen your system design skills, explore these tutorials on theiqra.edu.pk:

Learn how to prepare for a System Design Interview with real-world scenarios
Master caching with a complete Redis Tutorial
Build scalable APIs with a backend architecture guide
Understand load balancing and microservices design

These topics will help you become a strong backend engineer ready for opportunities in Pakistan’s growing tech industry 🚀

Practice the code examples from this tutorial

Open Compiler

Python

Test Your Python Knowledge!

Finished reading? Take a quick quiz to see how much you've learned from this tutorial.

Start Python Quiz

Previous Next

Introduction

Prerequisites

Core Concepts & Explanation

Rate Limiting Basics

Fixed Window Algorithm

Sliding Window Algorithm

Token Bucket Algorithm

Leaky Bucket Algorithm

Practical Code Examples

Example 1: Token Bucket Implementation (Python)

Example 2: Real-World API Rate Limiter (Node.js + Express)

Common Mistakes & How to Avoid Them

Mistake 1: Ignoring Distributed Systems

Mistake 2: Not Handling Burst Traffic

Practice Exercises

Exercise 1: Build a Fixed Window Limiter

Exercise 2: Token Bucket Simulation

Frequently Asked Questions

What is rate limiting in system design?

How do I choose the best rate limiting algorithm?

Can I implement rate limiting without Redis?

What is HTTP 429 error?

How do I handle rate-limited users?

Summary & Key Takeaways

Next Steps & Related Tutorials

Test Your Python Knowledge!

About Zaheer Ahmad