Scalability
📚 Overview
Scalability is the ability of a system to handle increased load by adding resources.
📈 Vertical vs Horizontal Scaling
Vertical Scaling (Scale Up)
Add resources to a single server:
| Aspect | Details |
|---|---|
| Pros | Simple, no code changes |
| Cons | Limited by hardware, single point of failure |
| Use Case | Small apps, MVP |
Horizontal Scaling (Scale Out)
Add more servers:
| Aspect | Details |
|---|---|
| Pros | Unlimited (theoretically), fault tolerance |
| Cons | Complex, needs load balancer |
| Use Case | Large apps, high availability |
🎯 Load Balancer
Distribute traffic across multiple servers.
Algorithms
| Algorithm | Description | Use Case |
|---|---|---|
| Round Robin | Sequential | Similar server capacity |
| Least Connections | Server with fewest connections | Varying request duration |
| IP Hash | Same IP → Same server | Session persistence |
| Weighted | Proportional to capacity | Different server specs |
Health Checks
// Liveness probe
GET /health/live
→ 200 OK if server is running
// Readiness probe
GET /health/ready
→ 200 OK if server can handle requests
// Example
{
"status": "healthy",
"checks": {
"database": "ok",
"redis": "ok",
"external_api": "degraded"
}
}
💾 Database Scaling
Read Replicas
// Write to master
await masterDb.query('INSERT INTO users ...');
// Read from replicas
await replicaDb.query('SELECT * FROM users ...');
Sharding
Sharding Strategies:
| Strategy | Pros | Cons |
|---|---|---|
| Hash-based | Even distribution | Rebalancing hard |
| Range-based | Range queries easy | Hot spots |
| Directory-based | Flexible | Extra lookup |
🗄️ Caching
Caching Strategies
Cache Aside (Lazy Loading):
async function getUser(id) {
let user = await cache.get(`user:${id}`);
if (!user) {
user = await db.getUser(id);
await cache.set(`user:${id}`, user, 3600);
}
return user;
}
Write Through:
async function updateUser(id, data) {
await db.updateUser(id, data);
await cache.set(`user:${id}`, data);
}
Write Back:
async function updateUser(id, data) {
await cache.set(`user:${id}`, data);
// Async write to DB
queue.push({ type: 'update', id, data });
}
Cache Invalidation
| Strategy | Description |
|---|---|
| TTL | Time-based expiration |
| LRU | Least Recently Used eviction |
| Write-through | Update cache on write |
| Write-back | Async write to DB |
📦 CDNs (Content Delivery Networks)
Distribute static content globally.
Use cases:
- Static assets (images, CSS, JS)
- Video streaming
- API responses (cacheable)
🔄 Asynchronous Processing
Message Queues
Use cases:
- Email sending
- Image processing
- Data export
- Notifications
Pub/Sub
Use cases:
- Real-time updates
- Event-driven architecture
- Microservices communication
📊 Monitoring & Metrics
Key Metrics
// RED Method
Rate: Requests per second
Errors: Failed requests
Duration: Response time
// USE Method
Utilization: % of capacity used
Saturation: How overloaded
Errors: Error rate
Alerting
// Alert rules
- Error rate > 1% for 5 min
- P95 latency > 500ms for 5 min
- CPU > 80% for 10 min
- Memory > 90% for 5 min
🚀 Microservices vs Monolith
| Aspect | Monolith | Microservices |
|---|---|---|
| Development | Simple | Complex |
| Deployment | All or nothing | Independent |
| Scaling | Scale entire app | Scale specific service |
| Communication | In-process | Network |
| Data | Single database | Per-service DB |
When to split:
- Different scalability needs
- Different teams
- Different deployment cycles
- Isolated failure domains
❓ Interview Questions
Easy
- Design a URL shortener
- Design a rate limiter
- Design a unique ID generator
Medium
- Design a chat system
- Design a news feed
- Design a file storage system
Hard
- Design YouTube
- Design Google Search
- Design a distributed database