Sharding vs Replication vs Partitioning
A comprehensive guide to database scaling strategies with practical examples and decision frameworks.
Overviewโ
These three strategies solve different problems in distributed systems:
- Replication: Copying the same data across multiple servers
- Partitioning: Splitting data logically within a database
- Sharding: Distributing data across multiple physical databases
๐ Replicationโ
What is it?โ
Replication creates multiple copies of the same dataset across different servers. Each replica contains the full dataset (or a subset in partial replication).
Architecture Patternsโ
Master-Slave Replicationโ
Master-Master Replicationโ
Key Characteristicsโ
| Aspect | Description |
|---|---|
| Purpose | High availability & read scalability |
| Data Distribution | Full copy on each server |
| Consistency | Eventual or strong (depending on sync method) |
| Failure Impact | Redundancy protects against server failure |
Replication Strategiesโ
Synchronous Replication
- Write confirmed only after all replicas acknowledge
- Strong consistency
- Higher latency
- Lower availability (all replicas must be online)
Asynchronous Replication
- Write confirmed immediately
- Eventual consistency
- Lower latency
- Higher availability
- Risk of data loss on master failure
Use Casesโ
โ When to use Replication:
- High read-to-write ratio (95% reads)
- Need for high availability and fault tolerance
- Geographic distribution for low latency
- Analytics workloads on read replicas
- Disaster recovery requirements
โ When NOT to use:
- Write-heavy workloads (all replicas must handle writes)
- Storage constraints (full copy per replica)
- Complex conflict resolution required
Example Scenarioโ
E-commerce Product Catalog
Master DB: Handles product updates (rare)
Replica 1 (US East): Serves customer browsing
Replica 2 (US West): Serves customer browsing
Replica 3 (Europe): Serves customer browsing
๐ Partitioningโ
What is it?โ
Partitioning divides a large table into smaller pieces within the same database instance. Each partition contains a subset of rows but shares the same schema.
Partitioning Typesโ
Horizontal Partitioning (Row-based)โ
Vertical Partitioning (Column-based)โ
Partitioning Strategiesโ
Range Partitioning
-- By date
Partition P1: created_at BETWEEN '2023-01-01' AND '2023-12-31'
Partition P2: created_at BETWEEN '2024-01-01' AND '2024-12-31'
-- By ID
Partition P1: user_id BETWEEN 1 AND 1000000
Partition P2: user_id BETWEEN 1000001 AND 2000000
Hash Partitioning
-- Distribute evenly using hash function
Partition P1: HASH(user_id) % 4 = 0
Partition P2: HASH(user_id) % 4 = 1
Partition P3: HASH(user_id) % 4 = 2
Partition P4: HASH(user_id) % 4 = 3
List Partitioning
-- By discrete values
Partition P1: region IN ('US', 'Canada')
Partition P2: region IN ('UK', 'Germany', 'France')
Partition P3: region IN ('India', 'Singapore')
Key Characteristicsโ
| Aspect | Description |
|---|---|
| Purpose | Query performance & manageability |
| Scope | Single database instance |
| Transparency | Often transparent to application |
| Scalability | Limited by single server resources |
Use Casesโ
โ When to use Partitioning:
- Large tables slowing down queries
- Time-series data with predictable access patterns
- Easy data archival/deletion (drop old partitions)
- Maintenance operations on specific data ranges
- Query patterns that filter on partition key
โ When NOT to use:
- Small tables (< 1GB)
- Queries don't align with partition key
- Need to scale beyond single server
- Uniform access across all data
Example Scenarioโ
Analytics Platform with Time-Series Data
logs_2024_01: January data (can archive after 90 days)
logs_2024_02: February data
logs_2024_03: March data (hot data, frequently queried)
logs_2024_04: April data (hot data, frequently queried)
๐๏ธ Shardingโ
What is it?โ
Sharding distributes data across multiple independent database servers (shards). Each shard contains a unique subset of the data and operates autonomously.
Sharding Architectureโ
Sharding with Replicationโ
Sharding Strategiesโ
Range-Based Sharding
Shard 1: user_id 1 to 1,000,000
Shard 2: user_id 1,000,001 to 2,000,000
Shard 3: user_id 2,000,001 to 3,000,000
Hash-Based Sharding
Shard = HASH(user_id) % number_of_shards
user_id 12345 โ HASH โ 2 โ Shard 2
user_id 67890 โ HASH โ 1 โ Shard 1
Geographic Sharding
Shard US: users where region = 'US'
Shard EU: users where region = 'EU'
Shard ASIA: users where region = 'ASIA'
Entity/Directory-Based Sharding
Shard mapping stored in lookup table:
tenant_id: 1001 โ Shard 1
tenant_id: 1002 โ Shard 3
tenant_id: 1003 โ Shard 1
Key Characteristicsโ
| Aspect | Description |
|---|---|
| Purpose | Horizontal scalability for writes & storage |
| Scope | Multiple independent database servers |
| Complexity | High (routing, joins, transactions) |
| Scalability | Nearly unlimited (add more shards) |
Challengesโ
Cross-Shard Queries
-- This query spans multiple shards
SELECT COUNT(*) FROM users WHERE created_at > '2024-01-01'
-- Must query all shards and aggregate results
Cross-Shard Joins
-- Users on Shard 1, Orders on Shard 2
SELECT users.name, orders.total
FROM users
JOIN orders ON users.id = orders.user_id
-- Very expensive or impossible
Distributed Transactions
- Maintaining ACID across shards is complex
- Often requires 2-phase commit
- Can use eventual consistency instead
Rebalancing
- Adding/removing shards requires data migration
- Can cause downtime or performance issues
- Hash-based sharding makes this harder
Use Casesโ
โ When to use Sharding:
- Massive data volume (TBs/PBs)
- High write throughput requirements
- Single database can't handle load
- Multi-tenant applications (shard by tenant)
- Data sovereignty requirements (geographic sharding)
โ When NOT to use:
- Can scale with replication/partitioning
- Frequent cross-shard queries needed
- Complex transactions across entities
- Small team without devops expertise
Example Scenarioโ
Social Media Platform
Shard 1: Users with last_name A-F (10M users)
Shard 2: Users with last_name G-M (10M users)
Shard 3: Users with last_name N-S (10M users)
Shard 4: Users with last_name T-Z (10M users)
Each shard has its own master + 2 replicas
๐ฏ Comparison Matrixโ
| Feature | Replication | Partitioning | Sharding |
|---|---|---|---|
| Data Distribution | Full copy per node | Subset per partition | Subset per shard |
| Scalability | Read scalability | Single server limit | Unlimited horizontal |
| Write Performance | No improvement | Improved for specific queries | Linear improvement |
| Read Performance | Linear improvement | Improved for partition-aware queries | Improved |
| Complexity | Low-Medium | Low | High |
| Storage Cost | High (full copies) | Same as original | Distributed |
| Query Complexity | Simple | Simple | Complex (cross-shard) |
| Failure Impact | Other replicas available | Single point of failure | Only affected shard |
| Setup Difficulty | Easy | Easy | Difficult |
| Maintenance | Medium | Easy | Difficult |
๐จ Combined Strategiesโ
Real-world systems often combine these approaches:
Example: E-commerce Platformโ
Strategy:
- Sharding by geographic region (data locality)
- Replication within each shard (high availability + read scaling)
- Partitioning by year (query performance + easy archival)
๐ฆ Decision Treeโ
๐ Quick Decision Guideโ
Start with Replication if:โ
- ๐ฏ Your reads outnumber writes 10:1 or more
- ๐ฏ You need high availability
- ๐ฏ You want geographic distribution
- ๐ฏ Your data fits comfortably on one server
- ๐ฏ You're starting out and need simple scaling
Add Partitioning if:โ
- ๐ฏ Single tables are becoming very large (>100GB)
- ๐ฏ Queries have predictable access patterns (time-based, range-based)
- ๐ฏ You need to archive/delete old data regularly
- ๐ฏ Query performance is degrading despite indexes
- ๐ฏ You're still on a single database server
Move to Sharding when:โ
- ๐ฏ You've exhausted vertical scaling (bigger servers)
- ๐ฏ Replication doesn't help (write bottleneck)
- ๐ฏ Data exceeds single server capacity (multi-TB)
- ๐ฏ You need linear write scalability
- ๐ฏ You have multi-tenant architecture
- ๐ฏ Data sovereignty requires geographic separation
๐ก Best Practicesโ
Replicationโ
- Monitor replication lag
- Use read replicas for analytics
- Implement retry logic for failover
- Consider async for performance, sync for consistency
Partitioningโ
- Choose partition key carefully (used in WHERE clauses)
- Keep partitions relatively equal in size
- Plan for growth (add partitions in advance)
- Use partition pruning in queries
Shardingโ
- Choose a stable shard key (won't change)
- Distribute data evenly (avoid hot shards)
- Plan for rebalancing from day one
- Minimize cross-shard operations
- Use consistent hashing for dynamic sharding
- Keep shard mapping external (not hardcoded)
๐ง Implementation Examplesโ
Replication Setup (PostgreSQL)โ
-- On Master
CREATE PUBLICATION my_publication FOR ALL TABLES;
-- On Replica
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=master_host dbname=mydb user=repl_user'
PUBLICATION my_publication;
Partitioning Setup (PostgreSQL)โ
-- Create partitioned table
CREATE TABLE orders (
order_id BIGSERIAL,
created_at TIMESTAMP,
customer_id INTEGER,
total DECIMAL
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE orders_2023 PARTITION OF orders
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
Sharding Logic (Application Level)โ
def get_shard(user_id, num_shards=4):
"""Hash-based sharding"""
shard_id = hash(user_id) % num_shards
return f"shard_{shard_id}"
def get_connection(user_id):
"""Get database connection for user"""
shard = get_shard(user_id)
return connection_pool[shard]
# Usage
user_id = 12345
db = get_connection(user_id)
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
๐ Summaryโ
Replication = Same data, multiple places โ High availability & read scaling
Partitioning = Split data logically, same server โ Query performance & manageability
Sharding = Split data physically, multiple servers โ Unlimited horizontal scaling
Start simple (replication + partitioning), move to sharding only when necessary. Most applications never need sharding.
๐ Further Readingโ
- PostgreSQL Replication Documentation
- MySQL Sharding Strategies
- MongoDB Sharding Architecture
- Vitess (Sharding layer for MySQL)
- Citus (Distributed PostgreSQL)
- Consistent Hashing Algorithms
- CAP Theorem and Distributed Systems