Apache Kafka - System Design Interview Guide
Table of Contents
- What is Apache Kafka
- Why Use Kafka
- Core Components
- Kafka Architecture
- Role in Distributed Systems
- System Design Patterns
- Performance Characteristics
- Use Cases
- Trade-offs and Limitations
- Interview Questions
What is Apache Kafka
Apache Kafka is a distributed event streaming platform designed for high-throughput, low-latency data streaming. Originally developed by LinkedIn and later open-sourced, Kafka acts as a distributed commit log that can handle millions of events per second.
Core Concepts
- Event Streaming: Continuous flow of data records (events) between systems
- Distributed: Runs as a cluster across multiple servers for fault tolerance
- Persistent: Events are stored on disk and replicated across brokers
- Pub-Sub Model: Publishers send messages to topics, consumers subscribe to topics
- Immutable Log: Events are append-only and immutable once written
Key Characteristics
- High Throughput: Millions of messages per second
- Low Latency: Sub-millisecond message delivery
- Horizontal Scalability: Add more brokers to handle increased load
- Durability: Messages persisted to disk with configurable retention
- Fault Tolerance: Data replicated across multiple brokers
- Ordering Guarantees: Messages ordered within partitions
Why Use Kafka
Traditional Messaging Problems
Point-to-Point Systems:
- Tight coupling between producers and consumers
- Difficult to add new consumers
- Single point of failure
- Limited scalability
Traditional Message Queues:
- Messages deleted after consumption
- Limited replay capability
- Difficulty handling high throughput
- Complex routing logic
Kafka Solutions
- Decoupling: Producers and consumers don't need to know about each other
- Scalability: Horizontal scaling through partitioning
- Durability: Messages stored for configurable time periods
- Replay: Consumers can replay messages from any point
- Multi-Consumer: Multiple consumer groups can read same data
- Ordering: Maintains message order within partitions
- Fault Tolerance: No single point of failure
Business Benefits
- Real-time Processing: Enable real-time analytics and responses
- System Integration: Connect disparate systems seamlessly
- Event Sourcing: Build event-driven architectures
- Data Pipeline: Create reliable data pipelines
- Microservices Communication: Async communication between services
Core Components
1. Topics
Definition: Named streams of records where events are published
Topic: "user-events"
├── Partition 0: [event1, event2, event3, ...]
├── Partition 1: [event4, event5, event6, ...]
└── Partition 2: [event7, event8, event9, ...]
Characteristics:
- Logical grouping of related events
- Split into multiple partitions for parallelism
- Immutable append-only logs
- Configurable retention period
2. Partitions
Purpose: Enable parallelism and scalability
Key Features:
- Ordering: Messages ordered within each partition
- Parallelism: Different partitions can be processed in parallel
- Distribution: Partitions distributed across brokers
- Key-based Routing: Messages with same key go to same partition
Partition Assignment:
Message Key → Hash Function → Partition Number
3. Brokers
Definition: Kafka servers that store and serve data
Responsibilities:
- Store partition data on disk
- Handle producer and consumer requests
- Replicate data to other brokers
- Manage partition leadership
Cluster Configuration:
Kafka Cluster
├── Broker 1 (Leader for Partition 0)
├── Broker 2 (Leader for Partition 1)
└── Broker 3 (Leader for Partition 2)
4. Producers
Role: Applications that publish events to Kafka topics
Key Features:
- Batching: Group multiple messages for efficiency
- Partitioning Strategy: Determine which partition to send messages
- Acknowledgment Modes: Configure delivery guarantees
- Compression: Reduce network overhead
Producer Configurations:
acks=0: Fire and forget (fastest, least reliable)acks=1: Wait for leader acknowledgmentacks=all: Wait for all replicas (slowest, most reliable)
5. Consumers
Role: Applications that read events from Kafka topics
Consumer Groups:
- Multiple consumers working together
- Each partition assigned to only one consumer in group
- Automatic rebalancing when consumers join/leave
Offset Management:
- Track position in each partition
- Enable replay from specific points
- Stored in special Kafka topic
__consumer_offsets
6. ZooKeeper (Legacy) / KRaft (New)
ZooKeeper (Legacy):
- Manages cluster metadata
- Handles leader election
- Stores configuration information
- Being phased out in newer versions
KRaft (Kafka Raft):
- New consensus protocol
- Eliminates ZooKeeper dependency
- Simplifies deployment and operations
- Better scalability and performance
Kafka Architecture
Cluster Architecture
┌─────────────────────────────────────────────────────────┐
│ Kafka Cluster │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Broker 1 │ │ Broker 2 │ │ Broker 3 │ │
│ │ │ │ │ │ │ │
│ │ Topic A │ │ Topic A │ │ Topic A │ │
│ │ P0(L) │ │ P1(L) │ │ P0(F) │ │
│ │ P1(F) │ │ P0(F) │ │ P1(F) │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────┘
↑ ↓
┌──────────┐ ┌──────────┐
│Producer 1│ │Consumer 1│
│Producer 2│ │Consumer 2│
└──────────┘ └──────────┘
Legend: P0(L) = Partition 0 Leader, P0(F) = Partition 0 Follower
Replication Strategy
Leader-Follower Model:
- Each partition has one leader and multiple followers
- Producers/consumers interact only with leaders
- Followers replicate data from leaders
- Automatic failover if leader fails
In-Sync Replicas (ISR):
- Replicas that are caught up with leader
- Used for leader election
- Ensures data consistency
Message Flow
- Producer sends message to topic partition
- Leader broker appends message to log
- Follower brokers replicate the message
- Acknowledgment sent back to producer
- Consumers read messages from partitions
- Offset updated after successful processing