Apache Kafka - System Design Interview Guide
Table of Contents
- What is Apache Kafka
- Why Use Kafka
- Core Components
- Kafka Architecture
- Role in Distributed Systems
- System Design Patterns
- Performance Characteristics
- Use Cases
- Trade-offs and Limitations
- Interview Questions
What is Apache Kafka
Apache Kafka is a distributed event streaming platform designed for high-throughput, low-latency data streaming. Originally developed by LinkedIn and later open-sourced, Kafka acts as a distributed commit log that can handle millions of events per second.
Core Concepts
- Event Streaming: Continuous flow of data records (events) between systems
- Distributed: Runs as a cluster across multiple servers for fault tolerance
- Persistent: Events are stored on disk and replicated across brokers
- Pub-Sub Model: Publishers send messages to topics, consumers subscribe to topics
- Immutable Log: Events are append-only and immutable once written
Key Characteristics
- High Throughput: Millions of messages per second
- Low Latency: Sub-millisecond message delivery
- Horizontal Scalability: Add more brokers to handle increased load
- Durability: Messages persisted to disk with configurable retention
- Fault Tolerance: Data replicated across multiple brokers
- Ordering Guarantees: Messages ordered within partitions
Why Use Kafka
Traditional Messaging Problems
Point-to-Point Systems:
- Tight coupling between producers and consumers
- Difficult to add new consumers
- Single point of failure
- Limited scalability
Traditional Message Queues:
- Messages deleted after consumption
- Limited replay capability
- Difficulty handling high throughput
- Complex routing logic
Kafka Solutions
- Decoupling: Producers and consumers don't need to know about each other
- Scalability: Horizontal scaling through partitioning
- Durability: Messages stored for configurable time periods
- Replay: Consumers can replay messages from any point
- Multi-Consumer: Multiple consumer groups can read same data
- Ordering: Maintains message order within partitions
- Fault Tolerance: No single point of failure
Business Benefits
- Real-time Processing: Enable real-time analytics and responses
- System Integration: Connect disparate systems seamlessly
- Event Sourcing: Build event-driven architectures
- Data Pipeline: Create reliable data pipelines
- Microservices Communication: Async communication between services
Core Components
1. Topics
Definition: Named streams of records where events are published
Topic: "user-events"
├── Partition 0: [event1, event2, event3, ...]
├── Partition 1: [event4, event5, event6, ...]
└── Partition 2: [event7, event8, event9, ...]
Characteristics:
- Logical grouping of related events
- Split into multiple partitions for parallelism
- Immutable append-only logs
- Configurable retention period
2. Partitions
Purpose: Enable parallelism and scalability
Key Features:
- Ordering: Messages ordered within each partition
- Parallelism: Different partitions can be processed in parallel
- Distribution: Partitions distributed across brokers
- Key-based Routing: Messages with same key go to same partition
Partition Assignment:
Message Key → Hash Function → Partition Number
3. Brokers
Definition: Kafka servers that store and serve data
Responsibilities:
- Store partition data on disk
- Handle producer and consumer requests
- Replicate data to other brokers
- Manage partition leadership
Cluster Configuration:
Kafka Cluster
├── Broker 1 (Leader for Partition 0)
├── Broker 2 (Leader for Partition 1)
└── Broker 3 (Leader for Partition 2)
4. Producers
Role: Applications that publish events to Kafka topics
Key Features:
- Batching: Group multiple messages for efficiency
- Partitioning Strategy: Determine which partition to send messages
- Acknowledgment Modes: Configure delivery guarantees
- Compression: Reduce network overhead
Producer Configurations:
acks=0: Fire and forget (fastest, least reliable)acks=1: Wait for leader acknowledgmentacks=all: Wait for all replicas (slowest, most reliable)