Spring Boot Microservices Interview Questions & Answers
1. Service Communication & Discoveryโ
Q: Your Order Service needs to call Inventory Service. How would you implement this communication?
A: I'd use:
- Synchronous: RestTemplate/WebClient with Service Discovery (Eureka)
- Asynchronous: Message Queue (RabbitMQ/Kafka) for eventual consistency
- Register both services with Eureka, use service name instead of hardcoded URLs
- Add circuit breaker (Resilience4j) for fault tolerance
Real Example: Amazon - Order service checks inventory availability before confirming order.
Q: Multiple instances of Payment Service are running. How does Order Service know which instance to call?
A:
- Use Spring Cloud LoadBalancer (or Ribbon in older versions)
- Services register with Eureka with multiple instances
- LoadBalancer automatically does client-side load balancing (Round Robin, Random, etc.)
- Example:
restTemplate.getForObject("http://PAYMENT-SERVICE/api/pay", PaymentResponse.class)
Real Example: Netflix - Thousands of instances of streaming service, Eureka distributes load.
2. Distributed Transactionsโ
Q: User places an order: Order Service โ Payment Service โ Inventory Service. If payment succeeds but inventory update fails, how do you handle it?
A:
- Saga Pattern (choreography or orchestration)
- Choreography: Each service publishes events, others listen and react
- Order Created โ Payment Processes โ Inventory Updates
- If fails: Publish compensation events (refund payment)
- Orchestration: Central orchestrator manages the flow
- Use eventual consistency, avoid distributed 2PC
- Implement compensating transactions for rollback
Real Example: Uber Eats - Order placed โ Restaurant confirms โ Delivery assigned. If restaurant cancels, refund payment automatically.
Q: How would you implement Saga pattern with Spring Boot?
A:
// Choreography with Kafka
@KafkaListener(topics = "order-created")
public void processPayment(OrderEvent event) {
try {
paymentService.processPayment(event);
kafkaTemplate.send("payment-success", event);
} catch (Exception e) {
kafkaTemplate.send("payment-failed", event);
}
}
// Compensation
@KafkaListener(topics = "payment-failed")
public void cancelOrder(OrderEvent event) {
orderService.cancelOrder(event.getOrderId());
}
Real Example: Airbnb - Booking โ Payment โ Host notification โ Calendar block. Any failure triggers compensating transactions.
3. Circuit Breaker & Fault Toleranceโ
Q: Your Order Service calls Payment Service, but Payment Service is down. How do you handle this?
A:
- Implement Circuit Breaker using Resilience4j
- Three states: Closed โ Open โ Half-Open
- After threshold failures, circuit opens (stops calling service)
- Provide fallback response
- Periodically retry (half-open state)
@CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback")
public PaymentResponse processPayment(PaymentRequest request) {
return restTemplate.postForObject(url, request, PaymentResponse.class);
}
public PaymentResponse paymentFallback(PaymentRequest request, Exception e) {
return new PaymentResponse("Payment service unavailable, order queued");
}
Real Example: Netflix - If recommendation service fails, show default content instead of error page.
Q: Circuit breaker keeps opening during peak hours. How do you debug?
A:
- Check Actuator metrics:
/actuator/health,/actuator/circuitbreakers - Review circuit breaker config (failure threshold, wait duration)
- Check downstream service logs and health
- Monitor using Micrometer + Prometheus/Grafana
- Verify timeout settings aren't too aggressive
- Scale downstream service if consistently overloaded
Real Example: Flipkart during Big Billion Days - Circuit breakers prevent cascade failures when services get overloaded.
4. API Gatewayโ
Q: You have 15 microservices. Frontend needs to call multiple services. How do you manage this?
A:
- Implement API Gateway (Spring Cloud Gateway or Netflix Zuul)
- Single entry point for all clients
- Routes requests to appropriate microservices
- Handles cross-cutting concerns:
- Authentication/Authorization
- Rate limiting
- Request/Response transformation
- Load balancing
spring:
cloud:
gateway:
routes:
- id: order-service
uri: lb://ORDER-SERVICE
predicates:
- Path=/api/orders/**
filters:
- name: CircuitBreaker
args:
name: orderService
fallbackUri: forward:/fallback/orders
Real Example: Amazon AWS API Gateway - Single entry point for all AWS services.
Q: How do you secure APIs in API Gateway?
A:
- Integrate with OAuth2/JWT authentication
- Validate tokens at gateway level
- Use Spring Security with resource server
- Pass user context to downstream services via headers
- Implement rate limiting per user/API key
Real Example: Stripe API - All requests go through gateway, authenticated via API keys.
5. Configuration Managementโ
Q: You need to change database URL across 10 microservices without redeployment. How?
A:
- Use Spring Cloud Config Server
- Centralized configuration in Git repository
- Services fetch config on startup
- Use
@RefreshScopefor runtime refresh - Trigger refresh via
/actuator/refreshendpoint or Spring Cloud Bus
@RestController
@RefreshScope
public class OrderController {
@Value("${database.url}")
private String dbUrl;
}
Real Example: Spotify - Configuration changes pushed to thousands of microservices without restart.
Q: How do you handle sensitive data like passwords in Config Server?
A:
- Encrypt properties using Spring Cloud Config encryption
- Use Vault for secrets management
- Environment variables for cloud deployments
- Never commit plain text secrets to Git
- Example:
{cipher}AQA7h8fj3h4k5...in properties file
Real Example: PayPal - All secrets stored in HashiCorp Vault, never in code.
6. Service Discovery Issuesโ
Q: Service registered with Eureka but other services can't discover it. How do you troubleshoot?
A:
- Check Eureka dashboard:
http://eureka-server:8761 - Verify service registration config:
eureka:
client:
service-url:
defaultZone: http://localhost:8761/eureka/
register-with-eureka: true
fetch-registry: true - Check network connectivity between services
- Verify application name is correct
- Check if instance is showing as UP in Eureka
- Review heartbeat intervals and renewal thresholds
Real Example: Netflix - Eureka was created to handle their massive service discovery needs.
7. Database per Serviceโ
Q: Order Service needs customer email from User Service for sending confirmation. How do you handle this?
A:
- Option 1: API call to User Service (synchronous)
- Option 2: Event-driven - User Service publishes user events, Order Service maintains read replica
- Option 3: API Composition in API Gateway
- Option 4: CQRS pattern with shared read database
For critical data: Synchronous call with caching For non-critical: Event-driven eventual consistency
Real Example: Uber - Ride service maintains denormalized user data to avoid constant calls to user service.
Q: How do you handle database joins across microservices?
A:
- Avoid joins across services
- Use API composition in application layer
- Implement CQRS with materialized views
- Denormalize data where necessary
- Use event sourcing to maintain consistency
Real Example: Twitter - Tweet service maintains denormalized user info to display tweets without calling user service every time.
8. Monitoring & Observabilityโ
Q: Production issue: API response time suddenly increased from 200ms to 5 seconds. How do you debug?
A:
- Check APM tools: Zipkin/Sleuth for distributed tracing
- Review logs: Aggregate logs (ELK stack)
- Metrics: Prometheus/Grafana for CPU, memory, DB connections
- Trace ID: Follow request across services
- Check:
- Database query performance
- External API latency
- Network issues
- Circuit breaker state
- Resource exhaustion (threads, connections)
Real Example: LinkedIn - Uses distributed tracing to identify bottlenecks in their feed generation pipeline.
Q: How do you implement distributed tracing?
A:
// Add dependencies
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>
// Configuration
spring:
zipkin:
base-url: http://localhost:9411
sleuth:
sampler:
probability: 1.0 # 100% sampling for dev
- Each request gets unique Trace ID
- Span ID for each service hop
- Visualize in Zipkin UI
Real Example: Google Dapper - Pioneered distributed tracing for their microservices.
9. Rate Limiting & Throttlingโ
Q: External API allows only 100 requests/minute. Multiple instances of your service exist. How do you implement rate limiting?
A:
- Use distributed rate limiter:
- Redis-based rate limiting (Spring Cloud Gateway + Redis)
- Bucket4j with distributed backend
- Store counter in Redis with TTL
- Implement token bucket or sliding window algorithm
- Return 429 Too Many Requests when limit exceeded
@Bean
public RouteLocator routes(RouteLocatorBuilder builder) {
return builder.routes()
.route("limited-route", r -> r.path("/api/**")
.filters(f -> f.requestRateLimiter(c -> c
.setRateLimiter(redisRateLimiter())
.setKeyResolver(userKeyResolver())))
.uri("lb://BACKEND-SERVICE"))
.build();
}
Real Example: Twitter API - Rate limits per user/app to prevent abuse.
Q: How do you implement per-user rate limiting across multiple gateway instances?
A:
- Use Redis with user ID as key
- Implement sliding window counter
- Store request timestamps in Redis sorted set
- Clean up old entries beyond time window
- Atomic operations to prevent race conditions
Real Example: GitHub API - Different rate limits for authenticated vs unauthenticated users.
10. Data Consistencyโ
Q: User updates profile in User Service. Order Service shows old data. How do you ensure consistency?
A:
- Event-Driven Architecture:
- User Service publishes "UserUpdated" event to Kafka
- Order Service subscribes and updates its cache/read replica
- Cache invalidation: Invalidate cache on update
- TTL on cache: Set expiration time
- CQRS: Separate read/write models
- Eventual consistency: Accept slight delay (usually acceptable)
Real Example: Facebook - Profile updates eventually propagate to all services through event streams.
11. Security Scenariosโ
Q: How do you secure inter-service communication?
A:
- mTLS (mutual TLS) for service-to-service
- JWT tokens passed via headers
- Service mesh (Istio) for automatic encryption
- API Gateway validates external requests
- Internal services validate JWT and check roles
- Use Spring Security OAuth2 Resource Server
@Configuration
@EnableWebSecurity
public class SecurityConfig extends WebSecurityConfigurerAdapter {
@Override
protected void configure(HttpSecurity http) throws Exception {
http
.oauth2ResourceServer()
.jwt()
.jwtAuthenticationConverter(jwtConverter());
}
}
Real Example: Google Cloud - All internal service communication encrypted with mTLS.
Q: How do you implement SSO across microservices?
A:
- Use OAuth2/OpenID Connect with Keycloak/Okta
- API Gateway handles authentication
- Issues JWT token after login
- Token contains user info and roles
- All services validate same token
- Centralized user session management
Real Example: Microsoft 365 - Single sign-on across all Microsoft services (Teams, Outlook, OneDrive).
Q: JWT token is compromised. How do you revoke it before expiration?
A:
- Maintain token blacklist in Redis with expiry
- Check blacklist on each request
- Use short-lived access tokens (5-15 min)
- Long-lived refresh tokens stored securely
- Implement token versioning (increment version on password change)
- Force re-authentication if needed
Real Example: AWS - Uses temporary security tokens that expire after short duration.
12. Deployment & Scalingโ
Q: Order Service receives 10x traffic during sale. How do you auto-scale?
A:
- Kubernetes HPA (Horizontal Pod Autoscaler):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 - Monitor CPU/Memory metrics
- Scale based on custom metrics (queue depth, request rate)
- Use caching (Redis) to reduce database load
Real Example: Amazon Prime Day - Auto-scales services to handle massive traffic spikes.
Q: Database becomes bottleneck during scaling. How do you handle?
A:
- Read replicas for read-heavy operations
- Connection pooling optimization
- Caching layer (Redis/Memcached)
- Database sharding for write scalability
- CQRS with separate read/write databases
- Queue-based writes for non-critical operations
Real Example: Instagram - Uses read replicas and aggressive caching to handle billions of requests.
13. Caching Strategyโ
Q: Product catalog changes rarely but is queried frequently. How do you optimize?
A:
- Multi-level caching:
- L1: In-memory cache (Caffeine) in each service instance
- L2: Distributed cache (Redis) shared across instances
- Cache-aside pattern: Check cache โ if miss, query DB โ update cache
- Set appropriate TTL based on data freshness requirement
- Cache invalidation on product updates via events
@Cacheable(value = "products", key = "#productId")
public Product getProduct(Long productId) {
return productRepository.findById(productId);
}
@CacheEvict(value = "products", key = "#product.id")
public void updateProduct(Product product) {
productRepository.save(product);
}
Real Example: Netflix - Caches movie metadata to reduce database load.
Q: Cache stampede occurs when popular cache expires. How do you prevent?
A:
- Mutex/Lock: First thread refreshes, others wait
- Probabilistic early expiration: Refresh before actual expiry
- Background refresh: Async refresh before expiry
- Stale-while-revalidate: Serve stale data while refreshing
public Product getProduct(Long id) {
RLock lock = redisson.getLock("product:" + id);
if (lock.tryLock()) {
try {
return refreshCache(id);
} finally {
lock.unlock();
}
} else {
return getFromCache(id); // Wait and get from cache
}
}
Real Example: Reddit - Handles cache stampede during major events using distributed locks.
14. Message Queue Failuresโ
Q: Message sent to Kafka but consumer fails to process. How do you handle?
A:
- Retry mechanism: Retry with exponential backoff
- Dead Letter Queue (DLQ): Move failed messages after max retries
- Idempotency: Ensure consumers can handle duplicate messages
- Manual intervention: Monitor DLQ and fix issues
@KafkaListener(topics = "orders", groupId = "order-processor")
public void processOrder(Order order) {
try {
orderService.process(order);
} catch (Exception e) {
log.error("Failed to process order: {}", order.getId(), e);
throw e; // Message goes to DLQ after max retries
}
}
Real Example: Uber - Uses Kafka with DLQ for ride matching failures.
Q: Kafka consumer lags behind producer significantly. How do you handle?
A:
- Increase consumer instances (scale out)
- Increase partition count for parallelism
- Optimize consumer processing (batch processing, async ops)
- Separate slow vs fast processing paths
- Monitor consumer lag with Prometheus
- Backpressure mechanism to slow down producer if needed
Real Example: LinkedIn - Monitors consumer lag closely for their feed generation pipeline.
15. API Versioningโ
Q: Breaking changes needed in User API. Existing clients can't update immediately. How do you handle?
A:
- URI versioning:
/api/v1/usersvs/api/v2/users - Header versioning:
Accept: application/vnd.api.v2+json - Run both versions simultaneously
- Gradual migration with deprecation notices
- Use API Gateway to route based on version
@RestController
@RequestMapping("/api/v1/users")
public class UserControllerV1 {
// Old implementation
}
@RestController
@RequestMapping("/api/v2/users")
public class UserControllerV2 {
// New implementation
}
Real Example: Stripe - Maintains multiple API versions with clear deprecation timeline.
16. Testing Microservicesโ
Q: How do you test integration between Order Service and Payment Service without actual Payment Service?
A:
- Contract Testing: Use Pact or Spring Cloud Contract
- WireMock: Mock HTTP responses for testing
- TestContainers: Run actual service in Docker for integration tests
- Component Tests: Test with in-memory implementations
@SpringBootTest
@AutoConfigureWireMock(port = 8081)
class OrderServiceTest {
@Test
void testPaymentIntegration() {
stubFor(post(urlEqualTo("/api/payment"))
.willReturn(aResponse()
.withStatus(200)
.withHeader("Content-Type", "application/json")
.withBody("{\"status\":\"SUCCESS\"}")));
PaymentResponse response = orderService.processPayment(request);
assertEquals("SUCCESS", response.getStatus());
}
}
Real Example: Spotify - Uses contract testing to ensure service compatibility.
17. Handling Duplicate Requestsโ
Q: Network issue causes client to retry payment request. How do you prevent duplicate charges?
A:
- Idempotency Key: Client sends unique ID with request
- Store processed request IDs in Redis/DB with TTL
- Check if ID already processed before executing
- Return cached response for duplicate requests
@PostMapping("/api/payment")
public ResponseEntity<PaymentResponse> processPayment(
@RequestHeader("Idempotency-Key") String idempotencyKey,
@RequestBody PaymentRequest request) {
PaymentResponse cached = redisTemplate.opsForValue()
.get("payment:" + idempotencyKey);
if (cached != null) {
return ResponseEntity.ok(cached);
}
PaymentResponse response = paymentService.process(request);
redisTemplate.opsForValue()
.set("payment:" + idempotencyKey, response, 24, TimeUnit.HOURS);
return ResponseEntity.ok(response);
}
Real Example: Stripe - All API requests support idempotency keys to prevent duplicate operations.
18. Database Connection Pool Exhaustionโ
Q: Service crashes with "Too many connections" error during peak load. How do you fix?
A:
- Tune connection pool:
spring:
datasource:
hikari:
maximum-pool-size: 20
minimum-idle: 5
connection-timeout: 30000
idle-timeout: 600000
max-lifetime: 1800000 - Monitor active connections: Use Actuator metrics
- Fix connection leaks: Ensure proper try-with-resources
- Read replicas: Separate read/write connections
- Caching: Reduce database queries
- Async processing: Move long operations to message queue
Real Example: Stack Overflow - Optimizes connection pools to handle traffic spikes efficiently.
19. Service Meshโ
Q: Managing service-to-service communication is complex. How does service mesh help?
A:
- Istio/Linkerd handles:
- Traffic management (routing, load balancing)
- Security (mTLS, authentication)
- Observability (tracing, metrics)
- Resilience (retries, timeouts, circuit breakers)
- Sidecar proxy injected into each pod
- Configuration via YAML, no code changes
- Centralized policy enforcement
Real Example: Lyft - Created Envoy proxy, foundation for Istio service mesh.
20. Graceful Shutdownโ
Q: Kubernetes terminates pod while processing requests. How do you handle gracefully?
A:
# Deployment configuration
spec:
template:
spec:
containers:
- name: order-service
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
terminationGracePeriodSeconds: 30
@Component
public class GracefulShutdown {
@PreDestroy
public void onShutdown() {
log.info("Shutting down gracefully...");
// Stop accepting new requests
// Wait for existing requests to complete
// Close database connections
// Flush caches
}
}
Real Example: Google - Drains traffic before pod termination to ensure zero downtime.
21. Async Communication Patternsโ
Q: Order placed needs to trigger email, SMS, push notification. How do you design this?
A:
- Publish-Subscribe pattern with Kafka
- Order Service publishes "OrderPlaced" event
- Multiple consumers: Email Service, SMS Service, Notification Service
- Each consumes independently, no blocking
- Failures don't affect order placement
@Service
public class OrderService {
public void placeOrder(Order order) {
orderRepository.save(order);
kafkaTemplate.send("order-placed", new OrderEvent(order));
}
}
Real Example: Amazon - Order confirmation triggers multiple async notifications.
22. Backward Compatibilityโ
Q: You added a new mandatory field to User API. Old clients break. How to fix?
A:
- Never make fields mandatory in breaking way
- Use optional fields with default values
- Implement backward-compatible changes only
- If mandatory: create new version endpoint
- Use
@JsonIgnorePropertiesto ignore unknown fields
Real Example: Google APIs - Maintain backward compatibility for years.
23. Health Checks & Readinessโ
Q: Service starts but isn't ready to serve traffic. How do you handle in Kubernetes?
A:
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
@Component
public class DatabaseHealthIndicator implements HealthIndicator {
@Override
public Health health() {
try {
// Check DB connection
return Health.up().build();
} catch (Exception e) {
return Health.down(e).build();
}
}
}
Real Example: Netflix - Uses sophisticated health checks to route traffic only to healthy instances.
24. Service Dependency Managementโ
Q: Service A depends on B, C, D. If D is down, should A start?
A:
- Fail-fast approach: Don't start if critical dependencies down
- Resilient approach: Start with circuit breakers, fallbacks for non-critical deps
- Use health checks to verify dependencies
- Implement retry logic with exponential backoff
- Distinguish critical vs non-critical dependencies
Real Example: Airbnb - Services start even if non-critical dependencies are down.
25. Data Migration in Microservicesโ
Q: Need to migrate 10 million users from monolith to User microservice. How?
A:
- Strangler pattern: Gradually route traffic to new service
- Dual-write pattern: Write to both old and new systems
- Background sync: Async migration of existing data
- Feature flags: Toggle between old/new system
- Verify data consistency before full cutover
- Rollback plan if issues arise
Real Example: Netflix - Migrated from monolith to microservices over several years using strangler pattern.
26. Bulkhead Patternโ
Q: One slow API endpoint is consuming all threads, affecting other endpoints. How to isolate?
A:
- Bulkhead pattern: Separate thread pools per operation
- Configure thread pools in Resilience4j
- Prevent one operation from exhausting resources
@Bulkhead(name = "slowOperation", type = Bulkhead.Type.THREADPOOL)
public CompletableFuture<Report> generateReport() {
return CompletableFuture.supplyAsync(() -> reportService.generate());
}
// Configuration
resilience4j.bulkhead:
configs:
default:
maxConcurrentCalls: 10
instances:
slowOperation:
maxConcurrentCalls: 5
Real Example: Amazon - Isolates resources for different operations to prevent cascading failures.
27. API Composition vs Aggregationโ
Q: Frontend needs data from 5 microservices for dashboard. How do you optimize?
A:
- API Gateway aggregation: Gateway calls all services, combines response
- GraphQL: Let client specify exactly what data needed
- Backend for Frontend (BFF): Dedicated backend for each frontend type
- Parallel calls with CompletableFuture
- Caching for frequently accessed data
public DashboardResponse getDashboard(String userId) {
CompletableFuture<User> userFuture =
CompletableFuture.supplyAsync(() -> userService.getUser(userId));
CompletableFuture<List<Order>> ordersFuture =
CompletableFuture.supplyAsync(() -> orderService.getOrders(userId));
CompletableFuture<Wallet> walletFuture =
CompletableFuture.supplyAsync(() -> walletService.getWallet(userId));
CompletableFuture.allOf(userFuture, ordersFuture, walletFuture).join();
return new DashboardResponse(
userFuture.get(), ordersFuture.get(), walletFuture.get()
);
}
Real Example: Netflix - Uses GraphQL for efficient data fetching across services.
28. Correlation ID for Debuggingโ
Q: Customer complains order failed, but you have logs from 50 microservices. How to debug?
A:
- Generate correlation/trace ID at API Gateway
- Pass via HTTP header to all downstream services
- Log correlation ID in every log statement
- Use ELK/Splunk to search by correlation ID
- Trace entire request flow across services
@Component
public class CorrelationIdFilter extends OncePerRequestFilter {
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain) {
String correlationId = request.getHeader("X-Correlation-ID");
if (correlationId == null) {
correlationId = UUID.randomUUID().toString();
}
MDC.put("correlationId", correlationId);
response.setHeader("X-Correlation-ID", correlationId);
filterChain.doFilter(request, response);
}
}
Real Example: Uber - Traces every ride request across hundreds of microservices using correlation IDs.
29. Handling File Uploadsโ
Q: User uploads product images. Where do you store and how do you handle large files in microservices?
A:
- Never store in database (use blob storage)
- Upload to S3/Azure Blob/GCS directly from client
- Generate pre-signed URLs for secure upload
- Store only file metadata in database
- Use CDN for serving images
- Implement chunked uploads for large files
@PostMapping("/upload")
public ResponseEntity<String> generateUploadUrl(@RequestParam String fileName) {
String key = UUID.randomUUID() + "/" + fileName;
URL presignedUrl = s3Client.generatePresignedUrl(bucketName, key, expiration);
fileMetadataRepo.save(new FileMetadata(key, fileName, userId));
return ResponseEntity.ok(presignedUrl.toString());
}
Real Example: Instagram - Uploads photos directly to S3, serves via CloudFront CDN.
30. Service-to-Service Authenticationโ
Q: How do you ensure only Order Service can call Inventory Service, not any unauthorized service?
A:
- Service accounts with unique credentials
- mTLS with certificate verification
- API keys per service
- JWT tokens with service identity in claims
- Service mesh automatic authentication (Istio)
- Network policies in Kubernetes
@Configuration
public class ServiceAuthConfig {
@Bean
public RestTemplate restTemplate() {
RestTemplate template = new RestTemplate();
template.getInterceptors().add((request, body, execution) -> {
request.getHeaders().add("X-Service-Key", serviceKey);
return execution.execute(request, body);
});
return template;
}
}
Real Example: Google Cloud - Uses service accounts for inter-service authentication.
31. Timeout Managementโ
Q: Payment gateway takes 30 seconds sometimes. Your Order Service times out at 5 seconds. How to handle?
A:
- Async processing: Queue payment requests
- Webhook callback: Payment gateway calls back when done
- Polling: Check payment status periodically
- Circuit breaker: Stop calling if consistently slow
- Different timeouts for different operations
@HystrixCommand(
commandProperties = {
@HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",
value = "30000")
},
fallbackMethod = "paymentFallback"
)
public PaymentResponse processPayment(PaymentRequest request) {
return paymentGateway.charge(request);
}
Real Example: PayPal - Uses webhooks for payment confirmation instead of synchronous responses.
32. Multi-Tenancyโ
Q: Same microservices serve multiple clients (tenants). How do you isolate data?
A:
- Database per tenant: Complete isolation (expensive)
- Schema per tenant: Shared DB, separate schemas
- Shared schema with tenant_id: Row-level security
- Use tenant context in request headers
- Implement tenant resolver interceptor
@Component
public class TenantInterceptor implements HandlerInterceptor {
@Override
public boolean preHandle(HttpServletRequest request,
HttpServletResponse response,
Object handler) {
String tenantId = request.getHeader("X-Tenant-ID");
TenantContext.setCurrentTenant(tenantId);
return true;
}
}
@Aspect
public class TenantAspect {
@Before("@annotation(MultiTenant)")
public void setTenantFilter() {
String tenantId = TenantContext.getCurrentTenant();
// Set hibernate filter or query parameter
}
}
Real Example: Salesforce - Multi-tenant architecture serving thousands of organizations on shared infrastructure.
33. Retry Logic with Exponential Backoffโ
Q: External API occasionally fails with 503. How do you implement smart retry?
A:
- Use Resilience4j Retry with exponential backoff
- Retry only on retriable errors (5xx, timeout)
- Don't retry on 4xx errors (client errors)
- Implement jitter to avoid thundering herd
- Set max retry attempts
@Retry(name = "externalApi", fallbackMethod = "apiFallback")
public ApiResponse callExternalApi() {
return restTemplate.getForObject(externalApiUrl, ApiResponse.class);
}
// Configuration
resilience4j.retry:
instances:
externalApi:
maxAttempts: 3
waitDuration: 1000ms
exponentialBackoffMultiplier: 2
retryExceptions:
- org.springframework.web.client.HttpServerErrorException
ignoreExceptions:
- org.springframework.web.client.HttpClientErrorException
Real Example: AWS SDK - Implements exponential backoff for API retries.
34. Canary Deploymentโ
Q: New version of Payment Service deployed. How do you test with real traffic before full rollout?
A:
- Canary deployment: Route small % of traffic to new version
- Monitor metrics (error rate, latency, success rate)
- Gradually increase traffic if stable
- Automatic rollback if metrics degrade
- Use feature flags for functionality toggle
# Istio virtual service for canary
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: payment-service
spec:
hosts:
- payment-service
http:
- match:
- headers:
canary:
exact: "true"
route:
- destination:
host: payment-service
subset: v2
- route:
- destination:
host: payment-service
subset: v1
weight: 90
- destination:
host: payment-service
subset: v2
weight: 10
Real Example: Facebook - Uses canary deployments to test changes on small user percentage first.
35. Database Migration in Productionโ
Q: Need to add new column to Users table with 100 million rows. Zero downtime required. How?
A:
- Backward compatible changes first:
- Add column as nullable
- Deploy code that writes to new column
- Backfill existing data (batch processing)
- Deploy code that reads from new column
- Make column non-null if needed
- Use database migration tools (Flyway/Liquibase)
- Blue-green deployment for safety
// Flyway migration
@Component
public class V2__Add_Email_Column implements JavaMigration {
@Override
public void migrate(Context context) throws Exception {
try (Statement statement = context.getConnection().createStatement()) {
statement.execute("ALTER TABLE users ADD COLUMN email VARCHAR(255)");
}
}
}
Real Example: GitHub - Performs zero-downtime migrations on massive databases.
36. Event Sourcingโ
Q: Need audit trail of all order changes. How do you implement?
A:
- Event Sourcing: Store all state changes as events
- Don't update records, append events
- Rebuild current state by replaying events
- Provides complete audit trail
- Enables time-travel debugging
@Service
public class OrderEventStore {
public void saveEvent(OrderEvent event) {
eventRepository.save(event);
kafkaTemplate.send("order-events", event);
}
public Order rebuildOrder(String orderId) {
List<OrderEvent> events = eventRepository.findByOrderId(orderId);
Order order = new Order();
events.forEach(event -> order.apply(event));
return order;
}
}
// Events
public class OrderCreatedEvent { }
public class OrderPaidEvent { }
public class OrderShippedEvent { }
public class OrderCancelledEvent { }
Real Example: Banking systems - Maintain complete audit trail of all transactions using event sourcing.
37. Handling Third-Party Service Outagesโ
Q: Payment gateway is down for 2 hours. How do you handle orders?
A:
- Queue orders for later processing
- Show user "Payment pending" status
- Background job retries payment
- Send notification when payment succeeds
- Circuit breaker prevents constant failures
- Implement fallback payment gateways
@Service
public class ResilientPaymentService {
@CircuitBreaker(name = "primaryGateway", fallbackMethod = "useSecondaryGateway")
public PaymentResponse processPrimary(PaymentRequest request) {
return primaryGateway.process(request);
}
public PaymentResponse useSecondaryGateway(PaymentRequest request, Exception e) {
log.warn("Primary gateway failed, using secondary");
return secondaryGateway.process(request);
}
}
Real Example: Amazon - Uses multiple payment processors with automatic failover.
38. API Response Time SLAโ
Q: Your API must respond within 500ms for 99.9% requests. How do you ensure this?
A:
- Performance monitoring: Track P50, P95, P99 latencies
- Database optimization: Proper indexes, query optimization
- Caching: Redis for frequently accessed data
- Connection pooling: Optimize DB connections
- Async processing: Move heavy operations to background
- CDN: Static content from edge locations
- Rate limiting: Prevent abuse
@Timed(value = "api.latency", percentiles = {0.5, 0.95, 0.99})
@GetMapping("/products/{id}")
public Product getProduct(@PathVariable Long id) {
return productService.findById(id);
}
// Alert configuration
- alert: HighApiLatency
expr: histogram_quantile(0.99, api_latency_bucket) > 0.5
annotations:
summary: "P99 latency exceeded 500ms"
Real Example: Stripe - Maintains strict SLAs with comprehensive monitoring.
39. Implementing CQRSโ
Q: Order read queries are slow affecting write performance. How to separate?
A:
- CQRS: Separate Command (write) and Query (read) models
- Write model: Normalized, optimized for consistency
- Read model: Denormalized, optimized for queries
- Sync via events (Kafka)
- Different databases for read/write
// Command side
@Service
public class OrderCommandService {
public void createOrder(CreateOrderCommand cmd) {
Order order = new Order(cmd);
orderWriteRepo.save(order);
eventPublisher.publish(new OrderCreatedEvent(order));
}
}
// Query side
@Service
public class OrderQueryService {
@EventListener
public void onOrderCreated(OrderCreatedEvent event) {
OrderReadModel readModel = new OrderReadModel(event);
orderReadRepo.save(readModel); // Optimized for queries
}
public List<OrderDTO> getOrders(String userId) {
return orderReadRepo.findByUserId(userId);
}
}
Real Example: LinkedIn - Uses CQRS for feed generation separating read/write workloads.
40. Dealing with Clock Skewโ
Q: Distributed services on different servers have time differences. How do you handle?
A:
- Use NTP (Network Time Protocol) to sync clocks
- Don't rely on local timestamps for ordering
- Use vector clocks or logical clocks
- Implement Lamport timestamps
- Use centralized time service (Google TrueTime)
- Database timestamps instead of application timestamps
@Entity
public class Order {
@CreationTimestamp // Database timestamp, not application
private Instant createdAt;
private Long lamportClock; // Logical clock for ordering
}
Real Example: Google Spanner - Uses TrueTime API for globally consistent timestamps.
41. Implementing Feature Flagsโ
Q: New recommendation algorithm ready but want to test on 10% users first. How?
A:
- Feature flags/toggles with LaunchDarkly/Unleash
- Control features without deployment
- A/B testing capabilities
- Gradual rollout
- Quick rollback if issues
@Service
public class RecommendationService {
@Autowired
private FeatureFlagService featureFlagService;
public List<Product> getRecommendations(String userId) {
if (featureFlagService.isEnabled("new-algorithm", userId)) {
return newRecommendationEngine.recommend(userId);
} else {
return oldRecommendationEngine.recommend(userId);
}
}
}
Real Example: Netflix - Uses feature flags extensively to test and deploy features incrementally.
42. Handling Large Payloadโ
Q: User uploads 100MB file to your API. How do you handle without memory issues?
A:
- Streaming upload: Don't load entire file in memory
- Chunked transfer encoding
- Direct upload to S3 with presigned URLs
- Async processing with status callback
- Set max request size limits
@PostMapping(value = "/upload", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
public ResponseEntity<String> uploadFile(@RequestParam("file") MultipartFile file) {
String uploadId = UUID.randomUUID().toString();
// Stream directly to S3
s3Client.putObject(PutObjectRequest.builder()
.bucket(bucketName)
.key(uploadId)
.build(),
RequestBody.fromInputStream(file.getInputStream(), file.getSize()));
// Process async
kafkaTemplate.send("file-uploaded", new FileEvent(uploadId));
return ResponseEntity.accepted().body(uploadId);
}
Real Example: Dropbox - Uploads large files in chunks with resume capability.
43. Cross-Cutting Concernsโ
Q: Need to log request/response, track metrics, validate auth for all endpoints. How to avoid duplication?
A:
- Use Spring AOP (Aspect-Oriented Programming)
- Interceptors for cross-cutting concerns
- Filters for request/response modification
- API Gateway for centralized concerns
@Aspect
@Component
public class LoggingAspect {
@Around("@annotation(org.springframework.web.bind.annotation.RequestMapping)")
public Object logAround(ProceedingJoinPoint joinPoint) throws Throwable {
long start = System.currentTimeMillis();
log.info("Method: {} started", joinPoint.getSignature());
Object result = joinPoint.proceed();
long duration = System.currentTimeMillis() - start;
log.info("Method: {} completed in {}ms", joinPoint.getSignature(), duration);
return result;
}
}
Real Example: Netflix - Uses Zuul filters for cross-cutting concerns across all services.
44. Handling Time Zonesโ
Q: Users in different time zones book appointments. How do you handle datetime consistently?
A:
- Always store in UTC in database
- Convert to user timezone only in presentation layer
- Use ISO 8601 format for APIs
- Store user timezone preference
- Use
InstantorZonedDateTimein Java
@Entity
public class Appointment {
private Instant appointmentTime; // Always UTC
public ZonedDateTime getLocalTime(String timezone) {
return appointmentTime.atZone(ZoneId.of(timezone));
}
}
// API response
public AppointmentDTO toDTO(Appointment apt, String userTimezone) {
return AppointmentDTO.builder()
.time(apt.getLocalTime(userTimezone))
.timezone(userTimezone)
.build();
}
Real Example: Booking.com - Handles hotel bookings across all time zones.
45. Implementing Search Functionalityโ
Q: Need to search products by name, category, price range across millions of records. How?
A:
- Use Elasticsearch for full-text search
- Sync data from database to Elasticsearch
- Use change data capture (Debezium) for real-time sync
- Implement search analytics
@Service
public class ProductSearchService {
@Autowired
private ElasticsearchRestTemplate elasticsearchTemplate;
public List<Product> search(String query, PriceRange range, String category) {
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(multiMatchQuery(query, "name", "description"))
.withFilter(boolQuery()
.must(rangeQuery("price").gte(range.getMin()).lte(range.getMax()))
.must(termQuery("category", category)))
.build();
return elasticsearchTemplate.search(searchQuery, Product.class)
.stream()
.map(SearchHit::getContent)
.collect(Collectors.toList());
}
}
Real Example: Amazon - Uses Elasticsearch for product search across millions of items.
46. Handling Partial Failuresโ
Q: Dashboard needs data from 5 services. 2 services are down. What do you show?
A:
- Fail gracefully: Show available data
- Use circuit breaker with fallbacks
- Timeout quickly for failing services
- Show partial UI with error indicators
- Cache stale data as fallback
public DashboardResponse getDashboard(String userId) {
DashboardResponse response = new DashboardResponse();
try {
response.setUser(userService.getUser(userId));
} catch (Exception e) {
log.error("User service failed", e);
response.setUser(getCachedUser(userId));
response.addError("user-service-unavailable");
}
try {
response.setOrders(orderService.getOrders(userId));
} catch (Exception e) {
log.error("Order service failed", e);
response.setOrders(Collections.emptyList());
response.addError("order-service-unavailable");
}
return response;
}
Real Example: Facebook - Shows partial feed if some services fail.
47. Implementing Saga Orchestrationโ
Q: Complex workflow: Book hotel โ Book flight โ Book car. If flight fails, rollback hotel. How?
A:
- Saga Orchestration: Central coordinator manages workflow
- Defines compensating transactions
- State machine for workflow
- Persists saga state for recovery
@Service
public class TravelBookingSaga {
public void bookTravel(TravelRequest request) {
String sagaId = UUID.randomUUID().toString();
SagaState state = new SagaState(sagaId);
try {
// Step 1: Book hotel
HotelBooking hotel = hotelService.book(request);
state.setHotelBookingId(hotel.getId());
sagaStateRepo.save(state);
// Step 2: Book flight
FlightBooking flight = flightService.book(request);
state.setFlightBookingId(flight.getId());
sagaStateRepo.save(state);
// Step 3: Book car
CarBooking car = carService.book(request);
state.setCarBookingId(car.getId());
state.setStatus(SagaStatus.COMPLETED);
sagaStateRepo.save(state);
} catch (Exception e) {
// Compensate
compensate(state);
}
}
private void compensate(SagaState state) {
if (state.getCarBookingId() != null) {
carService.cancel(state.getCarBookingId());
}
if (state.getFlightBookingId() != null) {
flightService.cancel(state.getFlightBookingId());
}
if (state.getHotelBookingId() != null) {
hotelService.cancel(state.getHotelBookingId());
}
state.setStatus(SagaStatus.FAILED);
sagaStateRepo.save(state);
}
}
Real Example: Uber Eats - Orchestrates restaurant, delivery, and payment in single workflow.
48. Implementing API Gateway Aggregationโ
Q: Mobile app has limited bandwidth. How do you reduce API calls?
A:
- Backend for Frontend (BFF): API specifically for mobile
- GraphQL: Let client request exact data needed
- Gateway aggregation: Combine multiple calls
- Data compression: GZIP responses
@RestController
@RequestMapping("/api/mobile")
public class MobileBFFController {
@GetMapping("/home")
public MobileHomeResponse getHome(@AuthenticationPrincipal User user) {
// Aggregate data from multiple services
CompletableFuture<UserProfile> profileFuture =
CompletableFuture.supplyAsync(() -> userService.getProfile(user.getId()));
CompletableFuture<List<Recommendation>> recsFuture =
CompletableFuture.supplyAsync(() -> recommendationService.get(user.getId()));
CompletableFuture<List<Notification>> notifsFuture =
CompletableFuture.supplyAsync(() -> notificationService.getUnread(user.getId()));
CompletableFuture.allOf(profileFuture, recsFuture, notifsFuture).join();
return MobileHomeResponse.builder()
.profile(profileFuture.get())
.recommendations(recsFuture.get())
.notifications(notifsFuture.get())
.build();
}
}
Real Example: Twitter - BFF pattern for mobile apps to reduce API calls.
49. Handling Webhook Retriesโ
Q: You send webhooks to customer systems. Their server is down. How do you retry?
A:
- Exponential backoff for retries
- Maximum retry attempts (e.g., 10)
- Dead letter queue for failed webhooks
- Store webhook delivery history
- Provide manual retry option in dashboard
@Service
public class WebhookService {
@Async
@Retryable(
value = {RestClientException.class},
maxAttempts = 10,
backoff = @Backoff(delay = 1000, multiplier = 2, maxDelay = 3600000)
)
public void sendWebhook(WebhookEvent event) {
try {
HttpHeaders headers = new HttpHeaders();
headers.set("X-Webhook-Signature", generateSignature(event));
HttpEntity<WebhookEvent> request = new HttpEntity<>(event, headers);
ResponseEntity<String> response = restTemplate.postForEntity(
event.getCallbackUrl(), request, String.class);
webhookLogRepo.save(new WebhookLog(event.getId(), "SUCCESS", response.getStatusCode()));
} catch (Exception e) {
webhookLogRepo.save(new WebhookLog(event.getId(), "FAILED", e.getMessage()));
throw e;
}
}
@Recover
public void recover(RestClientException e, WebhookEvent event) {
log.error("Webhook delivery failed after all retries: {}", event.getId());
deadLetterQueueService.add(event);
}
}
Real Example: Stripe - Sophisticated webhook retry system with exponential backoff.
50. Blue-Green Deploymentโ
Q: Zero-downtime deployment needed. How do you switch between versions?
A:
- Two identical environments: Blue (current) and Green (new)
- Deploy to Green environment
- Test thoroughly
- Switch traffic from Blue to Green
- Keep Blue for quick rollback
- Use load balancer to switch traffic
# Kubernetes service switching
apiVersion: v1
kind: Service
metadata:
name: payment-service
spec:
selector:
app: payment-service
version: blue # Change to 'green' to switch
ports:
- port: 8080
---
# Blue deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service-blue
spec:
replicas: 3
selector:
matchLabels:
app: payment-service
version: blue
---
# Green deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service-green
spec:
replicas: 3
selector:
matchLabels:
app: payment-service
version: green
Real Example: Amazon - Uses blue-green deployments for zero-downtime updates.
51. Implementing Request Deduplicationโ
Q: User accidentally clicks "Submit Order" twice. How do you prevent duplicate orders?
A:
- Client-side: Disable button after first click
- Server-side: Idempotency key or unique constraint
- Time window: Check for duplicate within 5 minutes
- Redis: Store request hash with TTL
@Service
public class OrderDeduplicationService {
@Autowired
private RedisTemplate<String, String> redisTemplate;
public boolean isDuplicate(OrderRequest request, String userId) {
String key = "order:" + userId + ":" + generateHash(request);
Boolean isNew = redisTemplate.opsForValue()
.setIfAbsent(key, "processing", Duration.ofMinutes(5));
return !Boolean.TRUE.equals(isNew);
}
private String generateHash(OrderRequest request) {
return DigestUtils.sha256Hex(
request.getProductIds().toString() +
request.getTotalAmount());
}
}
@PostMapping("/orders")
public ResponseEntity<?> createOrder(@RequestBody OrderRequest request,
@AuthenticationPrincipal User user) {
if (orderDeduplicationService.isDuplicate(request, user.getId())) {
return ResponseEntity.status(HttpStatus.CONFLICT)
.body("Duplicate order detected");
}
Order order = orderService.create(request);
return ResponseEntity.ok(order);
}
Real Example: PayPal - Prevents duplicate payments with sophisticated deduplication.
52. Handling Schema Evolutionโ
Q: Need to change event schema in Kafka. Old consumers still running. How?
A:
- Schema Registry (Confluent/Apicurio)
- Backward compatibility: New fields optional
- Forward compatibility: Old producers work with new consumers
- Version field in events
- Avro/Protobuf for schema evolution
// Version 1
public class OrderEventV1 {
private String orderId;
private BigDecimal amount;
}
// Version 2 - backward compatible
public class OrderEventV2 {
private String orderId;
private BigDecimal amount;
private String currency = "USD"; // Default value
private List<String> tags = new ArrayList<>(); // Optional field
}
@KafkaListener(topics = "orders")
public void handleOrderEvent(String message) {
JsonNode event = objectMapper.readTree(message);
int version = event.get("version").asInt(1);
if (version == 1) {
OrderEventV1 orderV1 = objectMapper.readValue(message, OrderEventV1.class);
// Handle V1
} else if (version == 2) {
OrderEventV2 orderV2 = objectMapper.readValue(message, OrderEventV2.class);
// Handle V2
}
}
Real Example: LinkedIn - Uses Avro with Schema Registry for event schema evolution.
53. Implementing Circuit Breaker Dashboardโ
Q: Multiple services using circuit breakers. How do you monitor them centrally?
A:
- Hystrix Dashboard (deprecated) or Resilience4j Dashboard
- Spring Boot Admin with Actuator
- Export metrics to Prometheus/Grafana
- Set up alerts for circuit breaker state changes
# Actuator endpoints
management:
endpoints:
web:
exposure:
include: health,circuitbreakers,circuitbreakerevents
health:
circuitbreakers:
enabled: true
# Prometheus metrics
resilience4j.circuitbreaker:
instances:
paymentService:
registerHealthIndicator: true
ringBufferSizeInClosedState: 100
ringBufferSizeInHalfOpenState: 10
waitDurationInOpenState: 10000
failureRateThreshold: 50
eventConsumerBufferSize: 10
Real Example: Netflix - Hystrix Dashboard (now deprecated) showed real-time circuit breaker status.
54. Implementing Distributed Lockingโ
Q: Two instances try to process same order simultaneously. How do you prevent?
A:
- Redis distributed lock (Redisson)
- Database pessimistic locking
- Optimistic locking with version field
- ZooKeeper for coordination
@Service
public class OrderProcessingService {
@Autowired
private RedissonClient redissonClient;
public void processOrder(String orderId) {
RLock lock = redissonClient.getLock("order-lock:" + orderId);
try {
// Wait for lock, auto-release after 10 seconds
boolean acquired = lock.tryLock(100, 10000, TimeUnit.MILLISECONDS);
if (acquired) {
// Check if already processed
if (orderRepository.findById(orderId).getStatus() == PROCESSED) {
return;
}
// Process order
processOrderInternal(orderId);
} else {
log.warn("Could not acquire lock for order: {}", orderId);
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} finally {
if (lock.isHeldByCurrentThread()) {
lock.unlock();
}
}
}
}
Real Example: Airbnb - Uses distributed locking for concurrent booking prevention.
55. Implementing Rate Limiterโ
Q: API should allow max 100 requests per minute per user. How do you implement across multiple instances?
A:
- Redis-based rate limiter
- Token bucket or sliding window algorithm
- Store counters in Redis with TTL
@Component
public class RedisRateLimiter {
@Autowired
private RedisTemplate<String, String> redisTemplate;
public boolean isAllowed(String userId, int maxRequests, Duration window) {
String key = "rate_limit:" + userId;
long currentTime = System.currentTimeMillis();
long windowStart = currentTime - window.toMillis();
// Remove old entries
redisTemplate.opsForZSet().removeRangeByScore(key, 0, windowStart);
// Count requests in current window
Long count = redisTemplate.opsForZSet().count(key, windowStart, currentTime);
if (count != null && count < maxRequests) {
// Add current request
redisTemplate.opsForZSet().add(key, UUID.randomUUID().toString(), currentTime);
redisTemplate.expire(key, window);
return true;
}
return false;
}
}
@RestController
public class ApiController {
@GetMapping("/api/resource")
public ResponseEntity<?> getResource(@AuthenticationPrincipal User user) {
if (!rateLimiter.isAllowed(user.getId(), 100, Duration.ofMinutes(1))) {
return ResponseEntity.status(HttpStatus.TOO_MANY_REQUESTS)
.body("Rate limit exceeded");
}
return ResponseEntity.ok(resourceService.get());
}
}
Real Example: GitHub API - Implements per-user rate limiting across global infrastructure.
Quick Fire Conceptsโ
Event Sourcingโ
Q: What is it? A: Store all changes as events instead of current state. Rebuild state by replaying events. Provides audit trail, time travel debugging. Example: Banking - Every transaction stored as event, account balance derived.
CQRSโ
Q: What is it? A: Separate read and write models. Write to normalized DB, project to optimized read models. Improves scalability and performance. Example: E-commerce - Write to transactional DB, read from Elasticsearch.
Strangler Patternโ
Q: What is it? A: Gradually replace legacy system by "strangling" it. Route new features to microservices, old features to monolith. Example: Migrating from monolith to microservices incrementally.
Backend for Frontend (BFF)โ
Q: What is it? A: Separate backend for each frontend type (web, mobile, IoT). Optimized APIs for each client. Example: Netflix - Different APIs for TV, mobile, web.
Bulkhead Patternโ
Q: What is it? A: Isolate resources (thread pools, connections) per service/operation. One failing service doesn't exhaust all resources. Example: Ship compartments - one leak doesn't sink entire ship.
Sidecar Patternโ
Q: What is it? A: Deploy helper container alongside main container. Handles logging, monitoring, proxying. Example: Istio Envoy sidecar for service mesh.
Ambassador Patternโ
Q: What is it? A: Proxy that handles external service communication. Retry, circuit breaker, monitoring. Example: Database connection pooling sidecar.
Anti-Corruption Layerโ
Q: What is it? A: Translation layer between new microservices and legacy system. Prevents legacy complexity from leaking. Example: Adapter for legacy SOAP services in REST world.