What is lambda architecture?

Lambda architecture is a big data systems design pattern that combines both batch and real-time stream processing to meet the latency and throughput needs of large-scale data processing pipelines.

It serves requests using a read-optimized view generated by batch processing while using fast stream processing on recent data to fill the batch latency gap. The batch and stream results are stitched together at query time.

Lambda architecture emerged as a popular pattern but adds complexity of managing separate pipelines. The Kappa architecture and unified processing approaches simplify this by collapsing the batch and streaming into a single pipeline.

What does it do? How does it work?

In a lambda architecture, incoming data is dispatched to both a batch processing system that stores the master dataset, and a lower-latency stream processing system that handles recent data.

Queries are resolved by merging results from both - taking the pre-computed batch results for historical data, and fast real-time stream results for recent data.

Why is it important? Where is it used?

Lambda architecture gained popularity as a pattern for handling large data volumes with high throughput batch processing and low latency real-time analytics.

It can be useful for workloads that require a combination of real-time data queries and complex analytics on the full historical dataset. However, complexity and operational costs have led to alternative unified approaches.

FAQ

What are the components of a lambda architecture?

The core components of a lambda architecture are:

Batch processing layer for master dataset

Stream processing layer for recent real-time data

Serving layer to respond to queries

Coordination glue code to orchestrate and reconcile batch and stream processing

What are the pros and cons of lambda architecture?

Pros:

Tuned for latency and throughput

Robust and fault-tolerant

Cons:

Complexity of operating disparate systems

Reconciling batch and stream results

Repeated reprocessing of data

How is it different from Kappa architecture?

The Kappa architecture aims to simplify lambda architecture by eliminating the batch processing component and relying solely on stream processing for both real-time and historical processing.