Lambda Architecture

Data Processing
Updated on:
August 20, 2024

What is lambda architecture?

Lambda architecture is a big data systems design pattern that combines both batch and real-time stream processing to meet the latency and throughput needs of large-scale data processing pipelines.

It serves requests using a read-optimized view generated by batch processing while using fast stream processing on recent data to fill the batch latency gap. The batch and stream results are stitched together at query time.

Lambda architecture emerged as a popular pattern but adds complexity of managing separate pipelines. The Kappa architecture and unified processing approaches simplify this by collapsing the batch and streaming into a single pipeline.

What does it do? How does it work?

In a lambda architecture, incoming data is dispatched to both a batch processing system that stores the master dataset, and a lower-latency stream processing system that handles recent data.

Queries are resolved by merging results from both - taking the pre-computed batch results for historical data, and fast real-time stream results for recent data.

Why is it important? Where is it used?

Lambda architecture gained popularity as a pattern for handling large data volumes with high throughput batch processing and low latency real-time analytics.

It can be useful for workloads that require a combination of real-time data queries and complex analytics on the full historical dataset. However, complexity and operational costs have led to alternative unified approaches.

FAQ

What are the components of a lambda architecture?

The core components of a lambda architecture are:

  • Batch processing layer for master dataset
  • Stream processing layer for recent real-time data
  • Serving layer to respond to queries
  • Coordination glue code to orchestrate and reconcile batch and stream processing

What are the pros and cons of lambda architecture?

Pros:

  • Tuned for latency and throughput
  • Robust and fault-tolerant

Cons:

  • Complexity of operating disparate systems
  • Reconciling batch and stream results
  • Repeated reprocessing of data

How is it different from Kappa architecture?

The Kappa architecture aims to simplify lambda architecture by eliminating the batch processing component and relying solely on stream processing for both real-time and historical processing.

When is lambda architecture a good choice?

Lambda architecture excels for workloads requiring:

  • Very low latency reads on recent data
  • High throughput storage and queries on the full master dataset
  • Complex analytical batch processing jobs

References:


Related Entries

Kappa Architecture

Kappa architecture is a big data processing pattern that uses stream processing for both real-time and historical analytics, avoiding the complexity of hybrid stream and batch processing.

Read more ->
Unified Processing

Unified processing refers to data pipeline architectures that handle batch and real-time processing using a single processing engine, avoiding the complexities of hybrid systems.

Read more ->
Batch Processing

Batch processing is the execution of a series of programs or jobs on a set of data in batches without user interaction for efficiently processing high volumes of data.

Read more ->

Get early access to AI-native data infrastructure