What is partitioning?

Database partitioning is an architectural process of subdividing large tables, indexes and data into smaller partitions, enabling database performance and management benefits through parallel processing.

Partitioning breaks up data into smaller, more manageable chunks that can be spread across storage devices or nodes. Each partition forms an independent subset of the overall data that can be queried and manipulated efficiently in parallel. The partitions are consolidated transparently during query execution to produce complete results across the entire dataset.

Partitioning facilitates parallel execution within a server and distributed execution across a cluster. It works closely with the overall query execution engine to improve performance through divide-and-conquer parallelism. Partitioning also aids maintenance operations like backups and insert scalability.

How does it work?

Some ways partitioning is accomplished:

Range partitioning maps data to partitions by ranges of a column like date ranges.

List partitioning maps data based on discrete values in a column.

Hash partitioning maps data based on a hashing function on its key.

Composite partitioning uses a combination like range with hash.

Why does it matter?

Partitioning provides benefits like:

Spreading IO across disks by striping partitions.

Dividing memory/CPU needs across servers.

Pruning queries by eliminating unneeded partitions.

Easy deletion of older historic data in range partitions.

Parallelizing operations by working on partitions concurrently.

FAQ

When is partitioning useful?

Partitioning helps for:

Taming large tables exceeding storage limits.

Parallelizing big data analytics through division of work.

Hardware scalability through spreading load.

Maintaining history by aging out old partitions.

What are some key partitioning challenges?

Some downsides to partitioning:

Extra development and tuning work.

Joining partitioned tables can be complex.

Mandatory keys to map rows to partitions.

Rebalancing data as partitions grow.

Indexing considerations with locality.

What are some examples of partitioned databases?

Some databases with partitioning capabilities:

Oracle - Range, hash, list, composite partitioning.

SQL Server - Partitioned tables and indexes using filegroups.

MySQL - Hash and key partitioning.

PostgreSQL - Declarative table partitioning.

Cassandra - Partitioning by row key for distribution.

References:

[Book] PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries, by Apress

[Article] Vertical partitioning algorithms for database design

[Post] Understanding Data Partitioning in Database design (Part 1)

[Post] Data Partitioning — System Design Concept

Partitioning

What is partitioning?

How does it work?

Why does it matter?

FAQ

When is partitioning useful?

What are some key partitioning challenges?

What are some examples of partitioned databases?

References:

Related Topics

Query Execution

Parallel Execution

Distributed Execution

Partitioning

CAP Theorem