What is query execution?

Query execution refers to the process of actually running and performing the operations specified in a query execution plan to produce the results of a database query. It takes the chosen query plan and compiles it into low-level procedures for the database engine to follow step-by-step to insert, retrieve, join, aggregate data as needed for the query.

Query compilation and optimization happens before execution to pick the optimal plan. But query execution does the real work of orchestrating the complex steps like table scans, index lookups, network transfers, sort operations, and final result set assembly as designed by the query plan.

There are different strategies to execute a query plan efficiently such as parallel execution across multiple cores/threads, distributed execution across clustered nodes, and partitioning data into independent chunks. Careful query execution significantly impacts overall query latency and scalability.

How does it work?

The database engine processes a query plan recursively with steps like:

Scanning tables or indexes for row matching conditions.

Requesting rows from remote shards in distributed systems.

Sending subsets to be joined, sorted, aggregated in stages.

Reading intermediate temporary tables and spools.

Tracking query progress and statistics.

Assembling final result set rows/columns.

Modern engines try to overlap IO, CPU, and network transfers for parallelism during execution.

Why does it matter?

Good query execution ensures operations in the optimal query plan translate to efficient utilization of database resources like IO, memory, network bandwidth.

Poorly executing a query negates benefits of picking a good query plan. Attention to execution details like dispatching work optimally, minimizing intermediate trips to disk or network, using parallelism, not thrashing memory/caches is key to getting good query performance.

FAQ

How can query execution be tuned?

Some ways to optimize query execution itself:

Benchmarking to identify performance bottlenecks.

Tweaking degree of parallelism for operations.

Using execution hints to force plan choices.

Tuning work area sizes for optimal memory use.

Minimizing trips to disk with intermediate caching.

Optimizing concurrency and locking strategies.

What execution methods are used for parallelism?

Parallel execution methods to speed up query processing on modern hardware:

Inter-query parallelism processes multiple independent queries concurrently.

Intra-query parallelism splits work like scans and aggregations within one query.

Vectorized execution uses SIMD instructions for in-memory columnwise processing.

How does distributed query execution work?

In distributed databases, queries execute by:

Parsing query across nodes with data needed.

Scanning local shards and sending subsets to aggregators.

Combining data shuffled across network.

Coordinating parallelism across cluster.

References:

[Book] PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries, by Apress

[Documentation] Apache Arrow Datafusion - Configuration Settings

[Article] An overview of query optimization in relational systems

[Post] Explore your SQL Query Execution

[Post] Understanding SQL Query Execution Plans: Unveiling the Path to Database Performance

Query Execution

What is query execution?

How does it work?

Why does it matter?

FAQ

How can query execution be tuned?

What execution methods are used for parallelism?

How does distributed query execution work?

References:

Related Topics

Partitioning

Distributed Execution

Parallel Execution

SQL Compatibility

ETL Data Processing