Query execution refers to the process of actually running and performing the operations specified in a query execution plan to produce the results of a database query. It takes the chosen query plan and compiles it into low-level procedures for the database engine to follow step-by-step to insert, retrieve, join, aggregate data as needed for the query.
Query compilation and optimization happens before execution to pick the optimal plan. But query execution does the real work of orchestrating the complex steps like table scans, index lookups, network transfers, sort operations, and final result set assembly as designed by the query plan.
There are different strategies to execute a query plan efficiently such as parallel execution across multiple cores/threads, distributed execution across clustered nodes, and partitioning data into independent chunks. Careful query execution significantly impacts overall query latency and scalability.
The database engine processes a query plan recursively with steps like:
Modern engines try to overlap IO, CPU, and network transfers for parallelism during execution.
Good query execution ensures operations in the optimal query plan translate to efficient utilization of database resources like IO, memory, network bandwidth.
Poorly executing a query negates benefits of picking a good query plan. Attention to execution details like dispatching work optimally, minimizing intermediate trips to disk or network, using parallelism, not thrashing memory/caches is key to getting good query performance.
Some ways to optimize query execution itself:
Parallel execution methods to speed up query processing on modern hardware:
In distributed databases, queries execute by:
Database partitioning refers to splitting large tables into smaller, independent pieces called partitions stored across different filegroups, drives or nodes.Read more ->
Distributed execution refers to techniques to execute database queries efficiently across clustered servers or nodes, dividing work to utilize parallel resources.Read more ->
Parallel execution refers to techniques for speeding up database query processing by leveraging multiple CPUs, servers, or resources concurrently.Read more ->
Our CEO Ozan recently joined an episode of the Streaming Caffeine podcast — Streaming Caffeine E10: Ozan from Synnada, about Arrow Datafusion, Rust, Databases, SQL, AI — to discuss our perspective on DataFusion and the future of data infrastructure.