Apache Arrow, Arrow/DataFusion, AI-native Data Infra — An Interview with Our CEO Ozan

Apache Arrow, Arrow/DataFusion, AI-native Data Infra — An Interview with Our CEO Ozan

At Synnada, our goal is to enable data practitioners to quickly build and deploy sophisticated data applications powered by AI. We believe that Arrow/DataFusion, an Apache project we contribute heavily to, will be a game-changer in realizing this vision.

Our CEO Ozan recently joined an episode of the Streaming Caffeine podcast — Streaming Caffeine E10: Ozan from Synnada, about Arrow Datafusion, Rust, Databases, SQL, AI — to discuss our perspective on DataFusion and the future of data infrastructure. He shared his view that current technologies like Spark and Flink, while powerful, have limitations because they were designed before the rise of cloud native development and AI. As a result, integrating ML and implementing operational best practices often feels bolted on.

DataFusion takes a different approach. It provides modular components for each aspect of a query engine - parsing, optimization, execution, etc. This means you can pick and choose the parts you need and customize or extend DataFusion while retaining high performance.

We see DataFusion bringing the same flexibility LLVM enabled for programming languages into the world of data systems. No longer do you need to build everything from scratch or settle for a monolithic architecture.

A key strength of DataFusion is its ability to natively handle batch and streaming workloads using the same SQL syntax. It optimizes query plans to run seamlessly on infinite, ordered streams without expensive shuffles or repartitions. This paves the way for unifying stream processing and analytics.

At Synnada, we are using DataFusion to build what we call "AI-native" data infrastructure. By leveraging DataFusion as a core component, we can focus on high-level innovations while relying on its battle-tested SQL engine.

The DataFusion community is coming together to drive adoption. There are plans for a DataFusion event in early 2024 to connect users and contributors. As one of the top contributors to DataFusion, we are excited to participate and share our experiences.

Ozan sees DataFusion and the broader Arrow ecosystem playing a key role in making data infrastructure radically easier to build, deploy and run. This will empower many more people to take advantage of data and AI. We at Synnada are fully committed to helping make this vision a reality.

Please check out the episode to hear directly from Ozan on Synnada, DataFusion and the future of data systems. We welcome you to join us on this journey and look forward to collaborating with the open source community.

Synnada

Synnada

Get early access to AI-native data infrastructure