Overview

What is Flowman?

Flowman is a declarative data build tool based on Apache Spark.

Everything as Code

Simple YAML files support proven workflows with source code management, code reviews and CI/CD pipelines.

Declarative Spark

With its declarative approach, Flowman removes the complexity of writing robust Spark applications and let your developers focus on the business logic instead.

Development Workflow

By using simple YAML files, Flowman easily supports collaboration between developers. An optional integration with Apache Maven simplifies CI/CD processes.

Users

Learn how Flowman reduces the cognitive load of data engineers.

Operations Teams

Learn how Flowman supports your operations.

100% Open Source

Transforming Big Data

Apache Spark. Extended. Declarative.

Flowman is a declarative ETL framework and data build tool powered by Apache Spark. It reads, processes and writes data from and to a huge variety of physical storages, like relational databases, files, and object stores. It can easily join data sets from different source systems for creating an integrated data model. This makes Flowman a powerful tool for creating complex data transformation pipelines for the modern data stack.

For defining all data sources, sinks and the transformations between them, Flowman follows a purely declarative approach using plain YAML files. Developers can focus on the business logic, while Flowman takes care of executing the data flow and managing the data models.

Being built on top of Apache Spark, Flowman can process both small amounts of data on a local machine and scale out to large clusters of multiple machines (Hadoop, Kubernetes, AWS EMR and Azure Synapse) for processing terabytes of data.

Transform data with Flowman and use it for BI, ML or Analytics

How you will benefit from Flowman

Flowman blog and change log

Flowman 1.1.0 released

We are happy to announce the release of Flowman 1.1.0. This release contains many small improvements and bugfixes. Flowman now finally supports Spark 3.4.1. Major

October 17, 2023

Flowman — A Declarative ETL Framework for Apache Spark

Don’t reinvent the wheel by writing more boilerplate code. Focus on critical business logic and delegate the tricky details to a clever tool. Introduction Apache

May 31, 2023

Flowman at Smartclip

smartclip is a successful and growing company specialized for online video advertisement. More importantly, smartclip was one of the first companies implementing Flowman for their

May 26, 2023

Flowman 1.0.0 released

Flowman version 1.0.0 has finally arrived. For several years, multiple companies are using Flowman in production as a robust and reliable solution for efficiently building

May 3, 2023

Flowman 1.0 has landed

We are excited and proud to announce the official release of Flowman 1.0. Flowman is a tool for performing complex data transformations in a structured

May 3, 2023

Projects delivered with Flowman

Online Adversting

Online advertising produces huge amounts of data on a daily basis. In order to provide meaningful insights, all this data needs to be integrated and aggregated to meaningful dimensions. Flowman has been implemented successfully to create multiple pre-aggregated data marts. By relying on the declarative specification, business experts can be easily involved for reviewing.

Complex ETL

Flowman has been successfully implemented in a microservice project in the financial service industry. The project uses Kafka for intra-service communication and Flowman is used to process relevant messages in a Data Lake built from Kafka without the need to connect all services individually.

Customer facing reporting

The art of making sense of millions of detailed records from multiple source systems by providing a high level and holistic view is at the core of customer facing reporting in B2B scenarios. Flowman is the right tool for integrating different data sources, applying complex business logic and storing aggregated tables into your reporting backend.

Copyright © The Flowman Authors | Kaya Kupferschmidt | Freiherr-vom-Stein Straße 3, 60323 Frankfurt, Germany | +49 69 71588909 | info@flowman.io

Webdesign by Katharina Vennewald