Product

Overview

What is Flowman?
Flowman is a declarative data build tool based on Apache Spark.

Everything as Code
Simple YAML files support proven workflows with source code management, code reviews and CI/CD pipelines.

Declarative Spark
With its declarative approach, Flowman removes the complexity of writing robust Spark applications and let your developers focus on the business logic instead.

Development Workflow
By using simple YAML files, Flowman easily supports collaboration between developers. An optional integration with Apache Maven simplifies CI/CD processes.
Users

Data Engineers
Learn how Flowman reduces the cognitive load of data engineers.

Operations Teams
Learn how Flowman supports your operations.
Community
Get Started

Overview
Install and try out Flowman, or simply request a demo session.

Download Flowman
The latest Flowman release for local installation

Run in Docker
The simplest way to get started with Flowman

Install Locally
How to set up Apache Spark and Flowman on your local machine, step by step.
Learn

Reference Documentation
Flowman provides a rich and extensive documentation with concepts, tutorials and reference.

Blog
Read background stories about Flowman and find the release informations.

FAQ
Find answers to commonly asked questions

100% Open Source

Transforming Big Data

Apache Spark. Extended. Declarative.

Flowman is a declarative ETL framework and data build tool powered by Apache Spark. It reads, processes and writes data from and to a huge variety of physical storages, like relational databases, files, and object stores. It can easily join data sets from different source systems for creating an integrated data model. This makes Flowman a powerful tool for creating complex data transformation pipelines for the modern data stack.

For defining all data sources, sinks and the transformations between them, Flowman follows a purely declarative approach using plain YAML files. Developers can focus on the business logic, while Flowman takes care of executing the data flow and managing the data models.

Being built on top of Apache Spark, Flowman can process both small amounts of data on a local machine and scale out to large clusters of multiple machines (Hadoop, Kubernetes, AWS EMR and Azure Synapse) for processing terabytes of data.

Transform data with Flowman and use it for BI, ML or Analytics

How you will benefit from Flowman

Flowman blog and change log

Flowman 1.1.0 released

We are happy to announce the release of Flowman 1.1.0. This release contains many small improvements and bugfixes. Flowman now finally supports Spark 3.4.1. Major

October 17, 2023

Flowman — A Declarative ETL Framework for Apache Spark

Don’t reinvent the wheel by writing more boilerplate code. Focus on critical business logic and delegate the tricky details to a clever tool. Introduction Apache

May 31, 2023

Flowman at Smartclip

smartclip is a successful and growing company specialized for online video advertisement. More importantly, smartclip was one of the first companies implementing Flowman for their

May 26, 2023

Flowman 1.0.0 released

Flowman version 1.0.0 has finally arrived. For several years, multiple companies are using Flowman in production as a robust and reliable solution for efficiently building

May 3, 2023

Flowman 1.0 has landed

We are excited and proud to announce the official release of Flowman 1.0. Flowman is a tool for performing complex data transformations in a structured

May 3, 2023

Projects delivered with Flowman

Online Adversting

Online advertising produces huge amounts of data on a daily basis. In order to provide meaningful insights, all this data needs to be integrated and aggregated to meaningful dimensions. Flowman has been implemented successfully to create multiple pre-aggregated data marts. By relying on the declarative specification, business experts can be easily involved for reviewing.

Complex ETL

Flowman has been successfully implemented in a microservice project in the financial service industry. The project uses Kafka for intra-service communication and Flowman is used to process relevant messages in a Data Lake built from Kafka without the need to connect all services individually.

Customer facing reporting

The art of making sense of millions of detailed records from multiple source systems by providing a high level and holistic view is at the core of customer facing reporting in B2B scenarios. Flowman is the right tool for integrating different data sources, applying complex business logic and storing aggregated tables into your reporting backend.

Copyright © The Flowman Authors | Kaya Kupferschmidt | Freiherr-vom-Stein Straße 3, 60323 Frankfurt, Germany | +49 69 71588909 | info@flowman.io

Webdesign by Katharina Vennewald