100% Open Source

Flow through data pipelines with Flowman

Flowman is a powerful and open source data build tool powered by Apache Spark that follows a declarative approach to simplify the act of writing complex ETL, ELT and data transformation applications. The strong focus on transformation and schema management reduces your development efforts for creating robust data pipelines.

Being built on top of Apache Spark, Flowman can be run as a standalone application but can also scale by using compute clusters (Hadoop & Kubernetes) to process any amounts of data.

Focus on business logic instead of Spark boilerplate code!

Flowman Declarative Data Flows with Spark

Transform data with Flowman and use it for BI, ML or Analyitcs

Flowman explained

How you will benefit from Flowman

Flowman Declarative Code

01. Simple to learn

Lightweight specification of data models, transformations and build targets using declarative syntax instead of complex application code.

Flowman Interactive Shell

02. Modern

Modern development methodology following the "everything is code" approach supporting collaboration via arbitrary VCS. Support for self contained unittest, automatic documentation and data quality checks.

Flowman Execution Phases

03. Batteries included

Full lifecycle management of your data models, including creating target tables, automatic migration, and possibly final removal. Automatic documentation of data flows including lineage and quality checks. Job history server. Business defined execution metrics.

Flowman Declarative Code

01. Declarative Approach

Lightweight specification of data models, transformations and build targets using declarative syntax instead of complex application code.

Flowman Interactive Shell

02. Modern

Modern development methodology following the "everything is code" approach supporting collaboration via arbitrary VCS. Support for self contained unittest, automatic documentation and data quality checks.

Flowman Execution Phases

03. Batteries included

Full lifecycle management of your data models, including creating target tables, automatic migration, and possibly final removal. Automatic documentation of data flows including lineage and quality checks. Job history server. Business defined execution metrics.

Flowman blog and change log

09-09-2022

The new version 0.27.0 of Flowman has been released. Among many improvements and new features, this release takes the support for working with JDBC data sources and sinks even further. You can now execute arbitrary SQL commands as part of the build process to provide a way to handle database specific features.

27-07-2022

This latest release of Flowman contains a lot of work with a strong focus on improving working with JDBC targets like MariaDB/MySQL, Postgres, MS SQL Server, Azure SQL and Oracle. For example, column collations and comments are now correctly propagated into relational databases, changing the primary key is now supported and much more.

Also Spark 3.3 is now officially supported, albeit not much tested so far. Moreover many small bug fixes and enhancements help to make Flowman more robust and versatile.

31-06-2022

This latest release contains a couple of smaller changes and fixes plus a new relation for creating and managing views in SQL databases, which are accessed via JDBC. This  addition strengthens Flowmans position in environments where data resides in classical relational databases instead of HDFS or object storages.

04-05-2022

The latest Flowman release now provides a YAML/JSON to enable syntax highlighting and auto complete in code editors like IntelliJ and Visual Studio Code.

03-29-2022

The latest Flowman release addresses some issues and contains important improvements, especially when working with SQL databases as targets.

03-18-2022

The main feature of the newest version of Flowman focus on significant improvements for the automatic documentation. Moreover staging tables are now supported for all SQL databases to allow transactional data updates.

Projects delivered with Flowman

Online Adversting

Online advertising produces huge amounts of data on a daily basis. In order to provide meaningful insights, all this data needs to be integrated and aggregated to meaningful dimensions. Flowman has been implemented successfully to create multiple pre-aggregated data marts. By relying on the declarative specification, business experts can be easily involved for reviewing.

Financial Services

Flowman has been successfully implemented in a microservice project in the financial service industry. The project uses Kafka for intra-service communication and Flowman is used to collect and process relevant messages directly from Kafka without the need to connect all services individually.

Customer facing reporting

The art of making sense of millions of detailed records from multiple source systems by providing a high level and holistic view is at the core of customer facing reporting in B2B scenarios. Flowman is the right tool for integrating different data sources, applying complex business logic and storing aggregated tables into your reporting backend.