Product

Overview

What is Flowman?

Flowman is a declarative data build tool based on Apache Spark.

Everything as Code

Simple YAML files support proven workflows with source code management, code reviews and CI/CD pipelines.

Declarative Spark

With its declarative approach, Flowman removes the complexity of writing robust Spark applications and let your developers focus on the business logic instead.

Development Workflow

By using simple YAML files, Flowman easily supports collaboration between developers. An optional integration with Apache Maven simplifies CI/CD processes.

Users

Data Engineers

Learn how Flowman reduces the cognitive load of data engineers.

Operations Teams

Learn how Flowman supports your operations.
Community
Get Started

Overview

Install and try out Flowman, or simply request a demo session.

Download Flowman

The latest Flowman release for local installation

Run in Docker

The simplest way to get started with Flowman

Install Locally

How to set up Apache Spark and Flowman on your local machine, step by step.
Learn

Reference Documentation

Flowman provides a rich and extensive documentation with concepts, tutorials and reference.

Blog

Read background stories about Flowman and find the release informations.

FAQ

Find answers to commonly asked questions

Katharina Vennewald

January 6, 2022
11:00 am

Flowman 0.20.1 Released!

The new version 0.20.1 of Flowman has been released a couple of days ago. Among new features
and bug fixes you will find support for merge operations for Delta Lake and JDBC targets. With this new output target you can finally implement incremental processing as required for example by CDC (Change Data Capture). In addition the logic for detecting dirty targets has been improved to correctly handle more cases.

Detailed Changes

Fix detection of Derby metastore to truncate comment lengths.
Add new config variable flowman.default.relation.input.columnMismatchPolicy (default is IGNORE)
Add new config variable flowman.default.relation.input.typeMismatchPolicy (default is IGNORE)
Add new config variable flowman.default.relation.output.columnMismatchPolicy (default is ADD_REMOVE_COLUMNS)
Add new config variable flowman.default.relation.output.typeMismatchPolicy (default is CAST_ALWAYS)
Improve handling of _SUCCESS files for detecting (non-)dirty directories
Implement new merge target
Implement merge operation for Delta relations
Implement merge operation for JDBC relations (only for some databases, i.e. MS SQL)
Add new config variable flowman.execution.target.useHistory (default is false)
Change the semantics of config variable flowman.execution.target.forceDirty (default is false)
Add new -d / --dirty option for explicitly marking individual targets as dirty

About Flowman

Flowman is an open source data build tool on top of Apache Spark which uses a declarative approach for specifying the full data flow including all sources, targets and transformation. Like usual, you can find the latest version of Flowman prebuilt for different Spark / Hadoop versions at https://flowman.io

Katharina Vennewald

Flowman 0.20.1 Released!

Detailed Changes

About Flowman

Flowman 1.1.0 released

Flowman — A Declarative ETL Framework for Apache Spark

Flowman at Smartclip

About

Resources

Get in touch