Flowman blog and change log

Flowman 0.20.1 Released!

The new version 0.20.1 of Flowman has been released a couple of days ago. Among new features
and bug fixes you will find support for merge operations for Delta Lake and JDBC targets. With this new output target you can finally implement incremental processing as required for example by CDC (Change Data Capture). In addition the logic for detecting dirty targets has been improved to correctly handle more cases.

Detailed Changes

  • Fix detection of Derby metastore to truncate comment lengths.
  • Add new config variable flowman.default.relation.input.columnMismatchPolicy (default is IGNORE)
  • Add new config variable flowman.default.relation.input.typeMismatchPolicy (default is IGNORE)
  • Add new config variable flowman.default.relation.output.columnMismatchPolicy (default is ADD_REMOVE_COLUMNS)
  • Add new config variable flowman.default.relation.output.typeMismatchPolicy (default is CAST_ALWAYS)
  • Improve handling of _SUCCESS files for detecting (non-)dirty directories
  • Implement new merge target
  • Implement merge operation for Delta relations
  • Implement merge operation for JDBC relations (only for some databases, i.e. MS SQL)
  • Add new config variable flowman.execution.target.useHistory (default is false)
  • Change the semantics of config variable flowman.execution.target.forceDirty (default is false)
  • Add new -d / --dirty option for explicitly marking individual targets as dirty

About Flowman

Flowman is an open source data build tool on top of Apache Spark which uses a declarative approach for specifying the full data flow including all sources, targets and transformation. Like usual, you can find the latest version of Flowman prebuilt for different Spark / Hadoop versions at https://flowman.io