The new version 0.20.1 of Flowman has been released a couple of days ago. Among new features
and bug fixes you will find support for merge operations for Delta Lake and JDBC targets. With this new output target you can finally implement incremental processing as required for example by CDC (Change Data Capture). In addition the logic for detecting dirty targets has been improved to correctly handle more cases.
Detailed Changes
- Fix detection of Derby metastore to truncate comment lengths.
- Add new config variable
flowman.default.relation.input.columnMismatchPolicy
(default isIGNORE
) - Add new config variable
flowman.default.relation.input.typeMismatchPolicy
(default isIGNORE
) - Add new config variable
flowman.default.relation.output.columnMismatchPolicy
(default isADD_REMOVE_COLUMNS
) - Add new config variable
flowman.default.relation.output.typeMismatchPolicy
(default isCAST_ALWAYS
) - Improve handling of
_SUCCESS
files for detecting (non-)dirty directories - Implement new
merge
target - Implement merge operation for Delta relations
- Implement merge operation for JDBC relations (only for some databases, i.e. MS SQL)
- Add new config variable
flowman.execution.target.useHistory
(default isfalse
) - Change the semantics of config variable
flowman.execution.target.forceDirty
(default isfalse
) - Add new
-d
/--dirty
option for explicitly marking individual targets as dirty
About Flowman
Flowman is an open source data build tool on top of Apache Spark which uses a declarative approach for specifying the full data flow including all sources, targets and transformation. Like usual, you can find the latest version of Flowman prebuilt for different Spark / Hadoop versions at https://flowman.io