Flowman version 1.0.0 has finally arrived. For several years, multiple companies are using Flowman in production as a robust and reliable solution for efficiently building data transformation pipelines. Therefore it only made sense to leave the zero-versions behind and increase the major version number to “1” to underline the robustness of Flowman.
Please also don’t miss our story behind the 1.0 release!
Major Features
Aside from the version number itself, there are some exciting new major features available in Flowman:
- New client/server applications allows a developer to connect to a remote Flowman server, but still perform interactive development with the Flowman remote shell. This feature is still experimental and will receive enhancements within the next releases.
- You can now create a
.flowman-env.yml
file for locally override settings. - Azure Synapse is officially supported as a deployment target.
- Lots of documentation has been updated, added and extended.
- More integration tests have been added as an additional quality gate.
Detailed Changes
- github-314: Move avro related functionality into separate plugin
- github-307: Describe Flowmans security policy in SECURITY.md
- github-315: Create build profile for CDP 7.1 with Spark 3.3
- github-317: Automatically retry on failing JDBC commands
- github-318: Support mappings from different projects and with non-standard outputs in SQL
- github-140: Strictly check imports
- github-316: Beautify README.md
- github-310: Explain versioning policy in CHANGELOG.md
- github-313: Improve example for “observe” mapping
- github-319: Support Oracle for History Server
- github-320: Do not fall back to “inline” schema when no kind is specified
- github-321: [BUG] Properly support lower case / upper case table names in Oracle
- github-309: Automate integration tests
- github-322: Remove flowman-client
- github-324: Log environment variables for imported projects
- github-329: Create Kernel API
- github-330: Implement Kernel Server
- github-331: Implement Kernel Client
- github-332: Build Flowman Shell on top of kernel Client/Server
- github-334: Create standalone Flowman Kernel application
- github-338: Update Spark to 3.3.2
- github-333: Forward Logs from Kernel to Client
- github-339: Set Copyright to “The Flowman Authors”
- github-345: [BUG] Loading an embedded schema inside a jar file should not throw an exception
- github-346: Create build profile for Databricks
- github-343: Log all client requests in kernel
- github-342: Automatically close session when the client disconnects from kernel
- github-351: [BUG] Failing execution listener instantiation should not fail a build
- github-347: Exclude AWS SDK for Databricks and EMR build profiles
- github-352: [BUG] Spark sessions should not contain duplicate jars from different plugins
- github-353: Successful runs should not use System.exit(0)
- github-354: Optionally load custom log4j config from jar
- github-358: Provide different log4j config for Flowman server and kernel
- github-359: Update jline dependency
- github-357: Spark session should not be shut down in Databricks environment
- github-360: Logging should exclude more Databricks specific stuff
- github-361: Work around low-level API differences in DataBricks
- github-363: HiveDatabaseTarget should accept an optional location
- github-311: Create integration test for EMR
- github-362: Upgrade EMR to 6.10
- github-369: [BUG] Prevent endless loop in Kernel client, when getContext fails
- github-370: The Kernel client should use temporary workspaces with automatic cleanup
- github-337: Add documentation for flowman-rshell
- github-336: Add documentation for flowman-kernel
- github-366: Feature parity between Flowman shell and Flowman remote shell
- github-365: Implement saving mappings in Flowman Kernel/client
- github-367: Create integration test for “quickstart” archetype
- github-375: [BUG] “project reload” does not work correctly in remote shell with nested directories
- github-376: Document options to parallelize work
- github-378: Remove travis-ci integration
- github-308: Revise branching model
- github-381: Remove json-smart dependency
- github-382: [BUG] Parallel execution of multiple dq checks runs too many checks on Java 17
- github-384: Improve documentation for using docker-compose
- github-377: Load override config/env from .flowman-env.yml
- github-344: Support .flowman-ignore file for Flowman Kernel client
- github-385: Update Flowman tutorial
- github-386: Create Integration Test for Azure Synapse
- github-387: Remove scala-arm dependency
- github-390: Rename “master” branch to “main”
- github-392: [BUG] ‘relation’ mapping should support numeric partition values
- github-393: Move Maven archetype to flowman-maven project
- github-394: [BUG] The Spark job group and description are not set for sql assertions
- github-395: Support optional file locations for project imports
- github-397: Automate build using GitHub actions
- github-403: Upgrade Spark 3.2 to 3.2.4
- github-404: [BUG] Partition columns do not support Timestamp data type
- github-409: [BUG] Fix build for AWS EMR 6.10 and Azure Synapse 3.3
- github-407: Update Delta to 2.3.0 for Spark 3.3
- github-406: Improve integration tests to automatically pick up the current Flowman version
- github-408: Make use of DeltaLake in Synapse integration test
- github-405: Document deployment to EMR and Azure Synapse
Breaking Changes
This version introduces some (minor) breaking changes
- All Avro related functionality is now moved into the new “flowman-avro” plugin. If you rely on such functionality, you explicitly need to include the plugin in the
default-namesapce.yml
file. - Imports are now strictly checked. This means when you cross-reference some entity in your project which is provided by a different Flowman project, you now need to explicitly import the project in the
project.yml
. - The “
kind
” for schema definitions is now a mandatory attribute, Flowman will no longer fall back to an “inline
” schema.