Install and run Flowman on your local machine

Of course you can also run Flowman directly on your local machine, especially on a Linux machine. Windows users might consider installing Flowman inside WSL for the best experience.

Download & install Apache Spark

Although Flowman directly builds upon the power of Apache Spark, it does not provide a working Hadoop or Spark environment — and there is a good reason for that: In many environments (specifically in companies using Hadoop distributions) a Hadoop/Spark environment is already provided by some platform team. And Flowman tries its best not to mess this up and instead requires a working Spark installation.

The following step will install Apache Spark on your local machine. If you already have a working Spark installation with a version which is supported by Flowman, you may want to skip this section. Otherwise we download and install Spark 3.4.1 for Hadoop 3.3 which works nicely with the latest Flowman release 1.1.0.

				
					# Create an fresh playground directory, both for Spark and for Flowman
mkdir playground
cd playground

# Download and unpack Spark & Hadoop
curl -L https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz | tar xvzf -

# Create a nice link
ln -snf spark-3.4.1-bin-hadoop3 spark
				
			

Download & install Flowman

For this quickstart, we chose `flowman-dist-1.1.0-oss-spark3.4-hadoop3.3-bin.tar.gz` which nicely fits to the Spark package we just downloaded before. If you use your existing Spark and Hadoop installation, please use the appropriate download above.

				
					# Download and unpack Flowman
curl -L https://github.com/dimajix/flowman/releases/download/1.1.0/flowman-dist-1.1.0-oss-spark3.4-hadoop3.3-bin.tar.gz | tar xvzf -

# Create a nice link
ln -snf flowman-1.1.0-oss-spark3.4-hadoop3.3 flowman
				
			

Flowman configuration

Now before you can use Flowman, you need to tell it where it can find the Spark home directory which we just created in the previous step. This can be either done by providing a valid configuration file in flowman/conf/flowman-env.sh (a template can be found at flowman/conf/flowman-env.sh.template ), or you can simply set an environment variable. For the sake of simplicity, we follow the second approach

				
					# This assumes that we are still in the directory "playground"
export SPARK_HOME=$(pwd)/spark
				
			

In order to use some of the provided Flowman plugins, we also need to provide a default namespace which contains some basic configurations. We simply copy the provided template as follows:

				
					# Copy default namespace
cp flowman/conf/default-namespace.yml.template flowman/conf/default-namespace.yml
cp flowman/conf/flowman-env.sh.template flowman/conf/flowman-env.sh
				
			

Congratulation!

That’s it. Now you have a working Flowman installation. Continue reading the next section to learn how to use.

Get started with Flowman

If you are new to Flowman, the following guides, tutorials and resources will get you off to a start

Core
Concepts

Learn the fundamental ideas and concepts of Flowman.

Quickstart
Guide

A small quickstart guide will lead you through a simple example.

Online
Tutorial

Step-by-step introduction for learning how to succeed with Flowman.

Development
Workflow

Streamline your development workflow by making most of  all Flowman tools.