Apache Airflow with Docker on Windows

OK, so you really have to use Windows in order to run Airflow? In this post I want to describe how to setup your environment in order to:

We will be using VirtualBox since I was unable to make mounted folders work with HyperV correctly. I will also use Powershell or CMD in order to run the Docker commands. Howe

Prerequisites

You should be familiar with basic Docker concept and usage as well as Airflow in order to follow this tutorial.

You also need to:

   Disable-WindowsOptionalFeature 
   		-Online 
   		-FeatureName Microsoft-Hyper-V-All
  • Install VirtualBox
  • Fork/Clone Puckels Docker-Airflow GIT repo to C:/Users/YourUsername/Documents . It is important that you put it under documents since by default the boot2docker-VirtualBox setup mounts this folder to the VM

Run the Docker machine with VirtualBox

Create the VirtualMachine (named “default”)

	docker-machine create 
		--driver virtualbox 
		default

See additional docker-machine create parameters here. To verify that the machine was successfully created run docker-machine ls

If everything worked out so far you should see a new VM pop-up when you open VirtualBox. Run the machine by entering docker-machine run default into the Powershell or CMD-Window

Add the docker-host to the current Powershell/CMD environment

In Powershell/CMD run doker-machine env

$ docker-machine env
$Env:DOCKER_TLS_VERIFY = "1"
$Env:DOCKER_HOST = "tcp://192.168.99.101:2376"
$Env:DOCKER_CERT_PATH = "C:\Users\yourUsername\.docker\machine\machines\dev"
$Env:DOCKER_MACHINE_NAME = "default"

Then run docker-machine.exe env | Invoke-Expression. This will add the environment variables for your current open Powershell/CMD session. Your docker-commands now know which DOCKER_HOST to use and where to find the certificates for the secured communication betwen docker client and docker-host(running inside the boot2docker image)

Adapt the docker file for correct file permissions

In order to get the right folder permissions for the dags and logs folder (I assume that you want local logfiles with a celery setup so that the Scheduler container can access the worker logs as if they were local) we need to create the folders with the right permissions from within the docker file. The following example shows part of the adapted dockerfile

 # [...]
User root
 # [...]
COPY script/entrypoint.sh /entrypoint.sh
COPY config/airflow.cfg ${AIRFLOW_HOME}/airflow.cfg

RUN mkdir ${AIRFLOW_HOME}/logs
RUN mkdir ${AIRFLOW_HOME}/dags
RUN chmod -R 777 ${AIRFLOW_HOME}/logs
RUN chown -R airflow: ${AIRFLOW_HOME}
RUN chmod -R 774 ${AIRFLOW_HOME}/dags

EXPOSE 8080 5555 8793
 # [...]
User airflow
 # [...]

Adapt the docker-compose file to mount volumes for dags and logs

The trickiest part setting up Airflow with Docker under Windows was getting the mounted folders to work. We want the dag and plugin folder to be mounted in the container.

Make sure to checkout Puckels Docker repo underneath C:/Users/YourUsername/Documents

In order to mount the DAG folder you need to map it to your hosts filesystem. In order to stay compatible with Windows and Linux (where the productive Airflow setup will probably run) you can use the PWD command in the docker-compose file like this:

 # [...]
    webserver:
        image: puckel/docker-airflow:1.9.0-2
        # [...]
        volumes:
            - ${PWD-.}/dags/:/usr/local/airflow/dags
            - ${PWD-.}/plugins:/usr/local/airflow/plugins
            - ${PWD-.}/helper:/usr/local/airflow/helper
            - airflowlogs:/usr/local/airflow/logs
 # [...]
 volumes:
        airflowlogs: {}
 # [...]      

Add the paths for your airflow webserver, scheduler and worker!

${VARIABLE:-default} evaluates to default if VARIABLE is unset or empty in the environment.

${VARIABLE-default} evaluates to default only if VARIABLE is unset in the environment.

I created a named volume for the logs called “airflowlogs”.

Run Airflow with docker-compose

A simple docker-compose -f docker-compose-CeleryExecutor.yml up -d will now spin up the containers and the Airflow Web UI should be reachable under localhost:8080.

Some usefull commands when working with Docker and airflow

# see all running container
docker ps
  
# see logs of a container
docker logs --tail 50 --follow --timestamps dockerairflow_webserver_1
  
# enter running container bash
docker exec -it dockerairflow_webserver_1 bash
  
# stop all running containers
docker ps -a | grep dockerairflow_ | awk '{print $1}' | xargs -r docker stop
  
# force remove a container that does not want to go away
docker rm -f dockerairflow_webserver_1
  
# delete all docker volumes
docker volume prune 

Are you missing anything? Still cannot get Airflow to work on Windows? Drop a comment, I will try to help!

Comments