A simple Apache Airflow [Alpine Linux] Docker image with as little magic as possible.
This project is licensed under the Apache 2.0 Licence. See LICENCE for more information.
The image is build mainly for learning purposes and depends on Python Alpine Linux.
The versioning scheme is <Native Airflow Version>[-<Optional Numeric Suffix>]. Like, "1.10.9" or "2.0.0-2".
Some notes:
- Timezones are supported.
- UTF-8 is supported out of the box.
- Building will be slow because wheel packages (like pandas and numpy) aren't supported on Alpine Linux, so we have to compile them from sources. Checkout the discussion for more info: https://stackoverflow.com/questions/49037742/why-does-it-take-ages-to-install-pandas-on-alpine-linux.
- Alpine Linux and musl have an incomplete locale support (gliderlabs/docker-alpine#144 (comment)), so if your code requires juggling with LC_ALL and similar stuff, the image setup will be non-trivial.
To build it without docker-compose:
docker build -t jjjax/airflow-docker-light -f Dockerfile .
It's easier to use the sample compose file:
cp docker-compose.{sample,}.yml
docker-compose build
# Will run:
# 1. PostgreSQL.
# 2. Airflow Scheduler which upgrades the db backend.
# 3. Airflow Webserver which will be available under http://localhost:8080.
docker-compose run
Build Arguments Table:
Argument | Default | Comment |
---|---|---|
AIRFLOW_VERSION | 1.10.9 | The version will be installed at build time |
AIRFLOW_HOME | /opt/airflow | If modified, don't forget to sync your docker-compose.yml and other stuff. |
AIRFLOW__CORE__FERNET_KEY | "" | If provided, the entrypoint.sh will use the value as is; else, the value from ${AIRFLOW_HOME}/fernet.key will be used. |
AIRFLOW_DEPS | "" | Provided as "mysql,gcp,hdfs" |
PYTHON_DEPS | src/requirements.sample.txt | The default file is empty; you may put a custom file into src/ and it will be installed with pip |
LINUX_DEPS | bash | Provided as "bash gcc make" |
TIMEZONE | UTC | For example, "Europe/Moscow" |
There is almost zero magic in the entrypoint.sh, so you may want to keep it. It has only two actions:
- If AIRFLOW__CORE__FERNET_KEY env is set, it will be used as is. Else, the value from AIRFLOW_HOME/fernet.key will be used. That's a reason why you should share the AIRFLOW_HOME via a volume for multiple airflow containers.
- if UPGRADE_DB env is set, the "airflow upgradedb" command will be executed.
The build argument table is provided above. Checkout the Dockerfile to understand their behavior.
The docker-compose.sample.yml provides a practical example. You should note that the scheduler and webserver have either share their AIRFLOW_HOME, if you want to leverage an auto-generated AIRFLOW_HOME/fernet.key, or you should provide the key via AIRFLOW__CORE__FERNET_KEY.
Build the test image which differs by PYTHON_DEPS (the requirements will be installed from requirements.test.txt):
docker-compose -f docker-compose.test.yml build
Run tests "service" (target) and stop its dependencies after completion:
docker-compose -f docker-compose.test.yml run --rm airflow_tests
docker-compose -f docker-compose.test.yml down
Build the test image, like in Auto Testing:
docker-compose -f docker-compose.test.yml build
Run the image dependencies:
# While the containers are running, we can call our tests from IDE.
docker-compose -f docker-compose.test.yml run --rm --name airflow_scheduler airflow_scheduler
After you've finished experimenting, stop the dependencies:
docker-compose -f docker-compose.test.yml down
I've (max.preobrazhensky@gmail.com) made it just for fun: to learn and explore the Airflow hands-on. Thanks to Matthieu "Puckel_" Roisil (https://github.com/puckel/docker-airflow/tree/master) for a starting point.
-
No hardcoded AIRFLOW_HOME (env files and build arguments)
-
Decouple from PostgreSQL. It should get by SQLite and SequentialExecutor by default
-
Provide an easier way to turn-off the encryption (AIRFLOW__CORE__FERNET_KEY).