Research project to build data analytics pipeline by Airflow.
- Introduction of data analytics pipeline
- Case study of data analytics pipeline
- Intoroduction to Apache Airflow
- Develop/Test/Deploy workflow
- Operation management and test for Airflow
- From Airflow to managed service
Set AIRFLOW_HOME
to current folder (if you need).
This setting is required when you execute airflow
commands (recommend to use .env file).
mkdir airflow
export AIRFLOW_HOME="$(pwd)/airflow"
Then, install airflow (without GPL libraries).
About Python 3.7 problem: Update Tenacity to 4.12
SLUGIFY_USES_TEXT_UNIDECODE=yes pip install apache-airflow tenacity==4.12.0 python-dotenv --no-binary=python-slugify
When pipenv.
PIP_NO_BINARY=python-slugify SLUGIFY_USES_TEXT_UNIDECODE=yes pipenv install apache-airflow tenacity==4.12.0 python-dotenv
Initialize the database.
airflow initdb
Now, we change the default dag
folder. So let's change dags_folder
setting in the airflow/airflow.cfg
.
dags_folder = /your_folder/airflow-ml-exercises/airflow_ml
Run web server.
airflow webserver --port 8080
If you want to refresh list of DAGs, execute following command.
python -c "from airflow.models import DagBag; d = DagBag();"