Skip to content

icoxfog417/airflow-ml-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airflow for data analytics pipeline

Research project to build data analytics pipeline by Airflow.

Research

  1. Introduction of data analytics pipeline
  2. Case study of data analytics pipeline
  3. Intoroduction to Apache Airflow
  4. Develop/Test/Deploy workflow
  5. Operation management and test for Airflow
  6. From Airflow to managed service

Setup

Set AIRFLOW_HOME to current folder (if you need).
This setting is required when you execute airflow commands (recommend to use .env file).

mkdir airflow
export AIRFLOW_HOME="$(pwd)/airflow"

Then, install airflow (without GPL libraries).
About Python 3.7 problem: Update Tenacity to 4.12

SLUGIFY_USES_TEXT_UNIDECODE=yes pip install apache-airflow tenacity==4.12.0 python-dotenv --no-binary=python-slugify

When pipenv.

PIP_NO_BINARY=python-slugify SLUGIFY_USES_TEXT_UNIDECODE=yes pipenv install apache-airflow tenacity==4.12.0 python-dotenv

Initialize the database.

airflow initdb

Now, we change the default dag folder. So let's change dags_folder setting in the airflow/airflow.cfg.

dags_folder = /your_folder/airflow-ml-exercises/airflow_ml

Run web server.

airflow webserver --port 8080

If you want to refresh list of DAGs, execute following command.

python -c "from airflow.models import DagBag; d = DagBag();"

About

The repository to learn Machine Learning with Airflow

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages