├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── bash <- vagrant virtual machine provision bash files
│ └── vagrant_spark_env.sh
│
├── data
│ ├── samples <- Sample data provided by Deloitte.
│ ├── assmbled <- Assembled data from assembling 3 sample files.
│ └── featurized <- Encoded and scaled data.
│
├── models <- Trained models.
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.py <- Makes project pip installable (pip install -e .) so src can be imported
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to assemble provided data files.
│ │ ├── assemble_data.py
│ │ └── Samples.py <- Sample class.
│ │
│ ├── features <- Scripts to turn assembled data into features for modeling
│ │ ├── featurize_data.py
│ │ └── Features.py<- Features class
│ │
│ └── models <- Scripts to train and save models
│ └── model_fitting.py
│
└── vagrantfile <- vagrantfile to spin up Centos 7 virtual machine and provision Spark and python environment.
make install
make ssh
cd spark-model
make assemble
make classifier
ctrl+D make clean
Project based on the cookiecutter data science project template. #cookiecutterdatascience