JingjingGuo-spark-interview

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── bash               <- vagrant virtual machine provision bash files      
│   └── vagrant_spark_env.sh
│
├── data
│   ├── samples        <- Sample data provided by Deloitte.
│   ├── assmbled       <- Assembled data from assembling 3 sample files.
│   └── featurized     <- Encoded and scaled data.
│
├── models             <- Trained models.
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- Makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to assemble provided data files.
│   │   ├── assemble_data.py
│   │   └── Samples.py <- Sample class.
│   │
│   ├── features       <- Scripts to turn assembled data into features for modeling
│   │   ├── featurize_data.py
│   │   └── Features.py<- Features class
│   │
│   └── models         <- Scripts to train and save models
│       └── model_fitting.py
│
└── vagrantfile        <- vagrantfile to spin up Centos 7 virtual machine and provision Spark and python environment.

Execution Instructions

Step 1 - Environment setup
make install
Step 2 - Log in to CentOS 7
make ssh
Step 3 - Navigate to sync folder
cd spark-model
Step 4 - Assemble Data
make assemble
Step 5 - Model Fitting
make classifier
Step 6 - Exit and Cleanup
ctrl+D
make clean

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bash

bash

data/samples

data/samples

src

src

.DS_Store

.DS_Store

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

Vagrantfile

Vagrantfile

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

JingjingGuo-spark-interview

Project Organization

Execution Instructions

Step 1 - Environment setup

Step 2 - Log in to CentOS 7

Step 3 - Navigate to sync folder

Step 4 - Assemble Data

Step 5 - Model Fitting

Step 6 - Exit and Cleanup

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bash		bash
data/samples		data/samples
src		src
.DS_Store		.DS_Store
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Vagrantfile		Vagrantfile
requirements.txt		requirements.txt
setup.py		setup.py

License

jingjingguo1/spark-interview

Folders and files

Latest commit

History

Repository files navigation

JingjingGuo-spark-interview

Project Organization

Execution Instructions

Step 1 - Environment setup

Step 2 - Log in to CentOS 7

Step 3 - Navigate to sync folder

Step 4 - Assemble Data

Step 5 - Model Fitting

Step 6 - Exit and Cleanup

About

Resources

License

Stars

Watchers

Forks

Languages