EECE5645 Parallel Processing Data Analytics Final Project
Professor Ioannidis
Log into the discovery cluster and run this while having a cluster checked out.
conda create --name final python=3.7
conda activate final
conda install tensorflow-gpu
pip3 install tensorflow sklearn --user
pip3 install -e elephas/ --user
Download the bitcoin dataset and place the csv (after extracting the zip) in the data/raw folder. Rename the file to bitstampUSD.csv
python src/make_dataset.py
PySpark
Discovery Keras
Discovery GPU
Run the src/train_random_forest__ml_lib.py using the following command:
spark-submit --master local[40] --executor-memory 100G --driver-memory 100G train_random_forest_ml_lib.py
The above command can also be used for train_gradient_boost_mllib.py