Guide to Modifying code for Various Modelling Methods

Regardless of attributes used to modeling, the overall flow should be the same. Overall process should be: read in data, process into modeling format, split into training and testing, convert to LabeledPoint object, pass to SVM.

Dependencies

Apache Spark
NumPy(for Spark's SVM to run)
YARN(optional)

Other Notes

Make sure all the CSV files are in the same directory on HDFS
Changes the code that reads from CSV files to point to the correct directory on HDFS
Every code chunk has a different level of data granularity. Edit the code depending on what you feel is the correct granularity/set

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
main.py		main.py
main2.py		main2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

main.py

main.py

main2.py

main2.py

Repository files navigation

Guide to Modifying code for Various Modelling Methods

Dependencies

Other Notes

About

Releases

Packages

Languages

ZiXian92/cs4225project

Folders and files

Latest commit

History

Repository files navigation

Guide to Modifying code for Various Modelling Methods

Dependencies

Other Notes

About

Resources

Stars

Watchers

Forks

Languages