Skip to content

Project to use BackBlaze data set to predict hard drive failures

Notifications You must be signed in to change notification settings

ZiXian92/cs4225project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

Guide to Modifying code for Various Modelling Methods

Regardless of attributes used to modeling, the overall flow should be the same. Overall process should be: read in data, process into modeling format, split into training and testing, convert to LabeledPoint object, pass to SVM.

Dependencies

  • Apache Spark
  • NumPy(for Spark's SVM to run)
  • YARN(optional)

Other Notes

  • Make sure all the CSV files are in the same directory on HDFS
  • Changes the code that reads from CSV files to point to the correct directory on HDFS
  • Every code chunk has a different level of data granularity. Edit the code depending on what you feel is the correct granularity/set

About

Project to use BackBlaze data set to predict hard drive failures

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages