Datasets and example code for Lesson 4 in Oracle Academy's Data Science Bootcamp. This lesson contains a number of datasets.
- ncdc_parse.hql and ncdc_parser.py provide HiveQL and python script for parsing the NCDC data in the data folder
- tree_building.R provides a script for building a decision tree in R
- weather_ooze provides a set of Hive and Pig+Weka scripts for deploying an Oozie workflow for model evaluation
- olh provides loading script for Oracle Loader for Hadoop
- pmml provides the complete source for deploying a model saved as PMML vida Cascading (requires gradle and Cascading to build)
- data provides 3 years of weather station data for California
- weather_sample and weather_sample2 provide samples for tree_building.R