-
Pandas is used for general data wrangling.
-
Data can be read in directly from a text file into the pandas data frame, or via an SQL database.
-
Regression and Classifcation machine learning tasks are performed using scikit-learn.
process_data.in - exmaple input deck file containing the input parameters process_data.py - main program README.md - this read me file run - run script to run the main program
libClean/ - library containing the subroutines undertaking the data reading and cleaning clean.py - source to read data directly into a pandas dataframe clean_sql.py - source to read data into SQL database
libInputDeck/ - library containing source to process the input deck input_deck.py - source to process the input deck
libMachineLearning/ - library containing the supervised machine learning tools classification.py - classification specific class machine_learning.py - general base class regression.py - regression specific class
libVisualisation/ - library containing the subroutines for visualisation the output visualisation.py - source to produce standard matplotlib line plots visualisation_sb.py - source to produce seaborn correlation and pair plots
libTheano/ - library neural networks implemented using Theano - source code needs to be revised to integrate with above libaries
-
add pandas and SQL to a master database class with additional options for: MongoDB; Hadoop; Spark
-
integrate theano neural network source into machine learning library
-
upload data, add examples and associated results