Plankton images were classified by extracting global(haralick, zernike, binary pattern, image_size and ratio) and local features(SURF). The code structured followed a pipelined proces of preprocessing, feature extraction, feature selection and model evaluation.
Note that the data of the train/test images are not included in this repo due to storage limitations.
Visit https://www.kaggle.com/c/1stdsbowl-in-class/data to download the data
The code for classyfing plankton species consists the following files:
Script that will preprocess the images suitbale for feature extracting.
Script that will extract SURF features for each image.
Script that will extract global features for each image.
Script that will train a model based on the train images and features.
Script that evaluates the model on the test data and is able to create a submission for Kaggle.
Script that visualizes some evaluation metrics of the models.
The files are structured as a pipeline.
- Run pre.py (set test=False) -> input: image paths, output: preprocess.pkl
- Run surf.py -> input: preprocess.pkl, output: surf.pkl
- Run features.py -> input: surf.pkl, output: features.pkl
- Run training.py -> input: features.pkl, output: model.pkl
- Run pre.py (set test=True) -> input: image paths, output: preprocess_test.pkl
- Run surf.py -> input: preprocess_test.pkl, output: surf_test.pkl
- Run features.py -> input: surf_test.pkl, output: features_test.pkl
- Run test.py -> input: features.pkl, features_test.pkl, output: submission.csv