Skip to content

Fine tune Computer Vision models on custom dataset then serve the trained model with a REST API.

License

Notifications You must be signed in to change notification settings

Cyril9227/computer-vision-REST-API

Repository files navigation

Object Recognition App

The aim of this project is to highlight the different phases of a deep learning project, from data preparation to serving the final model through an app.

Specifically, the steps covered are :

  • Downloading and preparing the dataset
  • Training an object recognition model on Google Colab, using Detectron2 framework
  • Extending Detectron2 with custom neural networks
  • Serving the model on https://www.streamlit.io/
  • (Additional code to serve the model with a very basic REST API is also provided)

Preamble

This project is organized in two folders :

  • The MaskRCNN_finetune folder contains all the Deep Learning related code. Specifically, it contains code to download and extract the well known balloon dataset, it also contains code to extend Detectron2 with new models (MobileNetV2 and VoVNet-19) and to fine-tune them on our dataset.

  • The REST_API_flask contains the code to serve the trained model with an API built with flask.

This project is designed to run on Google Colab but should be reproducible without (too much) hassle on any linux machine with a cuda enabled device.

Please use notebooks/object_recognition.ipynb to run the deep learning code and notebooks/object_recognition_REST_API.ipynb to use the REST API.

The Computer Vision part : Mask R-CNN model

Introduction

In image classification problems, there is usually a single object of interest. For that specific object, we build models to predict whether that object belongs to a specific class. For example, given a picture of an animal, the model should tell you whether it is a cat or a dog.

However, the real world is much more complex. What if there are cats and dogs in the image? What if we need to exactly know where the dogs are and where the cats are? What if they are overlapping, having a dog walking in front of a cat?

Image segmentation techniques such as Mask R CNN allow us to detect multiple objects, classify them separately and localize them within a single image.

To start with, a good introduction read is: A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN

Retrain the model

Most Mask R-CNN model uses resnet101 as a backbone, which is a humongous model. According to the original paper, the inference time is lower bounded at 200ms which is very slow. We provide code to try out different backbone, a comparison table is available at the end of this document.

Please, upload notebooks/object_recognition.ipynb to Google Colab and run the cells to reproduce the results.

The AI API : Build a simple REST API using flask

The REST API uses ngrok and flask and is pretty straight forward in its current state. To use it, please upload notebooks/object_recognition_REST_API.ipynb to Google Colab and follow the cells.

The app will first download and instantiate the model when the API is launched then you can :

  1. Upload a local image

Input

  1. Run inference
    1. Predicted masks are displayed in the browser
    2. Predicted masks can be downloaded by the user

Output

Trained Models and Results

Model Inference Time AP50 (Val) AP50 (Test)
ResNet-50-FPN 134 ms 90 84
ResNet-101-FPN 179 ms 95 88
MobileNetV2-FPN 98.3 ms 95 62
VoVNet-19-FPN 95.6 ms 90 85

About

Fine tune Computer Vision models on custom dataset then serve the trained model with a REST API.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages