Skip to content

HannahhoHe/Fear-ReFactor-Mask-R-CNN-Transfer-Learning

Repository files navigation

What is Fear ReFactor?

Fear ReFactor is a web app that masks phobia objects on YouTube videos. Currently, the specific objects, 'clown', 'dog', 'bird', and 'teddy bear' are available. Notice that Fear ReFactor requires GPU to run properly. If you prefer to run this streamlit app on your local computer, use this code and don't forget to type in your email login credentials (line 526 and line 535). Otherwise, this streamlit web app is available here, deployed on Amazon EC2 (p2 instance). Please email me at Dr.HeHannah@gmail.com if the port is not open.

This repo focuses on building a model to detect and mask ALL 'clown' objects throughout a YouTube video. To achieve this, I performed transfer learning with matterport/Mask_RCNN. Below is a post-masked clown video from YouTube.

Fear ReFactor Workflow

Fear ReFactor takes every frame from the video, running through the Mask R-CNN models built in this repo, and re-constructs all the processed frames to a video, stored in an Amazon S3 bucket. Every processed video should have the clown object completely masked. This code includes the process of parsing video and audio frames, and constructing and playing a video.

Building up Mask R-CNN models

To prepare 1000+ training images, I scraped both Google static images and YouTube video images. This code shows you how to use Google Selenium and headless Chrome to scrape large amounts of image data in AWS/EC2 [The original code was shared by Fabian Bosler]. Image annotation was performed by LabelImg and by which XML files in PASCAL VOC format were generated.

Since a clown is a complicated object which shares the same feature as a person as well as non-human but colorful objects, to achieve a clown instance segmentation in pixel level, we can expect huge amounts of training images will be required. To minimize the training process but still get a good clown-centered mask, I decided to implement two Mask R-CNN models - one with transfer learning to train a 'clown face' vs 'non-clown objects', and one with the pre-trained COCO model - to maximize the use of MS COCO mask. The transfer-learning code is available here. After training, this code can test different epoch models and generate batch images to visualize where the bounding box is. Statistical analysis and more plotting show that a ~100% recall rate is possible by lowering the detection minimum confidence. A non-streamlit actionable code is also available here.

About

Insight Data Science Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published