Skip to content

JapneetSingh/Chronos-Quartz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chronos&Quartz:
A reverse image search engine for watches

Motivation and Overview

Reverse Image Search is a growing field which is largely expected to improve the way we search online in the coming years. Image-based features can either be used as an alternative or additional signals to improve the efficacy of search engines. Chronos&Quartz is my attempt at creating a reverse image search engine for watches. The goal of my project was to use a picture to find similar men's wrist watches on amazon.

The results can be viewed on my app: 54.174.217.236:8080

The Process

  • Web Scraping:
    I started out by scraping over 5000 images and associated metadata from amazon using BeautifulSoup and urllib. The images were stored locally while the metadata was stored in MongoDB

  • Image Featurization and PCA:
    I used OpenCV 3.0 in python for my featurization. After multiple experiments with different techniques I decided to use Edge detection, Thresholding and color histograms to create my feature space. I had around 121,000 total features by the end of the process. The features from colors(around 1500) were dwarfed by the total features from other two techniques which focus more on the shape and texture of the watch. While I was capturing 90 % variance for the feature space with around 3600 features, I got better results with by taking a much lower number because 90 % variance in the dataset is akin to overfitting. After multiple experiments I finally settled on 1500 features.

  • Modeling and Evaluation:
    I treated this as a similarity problem and tried multiple similarity metrices. I get my best results with Cosine similarity based K nearest neighbors. Since its an image based unsupervised learning problem no machine can match humans in evaluating the final results. Which is why I leave the decision in the users hands. The results seems to be doing a good job of detecting similar watches.

  • Web App:<br> A web app was created which uses the image url recommend appropriate watches.

Code and Reproducing Results

You will need to start by install OpenCV to your system. The following link should help http://stackoverflow.com/a/27650299

Following is the appropriate directory structure to reproduce the results

Working Directory
├── Data
│   ├── 0
│   │   └── 0.jpg
│   ├── 1
│   :   └── 1.jpg
|   :
│   └── Image_Scraping.ipynb
├── Image_processing.py
├── Model.py
├── Featurize.py
|__ Images.json

  • Scraping: The scraping code is in Image_Scraping.ipynb. Running the file once will download around 5000 images in the current working directory. This process may take about an hour to two depending upon various factors. This should be run from inside the data function. This file needs you to have a connection to MongoDB ready using PyMongo which will dump the metadata there. Images.json is the dump from MongoDB that is created by executing Model.py which will be used to derive the results in absence of MongoDB.

  • Image model: Code for the image model constitutes the 3 files:
    * Features.py: Takes a single image and derives it features
    * Image_processing.py: creates a NumPy array from all the images by calling features.py for each image
    * Model.py: Takes the array generated by Image_processing.py and uses it to fit the model and then present the results

Just run Model.py from your working directory. This will take care of creating your dataset and training your model.It will also download pickled version of a number of important objects in your working directory. The process may take some time (~20-30 minutes) depending on your machine's capabilities.

  • Web app: Web app is run by starting app.py file. Please ensure that it is in the same folder as the files for modeling above along with Query_image.py and templates folder. This is because it uses multiple functions from Features.py and Model.py . The pickled models generated should be also in the same directory

Tools and Packages used

  • BeautifulSoup and urllib for Webscraping
  • MongoDB, OS, JSON and PyMongo for storing scraped data
  • OpenCV, NumPy, Pandas, SciPy, and matplotlib for EDA and image featurization
  • scikit-learn for Modeling and PCA
  • Flask and AWS: S3 and EC2 for creating and hostingWeb app

Future Work:

Currently my model works under the assumption that the the queried image will be centrally aligned like the dataset i.e. with a white background and the watch placed upright.Here are some other concepts and ideas I would like to explore going forward to add robustness to my model:
1) SIFT (Scale-invariant feature transform) and SURF (Speeded-Up Robust Features)
2) Supplementing Images with text based model(metadata for watch)
3) Structural Similarity (SSIM) Index
4) Neural Networks

References

  1. pyimagesearch.com
  2. https://github.com/JapneetSingh/dimensionality-reduction
  3. https://github.com/nateberman/Python-WebImageScraper
  4. http://goo.gl/EoAAFU
  5. http://www.kevinjing.com/jing_pami.pdf

About

Zipfian final project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published