Skip to content
forked from ramprs/grad-cam

Gradient based Localization and Visualization

Notifications You must be signed in to change notification settings

deshraj/grad-cam

 
 

Repository files navigation

Grad-CAM: Gradient-weighted Class Activation Mapping

Overview

Usage

Download Caffe model(s) and prototxt for VGG-16/VGG-19/AlexNet using sh models/download_models.sh.

Classification

th classification.lua -input_image_path images/cat_dog.jpg -label 243 -gpuid 0
th classification.lua -input_image_path images/cat_dog.jpg -label 283 -gpuid 0

Options

  • proto_file: Path to the deploy.prototxt file for the CNN Caffe model. Default is models/VGG_ILSVRC_16_layers_deploy.prototxt.
  • model_file: Path to the .caffemodel file for the CNN Caffe model. Default is models/VGG_ILSVRC_16_layers.caffemodel.
  • input_image_path: Path to the input image. Default is images/cat_dog.jpg.
  • input_sz: Input image size. Default is 224 (Change to 227 if using AlexNet).
  • layer_name: Layer to use for Grad-CAM. Default is relu5_3 (use relu5_4 for VGG-19 and relu5 for AlexNet).
  • label: Class label to generate grad-CAM for (-1 = use predicted class, 283 = Tiger cat, 243 = Boxer). Default is -1. These correspond to ILSVRC synset IDs.
  • out_path: Path to save images in. Default is output/.
  • gpuid: 0-indexed id of GPU to use. Default is -1 = CPU.
  • backend: Backend to use with loadcaffe. Default is nn.
  • save_as_heatmap: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1.

Examples

'boxer' (243)

'tiger cat' (283)

Visual Question Answering

Clone the VQA (http://arxiv.org/abs/1505.00468) sub-repository (git submodule init && git submodule update), and download and unzip the provided extracted features and pretrained model.

th visual_question_answering.lua -input_image_path images/cat_dog.jpg -question 'What animal?' -answer 'dog' -gpuid 0
th visual_question_answering.lua -input_image_path images/cat_dog.jpg -question 'What animal?' -answer 'cat' -gpuid 0

Options

  • proto_file: Path to the deploy.prototxt file for the CNN Caffe model. Default is models/VGG_ILSVRC_19_layers_deploy.prototxt.
  • model_file: Path to the .caffemodel file for the CNN Caffe model. Default is models/VGG_ILSVRC_19_layers.caffemodel.
  • input_image_path: Path to the input image. Default is images/cat_dog.jpg.
  • input_sz: Input image size. Default is 224 (Change to 227 if using AlexNet).
  • layer_name: Layer to use for Grad-CAM. Default is relu5_4 (use relu5_3 for VGG-16 and relu5 for AlexNet).
  • question: Input question. Default is What animal?.
  • answer: Optional answer (For eg. "cat") to generate Grad-CAM for ('' = use predicted answer). Default is ''.
  • out_path: Path to save images in. Default is output/.
  • model_path: Path to VQA model checkpoint. Default is VQA_LSTM_CNN/lstm.t7.
  • gpuid: 0-indexed id of GPU to use. Default is -1 = CPU.
  • backend: Backend to use with loadcaffe. Default is cudnn.
  • save_as_heatmap: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1.

Examples

What animal? Dog

What animal? Cat

What color is the hydrant? Yellow

What color is the hydrant? Green

Image Captioning

Clone the neuraltalk2 sub-repository. Running sh models/download_models.sh will download the pretrained model and place it in the neuraltalk2 folder.

Change lines 2-4 of neuraltalk2/misc/LanguageModel.lua to the following:

local utils = require 'neuraltalk2.misc.utils'
local net_utils = require 'neuraltalk2.misc.net_utils'
local LSTM = require 'neuraltalk2.misc.LSTM'
th captioning.lua -input_image_path images/cat_dog.jpg -caption 'a dog and cat posing for a picture' -gpuid 0
th captioning.lua -input_image_path images/cat_dog.jpg -caption '' -gpuid 0

Options

  • input_image_path: Path to the input image. Default is images/cat_dog.jpg.
  • input_sz: Input image size. Default is 224 (Change to 227 if using AlexNet).
  • layer: Layer to use for Grad-CAM. Default is 30 (relu5_3 for vgg16)
  • caption: Optional input caption. No input will use the generated caption as default.
  • out_path: Path to save images in. Default is output/.
  • model_path: Path to captioning model checkpoint. Default is neuraltalk2/model_id1-501-1448236541.t7.
  • gpuid: 0-indexed id of GPU to use. Default is -1 = CPU.
  • backend: Backend to use with loadcaffe. Default is cudnn.
  • save_as_heatmap: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1.

Examples

a dog and cat posing for a picture

a bathroom with a toilet and a sink

License

BSD

3rd-party

About

Gradient based Localization and Visualization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 46.4%
  • JavaScript 33.6%
  • Lua 10.7%
  • Python 8.5%
  • Other 0.8%