Download Caffe model(s) and prototxt for VGG-16/VGG-19/AlexNet using sh models/download_models.sh
.
th classification.lua -input_image_path images/cat_dog.jpg -label 243 -gpuid 0
th classification.lua -input_image_path images/cat_dog.jpg -label 283 -gpuid 0
proto_file
: Path to thedeploy.prototxt
file for the CNN Caffe model. Default ismodels/VGG_ILSVRC_16_layers_deploy.prototxt
.model_file
: Path to the.caffemodel
file for the CNN Caffe model. Default ismodels/VGG_ILSVRC_16_layers.caffemodel
.input_image_path
: Path to the input image. Default isimages/cat_dog.jpg
.input_sz
: Input image size. Default is 224 (Change to 227 if using AlexNet).layer_name
: Layer to use for Grad-CAM. Default isrelu5_3
(userelu5_4
for VGG-19 andrelu5
for AlexNet).label
: Class label to generate grad-CAM for (-1 = use predicted class, 283 = Tiger cat, 243 = Boxer). Default is -1. These correspond to ILSVRC synset IDs.out_path
: Path to save images in. Default isoutput/
.gpuid
: 0-indexed id of GPU to use. Default is -1 = CPU.backend
: Backend to use with loadcaffe. Default isnn
.save_as_heatmap
: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1.
'boxer' (243)
'tiger cat' (283)
Clone the VQA (http://arxiv.org/abs/1505.00468) sub-repository (git submodule init && git submodule update
), and download and unzip the provided extracted features and pretrained model.
th visual_question_answering.lua -input_image_path images/cat_dog.jpg -question 'What animal?' -answer 'dog' -gpuid 0
th visual_question_answering.lua -input_image_path images/cat_dog.jpg -question 'What animal?' -answer 'cat' -gpuid 0
proto_file
: Path to thedeploy.prototxt
file for the CNN Caffe model. Default ismodels/VGG_ILSVRC_19_layers_deploy.prototxt
.model_file
: Path to the.caffemodel
file for the CNN Caffe model. Default ismodels/VGG_ILSVRC_19_layers.caffemodel
.input_image_path
: Path to the input image. Default isimages/cat_dog.jpg
.input_sz
: Input image size. Default is 224 (Change to 227 if using AlexNet).layer_name
: Layer to use for Grad-CAM. Default isrelu5_4
(userelu5_3
for VGG-16 andrelu5
for AlexNet).question
: Input question. Default isWhat animal?
.answer
: Optional answer (For eg. "cat") to generate Grad-CAM for ('' = use predicted answer). Default is ''.out_path
: Path to save images in. Default isoutput/
.model_path
: Path to VQA model checkpoint. Default isVQA_LSTM_CNN/lstm.t7
.gpuid
: 0-indexed id of GPU to use. Default is -1 = CPU.backend
: Backend to use with loadcaffe. Default iscudnn
.save_as_heatmap
: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1.
What animal? Dog
What animal? Cat
What color is the hydrant? Yellow
What color is the hydrant? Green
Clone the neuraltalk2 sub-repository. Running sh models/download_models.sh
will download the pretrained model and place it in the neuraltalk2 folder.
Change lines 2-4 of neuraltalk2/misc/LanguageModel.lua
to the following:
local utils = require 'neuraltalk2.misc.utils'
local net_utils = require 'neuraltalk2.misc.net_utils'
local LSTM = require 'neuraltalk2.misc.LSTM'
th captioning.lua -input_image_path images/cat_dog.jpg -caption 'a dog and cat posing for a picture' -gpuid 0
th captioning.lua -input_image_path images/cat_dog.jpg -caption '' -gpuid 0
input_image_path
: Path to the input image. Default isimages/cat_dog.jpg
.input_sz
: Input image size. Default is 224 (Change to 227 if using AlexNet).layer
: Layer to use for Grad-CAM. Default is 30 (relu5_3 for vgg16)caption
: Optional input caption. No input will use the generated caption as default.out_path
: Path to save images in. Default isoutput/
.model_path
: Path to captioning model checkpoint. Default isneuraltalk2/model_id1-501-1448236541.t7
.gpuid
: 0-indexed id of GPU to use. Default is -1 = CPU.backend
: Backend to use with loadcaffe. Default iscudnn
.save_as_heatmap
: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1.
a dog and cat posing for a picture
a bathroom with a toilet and a sink
BSD
- VQA_LSTM_CNN, BSD
- neuraltalk2, BSD