Bag of Visual Words Model

This is an implementation of bag of visual words model in Python for feature extraction in videos.

The current repository is just one layer of a framework for video classification, composed by:

Bag-of-Visual-Words ( Feature extraction for each frame)
Long-Short Term Memory ( Maximizing Temporal Dependencies of features)
Softmax Classifier ( Classify the video, given the outputs of LSTM)

The first part consists of given an input video, split it up into sequence of frames and save these images in a folder that represents the video class. After this we extract the features for each image into the folder of video, generating a Histogram of Visual Words belonging to each image. The first part is done by process_video.py and the second by feature_extraction.py

The script feature_extraction.py will generate a visual vocabulary using the images provided by process_video.py.

The feature extraction consists of:

Extracting local features of all datasets
Generating a codebook of visual words with clustering of the features
Aggregating the histograms of the visual words for each of the training images

This code relies on:

SIFT features for local features (external implementation)
k-means for generation of the words via clustering

Example: Processing Video

To extract the frame sequence of a given video input eg. video.mp4 that belongs to a given class eg. walking, use this command:

python process_video.py walking video.mp4

Then will be create a new folder with the same name of the class given containing each frame extracted

Example: Extrating Features

You can extract the features of a video passing the path for the folder containing the video frames (eg. walking) extracted before by process_video.py script and passing what will be the dataset folder (eg. dataset_folder) used to generate (or simply use) the codebook. Use this command:

python feature_extraction.py dataset_folder/ walking/

The dataset should have following structure, where all the video frames belonging to one class are in the same folder:

.
|-- path_to_folders_with_video_frames
|    |-- class1
|    |-- class2
|    |-- class3
...
|    └-- classN

Prerequisites (for Linux):

To install the necessary libraries run following code from working directory:

# installing sift
wget http://www.cs.ubc.ca/~lowe/keypoints/siftDemoV4.zip
unzip siftDemoV4.zip
cp sift*/sift sift

Prerequisites (for Mac OS):

# installing sift
Download and unpack the latest VLFeat binary package from the download page (currently the latest version is 0.9.20).Copy
the binary sift and the libvl.dylib to the bag-of-visual-words repository path. The binaries are in the bin/
directory,just pick the sub-directory for your platform.

Notes

If you're using Linux and get an IOError: SIFT executable not found error, try sudo apt-get install libc6-i386.

References:

SIFT:

David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.

sift.py:

Taken from http://www.janeriksolem.net/2009/02/sift-python-implementation.html (Linux) or http://www.maths.lth.se/matematiklth/personal/solem/downloads/vlfeat.py (Mac)

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
README.md		README.md
codebook.txt		codebook.txt
common.py		common.py
feature_extraction.py		feature_extraction.py
gen_features.py		gen_features.py
gen_frames.py		gen_frames.py
generate_codebook.py		generate_codebook.py
libvl.dylib		libvl.dylib
lstm.py		lstm.py
params.xml		params.xml
process_video.py		process_video.py
sift		sift
sift.py		sift.py

PierreHao/BoVW-LSTM

Folders and files

Latest commit

History

Repository files navigation

Bag of Visual Words Model

Example: Processing Video

Example: Extrating Features

Prerequisites (for Linux):

Prerequisites (for Mac OS):

Notes

References:

SIFT:

sift.py:

About

Resources

Stars

Watchers

Forks

Languages