Lecture Video Summarization by Extracting Handwritten Content from Whiteboards

Online lecture videos are a valuable resource for students across the world. The ability to find videos based on their content could make them even more useful. Methods for automatic extraction of this content reduce the amount of manual effort required to make indexing and retrieval of such videos possible. We adapt a deep learning based method for scene text detection, for the purpose of detection of handwritten text, math expressions and sketches in lecture videos.

Acknowledgements: The code/data is based upon work supported by the U.S. National Science Foundation under grant #OAC-1640867.

This code release contains the necessary scripts for lecture video summarization as described in our paper. The code is distributed under GNU Public License.

This work is a progression of the AccessMath Project carried out at DPRL.

For any issues, please use the github issues page or contact me at buralako at buffalo dot edu.

Dependencies

Other main libraries required by our scripts include:

Numpy
OpenCV (with ffmpeg installed for video handling)
Scipy
PyGame (to use the ground truth annotator)
Caffe
PyTorch (to reproduce entire paper including training)

To reproduce the results in the paper (Table 2):

Download AccessMath Dataset and copy into project root.
Download our Handwritten Content Detector model and structure file and place in models/text_detection. In the structure file (deploy.prototxt) make sure to customize paths in the save_output_param as required.

    save_output_param {
      output_directory: "results/text/longer_conv_300x300/Main"
      output_name_prefix: "comp4_det_test_"
      output_format: "VOC"
      label_map_file: "data/AccessMath/labelmap_accessmath.prototxt"
      num_test_image: 2497
    }

Setup AccessMath-TextBoxes. If needed, generate training LMDBs.
Run the following scripts:

-- Export video into still frames for generating training samples for text detector by running

python pre_ST3D_v2.0_00_export_frames.py test_data/databases/db_AccessMath2015.xml -d testing

-- Run Text Detection on exported still testing video frames

python pre_ST3D_v2.0_01_text_detection.py test_data/databases/db_AccessMath2015.xml -d testing
GPU ID can be set in AccessMath/preprocessing/config/parameters.py GPU_TextDetection (0, by default)

-- Run coarse-grained temporal analysis and reconstruction (bringing back occluded content - part 2 of Table 2 - recommended)

python pre_ST3D_v2.0_02_td_stability.py test_data/databases/db_AccessMath2015.xml -d testing
python pre_ST3D_v2.0_03_td_bbox_grouping.py test_data/databases/db_AccessMath2015.xml -d testing
python pre_ST3D_v2.0_04_td_ref_binarize.py test_data/databases/db_AccessMath2015.xml -d testing

OR without reconstruction (part 1 of Table 2)

python pre_ST3D_v2.0_04_td_raw_binarize.py test_data/databases/db_AccessMath2015.xml -d testing

-- Run fine-grained temporal refinement

python pre_ST3D_v2.0_05_cc_analysis.py test_data/databases/db_AccessMath2015.xml -d testing
python pre_ST3D_v2.0_06_cc_grouping.py test_data/databases/db_AccessMath2015.xml -d testing

-- Run conflict minimization

python pre_ST3D_v2.0_07_vid_segmentation.py test_data/databases/db_AccessMath2015.xml -d testing

-- Generate final keyframe summaries and evaluation results

python pre_ST3D_v2.0_08_generate_summary.py test_data/databases/db_AccessMath2015.xml -d testing

-- The final summary keyframes can be found in output/summaries

To retrain handwritten content detector from scratch:

Download the SSD model for VOC object class detection and place in models/person_detection
Clone the SSD PyTorch repository and set it up. Add this directory to $PYTHONPATH

export PYTHONPATH=/path/to/ssd.pytorch/:$PYTHONPATH

Run the following scripts:

-- Export video into still frames

python pre_ST3D_v2.0_00_export_frames.py test_data/databases/db_AccessMath2015.xml -d "training, testing"

-- Generate person detection bounding boxes on training set and add to annotations

python gt_PD_01_detect_speaker.py test_data/databases/db_AccessMath2015.xml -d training
python gt_PD_02_add_speaker_to_annotations.py test_data/databases/db_AccessMath2015.xml -d training

-- Generate ground truth annotations by removing text region annotations that are occluded by speaker

python pre_ST3D_v2.0_00_export_frames_annotations.py test_data/databases/db_AccessMath2015.xml -d training

-- Alternatively, you can download the prepared training data for the Handwritten Content Detector from here. Download the 3-part zip archive and extract into a folder called AccessMathVOC and place in project root.

-- Generate trained model using the procedure described in AccessMath-TextBoxes

-- Follow procedure to reproduce Table 2 starting with 01_text_detection.py

To annotate custom lecture videos:

The annotation tool is run from gt_annotator.py. Use it to mark the ideal video segments, select key-frames per segment, and also to label elements on each key-frame. Note that precision tools and interpolation capabilities are provided to make the labeling of moving objects easier as well.

Usage: python gt_annotator.py database -l lecture Where database = Database metadata file lecture = Lecture video to process

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
AM_CommonTools		AM_CommonTools
AccessMath		AccessMath
database		database
models		models
output		output
test_data		test_data
.gitignore		.gitignore
Methodology.png		Methodology.png
README.md		README.md
gt_PD_01_detect_speaker.py		gt_PD_01_detect_speaker.py
gt_PD_02_add_speaker_to_annotations.py		gt_PD_02_add_speaker_to_annotations.py
gt_annotator.py		gt_annotator.py
license.txt		license.txt
pre_ST3D_v2.0_00_export_frames.py		pre_ST3D_v2.0_00_export_frames.py
pre_ST3D_v2.0_00_export_frames_annotations.py		pre_ST3D_v2.0_00_export_frames_annotations.py
pre_ST3D_v2.0_01_text_detection.py		pre_ST3D_v2.0_01_text_detection.py
pre_ST3D_v2.0_02_td_stability.py		pre_ST3D_v2.0_02_td_stability.py
pre_ST3D_v2.0_03_td_bbox_grouping.py		pre_ST3D_v2.0_03_td_bbox_grouping.py
pre_ST3D_v2.0_04_td_raw_binarize.py		pre_ST3D_v2.0_04_td_raw_binarize.py
pre_ST3D_v2.0_04_td_ref_binarize.py		pre_ST3D_v2.0_04_td_ref_binarize.py
pre_ST3D_v2.0_05_cc_analysis.py		pre_ST3D_v2.0_05_cc_analysis.py
pre_ST3D_v2.0_06_cc_grouping.py		pre_ST3D_v2.0_06_cc_grouping.py
pre_ST3D_v2.0_07_vid_segmentation.py		pre_ST3D_v2.0_07_vid_segmentation.py
pre_ST3D_v2.0_08_generate_summary.py		pre_ST3D_v2.0_08_generate_summary.py

License

bhargavaurala/accessmath-icfhr2018

Folders and files

Latest commit

History

Repository files navigation

Lecture Video Summarization by Extracting Handwritten Content from Whiteboards

Dependencies

To reproduce the results in the paper (Table 2):

To retrain handwritten content detector from scratch:

To annotate custom lecture videos:

About

Resources

License

Stars

Watchers

Forks

Languages