Online lecture videos are a valuable resource for students across the world. The ability to find videos based on their content could make them even more useful. Methods for automatic extraction of this content reduce the amount of manual effort required to make indexing and retrieval of such videos possible. We adapt a deep learning based method for scene text detection, for the purpose of detection of handwritten text, math expressions and sketches in lecture videos.
Acknowledgements: The code/data is based upon work supported by the U.S. National Science Foundation under grant #OAC-1640867.
This code release contains the necessary scripts for lecture video summarization as described in our paper. The code is distributed under GNU Public License.
This work is a progression of the AccessMath Project carried out at DPRL.
For any issues, please use the github issues page or contact me at buralako at buffalo dot edu.
Other main libraries required by our scripts include:
- Numpy
- OpenCV (with ffmpeg installed for video handling)
- Scipy
- PyGame (to use the ground truth annotator)
- Caffe
- PyTorch (to reproduce entire paper including training)
-
Download AccessMath Dataset and copy into project root.
-
Download our Handwritten Content Detector model and structure file and place in
models/text_detection
. In the structure file (deploy.prototxt
) make sure to customize paths in thesave_output_param
as required.
save_output_param {
output_directory: "results/text/longer_conv_300x300/Main"
output_name_prefix: "comp4_det_test_"
output_format: "VOC"
label_map_file: "data/AccessMath/labelmap_accessmath.prototxt"
num_test_image: 2497
}
-
Setup AccessMath-TextBoxes. If needed, generate training LMDBs.
-
Run the following scripts:
-- Export video into still frames for generating training samples for text detector by running
python pre_ST3D_v2.0_00_export_frames.py test_data/databases/db_AccessMath2015.xml -d testing
-- Run Text Detection on exported still testing video frames
python pre_ST3D_v2.0_01_text_detection.py test_data/databases/db_AccessMath2015.xml -d testing
GPU ID can be set in AccessMath/preprocessing/config/parameters.py GPU_TextDetection (0, by default)
-- Run coarse-grained temporal analysis and reconstruction (bringing back occluded content - part 2 of Table 2 - recommended)
python pre_ST3D_v2.0_02_td_stability.py test_data/databases/db_AccessMath2015.xml -d testing
python pre_ST3D_v2.0_03_td_bbox_grouping.py test_data/databases/db_AccessMath2015.xml -d testing
python pre_ST3D_v2.0_04_td_ref_binarize.py test_data/databases/db_AccessMath2015.xml -d testing
OR without reconstruction (part 1 of Table 2)
python pre_ST3D_v2.0_04_td_raw_binarize.py test_data/databases/db_AccessMath2015.xml -d testing
-- Run fine-grained temporal refinement
python pre_ST3D_v2.0_05_cc_analysis.py test_data/databases/db_AccessMath2015.xml -d testing
python pre_ST3D_v2.0_06_cc_grouping.py test_data/databases/db_AccessMath2015.xml -d testing
-- Run conflict minimization
python pre_ST3D_v2.0_07_vid_segmentation.py test_data/databases/db_AccessMath2015.xml -d testing
-- Generate final keyframe summaries and evaluation results
python pre_ST3D_v2.0_08_generate_summary.py test_data/databases/db_AccessMath2015.xml -d testing
-- The final summary keyframes can be found in output/summaries
-
Download the SSD model for VOC object class detection and place in
models/person_detection
-
Clone the SSD PyTorch repository and set it up. Add this directory to
$PYTHONPATH
export PYTHONPATH=/path/to/ssd.pytorch/:$PYTHONPATH
- Run the following scripts:
-- Export video into still frames
python pre_ST3D_v2.0_00_export_frames.py test_data/databases/db_AccessMath2015.xml -d "training, testing"
-- Generate person detection bounding boxes on training set and add to annotations
python gt_PD_01_detect_speaker.py test_data/databases/db_AccessMath2015.xml -d training
python gt_PD_02_add_speaker_to_annotations.py test_data/databases/db_AccessMath2015.xml -d training
-- Generate ground truth annotations by removing text region annotations that are occluded by speaker
python pre_ST3D_v2.0_00_export_frames_annotations.py test_data/databases/db_AccessMath2015.xml -d training
-- Alternatively, you can download the prepared training data for the Handwritten Content Detector from here. Download the 3-part zip archive and extract into a folder called AccessMathVOC and place in project root.
-- Generate trained model using the procedure described in AccessMath-TextBoxes
-- Follow procedure to reproduce Table 2 starting with 01_text_detection.py
The annotation tool is run from gt_annotator.py
. Use it to mark the ideal video segments, select key-frames per segment, and also to label elements on each key-frame. Note that precision tools and interpolation capabilities are provided to make the labeling of moving objects easier as well.
Usage: python gt_annotator.py database -l lecture
Where
database = Database metadata file
lecture = Lecture video to process