Multi-person pose estimation is a challenging vision task that can be seriously affected by keypoint scale variation. Existing heatmap-based approaches are devoted to reducing the effect by optimizing backbone architecture or loss functions, but the problem of an inaccurate heatmap representation with different keypoint scales still exists. We present a scale-sensitive heatmap algorithm to generate reasonable spatial and contextual features for the network to predict more precise coordinates, by systematically considering the standard deviation, truncated radius, and shape of Gaussian kernels. Specifically, the scale-sensitive heatmap algorithm contains three parts: inter-person heatmap, limited-area heatmap, and shape-aware heatmap. The inter-person heatmap allocates different standard deviation for each human instance proportionally calculated by the keypoint-based method, the limited-area heatmap defines the truncated radius to limit the influence area of Gaussian kernels, and the shape-aware heatmap modifies the Gaussian kernels generated by some ellipse-shaped joints. Our scale-sensitive heatmap algorithm outperforms by a considerable margin on the COCO and CrowdPose benchmark datasets.
Backbone | Input size | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_single | 512x512 | 68.3 | 87.2 | 74.4 | 62.9 | 76.5 | 73.1 | 90.6 | 78.2 | 66.7 | 82.2 |
pose_higher_hrnet_w32_single | 512x512 | 69.3 | 87.3 | 75.5 | 64.0 | 77.2 | 74.3 | 91.2 | 79.9 | 68.0 | 83.4 |
pose_hrnet_w32_multi | 512x512 | 70.6 | 88.0 | 76.8 | 66.1 | 77.6 | 75.9 | 92.2 | 81.4 | 70.2 | 84.1 |
pose_higher_hrnet_w32_multi | 512x512 | 71.2 | 87.7 | 77.4 | 66.9 | 77.9 | 76.5 | 92.2 | 81.9 | 71.1 | 84.3 |
Backbone | Input size | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_single | 512x512 | 67.5 | 88.2 | 73.7 | 62.1 | 75.2 | 72.4 | 91.5 | 78.0 | 65.9 | 81.4 |
pose_higher_hrnet_w32_single | 512x512 | 68.4 | 88.7 | 74.5 | 62.2 | 75.5 | 73.0 | 92.1 | 79.0 | 66.1 | 82.2 |
pose_hrnet_w32_multi | 512x512 | 69.6 | 89.0 | 76.2 | 65.1 | 76.0 | 75.0 | 93.1 | 80.9 | 69.3 | 82.9 |
pose_higher_hrnet_w32_multi | 512x512 | 70.0 | 89.2 | 77.1 | 65.6 | 76.4 | 75.2 | 93.0 | 81.3 | 69.4 | 83.0 |
Backbone | AP | Ap .5 | AP .75 | AP (E) | AP (M) | AP (H) |
---|---|---|---|---|---|---|
pose_hrnet_w32_single | 66.2 | 84.9 | 71.4 | 73.6 | 67.0 | 57.6 |
pose_hrnet_w32_multi | 68.2 | 86.2 | 73.6 | 75.8 | 69.1 | 59.1 |
The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA V100 GPU cards for HRNet-W32 and HrHRet-W32. Other platforms are not fully tested.
-
Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.
-
Init output(training model output directory) and log(tensorboard log directory) directory:
mkdir output mkdir log mkdir model mkdir data mkdir vis
[News] We update the pretrained model COCO_HrHRNetW32_Scale-sensitive.pth in Google Drive.
Download pretrained models and our well-trained models from zoo(Google Drive and make models directory look like this:
${POSE_ROOT}
|-- model
`-- |-- imagenet
| |-- hrnet_w32-36af842e.pth
`-- rescore
|-- final_rescore_coco_kpt.pth
`-- final_rescore_crowd_pose_kpt.pth
|-- output
|-- coco
| |-- COCO_HRNetW32_Scale-sensitive.pth
|-- COCO_HRNetW32_Shape.pth
|-- COCO_HRNetW32_Inter.pth
|-- COCO_HrHRNetW32_Scale-sensitive.pth
|-- crowdpose
| |-- CrowdPose_HRNetW32_Scale-sensitive.pth
For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. Download and extract them under {POSE_ROOT}/data, and make them look like this:
${POSE_ROOT}
|-- data
`-- |-- coco
`-- |-- annotations
| |-- person_keypoints_train2017.json
| `-- person_keypoints_val2017.json
`-- image_info_test-dev2017.json
`-- images
|-- train2017.zip
`-- val2017.zip
`-- test2017.zip
For CrowdPose data, please download from CrowdPose download, Train/Val is needed for CrowdPose keypoints training. Download and extract them under {POSE_ROOT}/data, and make them look like this:
${POSE_ROOT}
|-- data
`-- |-- crowdpose
`-- |-- json
| |-- crowdpose_train.json
| |-- crowdpose_val.json
| |-- crowdpose_trainval.json (generated by tools/crowdpose_concat_train_val.py)
| `-- crowdpose_test.json
`-- images.zip
After downloading data, run python tools/crowdpose_concat_train_val.py
under ${POSE_ROOT}
to create trainval set.
If you are using SLURM (Simple Linux Utility for Resource Management), then execute:
sbatch ready.sh
If you like, you can prepare the environment step by step.
python tools/valid.py --cfg experiments/coco/w32/coco_hrnetw32_jnt_scale-sensitive.yaml TEST.MODEL_FILE output/coco/coco_hrnetw32_scale-sensitive.pth
python tools/valid.py --cfg experiments/coco/w32/coco_hrnetw32_jnt_scale-sensitive.yaml TEST.MODEL_FILE output/coco/coco_hrnetw32_scale-sensitive.pth TEST.SCALE_FACTOR 0.5,1,2
python tools/valid.py --cfg experiments/crowdpose/w32/crowdpose_hrnetw32_scale-sensitive.yaml TEST.MODEL_FILE output/crowdpose/crowdpose_hrnetw32_scale-sensitive.pth
python tools/valid.py --cfg experiments/crowdpose/w32/crowdpose_hrnetw32_scale-sensitive.yaml TEST.MODEL_FILE output/crowdpose/crowdpose_hrnetw32_scale-sensitive.pth TEST.SCALE_FACTOR 0.5,1,2
python tools/train.py --cfg experiments/coco/w32/coco_hrnetw32_scale-sensitive.yaml OUTPUT_DIR 'output/minecoco'
python tools/train.py --cfg experiments/crowdpose/w32/crowdpose_hrnetw32_scale-sensitive.yaml OUTPUT_DIR 'output/minecrowdpose'
python tools/inference_video.py --cfg experiments/inference_demo_coco.yaml --videoFile vis/multi_people.mp4 --outputDir vis --visthre 0.3 TEST.MODEL_FILE output/coco/coco_hrnetw32_scale-sensitive.pth
python tools/inference_video.py --cfg experiments/inference_demo_crowdpose.yaml --videoFile vis/multi_people.mp4 --outputDir vis --visthre 0.3 TEST.MODEL_FILE output/crowdpose/crowdpose_hrnetw32_scale-sensitive.pth
The above command will create a video under vis directory and a lot of pose image under vis/pose directory.
python tools/inference_image.py --cfg experiments/inference_demo_coco.yaml --imageFile vis/multi_people.jpg --outputDir vis --visthre 0.3 TEST.MODEL_FILE output/coco/coco_hrnetw32_scale-sensitive.pth
python tools/inference_image.py --cfg experiments/inference_demo_crowdpose.yaml --videoFile vis/multi_people.jpg --outputDir vis --visthre 0.3 TEST.MODEL_FILE output/crowdpose/crowdpose_hrnetw32_scale-sensitive.pth
The above command will create two images under vis directory including the predicted keypoint coordinates and the predicted heatmap.
Thanks for the open-source HigherHRNet
MMPose, it is a part of the OpenMMLab project.
If you use our code or models in your research, please cite with:
@article{du2022scale,
title={A scale-sensitive heatmap representation for multi-person pose estimation},
author={Du, Congju and Yu, Han and Yu, Li},
journal={IET Image Processing},
year={2022},
publisher={Wiley Online Library}
}