2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning

Clone from https://github.com/dluvizon/deephar
Add some features

Add dalaloader, train code with merl dataset and coco

Coco dataset

For pose estimation. Use pycocotools to get image and label pose Image had been crop and resize to size (256,256).
Pose have 16 key points,

Merl dataset

1. Merl for pose estimation

Data from pkl with each element have form :

{   'img_paths': '17_2_crop_1077_1108_RetractFromShelf-0004.jpg', 
    'img_width': 920,
    'img_height': 680, 
    'image_id': 1,
    'bbox': [602.7314290727887, 324.3657157897949, 218.69714235578266, 119.07430627005441],
    'num_keypoints': 24,
    'keypoints': [[0.0, 0.0, 2], [747.5, 366.0285714285714, 1], [0.0, 0.0, 2], [741.4201450892857, 385.51311383928567, 1],\
               [0.0,0.0, 2], [704.9828591482981, 418.7314265659877, 1], [782.9857142857143, 393.3,0],\
                [632.4342917306083, 353.47713884626114, 1], [0.0, 0.0, 2], [690.2628631591797, 350.8485674176897, 1],\
                [764.5200060163226, 340.00571027483255, 1], [0.0, 0.0,2], [0.0, 0.0, 2], [0.0, 0.0, 2], [0.0, 0.0, 2],\
               [615.348577444894, 386.66285313197545, 1], [0.0, 0.0, 2]]}

With keypoint is list of point , each point is array with form [ x,y,confident_score ]

2. Merl for action recognition

Combine pose and visual feature for action recognition. Data load from json file with bbox of each frame.

Data from json file.

Each element have form

{
"action": "LookatShelf", 
"keypoints": [], 
"id": "LookatShelf_33_1_crop_3243_3374_InspectProduct", 
"image": 
    {
    "url": "/mnt/hdd10tb/Users/andang/actions/video/LookatShelf/33_1_crop_3243_3374_InspectProduct/2.jpg", 
    "file_name": "2.jpg", 
    "width": 920, "height": 680
    }, 
"person_bbox": [418, 336, 559, 499]
}

Run code

Train coco pose estimation

 CUDA_VISIBLE_DEVICES=1 python exp/coco/train_coco_singleperson.py
--batch-size 16 --epochs 10

Train merl pose estimation

 CUDA_VISIBLE_DEVICES=1 python exp/merl/train_merl_singleperson.py
--batch-size 16 --epochs 10

Train merl action recognition

CUDA_VISIBLE_DEVICES=2 python exp/merl/train_merl_video.py 
--num-frames 4 --anno-path /mnt/hdd10tb/Users/andang/actions/train_2.json 
--val-anno-path /mnt/hdd10tb/Users/andang/actions/test_2.json

Model

action

File reception.py

Pose estimation model

input is list of image, shape = (height,width,channel)
ouput predict pose estimation

file action.py have :

input is list of image, shape = (num_frames,height,width,channel)
ouput predict action in onehot encode

File action_2D.py like file action.py , delete some code, just run for 2D pose

File action_pose.py model predict from output of pose model

input
- y : pose corrdinate follow time distributed shape = (1, num_frames,
- num_joints, 2 (x,y - num coordinates))
- p : probability visible each point shape = (1, num_frames, numjoints, 1)
- hs : heat map shape = (1, num_frames, 32, 32, num_joints)
- xb1: visual feature output from Stem model shape = (1, num_frames, 32, 32, 576)
output predict action

Validation

CUDA_VISIBLE_DEVICES=1 python exp/merl/val_merl_video.py

Citing

Please cite our paper if this software (or any part of it) or weights are useful for you.

@InProceedings{Luvizon_2018_CVPR,
  author = {Luvizon, Diogo C. and Picard, David and Tabia, Hedi},
  title = {2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2018}
}

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
datasets		datasets
deephar		deephar
exp		exp
images		images
log		log
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh
run.txt		run.txt

License

pminhtam/2D-3D_Multitask_Deep_Learning

Folders and files

Latest commit

History

Repository files navigation

2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning

Add dalaloader, train code with merl dataset and coco

Coco dataset

Merl dataset

1. Merl for pose estimation

2. Merl for action recognition

Run code

Model

action

File reception.py

file action.py have :

File action_2D.py like file action.py , delete some code, just run for 2D pose

File action_pose.py model predict from output of pose model

Validation

Citing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages