Skip to content

implement "2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning"

License

Notifications You must be signed in to change notification settings

pminhtam/2D-3D_Multitask_Deep_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning

Clone from https://github.com/dluvizon/deephar
Add some features

Add dalaloader, train code with merl dataset and coco

Coco dataset

For pose estimation. Use pycocotools to get image and label pose Image had been crop and resize to size (256,256).
Pose have 16 key points,

Merl dataset

1. Merl for pose estimation

Data from pkl with each element have form :

{   'img_paths': '17_2_crop_1077_1108_RetractFromShelf-0004.jpg', 
    'img_width': 920,
    'img_height': 680, 
    'image_id': 1,
    'bbox': [602.7314290727887, 324.3657157897949, 218.69714235578266, 119.07430627005441],
    'num_keypoints': 24,
    'keypoints': [[0.0, 0.0, 2], [747.5, 366.0285714285714, 1], [0.0, 0.0, 2], [741.4201450892857, 385.51311383928567, 1],\
               [0.0,0.0, 2], [704.9828591482981, 418.7314265659877, 1], [782.9857142857143, 393.3,0],\
                [632.4342917306083, 353.47713884626114, 1], [0.0, 0.0, 2], [690.2628631591797, 350.8485674176897, 1],\
                [764.5200060163226, 340.00571027483255, 1], [0.0, 0.0,2], [0.0, 0.0, 2], [0.0, 0.0, 2], [0.0, 0.0, 2],\
               [615.348577444894, 386.66285313197545, 1], [0.0, 0.0, 2]]}

With keypoint is list of point , each point is array with form [ x,y,confident_score ]

2. Merl for action recognition

Combine pose and visual feature for action recognition. Data load from json file with bbox of each frame.

Data from json file.

Each element have form

{
"action": "LookatShelf", 
"keypoints": [], 
"id": "LookatShelf_33_1_crop_3243_3374_InspectProduct", 
"image": 
    {
    "url": "/mnt/hdd10tb/Users/andang/actions/video/LookatShelf/33_1_crop_3243_3374_InspectProduct/2.jpg", 
    "file_name": "2.jpg", 
    "width": 920, "height": 680
    }, 
"person_bbox": [418, 336, 559, 499]
}

Run code

Train coco pose estimation

 CUDA_VISIBLE_DEVICES=1 python exp/coco/train_coco_singleperson.py
--batch-size 16 --epochs 10

Train merl pose estimation

 CUDA_VISIBLE_DEVICES=1 python exp/merl/train_merl_singleperson.py
--batch-size 16 --epochs 10

Train merl action recognition

CUDA_VISIBLE_DEVICES=2 python exp/merl/train_merl_video.py 
--num-frames 4 --anno-path /mnt/hdd10tb/Users/andang/actions/train_2.json 
--val-anno-path /mnt/hdd10tb/Users/andang/actions/test_2.json

Model

action

File reception.py

Pose estimation model

  • input is list of image, shape = (height,width,channel)
  • ouput predict pose estimation

file action.py have :

  • input is list of image, shape = (num_frames,height,width,channel)
  • ouput predict action in onehot encode

File action_2D.py like file action.py , delete some code, just run for 2D pose

File action_pose.py model predict from output of pose model

  • input
    • y : pose corrdinate follow time distributed shape = (1, num_frames,
    • num_joints, 2 (x,y - num coordinates))
    • p : probability visible each point shape = (1, num_frames, numjoints, 1)
    • hs : heat map shape = (1, num_frames, 32, 32, num_joints)
    • xb1: visual feature output from Stem model shape = (1, num_frames, 32, 32, 576)
  • output predict action

Validation

CUDA_VISIBLE_DEVICES=1 python exp/merl/val_merl_video.py

Citing

Please cite our paper if this software (or any part of it) or weights are useful for you.

@InProceedings{Luvizon_2018_CVPR,
  author = {Luvizon, Diogo C. and Picard, David and Tabia, Hedi},
  title = {2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2018}
}

License

MIT License

About

implement "2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published