Welcome to the AttParseNet code repository! This work is the result of a research project completed by Nathan Thom and
Emily Hand at the University of Nevada, Reno - Machine Perception Laboratory.
If this work is a benefit to your own efforts, please cite our paper here:
Example of AttParseNet Architecture
AttParseNet is a simple convolutional neural network for facial attribute recognition. It is unique and novel because it
combines the tasks of facial attribute recognition (predicting which attributes are present in a given facial image)
and facial semantic segmentation (labeling each pixel in an image where an attribute occurs). The beauty of this
approach is that attribute prediction accuracy is increased by asking the network to tell us which attributes are
occurring and where they are occurring. The segmentation task is only used during training. At run time no segments are
used.
Here's how it works:
- Collect or download a dataset with facial attributes labeled (We use Celeba: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)
-
Run automatic facial landmark detection software to collect 64 landmark points from each image in your dataset.
We use:
- The OpenCV DLib landmark detector: http://dlib.net/
- OpenFace: https://github.com/TadasBaltrusaitis/OpenFace
- A hand-annotation tool: https://github.com/NateThom/Face-Annotation-Tool
-
Create semantic segmentation labels for each input image/attribute pair in your dataset. If you're using CelebA
this will result in 8,104,000 labels (202,600 input images, each with 40 attribute labels -> 202,600 * 40 = 8,104,000)
- Semantic segmentation labels are single channel, black and white images
- Black (pixel value of 0) denotes any pixels where the attribute does not occur
- White (pixel value of 255) denotes regions where the attribute does occur
-
Train a CNN with a joint learning architecture (i.e. two loss functions)
- Our two loss functions are Binary Cross Entropy with Logits (attribute prediction loss) and Mean Squared Error (segmentation loss)
- Simply calculate both loss values and sum them
- You can use a CNN of whatever complexity you desire. We use a fairly vanilla architecture with 6 convolution layers, 1 pooling layer, and 1 fully connected layer
Example of segment label
What is in this repository:
-
attparsenet.py
- PyTorch implementation
- Reads in data, trains a new or pretrained model, tests the model
-
attparsenet_landmark_labeler.py
- Generates and stores the facial landmark labels to csv
- Uses the opencv and openface to automatically detect 68 facial landmarks
-
attparsenet_regions.py
- Helper functions for forming foundational regions of the face from facial landmarks
- These regions are used in "attparsenet_segments.py" to form the regions where each attribute occurs
-
attparsenet_segment_labeler.py
- Generates and stores the attribute segment label images
- Uses the "attparsenet_segments.py" file to form the regions where each attribute occurs on a given face
-
attparsenet_segments.py
- Helper functions for forming the segments of a face where each attribute occurs
- Uses the "attparsenet_regions.py" file to form the segments where each attribute occurs on a given face
- These segments are used in "attparsenet_segment_labeler.py" to generate the segment label images
-
attparsenet_utils.py
- Python argparse file
- Stores helpful configuration items (file paths, number of training epochs, etc.) in one easy to access place