Skip to content

AttParseNet is a simple convolutional neural network for facial attribute recognition. It is unique and novel because it combines the tasks of facial attribute recognition (predicting which attributes are present in a given facial image) and facial semantic segmentation (labeling each pixel in an image where an attribute occurs).

Notifications You must be signed in to change notification settings

NateThom/AttParseNet

Repository files navigation

AttParseNet

Welcome to the AttParseNet code repository! This work is the result of a research project completed by Nathan Thom and Emily Hand at the University of Nevada, Reno - Machine Perception Laboratory.

If this work is a benefit to your own efforts, please cite our paper here:

AttParseNet Architecture Example of AttParseNet Architecture

AttParseNet is a simple convolutional neural network for facial attribute recognition. It is unique and novel because it combines the tasks of facial attribute recognition (predicting which attributes are present in a given facial image) and facial semantic segmentation (labeling each pixel in an image where an attribute occurs). The beauty of this approach is that attribute prediction accuracy is increased by asking the network to tell us which attributes are occurring and where they are occurring. The segmentation task is only used during training. At run time no segments are used.

Here's how it works:

  • Collect or download a dataset with facial attributes labeled (We use Celeba: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)
  • Run automatic facial landmark detection software to collect 64 landmark points from each image in your dataset. We use:
  • Create semantic segmentation labels for each input image/attribute pair in your dataset. If you're using CelebA this will result in 8,104,000 labels (202,600 input images, each with 40 attribute labels -> 202,600 * 40 = 8,104,000)
    • Semantic segmentation labels are single channel, black and white images
    • Black (pixel value of 0) denotes any pixels where the attribute does not occur
    • White (pixel value of 255) denotes regions where the attribute does occur
  • Train a CNN with a joint learning architecture (i.e. two loss functions)
    • Our two loss functions are Binary Cross Entropy with Logits (attribute prediction loss) and Mean Squared Error (segmentation loss)
    • Simply calculate both loss values and sum them
    • You can use a CNN of whatever complexity you desire. We use a fairly vanilla architecture with 6 convolution layers, 1 pooling layer, and 1 fully connected layer

Segment Label Example

Example of segment label

What is in this repository:

  • attparsenet.py
    • PyTorch implementation
    • Reads in data, trains a new or pretrained model, tests the model
  • attparsenet_landmark_labeler.py
    • Generates and stores the facial landmark labels to csv
    • Uses the opencv and openface to automatically detect 68 facial landmarks
  • attparsenet_regions.py
    • Helper functions for forming foundational regions of the face from facial landmarks
    • These regions are used in "attparsenet_segments.py" to form the regions where each attribute occurs
  • attparsenet_segment_labeler.py
    • Generates and stores the attribute segment label images
    • Uses the "attparsenet_segments.py" file to form the regions where each attribute occurs on a given face
  • attparsenet_segments.py
    • Helper functions for forming the segments of a face where each attribute occurs
    • Uses the "attparsenet_regions.py" file to form the segments where each attribute occurs on a given face
    • These segments are used in "attparsenet_segment_labeler.py" to generate the segment label images
  • attparsenet_utils.py
    • Python argparse file
    • Stores helpful configuration items (file paths, number of training epochs, etc.) in one easy to access place

Facial Landmark Layout Layout of the 68 facial landmarks

About

AttParseNet is a simple convolutional neural network for facial attribute recognition. It is unique and novel because it combines the tasks of facial attribute recognition (predicting which attributes are present in a given facial image) and facial semantic segmentation (labeling each pixel in an image where an attribute occurs).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages