All in one paper implementation
All in one convolutional network for face analysis presents a multipurpose algorithm for simultaneous face detection, face alignment, pose estimation, gender recognition, smile detection, age estimation and face recognition using a single convolutional neural network(CNN).
The project can be run by installing conda virtual environment with python=3.6 and installing dependencies using pip. Inside the projects directory run the following commands.
conda create -n <environment_name> python=3.6
.
After creating the environment use the following commands to install dependancies.pip install keras
pip install tensorflow
pip install sklearn
pip install pandas
pip install opencv-python
pip install dlib
These datasets are used for training the network.
- AFLW dataset provides a large-scale collection of annotated face images gathered from the web, exhibiting a large variety in appearance (e.g., pose, expression, ethnicity, age, gender) as well as general imaging and environmental conditions.
- IMDB-WIKI dataset is the largest publicly available dataset of face images with gender and age labels for training.
- CelebA dataset is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations.
- Adience dataset attempts to capture all the variations in appearance, noise, pose, lighting and more, that can be expected of images taken without careful preparation or posing.
- Extended Cohn-Kanade dataset (CK+), which is a public benchmark dataset for action unit and emotion recognition. The CK+ comprises a total of 593 sequences across 123 subjects. Sequences range from neutral to peak expression.
- The Yale Face Database (size 6.4MB) contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink.
The network architecture can be found in this paper. The model is built with deep convolutional layers in keras and is found in nets/model.py.
The model can be trained with age, gender, detection, visibility, pose, landmarks, identity, smile, and eye_glasses labels by using the following commands inside the project's directory.
The following code snippet is bash command to train the network in aflw dataset for face detection
python -m train --dataset aflw --images_path /path-to-dataset-images/ \
--label detection --batch_size 100 --steps 500 --ol output-of-large-model --os output-of-small-model --epochs 10;
If you have nvidea-docker2 installed, we have Dockerfile.gpu which we can use to build your image.
docker build --file Dockerfile.gpu . -t allinone:latest
You can also build an image which only has CPU dependancies to evaluate the models provided.
docker build --file Dockerfile . -t allinone:latest
We have a port exposed in our configuraiton for a container: 50051 for the grpc endpoint. To run with grpc endpoint available one can call:
# This starts the grpc endpoint alone.
docker run -it --rm -p 50051:50051 allinone:latest
- --images_path - Path to dataset images
- --dataset - Type of dataset to train the model. This could be imdb, wiki, celeba, yale, ck+, aflw. The layers that are going to be trained also depends on this choice.
- --label - This option specifies which for which type of classification/prediction to train the model. The choices are age, gender, detection, visibility, pose, landmarks, identity, smile, and eye_glasses.
- --epochs.
- --batch_size.
- --resume - To start training from previous checkpoint if available.
- --steps - Steps per epoch.
- --ol - Output filename to save large model(model with all layers)
- --os - Output filename to save small model(model with layers trained with current training)
- --load_model -
- --freeze - If true freezes shared layers of the model
- Gender estimation(~89% accuracy)
- Face detection(~90% accuracy)
- Smile detection(~91% accuracy)
- Age prediction(~4% accuaracy)
- Use CASIA and MORPH dataset for further training the model on age, detection and gender labels.
- Implement pose estimation, Landmark detection and Face recognition.