Image Generation

This repository contains pytorch implementations of various generative models to generate images on multiple datasets. I wanted to use datasets other than MNIST to make it a bi more interesting, so I tried some models on a dataset of paintings of a painter that I like a lot (Odilon Redon), and on a dataset of all pokemon sprites from every game. Datasets are covered mode in details in the following sections.

Table of content

Getting started
Datasets
Models
- GANs
  - DCGAN
  - LSGAN
  - CGAN
- Autoencoders
  - Autoencoder
  - VAE

Getting started

Prerequisites

Python 3
Pytorch (with torchvision)
Matplotlib
```
pip3 install matplotlib
```

Usage

Each model resides in a folder of its own. To run a model, first clone the repository:

git clone https://github.com/dvidbruhm/PokeGeneration.git

Then, while in the folder of the model you wish to run:

python3 <model name>.py

For example, if you wish to run the DCGAN model, cd to the DCGAN folder, then:

python3 DCGAN.py

The results for every epoch will be (by default) in a folder named results_<dataset name>/. For every epoch there will be an example of generated images and a plot of the loss. At the end of the training all models are saved in this folder.

Customization

Every model folder has a file named hyperparameters.py. If you want to use different parameters for a model, modify any parameters in this file to your liking and rerun the script. The names of the parameters should be mostly self-explanatory. Note that I use a file instead of command line options to modify parameters because I find it more convenient.

Datasets

The paintings and pokemon datasets used can be found on this google drive. MNIST and FASHIONMNIST will automatically be downloaded if you don't have them.

Images of all datasets are resized to either 32x32 of 64x64 pixels before using them.

MNIST

MNIST is a dataset containing 60000 grayscale handwritten digits images. There are 10 classes (digits from 0 to 9) and all images have been normalized and centered to fit into a 28x28 pixels bounding box. It is a standard dataset to test and see if your model works.

Fashion MNIST

FASHIONMNIST is a MNIST-like dataset of fashion product. It also has 60000 grayscale 28x28 pixels images. It has 10 classes (T-shirt, trouser, pullover, ...). It has been created because MNIST might be too easy to test your models, and because the creators think MNIST is overused.

Paintings

This is a custom made dataset containing all 590 paintings of Odilon Redon. All images have been resized (and croped if the original image wasn't square) to 64x64 pixels. The images in this dataset are in color, so they have three channels instead of only one like MNIST and FASHIONMNIST.

Pokemon

This is also a custom make dataset containg all the 4744 images of all pokemon sprites of all games. The background of the sprites have been modified to white for unicity (some of them were black or pink). The images are in colors, so have three channels.

Models

Here is the detail implementation and results of each model and the results for each one on every dataset.

Generative adversarial networks (GAN)

[Paper arxiv link]

In a GAN, two networks try to outperform each other. The first one, the generator, generates new data and tries to fool the second network, the discriminator, into thinking it is a real data (from the dataset) and not a new generated data.

The generator is a network that takes a vector of random numbers in input and outputs an image. The input of the generator, so the vector of random numbers, is called the latent space. The vector associated to an image is called the latent representation of the image.

The discrimator is a network that takes an image in input and returns a single number between 0 and 1. The number represents the probability that the image is real. If the discrimator outputs a 1, it really thinks that the image is real. If it outputs a 0, it really thinks that the image is fake (generated by the generator). If it outputs anything in between, it is not sure if the image is real or fake.

Let's take for example the MNIST dataset. The goal of the generator is to produce new handwritten digits that are so close to real ones that the discriminator can't distinguish between the generated digits and the original handwritten ones. And the goal of the discriminator is to not let itself being fooled by the generator. Every iteration, both networks are trained one after the other. The steps, for one iteration, are:

Training of the discriminator:

The generator takes in a batch of vectors of random numbers sampled from a gaussian distribution and generates a batch of images
The discriminator takes in the batch of generated images and returns its predictions
The discriminator takes in a batch of real data from the dataset and returns its predictions
Update the discriminator according to this loss:

Training of the generator:

The generator takes in a batch of vectors of random numbers sampled from a gaussian distribution and generates a batch of images
The discriminator takes in the batch of generated images and returns its predictions
Update the generator according to this loss:

In general, GANs can suffer of mode collapse or vanishing gradients. Some techniques I used to help the training process includes:

Label smoothing
Packing
Using multiple steps for the generator and/or discriminator for each iteration

DCGAN

[Paper arxiv link]

DCGAN is the same as a standard GAN, but the generator and discriminator are composed of convolutional layers instead of fully connected layers. They are more suited for images and are faster to train due to them having less weights.

Results

MNIST

FASHIONMNIST

Paintings

Pokemon

LSGAN

[Paper arxiv link]

LSGAN is the same as DCGAN, but the loss functions are changed. For the discriminator:

and for the generator:

Results

Conditional GAN (CGAN)

[Paper arxiv link]

TODO: add explication of CGAN

Results

FASHIONMNIST

Autoencoders

TODO: add explication of autoencoders

Autoencoder

Results

Pokemon

Variational autoencoder (VAE)

[Paper arxiv link]

TODO: add explication of VAEs

Results

Pokemon

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
CGAN		CGAN
DCGAN		DCGAN
LSGAN		LSGAN
VAE		VAE
autoencoder		autoencoder
images		images
utils		utils
.gitignore		.gitignore
README.md		README.md

luodua/ImageGeneration

Folders and files

Latest commit

History

Repository files navigation

Image Generation

Table of content

Getting started

Prerequisites

Usage

Customization

Datasets

MNIST

Fashion MNIST

Paintings

Pokemon

Models

Generative adversarial networks (GAN)

DCGAN

Results

LSGAN

Results

Conditional GAN (CGAN)

Results

Autoencoders

Autoencoder

Results

Variational autoencoder (VAE)

Results

About

Resources

Stars

Watchers

Forks

Languages