Skip to content
This repository has been archived by the owner on Jun 14, 2021. It is now read-only.
/ manGANime Public archive

2020 Syslab project to generate anime from manga

Notifications You must be signed in to change notification settings


Repository files navigation


TJHSST Computer Systems Lab 2020--2021 project to generate anime from manga. For more information, see the summary poster, view our presentation, or read the full paper.

Getting Started

To get started, clone this repository and its submodules:

git clone --recursive 

Install the dependencies with conda:

conda env create -f environment.yml
conda activate venv

or pip:

pip install -r requirements.txt

Data Processing

See the scikit-image guide on video processing. The library pims is used for most image sequence operations because of its many useful features (e.g. automatic loading of many different file types, lazy loading).

Useful ffmpeg Operations

Video to folder of images (1 FPS = extract 1 frame per second)

ffmpeg -i input.mp4 -ss 00:02:38 -t 00:23:42 -vf fps=1 image0-%05d.png

Images to video:

ffmpeg -framerate 1 -i image-%03d.png output.webm

Resizing a video:

ffmpeg -i input.mp4 -vf scale=256:256,setsar=1:1 output.mp4

Preprocessing Script

The script provides many useful data manipulation functions.

Converting one video file format to another (in this case, from mkv to mp4):

python reformat --format mp4 --path *.mkv     

Rename each file in a folder by assigning each file a number according to their sorted order:

python index --path path/to/folder 

Preprocessing a manga folder (folder of *.png images):

python manga --path path/to/manga_folder 

The preprocessing steps include removing extraneous images from the folder, and then applying greyscale, cropping, and resizing to each image. To control which images are kept and the size of the crop, create a config.json file in the manga folder. Assuming each file's name is a 0-indexed number resulting from the aforementioned index function, start controls how many images to skip from the beginning, end controls how many images to skip from the end, exclude is a list of indexes to exclude, and crop controls the amount of pixels to remove from each side of the image, ordered top, bottom, left, right. An example config.json is below:

    "start" : 3,
    "end": 4,
    "exclude": [20, 21, 39, 40, 56, 57, 74, 75, 91, 92],
    "crop": [62, 75, 64, 64]

Preprocessing an anime folder (folder of *.mp4 videos):

python anime --path path/to/preprocssed_anime_folder 

Configuration is not necessary for an anime folder since the preprocessing is only re-sizing each video to 256x256 (the same as each manga page).

Excluding intros/outros from a video file:

python search --path path/to/anime_folder --type intro  

In order for the system to determine where the intros and outros are, create a config.json file in the preprocessed anime folder. Select a representative image for the intro and the outro, and save them in the same folder as intro.png and outro.png, respectively. The system then searches each video file for an image closest to intro.png as measured by a generalized cosine similarity in order to determine where the intro occurs. Finally, the left and right parameters control many frames to remove from the left and right of the representative image. For example, if you select an image that occurs 10 seconds into the intro, and the intro is 90 seconds long, you must remove 10 seconds from the left and 80 seconds from the right of the selected image. This is 10*24 = 240 frames left and 80*24 = 1920 frames right. An example config.json is below:

    "intro": {
        "image": "intro.png",
        "left": 1657,
        "right": 500
    "outro": {
        "image": "outro.png",
        "left": 1866,
        "right": 10000

Note that this script does not actually touch the video file and instead stores an "exclude" parameter in the same config.json. pims can then be used to slice the video file in order to remove the ranges.

Labeling Data

Run the provided script to begin tagging anime frames with their corresponding manga pages:

python -c

The script expects a path to a video file for the anime and a path to a folder of PNG images for the manga.

You can create a config.json file in the root directory to automatically load these paths:

    "anime_path": "preprocess/anime/glt/0.mp4",
    "manga_path": "preprocess/manga/glt/vol1"

If there is no config file, you can press the "anime_path" and "manga_path" buttons in the upper left corner to pick the right path.

The script also expects a config.json file placed in the same folder as the video file, generated by the aforementioned search function which excludes frame ranges. Each video file should have the exclude parameter, which is a list of pairs giving ranges to exclude, a manga parameter giving a path to the corresponding manga volume (assuming both anime and manga are under the root folder preprocess/) and the pages parameter, a range of manga pages to consider. A sample config.json is below:

    "0": {
        "exclude": [[0, 3850], [8702, 9548], [32170, 34094]],
        "manga": "glt/vol1",
        "pages": [0, 33]
    "1": {
    "intro": {

Navigation can be done with the directional arrows (< and >), or by typing a number into the boxes. The amount the arrows moves can be configured by typing a number into the rightmost box ("1" by default). Pressing the next chunk and prev chunk buttons jumps to the next scene change (as measured by a large dissimilarity between adjacent frames). Finally, pressing the tag button will pair the current anime frame to the current manga page and save the result. Note that tag by design tags all untagged frames up to the current frame, e.g. if the user is just starting to tag and does their first tag at frame 100, the system will implicitly tag frames 0--100. Then, if the user tags frame 250, the system will tag frames 100--250. This behavior is from the monotonicity assumption discussed at length in the paper.

The script saves its annotations as a list of name filename.json, e.g. 0.json looks like:

[200, 483, 1203, ..., 24478, 25350, 27471]

StyleGAN Integration

To generate a dataset in the form StyleGAN expects, first generate a folder of images from each video, like Then run the following command (which is done automatically by

python stylegan2_ada_pytorch/ --source path/to/image_folder --dest

Then, to train StyleGAN, run the following command:

# lower batch size if out of memory
CUDA_VISIBLE_DEVICES=1 python stylegan2_ada_pytorch/ --outdir=train --gpus=1 --resume=ffhq256 --cfg=auto

Generating images:

python stylegan2_ada_pytorch/ --outdir=out --network=network.pkl --trunc=1 --seeds=0-10

Projecting an image to the latent space:

python stylegan2_ada_pytorch/ --outdir=out --network=network.pkl --target=input.png

Our Functions

Projecting a video:

PYTHONPATH=/path/to/stylegan2_ada_pytorch/ python project --network network.pkl --path input.mp4 

Re-training StyleGAN for future video prediction:

PYTHONPATH=... python train --network network.pkl 

Testing the average mean-squared error of the resulting trained model:

PYTHONPATH=... python test --network train/

Generating video with the resulting trained model (manga.png is an optional conditioning manga page):

PYTHONPATH=... python generate --path input.mp4 --image manga.png --network --frames 8 

Backpropagation for Image Generation

We experimented with backpropagation for image generation. The idea is the following: once StyleGAN is trained, we have a generator and a discriminator. If the discriminator is an accurate parameterization of the image manifold, then picking a random point and finding the closest on-manifold point should theoretically randomly sample a point from the discriminator's manifold, which exactly the goal of the generator (to sample an image from the image distribution). We expected these images to be higher quality since the algorithm is able to iteratively "improve" the image, similar to how human artists create works while the standard generator is a feed-forward network and hence a "one-shot" generator. Unfortunately, we instead re-discovered adversarial attacks.

Specifically, our proposed algorithm is the following:

  1. Initialize a random image, there are a variety of distributions: normal distribution, uniform on [0, 255], etc. We choose to uniformly sample each pixel from -1 to 1, the range expected by StyleGAN's discriminator.
  2. Run the image through the discriminator to get a probability
  3. Optimize the image by backpropagation on the objective -log(D(X))
  4. Repeat steps 2--3 to iteratively move the image to a high probability image

Since StyleGAN's discriminator lacks a sigmoid layer, we apply sigmoid and then log, so the objective is -log(sigmoid(D(X))). Simple algebra shows this is equivalent to log(1 + e^(-D(x))), or softplus(-D(x)).

Our initial image on the left has a probability of 0.04 as measured by the discriminator, as expected for a random image not belonging to the image distribution. After 1000 iterations of backpropagation, the image on the right evolves some "structure", but is still clearly mostly noise to a human. However, the discriminator now thinks the image has a 0.959 probability of being real. Hence we have performed an adversarial attack on the discriminator, by finding a small change in the starting image that tricks the discriminator into changing its output.

start end

These images were generated with the following command:

PYTHONPATH=... python --network 


2020 Syslab project to generate anime from manga






No releases published


No packages published