Maua Style

Python scripts for neural artistic style transfer.

The approach can generally be summarized as: (1) encoding a content and style image into a deep neural network's feature representation, (2) optimizing a new image to have a deep feature representation similar to both content and style.

This repository applies a number of tricks to optimize this process for faster generation and better quality.

The most important improvement is multi-resolution generation. This is a typical trick in image processing where algorithms are applied to the image at different sizes. In this case, the generated image is initialized to a small size (e.g. 256x256) and the progressively optimized and upscaled to a larger size. This helps reduce the time needed for generation as the majority of processing happens on less pixels. It also helps to increase the quality as the receptive field of the neural networks is larger at small scales. This leads to larger scale spatial coherence in the output image.

Another benefit of this approach is that the deep feature network can be swapped out over the course of generation. This means heavier (high quality) networks can be used at small scales and lighter (lower quality) networks can be used to fill in the details later. This allows for much larger high-quality images to be generated with lower VRAM requirements.

Installation

CUDA

First you need to have a working CUDA toolkit installation.

An easy option to get a working CUDA environment on Ubuntu is Lambda Stack.

Alternatively, I personally use Anaconda:

conda install -c conda-forge cudatoolkit-dev cudatoolkit cudnn

(this lets you use different CUDA toolkits for different environments)

Maua Style

Whatever way you've set up your CUDA environment, once it's working, run the following to install this repository and its requirements:

git clone --recursive https://github.com/JCBrouwer/maua-style.git
cd maua-style
git submodule update
pip install -r requirements.txt
pip install cupy-cudaXXX

Note the XXX in the final line must be your CUDA toolkit version (e.g. CUDA 10.2 ==> cupy-cuda102, CUDA 11.3 ==> cupy-cuda113)

Usage

This repo has a lot of functionality to tune results. This makes configuration of the scripts quite a difficult issue. I've tried a couple different approaches, but am still not happy with the current user interface. However, it is at least very flexible at the cost of usability.

The full list of options can be found in the argument parser in config.py. This allows any of the options to be set via the command line using --option_name <value>. However, because there are so many options, you can also create a .json file in config/ to load in presets. There are a couple of examples in that folder to start you off. You can use --save_args to save your current settings from the command line to a new preset file.

There are four types of configuration files: args, scaling, max-sizes, and ffmpeg.

args: simply contains a dictionary of all the agument parser arguments listed in config.py
scaling: contains a mapping from image size to a set of arguments to use up to that size
max-sizes: contains a mapping from optimization configuration to the largest image size that your GPU can support
ffmpeg: contains command line arguments that are sent to ffmpeg when generating videos (see ffmpeg --help)

scaling arguments allow you to define different settings for different image sizes. This allows swapping models or optimization priorities over the course of multi-resolution rendering.

max-sizes files can be generated by running python max-sizes.py. This will try many different configurations at progressively larger sizes and record the results. You can use max-sizes files to specify the image sizes in scaling config files.

All of the default configuration files are designed for my system which has two 11 GB cards. However, when you only have less cards available than in the config, the scripts will avoid any configurations that you cannot support, opting for the next best (usually slightly lower quality) setting in the config.

If you're using cards that have less than 11 GB of memory though, you'll probably run into out-of-memory errors. In this case you'll need to reduce some of the image sizes to fit. If you run max-sizes.py, feel free to open a pull request to add the .json file to the repo. That way in the future people don't need to run it!

Alright, with all that said, let's look at some examples.

Image style transfer

The simplest case:

python style.py --content </path/to/img.png | https://some.image/url.jpeg> \
                --style </path/to/img.png | https://some.image/url.jpeg>

For a little more control:

cp config/args-img.json config/args-mine.json
cp config/scaling-img.json config/scaling-mine.json
# edit the .json files
python style.py --content /path/to/img.png --style https://some.image/url.jpeg \
                --load_args config/args-mine.json --scaling_args config/scaling-mine.json

or, for example:

python style.py --style https://some.image/url.jpeg,/a/local/image/too.png --image_sizes 256,512,1024,2048 \
                --num_iters 1000,500,250,100 --content_weight 0 --use_covariance --init random

Video style transfer

Default settings:

python style.py --content /my/video/folder/cool.mp4 --style /my/image/folder/epic.jpg \
                --load_args config/args-vid.json --scaling_args config/scaling-vid.json \
                --ffmpeg_args config/ffmpeg-libx264.json

Changing some options on the command line:

python style.py --content /my/video/folder/cool.mp4 --style /my/image/folder/epic.jpg \
                --load_args config/args-vid.json --scaling_args config/scaling-vid.json \
                --ffmpeg_args config/ffmpeg-libx264.json -image_sizes 256,512,724,1024 \
                --num_iters 512,384,256,128 --temporal_weight 1 --passes_per_scale 16 \
                --temporal_blend 0.25 --no_check_occlusion --init content

Video style transfer caches images to disk, so if it crashes or you need to change settings you can just quit and run the command again. The program will skip any frames that have already been rendered.

CLIP + VQGAN style transfer

Allows for text only or multi-modal transfer:

python clip_vqgan.py --content some.png --content_text "description of content image" \
                     --style_text "description of desired style" --style optional.png

or synthesis:

python clip_vqgan.py --content random --style_text "description of desired style" \
                     --style optional.png

or video transfer (this can take a loooong time). There aren't great default settings for this yet, although the following is decent:

python clip_video_style.py --content some.mp4 --content_text "description" \
                           --style optional.jpg --load_args config/args-vid.json \
                           --style_text "description" --init content --num_iters 800 \
                           --scaling_args config/scaling-vid.json --num_passes 4  \
                           --ffmpeg_args config/ffmpeg-libx264.json --text_weight 5 \
                           --content_weight 1 --style_weight 2.5 --image_sizes 400

Videos as style

I'm not sure if this is actually working at the moment, but you can try:

python style.py --content image.jpg --style video.mp4 --transfer_type img_vid

Neural Cellular Automata

This one's also rough around the edges. Take a look in the files for a bunch more hardcoded options.

python NCA_train.py style/image.png output/directory/
python NCA_gen.py style/image.png output/directory/

Credits

The main style transfer implementation is based on this excellent implementation by ProGamerGov.

The video style transfer approach is based on Manuel Ruder et al..

For optical flow support this repo includes a collection of excellent optical flow model reproductions by Simon Niklaus. Each repository has its own license and terms; make sure to read and adhere to them!

The CLIP + VQGAN code is based on Katherine Crowson's Multi-modal Analogies Notebook.

Neural Cellular Automata by Alexander Mordvintsev from the Kunstformen Notebook.

Citations

If you use this code in your research, you should probably cite the work that this repository relies on

@misc{maua-style,
    author = {Hans Brouwer},
    title = {maua-style},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/JCBrouwer/maua-style}},
}
@misc{ProGamerGov2018,
    author = {ProGamerGov},
    title = {neural-style-pt},
    year = {2018},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/ProGamerGov/neural-style-pt}},
}
@misc{Johnson2015,
    author = {Johnson, Justin},
    title = {neural-style},
    year = {2015},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/jcjohnson/neural-style}},
}
@inproceedings{RuderDB2016,
    author = {Manuel Ruder and Alexey Dosovitskiy and Thomas Brox},
    title = {Artistic Style Transfer for Videos},
    booktitle = {German Conference on Pattern Recognition},
    pages     = {26--36},
    year      = {2016},
}
@misc{DynamicTextures,
    Author = {Christina M. Funke and Leon A. Gatys and Alexander S. Ecker and Matthias Bethge},
    Title = {Synthesising Dynamic Textures using Convolutional Neural Networks},
    Year = {2017},
    Eprint = {arXiv:1702.07006},
}
@misc{2103.00020,
    Author = {Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
    Title = {Learning Transferable Visual Models From Natural Language Supervision},
    Year = {2021},
    Eprint = {arXiv:2103.00020},
}
@misc{esser2020taming,
    title={Taming Transformers for High-Resolution Image Synthesis}, 
    author={Patrick Esser and Robin Rombach and Björn Ommer},
    year={2020},
    eprint={2012.09841},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
@article{mordvintsev2020growing,
  author = {Mordvintsev, Alexander and Randazzo, Ettore and Niklasson, Eyvind and Levin, Michael},
  title = {Growing Neural Cellular Automata},
  journal = {Distill},
  year = {2020},
  note = {https://distill.pub/2020/growing-ca},
  doi = {10.23915/distill.00023}
}
@inproceedings{Hui_CVPR_2018,
    author = {Tak-Wai Hui and Xiaoou Tang and Chen Change Loy},
    title = {{LiteFlowNet}: A Lightweight Convolutional Neural Network for Optical Flow Estimation},
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition},
    year = {2018}
}
@misc{pytorch-liteflownet,
    author = {Simon Niklaus},
    title = {A Reimplementation of {LiteFlowNet} Using {PyTorch}},
    year = {2019},
    howpublished = {\url{https://github.com/sniklaus/pytorch-liteflownet}}
}
@inproceedings{Sun_CVPR_2018,
    author = {Deqing Sun and Xiaodong Yang and Ming-Yu Liu and Jan Kautz},
    title = {{PWC-Net}: {CNNs} for Optical Flow Using Pyramid, Warping, and Cost Volume},
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition},
    year = {2018}
}
@misc{pytorch-pwc,
    author = {Simon Niklaus},
    title = {A Reimplementation of {PWC-Net} Using {PyTorch}},
    year = {2018},
    howpublished = {\url{https://github.com/sniklaus/pytorch-pwc}}
}
@inproceedings{Ranjan_CVPR_2017,
    author = {Ranjan, Anurag and Black, Michael J.},
    title = {Optical Flow Estimation Using a Spatial Pyramid Network},
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition},
    year = {2017}
}
@misc{pytorch-spynet,
    author = {Simon Niklaus},
    title = {A Reimplementation of {SPyNet} Using {PyTorch}},
    year = {2018},
    howpublished = {\url{https://github.com/sniklaus/pytorch-spynet}}
}
@inproceedings{Meister_AAAI_2018,
    author = {Simon Meister and Junhwa Hur and Stefan Roth},
    title = {{UnFlow}: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss},
    booktitle = {AAAI},
    year = {2018}
}
@misc{pytorch-unflow,
    author = {Simon Niklaus},
    title = {A Reimplementation of {UnFlow} Using {PyTorch}},
    year = {2018},
    howpublished = {\url{https://github.com/sniklaus/pytorch-unflow}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
CLIP @ cfcffb9		CLIP @ cfcffb9
VQGAN @ 9d17ea6		VQGAN @ 9d17ea6
config		config
sniklaus		sniklaus
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
NCA_gen.py		NCA_gen.py
NCA_train.py		NCA_train.py
README.md		README.md
clip_video_style.py		clip_video_style.py
clip_vqgan.py		clip_vqgan.py
config.py		config.py
flow.py		flow.py
load.py		load.py
loss.py		loss.py
max-sizes.py		max-sizes.py
models.py		models.py
optim.py		optim.py
requirements.txt		requirements.txt
similarity.py		similarity.py
style.py		style.py
utils.py		utils.py

License

JCBrouwer/maua-style

Folders and files

Latest commit

History

Repository files navigation

Maua Style

Installation

CUDA

Maua Style

Usage

Image style transfer

Video style transfer

CLIP + VQGAN style transfer

Videos as style

Neural Cellular Automata

Credits

Citations

About

Resources

License

Stars

Watchers

Forks

Languages