Automatically Identifying Van Gogh's Paintings

This is the source code used in the paper "From Impressionism to Expressionism: Automatically Identifying Van Gogh's Paintings", which has been published on the 23rd IEEE International Conference on Image Processing (ICIP 2016).

The paper is available at IEEE Xplore: https://dx.doi.org/10.1109/icip.2016.7532335

The dataset is available at figshare: https://dx.doi.org/10.6084/m9.figshare.3370627

Corresponding author: Anderson Rocha (anderson.rocha@ic.unicamp.br)

If you find this work useful in your research, please cite the paper! :-)

@InProceedings{folego2016vangogh,
    author = {Guilherme Folego and Otavio Gomes and Anderson Rocha},
    booktitle = {2016 IEEE International Conference on Image Processing (ICIP)},
    title = {From Impressionism to Expressionism: Automatically Identifying Van Gogh's Paintings},
    year = {2016},
    month = {Sept},
    pages = {141--145},
    keywords = {Art;Feature extraction;Painting;Support vector machines;Testing;Training;Visualization;CNN-based authorship attribution;Painter attribution;Data-driven painting characterization},
    doi = {10.1109/icip.2016.7532335}
}

Quick Guide

This guide has four sections:

Creating the dataset - Create your own dataset, or (even better!) expand on VGDB-2016.
Using our method - Use our method, given a dataset.
Predicting debated paintings - Predict debated paintings with our method.
Calculating scores - Transform distances into probabilities.

General note: all the scripts presented here have a --help argument, which describes the script and possible parameters.

Creating the dataset

Requirements (for this section)

ImageMagick
Python, and the following packages:
- hurry.filesize
- numpy
- progressbar2
- wikitools
R, and the following packages:
- argparse
- data.table
- dplyr

Create a directory for resources.

mkdir -pv res/{db,img/{orig,resz}}

Define the URL to be crawled. This is just an example. In our work, we crawled more than 200 different URLs.

url='Category:Still_life_paintings_of_flowers_by_Vincent_van_Gogh,_Auvers_1890'

Crawl URL and collect metadata.

python src/crawler/crawl2csv.py --url "$url" --csv res/db/"$url"

Parse and clean up collected metadata. We set different values here just as a working example. Also, at this point, it is possible to provide multiple files at once, even with duplicated entries (as shown).

Rscript src/crawler/tidy_dataset.R --density 95 --ratio 0.15 --output res/db/db.csv res/db/"$url" res/db/"$url"

Dataset done, and the CSV file is at res/db/db.csv. Now, you may choose to continue with your newly created dataset, or with the original vgdb_2016.csv.

Download images.

python src/crawler/download_images_from_csv.py --csv res/db/db.csv --directory res/img/orig/

Note: images with less than 75% of JPEG quality were manually removed (both images and respective entries in the CSV file). It is possible to check the quality with ImageMagick.

identify -format "%f:%Q\n" res/img/orig/* | grep -v ^$ | sort -k2nr -k1 -t:

Resize images to the standard density.

python src/crawler/resize_images.py --csv res/db/db.csv --original res/img/orig/ --resized res/img/resz/

Using our method

Requirements (for all the following sections)

Caffe
Parallel
Python, and the following packages:
- scikit-image
- scikit-learn
Unzip

From now on, we will assume that the vgdb_2016.zip dataset file has already been downloaded.

Unzip the dataset.

unzip vgdb_2016.zip

Create a directory for resources.

mkdir -pv vgdb_2016/{train,test}/{patch,feats}

Extract patches from each image.

find vgdb_2016/train/{,n}vg -type f | parallel python src/analysis/patch_extraction.py --image {} --dir vgdb_2016/train/patch/
find vgdb_2016/test/{,n}vg -type f | parallel python src/analysis/patch_extraction.py --image {} --dir vgdb_2016/test/patch/

Extract features from each patch. In our work, we used the VGG 19-layer model, which is available at http://www.robots.ox.ac.uk/~vgg/research/very_deep/.

ls vgdb_2016/train/patch/ > vgdb_2016/train/patch_list.txt
ls vgdb_2016/test/patch/ > vgdb_2016/test/patch_list.txt

python src/analysis/caffe_extract_features.py --proto path/to/VGG_ILSVRC_19_layers_deploy.prototxt --model path/to/VGG_ILSVRC_19_layers.caffemodel --list vgdb_2016/train/patch_list.txt --input vgdb_2016/train/patch/ --output vgdb_2016/train/feats/
python src/analysis/caffe_extract_features.py --proto path/to/VGG_ILSVRC_19_layers_deploy.prototxt --model path/to/VGG_ILSVRC_19_layers.caffemodel --list vgdb_2016/test/patch_list.txt --input vgdb_2016/test/patch/ --output vgdb_2016/test/feats/

Create a directory for the classification model.

mkdir -pv vgdb_2016/clf

Generate classification model.

python src/analysis/generate_model.py --dir vgdb_2016/train/feats/ --model vgdb_2016/clf/model.pkl

Classify paintings in the test set using the Far method.

python src/analysis/classify.py --dir vgdb_2016/test/feats/ --model vgdb_2016/clf/model.pkl --aggregation far --gtruth

Done!

Predicting debated paintings

Create a directory for resources.

mkdir -pv vgdb_2016/check/{patch,feats}

Extract patches from each image.

find vgdb_2016/check/[0-9]*.png -type f | parallel python src/analysis/patch_extraction.py --image {} --dir vgdb_2016/check/patch/

Extract features from each patch.

ls vgdb_2016/check/patch/ > vgdb_2016/check/patch_list.txt
python src/analysis/caffe_extract_features.py --proto path/to/VGG_ILSVRC_19_layers_deploy.prototxt --model path/to/VGG_ILSVRC_19_layers.caffemodel --list vgdb_2016/check/patch_list.txt --input vgdb_2016/check/patch/ --output vgdb_2016/check/feats/

Classify paintings using the Far method.

python src/analysis/classify.py --dir vgdb_2016/check/feats/ --model vgdb_2016/clf/model.pkl --aggregation far

In the output, class 1 means van Gogh, and class 0 means non-van Gogh.

Calculating scores

Generate scores model.

python src/analysis/generate_score_model.py --dir vgdb_2016/train/feats/ --model vgdb_2016/clf/model.pkl --score vgdb_2016/clf/score.pkl

Calculate score probabilities. Targets are the filenames without extension, separated by comma.

targets='9414428,9420113'
echo $targets | sed s/,/\\n/ > vgdb_2016/check/target_list.txt
python src/analysis/get_scores.py --dir vgdb_2016/check/feats/ --model vgdb_2016/clf/model.pkl --score vgdb_2016/clf/score.pkl --targets vgdb_2016/check/target_list.txt

In the output, the first column represents non-van Gogh, and the second column represents van Gogh.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
presentation.pdf		presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

presentation.pdf

presentation.pdf

Repository files navigation

Automatically Identifying Van Gogh's Paintings

Quick Guide

Creating the dataset

Using our method

Predicting debated paintings

Calculating scores

About

Releases

Packages

Languages

License

shrubaG/vangogh

Folders and files

Latest commit

History

Repository files navigation

Automatically Identifying Van Gogh's Paintings

Quick Guide

Creating the dataset

Using our method

Predicting debated paintings

Calculating scores

About

Resources

License

Stars

Watchers

Forks

Languages