Data files and scripts for working with the solr-whosonfirst repository.
Sample data files are available from the following sources:
-
Walker Arts Center
Please note that the data in these files is not standardized. There are source
specific tools for importing each dataset in the bin
directory.
In some cases the data here is a subset of the data that the source itself publishes. For example, the Open Library dataset only contains authors and IDs since there are so many of them (approxiamately 7M).
Additional datasets will be added as time and circumstances (and pull requests) permit. We're looking at you, Wikipedia.
Because of the size of the data the scripts below assume that the input files are bzip2 encoded and uncompressed (and processed) on the fly.
$> ./bin/import-cooperhewitt.py -p ./data/people-cooperhewitt.csv.bz2
$> ./bin/import-imamuseum.py -p ./data/people-imamuseum.csv.bz2
$> ./bin/import-openlibrary.py -p ./data/people-openlibrary.csv.bz2
$> ./bin/import-walkerarts.py -p ./data/people-walkerarts.csv.bz2