ludoj

Scraping data about board games from the web.

Scraped websites

BoardGameGeek (bgg)
luding.org (luding)
spielen.de (spielen)

Run scrapers

Requires Python 3. Make sure your (virtual) environment is up-to-date:

pip install -Ur requirements.txt

Run a spider like so:

scrapy crawl <spider> -o 'feeds/%(name)s/%(time)s/%(class)s.csv'

where <spider> is one of the IDs above.

You can run scrapy check to perform contract tests for all spiders, or scrapy check <spider> to test one particular spider. If tests fails, there most likely has been some change on the website and the spider needs updating.

Name		Name	Last commit message	Last commit date
Latest commit History 434 Commits
ludoj_scraper		ludoj_scraper
.gitattributes		.gitattributes
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
fields.yaml		fields.yaml
full_merge.sh		full_merge.sh
link.sh		link.sh
merge.sh		merge.sh
news.sh		news.sh
processes.sh		processes.sh
run_all.sh		run_all.sh
scrapy.cfg		scrapy.cfg
setup.py		setup.py

License

kingking888/ludoj-scraper

Folders and files

Latest commit

History

Repository files navigation

ludoj

Scraped websites

Run scrapers

About

Resources

License

Stars

Watchers

Forks

Languages