dbnl-scripts

Scripts to scrape DBNL and work with the texts ...and to generate a book with accidental haikus.

Project

I used the code in this repository to generate a book of single-sentence haikus, automatically collected from DBNL.

The first edition of the book has been automatically generated, and typeset in LaTeX. LaTeX source code for the book can be found in the ./book/ folder. There may or may not be a second, manually curated edition. (At 5,325 pages, this would be a lot of work!)

Requirements

All code is written in Python 3. Libraries that need to be installed:

SpaCy (including the Dutch model: nl_core_news_sm)
ebooklib
BeautifulSoup
Pyphen
pylatexenc

Some of the Python syntax only works with versions >= 3.6.

Using this code for other projects

The first two Python scripts (index_dbnl.py and download_example.py) are probably useful for other projects as well. Building an index of DBNL means you can search the database locally, which is much faster than scraping the website. The download script shows how to download epub books. The script can easily be modified to your needs. The other two files are more specific to this project, but they might be useful to see how the functions in utils.py can be used to read text in .epub files, for example.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
book		book
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
accidental_haiku.py		accidental_haiku.py
download_example.py		download_example.py
generate_chapters.py		generate_chapters.py
haikus.json		haikus.json
index_dbnl.py		index_dbnl.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

book

book

resources

resources

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

accidental_haiku.py

accidental_haiku.py

download_example.py

download_example.py

generate_chapters.py

generate_chapters.py

haikus.json

haikus.json

index_dbnl.py

index_dbnl.py

utils.py

utils.py

Repository files navigation

dbnl-scripts

Project

Requirements

Contents

Using this code for other projects

About

Releases 1

Packages

Languages

License

evanmiltenburg/dbnl-scripts

Folders and files

Latest commit

History

Repository files navigation

dbnl-scripts

Project

Requirements

Contents

Using this code for other projects

About

Resources

License

Stars

Watchers

Forks

Languages