To start off, first we must parse some Gutenberg data! Since I couldn't find anything that fit the bill, I'm going to parse through the HTM files (since html is must easier to work with than raw text).
Next? THE WORLD!
Downloaded from:
- http://mirrors.pglaf.org/gutenberg-iso/pgsfcd-032007.zip
- http://mirrors.pglaf.org/gutenberg-iso/pgdvd072006.iso
and extracted out of their lonely ISO containers.