Skip to content

mpevner/GITenberg

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Gutenberg Stats

Estimated 1.6 million files Reported 650 GB total ~40,000 + books

Links to: Home Page - Book Repositories - Issues

How are we getting the files?

rsync -rvhz --progress --partial ftp...

Each repo should...

  • metadata.yml
    • author
    • title
    • publishing info
    • provinence
  • book_name.{rsttxt}
    • book text in a master source format
  • license.txt
    • PG license information
    • transcriber, converter credits
  • README.rst
    • generic GITenburg info
    • generic PG info
    • book specific info
    • desc and links to toolchains
    • desc and links to generated versions for ebook readers

Smart comments:

Convert all files to UTF-8 https://groups.google.com/forum/?fromgroups#!topic/prj-alexandria/VhKbMyH9kcA

File formats:

A list of file formats and their freqency is in the docs folder, generated via:

find -type f|rev|cut -d\. -f1|grep -v "/" |rev|sort -f|uniq -c|sort -nr

.tei

a master format http://www.tei-c.org/Tools/Stylesheets/ http://code.google.com/p/hrit/source/browse/rst2xml-tei.py?repo=tei-rest

.rst

a master format Research toolchain for rst >> whatever

dp rst manual http://pgrst.pglaf.org/publish/181/181-h.html

Future

About

A project to migrate Project Gutenberg to a version control system

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%