Skip to content

Code and documentation associated with "Understanding Genre in a Collection of a Million Volumes"

License

Notifications You must be signed in to change notification settings

tedunderwood/GenreProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenreProject

This repository stores work related to the project "Understanding Genre in a Collection of a Million Volumes."

The folder /Java contains Java code, mostly related to a browser that we used for page tagging. For actual classification code in Java, see my top-level pages repo, which has a lot more of it.

The folder /python contains Python scripts for wrangling data stored on the Univ. of Illinois Taub cluster, especially for extracting feature counts once pages are tagged by genre. For python code that I used to generate features for classification by genre, see rather the subfolder pagefeatures under my top-level DataMunging repo.

The files in /Plans are descriptions of plans for the project, or some aspect of the project. The main ProjectPlans are numbered in sequential order. Other files in this folder describe a particular data object or process; for an overview of the project structure see the image in Flowchart.pdf.

/Proposals store the initial proposals for the project, which is presently funded by the American Council of Learned Societies and the National Endowment for the Humanities.

/SampleData contains notably metadata.txt - which is an updated copy of metadata for the collection

About

Code and documentation associated with "Understanding Genre in a Collection of a Million Volumes"

Resources

License

Stars

Watchers

Forks

Packages

No packages published