Skip to content

cfournie/docstruct

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DocStruct - A Document Structure Parser

A tool to create Document Structure[1] (DS) trees from XHTML websites.  This was created as a term project for 
CSI 5386 (Fall 2009) at the University of Ottawa, Fall 2009.  More detailed information on the project can be
found in the paper located at http://cloud.github.com/downloads/cfournie/docstruct/paper.pdf


Directories
  \module\  - Contains the python parser tool
  \spec\     - Contains example DS trees, and the DS XML Schema
  

References

[1] R. Power, D. Scott, and N. Bouayad-Agha, "Document structure," Comput. Linguist., vol. 29, no. 2,
pp. 211-260, 2003. Accessible at http://www.mitpressjournals.org/doi/abs/10.1162/089120103322145315

About

A tool to create Document Structure trees from XHTML websites.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages