forked from janlaan/zoekmachines
-
Notifications
You must be signed in to change notification settings - Fork 0
MichaelF89/zoekmachines
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
###### # # Subset of New York Times Corpus # for Project Information Retrieval (PIR) # # # contact: Manos Tsagkias <e.tsagkias@uva.nl> # Christof Monz <c.monz@uva.nl> # last revision: 26 Januart 2010 ######## the data/ directory is a subset of the New York Times Corpus released by LDC. The subset includes 7.167 articles from April 2007. The directory structure is as follows: year-month -> day -> article.xml Each article comes in one XML file in the corresponding directory. Sample data is provided in the directory: sample/ Contains 208 articles from May 01, 2007. In the docs/ directory you can find useful guidelines on how to access the XML data. Extraction tools written in JAVA can be found in the tools/ directory.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published