-
Notifications
You must be signed in to change notification settings - Fork 0
LunasAbacus/DataMiner
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
DataMiner ========= Authors ======= Nathan Jacobs Joshua Adams Mine all the data, for homework and science! Files ===== TagExtractor.py - data structure used to extract tags from the .sgm files that are used to create the feature vectors stopwords.txt - is a list of stop words that are elimated in a preliminary step when creating a feature vector FeatureVector2.py - constructs a feature vector for each reuter in the file by pulling out all of the nouns in the body FeatureVector3.py - constructors a feature vector by keeps a count of words that are not listed in stopwords.txt FeatureVector4.py - constructs a feature vector for each reuter by finding the frequency distribution for each word in the body output-FeatureVector2.txt - sample output for FeatureVector2 output-FeatureVector3.txt - sample output for FeatureVector3 output-FeatureVector4.txt - sample output for FeatureVector4 Installation ============ 1. First download python 2.7.5 from http://www.python.org/download/ 2. Next instal nltk, following instructions from http://nltk.org/install.html 3. In python idle, type the following import nltk nltk.download() 4. Click the download button from the window that pops up Runing The Program ================== To run the program type 'python FeatureVector[number].py' in terminal where number is the feature vector that is being run The output for each program will be according to the number of the feature vector that is executed. For example FeatureVector2.py will create the file 'output-FeatureVector2.txt' which is the output file for FeatureVector2.
About
Mine all the data, for homework and science!
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published