How the system works from end to end

The system is built in python and bash. Optimal systems are identified with R. Some dependencies that you might not have.

nltk (Python library)
csplit
beautifulsoup (python xml parser)

Converting to pseudo--Senseval-2 format

Read from .train to .pos files

 cd EnglishLS.test/
 ./import.sh EnglishLS.test

This creates two directories

EnglishLS.test.split
senseval2_format

The former contains one file per instance within the corpus, and the latter contains one file per word type of interest, with one of these files containing each of the instances for the corresponding word.

Because we automated all of the tests we wrote the output to files as we tested it. The results are included in the folder experiment/allruns. The calls.csv file shows the features that were used in each test and the output for each test. The output was formatted in a way that the scorer would accept.

Running

First, there's some weird system state that you need to manipulate to get maltparse to work. Run ./parser.py to set this state.

Then, run ./classify.py. It will read in the training and test files; be sure to read it's --help or -h text for flags - running it without flags will fail.

If you want to use the dependency parser, you will have to install and configure (possibly with edits to files) MaltParse yourself - it was too large to fit in the file size requirement. Our setup included a MaltParse directory at root, containing engmalt.linear.mco, the training file, and a directory malt-1.2, with the java binaries in it. If you mimic that setup, it should work without modification.

The example outputs can be found at experiment/allruns. The file calls.csv describes the parameters used to generate each file. The tmp# files are our actual output.

Name		Name	Last commit message	Last commit date
Latest commit History 421 Commits
EnglishLS.test		EnglishLS.test
EnglishLS.train		EnglishLS.train
experiment		experiment
nltk_data		nltk_data
paper		paper
scoring		scoring
scripts		scripts
sentenceLength		sentenceLength
.gitignore		.gitignore
README.md		README.md
answers.txt		answers.txt
baseline_responses.txt		baseline_responses.txt
classify.py		classify.py
colocation.py		colocation.py
common_words.py		common_words.py
common_words_run.txt		common_words_run.txt
cooccurrence.py		cooccurrence.py
data_extension.py		data_extension.py
import.py		import.py
pairing.py		pairing.py
parser.py		parser.py
parses.pickle		parses.pickle
part1		part1
pos_feature.py		pos_feature.py
sentence_length.R		sentence_length.R
stem_feature.py		stem_feature.py
wc_score.txt		wc_score.txt
wsd-alec.sh		wsd-alec.sh
wsd-all.sh		wsd-all.sh
wsd.sh		wsd.sh
wsd1.sh		wsd1.sh
wsd2.sh		wsd2.sh
wsd3.sh		wsd3.sh
wsd4.sh		wsd4.sh

aavarghese/cs4740_3

Folders and files

Latest commit

History

Repository files navigation

How the system works from end to end

Converting to pseudo--Senseval-2 format

About

Resources

Stars

Watchers

Forks

Languages