GitHub - orestes1986/voz2b

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
.idea		.idea
css		css
data		data
js		js
paste		paste
pattern		pattern
sam-clisp-irb		sam-clisp-irb
sam-clisp		sam-clisp
scripts		scripts
scripts_no_voz		scripts_no_voz
stories		stories
tool_corpus_functions_summary		tool_corpus_functions_summary
webapp2_extras		webapp2_extras
webob		webob
.gitignore		.gitignore
README		README
all_coreferenced-entities.csv		all_coreferenced-entities.csv
all_coreferenced-entities.tsv.csv		all_coreferenced-entities.tsv.csv
autoreloader.py		autoreloader.py
classificationhelper.py		classificationhelper.py
coreferencehelper.py		coreferencehelper.py
dependencyhelper.py		dependencyhelper.py
entitymanager.py		entitymanager.py
featuremanager.py		featuremanager.py
features_basic.py		features_basic.py
formatter.py		formatter.py
functionsearchhelper.py		functionsearchhelper.py
grammarhelper.py		grammarhelper.py
graphhelper.py		graphhelper.py
markovhelper.py		markovhelper.py
mctshelper.py		mctshelper.py
narrativehelper.py		narrativehelper.py
networkcachemanager.py		networkcachemanager.py
nltkhelper.py		nltkhelper.py
oldannotationhelper.py		oldannotationhelper.py
overall_confusion.txt		overall_confusion.txt
parse_tree_mention_helper.py		parse_tree_mention_helper.py
qsahelper.py		qsahelper.py
quotedbase.py		quotedbase.py
quotedspeechhelper.py		quotedspeechhelper.py
quotedspeechpredictor.py		quotedspeechpredictor.py
requirements.txt		requirements.txt
riuhelper.py		riuhelper.py
sequence.py		sequence.py
settings.py		settings.py
standalone-frontend.py		standalone-frontend.py
stanfordhelper.py		stanfordhelper.py
styhelper.py		styhelper.py
synthetichelper.py		synthetichelper.py
test_voz.py		test_voz.py
tool_aiide_to_riu.py		tool_aiide_to_riu.py
tool_coreference_report_ijcai.py		tool_coreference_report_ijcai.py
tool_corpus_functions_summary.py		tool_corpus_functions_summary.py
tool_entity_classification_loop_aaai.py		tool_entity_classification_loop_aaai.py
tool_filter_quoted_speech.py		tool_filter_quoted_speech.py
tool_generate_sentences_for_annotation.py		tool_generate_sentences_for_annotation.py
tool_get_all_verbs_from_finlayson.py		tool_get_all_verbs_from_finlayson.py
tool_irb_consent.html		tool_irb_consent.html
tool_irb_form_1.html		tool_irb_form_1.html
tool_irb_form_2.html		tool_irb_form_2.html
tool_irb_form_3.html		tool_irb_form_3.html
tool_irb_form_4.html		tool_irb_form_4.html
tool_irb_sorry.html		tool_irb_sorry.html
tool_irb_synthetic_forms.py		tool_irb_synthetic_forms.py
tool_irb_synthetic_forms_data_collection.py		tool_irb_synthetic_forms_data_collection.py
tool_irb_synthetic_forms_httpserver.py		tool_irb_synthetic_forms_httpserver.py
tool_irb_thanks.html		tool_irb_thanks.html
tool_print_stats_on_corpus.py		tool_print_stats_on_corpus.py
tool_run_tests.py		tool_run_tests.py
tool_sentence_classification_aaai.py		tool_sentence_classification_aaai.py
tool_sigdial_functionsearchhelper.py		tool_sigdial_functionsearchhelper.py
tool_sigdial_test_propp_automatic.py		tool_sigdial_test_propp_automatic.py
tool_sigdial_test_propp_manual.py		tool_sigdial_test_propp_manual.py
tool_sigdial_test_qsa_automatic.py		tool_sigdial_test_qsa_automatic.py
tool_sigdial_test_qsa_manual.py		tool_sigdial_test_qsa_manual.py
util.py		util.py
utterancegenerator.py		utterancegenerator.py
verbhelper.py		verbhelper.py
verbmanager.py		verbmanager.py
voz.py		voz.py
vozbase.py		vozbase.py
web.py		web.py
webapp2.py		webapp2.py
weights.json		weights.json

Repository files navigation

INSTALLATION:
After checking out the repository, make sure you have the following libraries:
* libxml2 (sudo yum install libxml2)
Then, install dependencies:
> pip install -r requirements.txt
Finally, download required NLTK corpora:
> python -m nltk.downloader wordnet
> python -m nltk.downloader names
> python -m nltk.downloader punkt
> python -m nltk.downloader sentiwordnet
> python -m nltk.downloader stopwords

RUNNING:
There are several stand-alone scripts. The run the web interface, run ./web.py

KNOWN CAVEATS AND LIMITATIONS:
Split antecedents are currently not supported and despite the code being able to handle lists of mentions, this is currently disabled.
There is very limited world knowledge and because of biases in the training data, proper nouns such as toponyms are likely to be classified as characters.
The training data comes from Afanasev's folktales, the system may underperform on out-of-domain text and overfit stories in the training data.

TODO:
Port latest code from IJCAI that iterates in order to refine coreference, verb arguments and predictions for mention types and roles.