GitHub - drwiner/DIRTAR: DIRTAR: "Discovery of Inference Rules from Text for Action Recognition": modified paraphrase extraction algorithm (DIRT) where slots are dependency positions and can be assigned semantic classes to discriminate candidates

David Winer IE task: DIRTAR: "Discovery of Inference Rules from Text for Action Recognition"

Summary: DIRT algorithm (Lin and Pantel, 2001) with modifications (lemmas, constituency parse, slot-sim, slot-types, hypernyms, semantic-discrimination)

CODE FILES:

moviescript_crawler.py - collects movie corpus (from local path) and each document is inserted into a single document called movie_combo.txt.

sentence_parser.py - reads moviescripts from movie_comb.txt, splits into sentences, and each sentence into clauses (using constituency parse), output is "movie_cluases.txt"

semantic_parser.py - used for one of the experimental conditions - hand written frame-net style rules for discriminating candidate nouns from slots

dirtar.py - runs the dirt algorithm with all experimental conditions, and includes which X and Y slot dependencies are included for some of the experimental conditions. Running this file dumps the databases as pickle files, which are loaded for analysis by "assign_labels_moviedirt.py"

assign_labels_moviedirt.py - reads triple databases and reads (and parses with stanford parser) the test sentences (duel corpus sentences), in "IE_sent_key.txt". Outputs text files for each experimental condition where each line is a verb in the duel corpus, the guess, etc.

score_labels_dirtar.py - reads the experimental labels and calculates fscore, etc, and spits out files which are in "scored_labels" folder

FOLDERS:

extract_duel_sentences.zip - includes original duel corpus and data structures used to construct "IE_sent_key.txt" from excel file.

experimental_labels - folder containing the text files spit by assign_labels_moviedirt.py

scored_labels - folder containing the evaluation of each experimental condition, includes "total" and 1 per action of interest

Other Files:

IE_sent_key.txt - each line has a sentence from duel corpus, followed by "-#-", followed by list of action classes for that sentence.

movie_combo.txt - not attached (too large) contains combined text file for movies from Walker's cleaned imsdb database (https://nlds.soe.ucsc.edu/fc2)

movie_clauses.txt - movie_comb.txt separated into clause triples, where each slot in the triple has some extra annotations from the parse

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
experimental_labels		experimental_labels
redo_labels_420		redo_labels_420
redo_scored_labels_420		redo_scored_labels_420
scored_labels		scored_labels
.gitignore		.gitignore
IE_sent_key.txt		IE_sent_key.txt
README.md		README.md
VC_DIRT.py		VC_DIRT.py
action_lemmas.txt		action_lemmas.txt
assign_labels_moviedirt.py		assign_labels_moviedirt.py
assign_labels_vc.py		assign_labels_vc.py
clockdeco.py		clockdeco.py
dirtar.py		dirtar.py
format_corpus.py		format_corpus.py
key_phrases		key_phrases
moviescript_crawler.py		moviescript_crawler.py
random_test.txt		random_test.txt
run_dirtar_tests.py		run_dirtar_tests.py
score_labels_dirtar.py		score_labels_dirtar.py
score_labels_moviedirt.py		score_labels_moviedirt.py
semantic_parser.py		semantic_parser.py
sentence_parser.py		sentence_parser.py
sentence_splitter.py		sentence_splitter.py
setup_parser.py		setup_parser.py
total_plot.pdf		total_plot.pdf

drwiner/DIRTAR

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages