Skip to content

DIRTAR: "Discovery of Inference Rules from Text for Action Recognition": modified paraphrase extraction algorithm (DIRT) where slots are dependency positions and can be assigned semantic classes to discriminate candidates

drwiner/DIRTAR

Repository files navigation

David Winer IE task: DIRTAR: "Discovery of Inference Rules from Text for Action Recognition"

Summary: DIRT algorithm (Lin and Pantel, 2001) with modifications (lemmas, constituency parse, slot-sim, slot-types, hypernyms, semantic-discrimination)

CODE FILES:

moviescript_crawler.py - collects movie corpus (from local path) and each document is inserted into a single document called movie_combo.txt.

sentence_parser.py - reads moviescripts from movie_comb.txt, splits into sentences, and each sentence into clauses (using constituency parse), output is "movie_cluases.txt"

semantic_parser.py - used for one of the experimental conditions - hand written frame-net style rules for discriminating candidate nouns from slots

dirtar.py - runs the dirt algorithm with all experimental conditions, and includes which X and Y slot dependencies are included for some of the experimental conditions. Running this file dumps the databases as pickle files, which are loaded for analysis by "assign_labels_moviedirt.py"

assign_labels_moviedirt.py - reads triple databases and reads (and parses with stanford parser) the test sentences (duel corpus sentences), in "IE_sent_key.txt". Outputs text files for each experimental condition where each line is a verb in the duel corpus, the guess, etc.

score_labels_dirtar.py - reads the experimental labels and calculates fscore, etc, and spits out files which are in "scored_labels" folder

FOLDERS:

extract_duel_sentences.zip - includes original duel corpus and data structures used to construct "IE_sent_key.txt" from excel file.

experimental_labels - folder containing the text files spit by assign_labels_moviedirt.py

scored_labels - folder containing the evaluation of each experimental condition, includes "total" and 1 per action of interest

Other Files:

IE_sent_key.txt - each line has a sentence from duel corpus, followed by "-#-", followed by list of action classes for that sentence.

movie_combo.txt - not attached (too large) contains combined text file for movies from Walker's cleaned imsdb database (https://nlds.soe.ucsc.edu/fc2)

movie_clauses.txt - movie_comb.txt separated into clause triples, where each slot in the triple has some extra annotations from the parse

About

DIRTAR: "Discovery of Inference Rules from Text for Action Recognition": modified paraphrase extraction algorithm (DIRT) where slots are dependency positions and can be assigned semantic classes to discriminate candidates

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages