- Maintainer
- Topic
- Runthrough
- Leon Schmidt, Computational Linguistics B.A.
- re
- os
- nltk
- random
- framenet
- spaCy (English language model)
This repository corresponds to my Bachelor Thesis in which I perform a merge between FrameNet information and Connotation Frames (Rashkin et. al, 2016). The semantic roles of each ressource are being connected for every verb that is represented in FrameNet exactly once. The evaluation consists also of an interactive program which can be found in this repository.
To get everything going follow the instructions here. First some requirements need to be met:
- The Connotation Frame Lexicon can be found in
/data/full_frame_info.txt
- In order to run most of the code you need to download FrameNet from the nltk library. You can easily install FrameNet by importing nltk and running the command:
>>>nltk.download('framenet_v17')
The preprocessing of FrameNet and Connotation Frame data for the purpose of this work is implemented in the folder /preprocessing/
.
It consists of the following files:
/connotation_frames_preprocessing.py
: Preprocessing all connotation frames to a format designed for further processes/framenet_preprocessing.py
: Preprocessing FrameNet data and a few methods for an easier access to FrameNet data/serialization.py
: Methods for loading and saving pickle objects
The processed Connotation Frame Verbs can be found as a dictionary in /preprocessing/obj/extracted_cf_verbs.pkl
.
The main algorithm can be found in /framenet_connotationframes_mapping.py
.
The implemented methods are:
find_common_verbs(filename)
: Finding all verbs that are in the Connotation Frame lexicon and FrameNet; returns a list of all verbscf_verbs_frame_count(filename)
: Computes for each Connotation Frame verb the amount of Lexical Units in FrameNet; returns a dictionary with verbs as key and the amount of LUs as valuefind_unambiguous_common_verbs(verb_frame_amount_dict)
: Retrieves verbs that occur only once in FrameNet; returns a listmap_cfs_lus(verbs, cfs)
: Merges the FrameNet information (lemma & Lexical Unit ID) with the respective Connotation Frame for each verbdetect_subject(nlp, sentence, lu)
: Detects the syntactic subject of a given head (verb) and returns it's string, position, head and a boolean whether it's a passive case or notdetect_subject_short_phrase(nlp, sentence, lu)
: Same as above, but the returned subject can be a phrase. All syntactic children of the subject are being added to the phrasedetect_subject_long_phrase(nlp, sentence, lu
): Same as above, but the returned subject can be a phrase. All syntactic descendants (not only children) of the subject are being added to the phrasedetect_object(nlp, sentence, lu)
: Detects the syntactic object of a given head (verb) and returns it's string, position, head and a boolean whether it's a passive case or notdetect_object_short_phrase(nlp, sentence, lu)
: Same as above, but the returned object can be a phrase. All syntactic children of the object are being added to the phrasedetect_object_long_phrase(nlp, sentence, lu
): Same as above, but the returned object can be a phrase. All syntactic descendants (not only children) of the object are being added to the phrase
map_cf_roles_and_fes_long_phrase_all_sents(nlp, mapping_verb_lu_cfs)
: Mapping of semantic roles with the so called long phrase approach. For each verb, all sentences in FrameNet are being parsed. If the logical subject/object matches the position of a frame element, this frame element will be added to the set of subject- or object corresponding roles (see Thesis for more detail)map_cf_roles_and_fes_short_phrase_all_sents(nlp, mapping_verb_lu_cfs)
: Mapping of semantic roles with the so called short phrase approachmap_cf_roles_and_fes_naive_all_sents(nlp, mapping_verb_lu_cfs)
: Mapping of semantic roles with the so called naive approach
All of the methods return a dictionary which contains the Lexical Unit IDs as keys and further information, e.g. thr mapped roles, as values.
The interactive program for the evaluation can be found in /evaluation.py
. The implemented methods are:
show_mapping_for_one_verb_naive(nlp, lu_id, lu_text)
: Prints each example sentence and the calculated role mapping (naive approach) for a verb. The method is implemented for checking purposesshow_mapping_for_one_verb_short(nlp, lu_id, lu_text)
: Same as above; for short phrase approachshow_mapping_for_one_verb_long(nlp, lu_id, lu_text)
: Same as above; for long phrase approachmap_evaluation(role_mapping, approach, eval_list)
: The interactive program for the evaluation of the role mapping. It saves the evaluation of a user in a .pkl file in the folder/eval/
. The user has to evaluate 25 LUs with 2 sentences each. After each LU, the process is being saved so the user can interrupt the evaluation. A Readme of the evaluation can be found in the/eval
folder. It provides guidance and examples for the evaluationpick_lus_for_evaluation(nlp, role_mapping)
: Picks pseudo random LUs for the eval and returns statistics so one can be sure there are enough special cases, e.g. passive casescf_evaluation(role_mapping, eval_list)
: The interactive program for the evaluation of the Connotation Frames. It saves the evaluation of a user in a .pkl file in the folder/eval/
. The user has to evaluate 25 LUs with 2 sentences each. Firstly, the user has to rate the Connotation Frames without context. Only then the two sentences will be displayed and the user shall evaluate the Connotation Frame again. After each LU, the process is being saved so the user can interrupt the evaluation. A Readme of the evaluation can be found in the/eval
folder. It provides guidance and examples for the evaluation
Evaluation results - stored in .pkl
files - can be found in the /eval
folder.
Plots can be found as .png
files in /plots
.
The file /statistics.py
creates the plots, computes cohen's kappa and reads the evaluation and wraps up the results. The implemented methods are:
frames_per_verb(verb_dictionary)
: Counts the amount of Lexical Units which evoke a specific number of framesplot_verb_frame_amount(verb_frame_amount_dict)
: Plots the statistics for evoked frames per Verb/Lexical Unit and saves the plot as a .png fileshow_dependency_parse(sentence)
: Shows a spaCy dependency parse for the input sentencecohens_kappa(eval_r1, eval_r2, role)
: Computes cohen's kappa between both annotators for the role mapping evaluation. Either for the agent role or for the theme rolecf_kappa(eval_r1, eval_r2)
: Computes cohen's kappa between both annotators for the Connotation Frame evaluationread_cf_eval(eval_r1, eval_r2)
: Reads and wraps up the results of the Connotation Frame evaluation and returns the results in a listcf_kappa_with_original(eval_r1, eval_r2, type)
: Calculates the Cohens Kappa between the original CF values and the evaluated CF values of this work