Here you can find five mini-projects done for the course of Natural Language Technologies taught in the MSc in Computer Science of the University of Turin.
The goal of this project is to evaluate the similarity between pairs of words given in input. I implemented three similarity measures based on WordNet: Wu & Palmer, Leakcock & Chodorow and Shotest Path. For each implemented measure, the Spearman's and Pearson's correlation coefficients are computed.
The goal of this project is to produce an extracting summary of a given input document. To do this I computed the relevance of each paragraph contained in the document with the respect to the topic and the context. The topic is the set of relevat vectors extracted from NASARI using the title of the document. The contex is the set of relevant vectors extracted from NASARI using the body of the document.
The goal of the project is to evaluate the semantic of pairs of words given in input, furthermore we have to assign a similarity score to the pairs.
The goal of the project is to disambiguate a polysemic word in a given sentence. I implemented the Lesk algorithm and disambiguate 63 polysemic word (one for each sentences). 50 out of 63 sentences are extracted from the SemCor corpus.
The goal of this project is to build a translator to translate from Italian to Italian-Yodish, namely to translate an input sequence of the form SVX (Subject, Verb, Other) to an output sequence of the form XSV (Other, Subject, Verb). I implemented the Cocke–Younger–Kasami algorithm (CKY) and made a simple Context Free Grammar (CFG) in Chomsky Normal Form (CNF) to parse the input sequence to a tree representing its syntactic structure. The translation is made simply swapping the S subtree with the X one. The relation is available in italian only. Here some examples of translation.
Italian | Italian-Yodish |
---|---|