CS 124 PA 6: Direct Machine Translation
Due February 28th @ 5:00PM To Do + Deadlines:
By Saturday (1-4 complete):
-
Choose a language
-
Build a test corpus for that language (15 sentences from outside sources: save sources); pick 10 sentences for the dev set and 5 for the test set
-
Create a dictionary for all the words in the corpus using www.wordreference.com or google translate.
- If there’s more than one definition, include all of them in the corpus and we’ll come up with good heuristics for choosing the correct one
-
Look up annotation toolkits for the chosen language and determine how to use them
-
Actual coding:
i) Use the dictionary to translate the dev corpus into English words and find annotation tools for sentences that will be helpful for post processing
ii) Come up with 6-10 post processing strategies to improve the baseline translations and code them
-
Run the system on the test set
-
Error analysis
-
Compare our result with google’s
-
Follow up report