This is an group project for CS140B, Natural Language Annotation for Machine Learning, under Prof. James Pustejovsky, in the spring 2016 semester.
- Jessica Huynh
- Ryan Nicoll
- Yuzhe Chen
- To use the TOEFL11 corpus for native language identification for non-native speakers of English from among the 11 given native languages in the texts in the corpus (Arabic, Chinese, French, German, Hindi, Italian, Japanese, Korean, Spanish, Telugu, and Turkish)
- To annotate non-native speakers' language features (syntactic, lexical)
- To determine which features are representative of particular native languages
- To develop a specification by determining the most salient language features for these purposes, that are better than using structural features