Skip to content

brandeis-cosi140b-s16/L1ML

Repository files navigation

L1ML

This is an group project for CS140B, Natural Language Annotation for Machine Learning, under Prof. James Pustejovsky, in the spring 2016 semester.

Team members

  • Jessica Huynh
  • Ryan Nicoll
  • Yuzhe Chen

Goals

  • To use the TOEFL11 corpus for native language identification for non-native speakers of English from among the 11 given native languages in the texts in the corpus (Arabic, Chinese, French, German, Hindi, Italian, Japanese, Korean, Spanish, Telugu, and Turkish)
  • To annotate non-native speakers' language features (syntactic, lexical)
  • To determine which features are representative of particular native languages
  • To develop a specification by determining the most salient language features for these purposes, that are better than using structural features