L1ML

This is an group project for CS140B, Natural Language Annotation for Machine Learning, under Prof. James Pustejovsky, in the spring 2016 semester.

Team members

To use the TOEFL11 corpus for native language identification for non-native speakers of English from among the 11 given native languages in the texts in the corpus (Arabic, Chinese, French, German, Hindi, Italian, Japanese, Korean, Spanish, Telugu, and Turkish)
To annotate non-native speakers' language features (syntactic, lexical)
To determine which features are representative of particular native languages
To develop a specification by determining the most salient language features for these purposes, that are better than using structural features

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
ML		ML
Preprocess		Preprocess
annotate		annotate
contract		contract
final-paper		final-paper
gold-standard		gold-standard
presentations		presentations
toefl11_part		toefl11_part
.gitignore		.gitignore
Annotation Specification L1ML v1.3.1.pdf		Annotation Specification L1ML v1.3.1.pdf
L1MLFinalPresentation.pdf		L1MLFinalPresentation.pdf
L1ML_v1.1.1.dtd		L1ML_v1.1.1.dtd
TaskDescription.pdf		TaskDescription.pdf
Week2Update.txt		Week2Update.txt
Week3Update		Week3Update
WorkflowExampleL1ML.pdf		WorkflowExampleL1ML.pdf
naivebayes.py		naivebayes.py
naivebayes_result_smallcorpus.txt		naivebayes_result_smallcorpus.txt
readme.md		readme.md
split_corpus.py		split_corpus.py
split_for_annotation.py		split_for_annotation.py