Hebrew syllabification

Description: Final project in "Intro. to NLP" course by Albert Shalumov and Kobi Bodek

Schedule

Due date	Task	Status	Date
22/7	Finish script to convert hebrew to transl. chars	✔️	19/7
26/7	Annotate some data for HMM work	✔️	24/7
2/8	Unigram HMM code finished	✔️	24/7
2/8	Bigram, Trigram HMM code finished	✔️	25/7
2/8	Metrics for HMM finished	✔️	25/7
2/8	Division to syllables finished	✔️	7/8
9/8	Conversion to english letters finished	✔️	7/8
15/8	CRF code finished	✔️	28/7
8/9	Final data version - no more annotation from this point	✔️
9/8	Implement edit distance metric	✔️	7/8
16/8	Finish NN	✔️	15/8
19/9	Finished project	✔️
21/9	Verified project

Files description

Models

Each model must be executed with a single parameter: search | seeds.
Search - train on each possible configuration and calculate accuracy measures
Seeds - Train using selected configuration over different seeds and calculate accuracy

crf_sentence.py - CRF model for word and sentence features
crf_word.py - CRF model for word only features
embedding_mds.py - Create embedding matrix using MDS
embedding_nn.py - Create embedding matrix using NN
hmm.py - HMM model
memm.py - MEMM model
rnn.py - RNN model
rnn_model.bin - Trained RNN model
emb_model_mds.npy - MDS embedding matrix
emb_model_nn.npy - NN embedding matrix

Post-Processing

post_proc\syllabification.py - Syllabification
post_proc\post_processing.py - Romanization

Utilities

metrics.py - Accuracy measures
test.py - Executes all models with the best configuration
input_proc\utils.py - Convert and prepare MILA dataset for annotation
input_proc\verifier.py - Validate annotation

Results

crf_sentence_res.csv - Results of CRF sentence search
crf_word_res.csv - Results of CRF word search
hmm_res.csv - Results of HMM search
memm_res.csv - Results of MEMM search
rnn_res.csv - Results of RNN search

Requirements

cots\used_packages.txt - Used packages

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
cots		cots
data		data
input_proc		input_proc
post_proc		post_proc
Hebrew Syllabification _ Student Facing.pdf		Hebrew Syllabification _ Student Facing.pdf
README.md		README.md
crf_sentence.py		crf_sentence.py
crf_sentence_res.csv		crf_sentence_res.csv
crf_word.py		crf_word.py
crf_word_res.csv		crf_word_res.csv
emb_model_mds.npy		emb_model_mds.npy
emb_model_nn.npy		emb_model_nn.npy
embedding_mds.py		embedding_mds.py
embedding_nn.py		embedding_nn.py
hmm.py		hmm.py
hmm_res.csv		hmm_res.csv
memm.py		memm.py
memm_res.csv		memm_res.csv
metrics.py		metrics.py
rnn.py		rnn.py
rnn_model.bin		rnn_model.bin
rnn_res.csv		rnn_res.csv
test.py		test.py
קובי בודק ואלברט שלומוב _ Hebrew Syllabification.pdf		קובי בודק ואלברט שלומוב _ Hebrew Syllabification.pdf

albert-shalumov/nlp_proj

Folders and files

Latest commit

History

Repository files navigation

Hebrew syllabification

Schedule

Files description

Models

Post-Processing

Utilities

Results

Requirements

About

Resources

Stars

Watchers

Forks

Languages