Skip to content

Fei00Wu/kaldi_data_manager

Repository files navigation

Kaldi Data Manager

Manage Kaldi-style data directory as ORM for easier data management (train-dev split, etc).

Example usage:

  • Convert Kaldi-style data directory to database
python3 data2db.py --data_dir "data/cmu_kids" --db_file "data/cmu.db" 
  • Convert database to Kaldi-style directory
python3 db2data.py --db_file "data/cmu_train.db" --data_dir "data/cmu_kids_train"
  • Split database by transcribed sentence (reserved certain amount of sentence). Also supports splitting by "spk"(reserves certain amount of speakers), and splitting by "utt" (reserves certain amount of utterances).
python3 split_db.py --db_file "data/cmu.db" \
  --split_by "sent" --data_dir "data/cmu_sent" \
  --split_ratio "{'train':0.7, 'dev': 0.15, 'test': 0.15}" 

About

Data manager for Kaldi

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages