GitHub

install

- https://github.com/seatgeek/fuzzywuzzy
- pip install fuzzywuzzy

smi_cleaning.py
- input : .smi file
- output : smi_cleaning.txt
- remove unnecessary lines and
  tag from smi file.
- seperate two conversation in one capture
json_cleaning.py
- read ['transcript'] element from json file.
- remove unnecessary line. ex) ( ) [ ] pattern
smi_superset.py
- input : smi_cleaning.txt
- output : smi_superset1.txt (Data structure: [("caption", start_t,end_t),("caption", start_t, end_t)...] )

4.trans_smi_matching.py - input : json_cleaning.txt, smi_cleaning.txt - fuzzy string matching: https://github.com/seatgeek/fuzzywuzzy - output: sample_result_ver_1.txt

5.trans_smi_matching2.py - output : sample_result_ver_2.txt

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
README.md		README.md
friends.json		friends.json
friends.smi		friends.smi
json_cleaning.py		json_cleaning.py
json_cleaning.txt		json_cleaning.txt
s_sample_265.txt		s_sample_265.txt
sample_result_ver_1.txt		sample_result_ver_1.txt
sample_result_ver_2.txt		sample_result_ver_2.txt
smi_cleaning.py		smi_cleaning.py
smi_cleaning.txt		smi_cleaning.txt
smi_superset.py		smi_superset.py
smi_superset.txt		smi_superset.txt
smi_superset1.txt		smi_superset1.txt
t_sample_80.txt		t_sample_80.txt
tensorflow.py		tensorflow.py
test.py		test.py
trans_smi_matching.py		trans_smi_matching.py
trans_smi_matching2.py		trans_smi_matching2.py
word_match_count.py		word_match_count.py

JueunKim/transcript_smi_matching

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages