Skip to content

brannerchinese/ChineseUtilities

Repository files navigation

ChineseUtilities

Utilities to help me with Chinese-language work and other NLP tasks

  1. json_texts: Contains research files in progress, in JSON format. Format specification is at json_format_for_prosody.txt. Program handle_files.py enables encrypted version of data files to be pushed to repo, but keeps contents private.

  2. character_count.py: Count the Chinese characters (only) in a file and return their overall percentages. File to be opened must be in directory DATA.

  3. separate_pinyin/ Takes a string of Pīnyīn as input and returns a list of the discrete component syllables. There is a second program count_syllables.py to count the number of syllables found.

  4. convert_pinyin/: Convert files in Pages (v. 3, "Pages '08") format so that their non-standard tonal diacritics are normalized to Unicode. Does not work with later versions of Pages. Sample font ("shyrbaw" 時報, based on Times) is included in directory.

  5. statistics/: Little programs to calculate statistical tests.

  6. poetry_flask/: The beginnings of a web application to assist the study of medieval Chinese prosody.

  7. hanamin_fonts/: Copy of the HANAMIN fonts for use with this project.

[end]

About

Utilities to help me with Chinese-language work

Resources

Stars

Watchers

Forks

Packages

No packages published