Skip to content

Xifax/siteki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

詩的なパーサー
Little utility for parsing Japanese texts/lyrics into words list.


Launch:

pythonw siteki.pyw

Notes:

  • Runs under Python (preferably 2.6.6)
  • Requires PyQt 4.8.1
  • Requires MeCab binaries
  • Makes use of cjktools and cjktools-data packages
  • MeCab (python module) inside
  • Run ./src/install.py to swiftly download all required packages, PyQt and MeCab
  • Pretty Japanese fonts included: ./res/fonts
  • MeCab morphological analysis quite often turns up inappropriate parsing, be aware (same problem with igo, apparently)
  • Stores config data in /home/user/.siteki.ini
  • As of yet, due to uromkan glitches there is no romaji/kana conversion (use IME instead)
  • Corpus frequencies are normalised by sequential numbers, not by their actual frequency values
  • It is also possible to search for lyrics on viewlyrics.com from application itself

How to use:

  1. launch
  • paste some (coherent) Japanese text from txt or html (e.g. /res/data/sample)
  • adjust font family/size as you see fit
  • specify excluded items manually or using frequency range
  • click parse/pdf (first call may take some time to load edict dictionary)
  • print resulting document or save it to pdf file

About

Little parser for Japanese lyrics/textlets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published