Skip to content

PabloRomanH/cihaidata-unihan

 
 

Repository files navigation

This project is inactive

image

image

image

cihaidata-unihan - tool to build unihan into simple data format CSV format. Part of the cihai project.

Unihan's data is disperved across multiple files in the format of:

U+3400  kCantonese  jau1
U+3400  kDefinition (same as U+4E18 丘) hillock or mound
U+3400  kMandarin   qiū
U+3401  kCantonese  tim2
U+3401  kDefinition to lick; to taste, a mat, bamboo bark
U+3401  kHanyuPinyin    10019.020:tiàn
U+3401  kMandarin   tiàn

script/process.py will download Unihan.zip and build all files into a single tabular CSV (default output: ./data/unihan.csv):

char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin
丘,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū
㐁,U+3401,tim2,"to lock; to taste, a mat, bamboo bark",10019.020:"tiàn,tiàn"

process.py supports command line arguments. See script/process.py CLI arguments for information on how you can specify custom columns, files, download URL's and output destinations.

Being built against unit tests. See the Travis Builds and Revision History.

Usage

To download and build your own unihan.csv:

$ ./scripts/process.py

Creates data/unihan.csv.

See script/process.py CLI arguments for advanced usage examples.

Structure

# dataset metadata, schema information.
datapackage.json

# (future) when this package is stable, unihan.csv will be provided
data/unihan.csv

# stores downloaded Unihan.zip and it's txt file contents (.gitignore'd)
data/build_files/

# script to download + build a SDF csv of unihan.
scripts/process.py

# unit tests to verify behavior / consistency of builder
testsuite/*

# python 2/3 compatibility modules
script/_compat.py
script/unicodecsv.py

# python module, public-facing python API.
__init__.py
scripts/__init__.py

# utility / helper functions
scripts/util.py

Cihai is not required for:

  • data/unihan.csv - simple data format compatible csv file.
  • scripts/process.py - create a data/unihan.csv.

When this module is stable, data/unihan.csv will have prepared releases, without requires using scripts/process.py. process.py will not require external libraries.

Examples

Related links:

Python support Python 2.7, >= 3.3
Source https://github.com/cihai/cihaidata-unihan
Docs http://cihaidata-unihan.rtfd.org
Changelog http://cihaidata-unihan.readthedocs.org/en/latest/history.html
API http://cihaidata-unihan.readthedocs.org/en/latest/api.html
Issues https://github.com/cihai/cihaidata-unihan/issues
Travis http://travis-ci.org/cihai/cihaidata-unihan
Test coverage https://coveralls.io/r/cihai/cihaidata-unihan
pypi https://pypi.python.org/pypi/cihaidata-unihan
Ohloh https://www.ohloh.net/p/cihaidata-unihan
License MIT.

git repo

$ git clone https://github.com/cihai/cihaidata-unihan.git

install dev

$ git clone https://github.com/cihai/cihaidata-unihan.git cihai
$ cd ./cihai
$ virtualenv .env
$ source .env/bin/activate
$ pip install -e .

tests

$ python setup.py test

About

Unihan dataset for cihai *this project is inactive*

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%