pycldf

A python package to read and write CLDF datasets

Writing CLDF

from pycldf.dataset import Dataset
from pycldf.sources import Source
dataset = Dataset('mydb')
dataset.fields = ('ID', 'Language_ID', 'Parameter_ID', 'Value', 'Source', 'Comment')
dataset.sources.add(Source('book', 'Meier2005', author='Hans Meier', year='2005', title='The Book'))
dataset.add_row([
    '1', 
    'http://glottolog.org/resource/languoid/id/stan1295', 
    'http://concepticon.clld.org/parameters/1277', 
    'hand', 
    'Meier2005[3-7]', 
    ''])
dataset.write('.')

results in

mydb.csv

ID,Language_ID,Parameter_ID,Value,Source,Comment
1,http://glottolog.org/resource/languoid/id/stan1295,http://concepticon.clld.org/parameters/1277,hand,Meier2005[3-7],

mydb.bib

@book{Meier2005,
    author = {Meier, Hans},
    title = {The Book},
    year = {2005}
}

mydb.csv-metadata.json

{
    "@context": [
        "http://www.w3.org/ns/csvw",
        {
            "@language": "en"
        }
    ],
    "dc:format": "cldf-1.0",
    "dialect": {
        "header": true,
        "delimiter": ",",
        "encoding": "utf-8"
    },
    "tables": [
        {
            "url": "",
            "dc:type": "cldf-values",
            "tableSchema": {
                "primaryKey": "ID",
                "columns": [
                    {
                        "datatype": "string",
                        "name": "ID"
                    },
                    {
                        "datatype": "string",
                        "name": "Language_ID"
                    },
                    {
                        "datatype": "string",
                        "name": "Parameter_ID"
                    },
                    {
                        "datatype": "string",
                        "name": "Value"
                    },
                    {
                        "datatype": "string",
                        "name": "Source"
                    },
                    {
                        "datatype": "string",
                        "name": "Comment"
                    }
                ]
            }
        }
    ]
}

Reading CLDF

>>> from pycldf.dataset import Dataset
>>> dataset = Dataset.from_file('mydb.csv')
>>> dataset
<Dataset mydb>
>>> len(dataset)
1
>>> row = dataset.rows[0]
>>> row
Row([('ID', u'1'), 
     ('Language_ID', 'http://glottolog.org/resource/languoid/id/stan1295'), 
     ('Parameter_ID', 'http://concepticon.clld.org/parameters/1277'), 
     ('Value', 'hand'), 
     ('Source', 'Meier2005[3-7]'), 
     ('Comment', '')])
>>> row['Value']
'hand'
>>> row.refs
[<Reference Meier2005[3-7]>]
>>> row.refs[0].source
<Source Meier2005>
>>> print row.refs[0].source
Meier, Hans. 2005. The Book.
>>> print row.refs[0].source.bibtex()
@book{Meier2005,
  year   = {2005},
  author = {Meier, Hans},
  title  = {The Book}
}

Validating a data file

By default, data files are read in strict-mode, i.e. invalid rows will result in an exception being raised. To validate a data file, it can be read in validating-mode.

For example the following output is generated

>>> from pycldf.dataset import Dataset
>>> dataset = Dataset.from_file('mydb.csv', skip_on_error=True)
WARNING:pycldf.dataset:skipping row in line 3: wrong number of columns in row
WARNING:pycldf.dataset:skipping row in line 4: duplicate ID: 1
WARNING:pycldf.dataset:skipping row in line 5: missing citekey: Mei2005

when reading the file

ID,Language_ID,Parameter_ID,Value,Source,Comment
1,stan1295,1277,hand,Meier2005[3-7],
1,stan1295,1277,hand,Meier2005[3-7]
1,stan1295,1277,hand,Meier2005[3-7],
2,stan1295,1277,hand,Mei2005[3-7],

Support for augmented metadata

pycldf provides some support for metadata properties as described in W3's Metadata Vocabulary for Tabular Data, in particular,

On column description level,
- datatype is interpreted to use appropriate python objects internally,
- a URI template provided as valueUrl can be expanded calling Row.valueUrl(<colname>).
On schema description level,
- a URI template provided as aboutUrl is used to compute the URL available as Row.url.

So the example above could be rewritten more succintly:

from pycldf.dataset import Dataset
from pycldf.sources import Source
dataset = Dataset('mydb')
dataset.fields = ('ID', 'Language_ID', 'Parameter_ID', 'Value', 'Source', 'Comment')
dataset.table.schema.columns['ID'].datatype = int
dataset.table.schema.columns['Language_ID'].valueUrl = 'http://glottolog.org/resource/languoid/id/{Language_ID}'
dataset.table.schema.columns['Parameter_ID'].valueUrl = 'http://concepticon.clld.org/parameters/{Parameter_ID}'
dataset.sources.add(Source('book', 'Meier2005', author='Hans Meier', year='2005', title='The Book'))
dataset.add_row(['1', 'stan1295', '1277', 'hand', 'Meier2005[3-7]', ''])
dataset.write('.')

And then accessed as follows:

>>> from pycldf.dataset import Dataset
>>> dataset = Dataset.from_file('mydb.csv')
>>> row = dataset.rows[0]
>>> type(row['ID'])
<type 'int'>
>>> row.valueUrl('Language_ID')
'http://glottolog.org/resource/languoid/id/stan1295'
>>> row['Language_ID']
'stan1295'

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
pycldf		pycldf
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README.rst		README.rst
RELEASING.md		RELEASING.md
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycldf

pycldf

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE

LICENSE

MANIFEST.in

MANIFEST.in

README.md

README.md

README.rst

README.rst

RELEASING.md

RELEASING.md

setup.cfg

setup.cfg

setup.py

setup.py

tox.ini

tox.ini

Repository files navigation

pycldf

Writing CLDF

Reading CLDF

Validating a data file

Support for augmented metadata

About

Releases

Packages

Languages

License

LinguList/pycldf

Folders and files

Latest commit

History

Repository files navigation

pycldf

Writing CLDF

Reading CLDF

Validating a data file

Support for augmented metadata

About

Resources

License

Stars

Watchers

Forks

Languages