Skip to content

vitorbaptista/tabulator-py

 
 

Repository files navigation

tabulator-py

Travis Coveralls

A utility library that provides a consistent interface for reading tabular data.

Usage

This section is intended to be used by end-users of the library.

Getting Started

Installation

To get started (under development):

$ pip install tabulator

Simple interface

Fast access to the table with topen (stands for table open) function:

from tabulator import topen, processors

with topen('path.csv', with_headers=True) as table:
    for row in table:
        print(row)
        print(row.get('header'))

For the most use cases topen function is enough. It takes the source argument:

<scheme>://path/to/file.<format>

and uses corresponding Loader and Parser to open and start to iterate over the table. Also user can pass scheme and format explicitly as function arguments. The last topen argument is encoding - user can force Tabulator to use encoding of choice to open the table.

Read more about topen - documentation.

Function topen returns Table instance. We use context manager to call table.open() on enter and table.close() when we exit:

  • table can be iterated like file-like object returning row by row
  • table can be read row by bow using readrow method (it returns row tuple)
  • table can be read into memory using read function (return list or row tuples) with limit of output rows as parameter.
  • headers can be accessed via headers property
  • table pointer can be set to start via reset method.

Read more about Table - documentation.

In the example above we use processors.Headers to extract headers from the table (via with_headers=True shortcut). Processors is a powerfull Tabulator concept. Parsed data goes thru pipeline of processors to be updated before returning as table row.

Read more about Processor - documentation.

Read a processors tutorial - tutorial.

Advanced interface

To get full control over the process you can use more parameters. Below all parts of Tabulator are presented:

from tabulator import topen, processors, loaders, parsers

table = topen('path.csv',
        loader_options={'encondig': 'utf-8'},
        parser_options={'delimeter': ',', quotechar: '|'},
        loader_class=loaders.File,
        parser_class=parsers.CSV,
        iterator_class=CustomIterator,
        table_class=CustomTable)
table.add_processor(processors.Headers(skip=1))
headers = table.headers
contents = table.read(limit=10)
print(headers, contents)
table.close()

Also Table class can be instantiated by user (see documentation). But there is no difference between it and topen call with extended list of parameters except topen also calls the table.open() method.

Design Overview

Tabulator uses modular architecture to be fully extensible and flexible. It uses loosely coupled modules like Loader, Parser and Processor to provide clear data flow.

diagram

Documentation

API documentation is presented as docstrings:

Development

This section is intended to be used by tech users collaborating on this project.

Getting Started

To activate virtual environment, install dependencies, add pre-commit hook to review and test code and get run command as unified developer interface:

$ source activate.sh

Reviewing

The project follow the next style guides:

To check the project against Python style guide:

$ run review

Testing

To run tests with coverage check:

$ run test

Coverage data will be in the .coverage file.

About

[WIP] A utility library that provides a consistent interface for reading tabular data. Work towards GoodTables and MessyTables V2.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.5%
  • Shell 0.5%