Skip to content

Python class representing a list of data rows, each row sharing a set of common keys (much like a database table). Class provides many methods and operations for filtering, aggregating or otherwise manipulating the data

License

airdrik/PyDataTable

Repository files navigation

PyDataTable is intended to provide a representation similar to a database table - a list of table rows where each row contains a set of key-value pairs, sharing common keys (internally represented by a list of dicts).  It includes several capabilities common to database tables as well as many other nice-to-have capabilities. 

Some of the capabilities include:	
	Selecting columns - including filtering selected columns by column content and adding columns 
		populated by the result of some expression
	Filtering rows by some criteria (by table row or by column value)
	Joining multiple tables (including database table-join, and concatenating rows)
	table aggregation (similar to database group by)
	bucketing and aggregating of bucketed results
	de-duplication (e.g. select distinct)
	sorting
	diffing of two tables
Table headers may be strings or other object which may be used as a key in a dict
		(implements __hash__ and __eq__ for storage and lookup, __str__ and __cmp__ for display).  
	Values stored may be of any type (preferably having a reasonably concise __str__ representation).

Includes the following modules:
* datatable - the main DataTable module as describbed above
* datatable_alt - an alternative representation of the DataTable intended to cut down on memory usage
	Replaces the list-of-dicts model with a dict-of-lists model.  This comes with some trade-offs in terms of performance (some operations may be faster (column-based operations), some will be slower (row-based operations)), but otherwise the capabilities are the same
* datatable_util - a collection of utilities for use with DataTables, including table-to-text formatters (csv, fixedwidth), some basic "column filters", and tools for changing data within a column
* datatable_aggregate - a collection of methods for aggregating results, to be used by the DataTable.aggregate function
* datatable_parsers - a collection of utilities for parsing DataTables from various sources (like those generated by the corresponding generators in datatable_util, and DB-API 2.0 compliant database cursors)
* datatable_diff - module used for examining the differences between two DataTable objects.
* datatable_stream - adds a stream method to DataTable which adds a streaming pipeline style of processing datatable data (inspired by java 8/reactive streams)
	the DataTableStream class provides the same interface as DataTable, but defers processing until a terminal operation is performed
* hierarchies - an alternative hierarchical representation of data - each level in the hierarchy is
	a specific key with the nodes of that level containing the values for that key.  See the documentation for that module for details.
* hierarchy_aggregate - a collection of methods for aggregating results, to be used by the Hierarchy.aggregate method

If you have any suggestions, please contact me at airdrik@gmail.com

About

Python class representing a list of data rows, each row sharing a set of common keys (much like a database table). Class provides many methods and operations for filtering, aggregating or otherwise manipulating the data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages