Skip to content

ojones/html_table_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synopsis

Transform html tables into usable data structures. Using beautiful soup table object, return 2D array structure, dictionary, or array of column data. Imputes cell values from row and col spans :)

Code Example

Given an html table (with or without row and col spans). You can make a 2D array

from bs4 import BeautifulSoup as bs
from html_table_parser import parser_functions as parse

soup = bs(YOUR_HTML_TABLE, "html.parser")
test_table = soup.find('table')
twod_array = parse.make2d(test_table)

# print 2D array
print(twod_array)

Use that 2D array to return column data by heading name

# print column data by col heading name (case insensitive)
print(parse.twod_col_data(twod_array, 'YOUR_COL_HEADING'))

Or transform the soup table object (test_table) into a dictionary

# row data begins on first row after col headings
# so rowstart is 1
print(parse.make_dict(test_table, 1))

Motivation

I looked everywhere for the code to transform html tables with row and col spans into a 2D array and couldn't find it. So I thought how hard could it be to quickly write my own. It was actally kinda hard, so here it is for the next guy.

Installation

Works on Python 2.7 and Python 3.4.

$ pip install html_table_parser

Tests

$ py.test

Contributors

Don't be shy.

License

MIT

About

Transforms html tables into usable data.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages