Skip to content

jensfinnas/ktweb-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A scraper for a KT Web interface for documents

Tested on City of Tampere.

Install

sudo apt-get install python-pip python git abiword tesseract-ocr tesseract-ocr-fin wv ghostscript python-imaging python-dev libxml2-dev libxslt1-dev zlib1g-dev libjpeg62 libjpeg62-dev
git clone https://github.com/jensfinnas/ktweb-scraper
cd ktweb-scraper
pip install -r requirements.txt

You will also need to put your Amazon AWS credentials in ~/.aws, as per https://aws.amazon.com/developers/getting-started/python/

Command line usage

To start scraping:

python run.py

To get help:

python run.py --help

Using the scraper as a Python module

Basic initialization.

from modules.site import Site

site = Site("http://ktweb.tampere.fi/ktwebbin/dbisa.dll/ktwebscr/")

Get a list of all available decision-making bodies.

print site.bodies()

Get a list of all upcoming or past (or both) meetings from a given body.

print site.upcoming_meetings("Kaupunginhallitus")
print site.past_meetings("Kaupunginhallitus")
print site.meetings("Kaupunginhallitus")

You can also choose to only get meetings after a specific date.

print site.meetings("Kaupunginhallitus", after_date="2016-06-01")

Meetings have two kind of documents: agendas ("esityslista") and minutes ("pöytäkirja"). You can get those using meeting.agenda() and meeting.minutes(). Or both using meeting.documents()

for meeting in site.meetings("Kaupunginhallitus"):
    for doc in meeting.documents():
    	print doc

Documents can also be downloaded.

doc.download()

By default documents are downloaded to a tmp folder with an autogenerated file name. Override these defaults with:

doc.download(file_name="my_file.pdf", folder="myfolder")

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages