Skip to content

a-hel/assignment_cgi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assignment

Introduction

This is a submission to CGI's recruitment assignment. It's main focus is on simplicity. For example, I chose to write out the data to a file instead of a database. Before pushing it into production, it is advisable to replace the file writer with a database writer, for example using asyncpg or aiomysqldrivers for even more asyncio fun.

The test coverage is not very high and the tests may contain some bugs, but it should still show my underlying decisions and choices.

2019 Andreas Helfenstein

Installation

Requires at least Python 3.7

pip install -r requirements.txt

To test the installation, run

python main.py

To run the test suite, you also need pytest-asyncio and bottle.

Then, run

pytest

Limitations

The script is designed to retrieve content in HTML; it is not suitable to retrieve other resources such as json, gzip etc.

Usage

Function main(urllist, max_coros=10)

Match the content of a list of urls with corresponding regular expressions and write the result into a file (output.csv).

Arguments:

  • urllist (iterable of iterables): Pairs of (url, regexp)
  • max_coros (int, default=10): Maximum number of co-routines to be run simultaneously.

Returns:

None

Example:

main([['https://www.hs.fi', 'Helsinki'],['https://www.is.fi', '"^Foo.*bar$"']])

Remarks:

This function runs an infinite loop. To interrupt it, use Ctrl+C.

The urllist parameter does not need to be a list, it can be any iterable, e.g. a generator expression that retrieves data periodically from a database or another external source and pipes it into the function.

2019 Andreas Helfenstein

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages