Skip to content


Repository files navigation

Hacker News scraper

This is a scraper for the Hacker News site.

Getting Started

Clone the project in any path you wish

git clone

or Download the zip and extract the files.

Open the currently downloaded directory.


In order to run the project without installing it this is the dependancies list:

  1. libc-dev
  2. gcc
  3. libxslt-dev
  4. libxml2-dev
  5. python-dev
  6. pip
  7. lxml
  8. pyquery
  9. rfc3986


Install all dependancise by running with root permits.


Do this operation only if you are running the application without a container.


If you have made all the previous installation needed to run the project without a container now you are able to do.

./hackernews --posts n

Where n is the number of the lasts post you want to show in json format as STDOUT.

If you want to make the project available to every user open the directory that has been previously downloaded.


Root priviledge are required.

Then you will be able to do

hackernews --posts n 

Where n is the number of the lasts post you want to show in json format as STDOUT.

If you want to install into the container Docker is require and as well Root priviledges are required.

Go into the installing dir

cd dockerbuild

Build the container

docker build -t hackernews .

Run it and also the image as well

docker run --name hackernews -ti hackernews

Now inside the container you are allowed to run

hackernews --posts n 

Where n is the number of the lasts post you want to show in json format as STDOUT.


In order to test the project before running it run the following command inside the project folder

python -m unittest discover -s hackernews_tests/ -p "*" -v


Using Root priviledges.

In order to unistall the program from your machine you can run


Instead if you have used the installation inside the container mode.

Using the root profile and priviledges.

docker rm hackernews ; docker rmi hackernews; docker rmi alpine

Library used

In this project I've used the Pyquery library to easy access the dom in the jquery fashon.

And I've use the rfc3986 to be really sure that the url passed fitted the standard asked in the assignment.

Os used

I've designed and tested the system on Lubuntu 16.04 LTS and the OS chosen for the container is Linux Alpine due to its lightweight.


No description, website, or topics provided.






No releases published


No packages published