This is a scraper for the Hacker News site.
Clone the project in any path you wish
git clone https://github.com/LucaPaterlini/TL_LucaPaterlini.git
or Download the zip and extract the files.
Open the currently downloaded directory.
In order to run the project without installing it this is the dependancies list:
- libc-dev
- gcc
- libxslt-dev
- libxml2-dev
- python-dev
- pip
- lxml
- pyquery
- rfc3986
Install all dependancise by running with root permits.
./setup_dependancies.sh
Do this operation only if you are running the application without a container.
If you have made all the previous installation needed to run the project without a container now you are able to do.
./hackernews --posts n
Where n is the number of the lasts post you want to show in json format as STDOUT.
If you want to make the project available to every user open the directory that has been previously downloaded.
./setup.sh
Root priviledge are required.
Then you will be able to do
hackernews --posts n
Where n is the number of the lasts post you want to show in json format as STDOUT.
If you want to install into the container Docker is require and as well Root priviledges are required.
Go into the installing dir
cd dockerbuild
Build the container
docker build -t hackernews .
Run it and also the image as well
docker run --name hackernews -ti hackernews
Now inside the container you are allowed to run
hackernews --posts n
Where n is the number of the lasts post you want to show in json format as STDOUT.
In order to test the project before running it run the following command inside the project folder
python -m unittest discover -s hackernews_tests/ -p "*test.py" -v
Using Root priviledges.
In order to unistall the program from your machine you can run
./purge.sh
Instead if you have used the installation inside the container mode.
Using the root profile and priviledges.
docker rm hackernews ; docker rmi hackernews; docker rmi alpine
In this project I've used the Pyquery library to easy access the dom in the jquery fashon.
And I've use the rfc3986 to be really sure that the url passed fitted the standard asked in the assignment.
I've designed and tested the system on Lubuntu 16.04 LTS and the OS chosen for the container is Linux Alpine due to its lightweight.