This project has been moved to https://github.com/webg-dev/WebG.backend
This repo houses the WebG backend webservice. The primary purpose of the backend webservice is to scrape and preprocess both the HTML DOM and screenshot of a requested web page and return this data in a format that is suitable for the frontend application.
- Requires Python 3.6+
- Create a virtual environment e.g:
python -m venv venv
- Activate the virtual environment
source venv/bin/activate
(on Linux) - Install requirements:
pip install -r requirements.txt
If you only wish to run and use the webservice then the simplest way is probably to build and run the Docker image:
- Ensure you have Docker installed:
docker --version
- Run
docker build -t webg-backend .
from the root of the repository. - Run
docker run -d --name mycontainer -p 80:80 webg-backend
to start the container. - Visit
http://127.0.0.1/docs
in the browser to view the interactive docs.
If you're going to be making changes to the code then you'll probably want to run the webservice locally:
- Ensure you have firefox installed
sudo apt install firefox
on Linux - Ensure you have xvfb installed if you wish to run the webdriver in a virtual frame buffer
sudo apt-get install xvfb
- From the root of the repository run
uvicorn webservice.main:app --reload
- Visit
http://localhost:8000/docs
in the browser to view the interactive docs.
The webPage resource represents a single scraped web page.
Endpoint: GET /webPage?url=[URL TO SCRAPE]
Example Response:
{
"url":"https://nike.com/shoes/1",
"width":1500,
"height":3000,
"html":"<html>...</html>",
"screenshot":"image/png;base64,wOebnINCNLz3elG1I3g==",
"graph":{
"nodes":[
{
"id":0,
"label":"html",
"attributes":{
"class":"..."
},
"coordinates":{
"left":0,
"right":1500,
"top":0,
"bottom":3000,
"width":1500,
"height":3000
},
"isVisible":true
}
],
"edges":[
{
"from":0,
"to":1
}