Deuxieme remise pour le projet de INF8007.
Ce script va rechercher et verifier le status de tous les liens presents sur la page du lien fournis. Ensuite, ce dernier parcourera tous les lien valides present sur le meme domaine afin d'extraire de nouveaux liens et verifier leurs status.
Afin de rouler les scripts, référez-vous a la section "Usage". Le script bash clone le repo git spécifier, lance le serveur au port local spécifié et roule main.py sur ce port local. Une fois roulé, le rapport final (en json) se trouvera dans le fichier 'link_status_report.json' a la racine du dossier d'execution.
Le format du rapport est comme suit:
{
"https://www.baseurl.com": [["https://www.baseurl.com/somepage.html", 200],["https://www.external.com/somepage.html", 404]],
"https://www.baseurl.com/somepage.html": [["https://www.baseurl.com", 200],["https://www.external2.com", 404]]
}
Vous devez utiliser python 3 et vous devez installer la librairie requests
pip install requests
Usage examples:
python3 main.py url https://spacejam.com
python3 main.py url https://spacejam.com crawling false
python3 main.py file /path/to/file.html
python3 main.py stdin html|filelist|urllist:
echo [\"https://spacejam.com\"] | python main.py stdin urllist
echo [\"./tests/spacejam.html\"] | python main.py stdin filelist
python main.py stdin html < ./tests/spacejam.html
----------------------------------------------
Parameter details:
help -> shows this message
url -> string url
file -> string path to a file
stdin -> accepted values are:
html -> will tell the program to expect html in the stdin
filelist -> will tell the program expect a JSON array of file paths in the stdin
urllist -> will tell the program expect a JSON array of urls in the stdin
!!!! NOTE: url, file, and stdin are mutually exclusive. You must only use one of them.
crawling -> true (default) or false. (can be capitalized) To be used whenever the program reads URLs
For mac and linux:
bash.sh [-u] git_url [-p] port. You should execute this script within the project directory.
For windows (car windows utilise different "line-ending"):
bash-win.sh [-u] git_url [-p] port. You should execute this script within the project directory.
Warning: bash script will call python using the 'python3' command. If your system uses an other command to call python 3 you need to change it in the script.
If you still have issues with your line endings (you get this error:)
Usage: bash-win.sh [-u] git_url [-p] port. You should execute this script within the project directory.
bash-win.sh: line 2: $'\r': command not found
bash-win.sh: line 4: syntax error near unexpected token `$'in\r''
bash-win.sh: line 4: ` case ${opt} in
Then run:
sed -i 's/\r//' bash-win.sh
Nous utilisons le linter qui vient avec pycharm
Olivier Brochu Dufour Oceane Destras