PERCEPTION

This tool combines various open source tools to give insight into accessibility and performance metrics for a list of URLs. There are several parts that can be understood as such:

This application requires a CSV wth a one column header labeled "Address" and one URL per line (ignore other comma delited data).
A crawl can be also be executed (e.g. currently using a licenced version of ScreamingFrogSEO CLI tools https://www.screamingfrog.co.uk/seo-spider/)
Runs Deque AXE for all URLs and produces both a detailed and summary report (including updating the associated Google Sheet) See: https://pypi.org/project/axe-selenium-python/
Runs Lighthouse CLI for all URLs and produces both a detailed and summary report (including updating the associated Google Sheet) See: https://github.com/GoogleChrome/lighthouse
Runs a PDF audit for all PDF URLs and produces both a detailed and summary report (including updating the associated Google Sheet) - more on this later...

NOTE: At the moment, no database is used due to an initial interest in CSV DATA ONLY . At this point, a database would make more sense and adding a function to "Export to CSV", etc.

Workflow

As mentioned, simply provide a CSV with a list of URLs (column header = "Address") and select the tests to run through the web form.

Once installed, run python app.py

Installation

To get all tests running, the following is required:

Clone and install

sudo apt update

sudo apt install git

sudo apt-get install python3-pip

sudo apt-get install python3-venv

sudo apt-get update

sudo apt-get install software-properties-common

sudo add-apt-repository ppa:deadsnakes/ppa

sudo apt-get install python3.6

git clone https://github.com/soliagha-oc/perception.git

sudo python3 -m venv venv

source venv/bin/activate

pip install -r requirements.txt

python3 app.py

CLI-TOOLS

Install the following CLI tools for your operating system:

chromedriver

Download and install the matching/required chromedriver

https://chromedriver.chromium.org/downloads

Download latest version from official website and upzip it (here for instance, verson 2.29 to ~/Downloads)

wget https://chromedriver.storage.googleapis.com/2.29/chromedriver_linux64.zip

Move to /usr/local/share (or any folder) and make it executable

sudo mv -f ~/Downloads/chromedriver /usr/local/share/

sudo chmod +x /usr/local/share/chromedriver

Create symbolic links

sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver

sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver

OR

export PATH=$PATH:/path-to-extracted-file/

OR

add to .bashrc

geckodriver

Go to the geckodriver releases page. Find the latest version of the driver for your platform and download it. For example: https://github.com/mozilla/geckodriver/releases

wget https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-linux64.tar.gz
Extract the file with:

tar -xvzf geckodriver*
Make it executable:

chmod +x geckodriver
Add the driver to your PATH so other tools can find it:

export PATH=$PATH:/path-to-extracted-file/

OR

add to .bashrc

lighthouse

Install node

curl -sL https://deb.nodesource.com/setup_14.x | sudo -E bash -

sudo apt-get install -y nodejs

Install npm

npm install npm@latest -g

sudo npm install npm@latest -g

Install lighthouse

npm install -g lighthouse

sudo npm install -g lighthouse

pdfimages

https://www.xpdfreader.com/download.html

To install this binary package:

Copy the executables (pdfimages, xpdf, pdftotext, etc.) to to /usr/local/bin.
Copy the man pages (*.1 and *.5) to /usr/local/man/man1 and /usr/local/man/man5.
Copy the sample-xpdfrc file to /usr/local/etc/xpdfrc. You'll probably want to edit its contents (as distributed, everything is commented out) -- see xpdfrc(5) for details.

nginx (optional)

See: https://www.nginx.com/

ScreamingFrog SEO

See: https://www.screamingfrog.co.uk/seo-spider/user-guide/general/#commandlineoptions

ScreamingFrog SEO CLI tools provide the following data sets:

crawl_overview.csv (used to create report DASHBOARD)
external_all.csv
external_html.csv (used to audit external URLs)
external_pdf.csv (used to audit external PDFs)
h1_all.csv
images_missing_alt_text.csv
internal_all.csv
internal_flash.csv
internal_html.csv (used to audit internal URLs)
internal_other.csv
internal_pdf.csv (used to audit internal PDFs)
internal_unknown.csv
page_titles_all.csv
page_titles_duplicate.csv
page_titles_missing.csv

Note: There are spider config files located in the /conf folder. You will require a licence to alter the configurations.

Note: If a licence is not available, simply provide a CSV where at least one column has the header "address". See RCMP example.

Deque AXE

Installed via pip install -r .\requirements.txt

See: https://pypi.org/project/axe-selenium-python/ and https://github.com/dequelabs/axe-core

Google Lighthouse

See: https://github.com/GoogleChrome/lighthouse

Google APIs

Authentication

While there is a /reports/ dashboard, the system is enabled to write to a Google Sheets. To do this, set up credentials for Google API authentication here: https://console.developers.google.com/apis/credentials to get a valid "credentials.json" file.

Template

To facilitate branding and other report metrics, a "non-coder/sheet formula template" is used. Here is a sample template:

Cautions

Spider, scanning, and viruses

It is possible when crawling and scanning sites to encounter various security risks. Please be sure to have a virus scanner enabled to protect against JavaScript and other attacks or disable JavaScript in the configuration.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
CLI-TOOLS-Instructions		CLI-TOOLS-Instructions
REPORTS/RCMP		REPORTS/RCMP
conf		conf
static		static
templates		templates
venv		venv
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
commander.py		commander.py
credentials.json		credentials.json
example_urls.csv		example_urls.csv
gdrive.py		gdrive.py
geckodriver.log		geckodriver.log
globals.py		globals.py
pdf_audit.py		pdf_audit.py
report.py		report.py
requirements.txt		requirements.txt
token.pickle		token.pickle
utils.py		utils.py

License

marctjones/perception

Folders and files

Latest commit

History

Repository files navigation