Skip to content

marctjones/perception

 
 

Repository files navigation

PERCEPTION

This tool combines various open source tools to give insight into accessibility and performance metrics for a list of URLs. There are several parts that can be understood as such:

  • This application requires a CSV wth a one column header labeled "Address" and one URL per line (ignore other comma delited data).
  • A crawl can be also be executed (e.g. currently using a licenced version of ScreamingFrogSEO CLI tools https://www.screamingfrog.co.uk/seo-spider/)
  • Runs Deque AXE for all URLs and produces both a detailed and summary report (including updating the associated Google Sheet) See: https://pypi.org/project/axe-selenium-python/
  • Runs Lighthouse CLI for all URLs and produces both a detailed and summary report (including updating the associated Google Sheet) See: https://github.com/GoogleChrome/lighthouse
  • Runs a PDF audit for all PDF URLs and produces both a detailed and summary report (including updating the associated Google Sheet) - more on this later...
NOTE: At the moment, no database is used due to an initial interest in CSV DATA ONLY . At this point, a database would make more sense and adding a function to "Export to CSV", etc.

Workflow

As mentioned, simply provide a CSV with a list of URLs (column header = "Address") and select the tests to run through the web form.

Once installed, run python app.py

Installation

To get all tests running, the following is required:

Clone and install

sudo apt update

sudo apt install git

sudo apt-get install python3-pip

sudo apt-get install python3-venv

sudo apt-get update

sudo apt-get install software-properties-common

sudo add-apt-repository ppa:deadsnakes/ppa

sudo apt-get install python3.6

git clone https://github.com/soliagha-oc/perception.git

sudo python3 -m venv venv

source venv/bin/activate

pip install -r requirements.txt

python3 app.py

CLI-TOOLS

Install the following CLI tools for your operating system:

chromedriver

Download and install the matching/required chromedriver

https://chromedriver.chromium.org/downloads

Download latest version from official website and upzip it (here for instance, verson 2.29 to ~/Downloads)

wget https://chromedriver.storage.googleapis.com/2.29/chromedriver_linux64.zip

Move to /usr/local/share (or any folder) and make it executable

sudo mv -f ~/Downloads/chromedriver /usr/local/share/

sudo chmod +x /usr/local/share/chromedriver

Create symbolic links

sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver

sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver

OR

export PATH=$PATH:/path-to-extracted-file/

OR

add to .bashrc

geckodriver

  1. Go to the geckodriver releases page. Find the latest version of the driver for your platform and download it. For example: https://github.com/mozilla/geckodriver/releases

    wget https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-linux64.tar.gz

  2. Extract the file with:

    tar -xvzf geckodriver*

  3. Make it executable:

    chmod +x geckodriver

  4. Add the driver to your PATH so other tools can find it:

    export PATH=$PATH:/path-to-extracted-file/

    OR

    add to .bashrc

lighthouse

Install node

curl -sL https://deb.nodesource.com/setup_14.x | sudo -E bash -

sudo apt-get install -y nodejs

Install npm

npm install npm@latest -g

sudo npm install npm@latest -g

Install lighthouse

npm install -g lighthouse

sudo npm install -g lighthouse

pdfimages

https://www.xpdfreader.com/download.html

To install this binary package:

  1. Copy the executables (pdfimages, xpdf, pdftotext, etc.) to to /usr/local/bin.

  2. Copy the man pages (*.1 and *.5) to /usr/local/man/man1 and /usr/local/man/man5.

  3. Copy the sample-xpdfrc file to /usr/local/etc/xpdfrc. You'll probably want to edit its contents (as distributed, everything is commented out) -- see xpdfrc(5) for details.

nginx (optional)

See: https://www.nginx.com/

ScreamingFrog SEO

See: https://www.screamingfrog.co.uk/seo-spider/user-guide/general/#commandlineoptions

ScreamingFrog SEO CLI tools provide the following data sets:

  • crawl_overview.csv (used to create report DASHBOARD)
  • external_all.csv
  • external_html.csv (used to audit external URLs)
  • external_pdf.csv (used to audit external PDFs)
  • h1_all.csv
  • images_missing_alt_text.csv
  • internal_all.csv
  • internal_flash.csv
  • internal_html.csv (used to audit internal URLs)
  • internal_other.csv
  • internal_pdf.csv (used to audit internal PDFs)
  • internal_unknown.csv
  • page_titles_all.csv
  • page_titles_duplicate.csv
  • page_titles_missing.csv

Note: There are spider config files located in the /conf folder. You will require a licence to alter the configurations.

Note: If a licence is not available, simply provide a CSV where at least one column has the header "address". See RCMP example.

Deque AXE

Installed via pip install -r .\requirements.txt

See: https://pypi.org/project/axe-selenium-python/ and https://github.com/dequelabs/axe-core

Google Lighthouse

See: https://github.com/GoogleChrome/lighthouse

Google APIs

Authentication

While there is a /reports/ dashboard, the system is enabled to write to a Google Sheets. To do this, set up credentials for Google API authentication here: https://console.developers.google.com/apis/credentials to get a valid "credentials.json" file.

Template

To facilitate branding and other report metrics, a "non-coder/sheet formula template" is used. Here is a sample template:

Cautions

Spider, scanning, and viruses

It is possible when crawling and scanning sites to encounter various security risks. Please be sure to have a virus scanner enabled to protect against JavaScript and other attacks or disable JavaScript in the configuration.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Roff 0.3%
  • HTML 0.1%
  • Shell 0.1%
  • C 0.1%
  • CSS 0.0%