Skip to content

dalanmiller/Arachnid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#Arachnid

A new type of web crawler

###Problem The problem is that there are no good web crawlers out there. Well there might be but they all cost a lot of money and we think we can do better. There should be a service where people can crawl a website of unlimited amount of pages and get results in a network-esque format that allows them to analyze the network and draw conclusions about that data to make their site more effective or get status information. Including but not limited to, page errors, broken links, rusting redirects, and much more.

###Requirements

  • Able to scan and compile a networkx graph of a domain
  • Able to somewhat intelligently repair links within the domain to get a more complete picture (handles errors well?)
  • Able to output an image of all pages on the website with relevant data. *Colors portray relevant data such as errors, redirects, links to external resources, etc. *Maybe one day in a scannable google map interface?
  • Data able to be put into a simple database
  • Final data able to be viewable in a simple web interface

Specifications

  • Coming soon

Written by Gregg Lamb and Daniel Miller

About

UW PCE Python Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published