web-linter

Find duplicated content of web

Mission

Phase 1

apple heroku and build a new project
add node_modules http-server and ngrok
install heroku-cli (https://devcenter.heroku.com/articles/getting-started-with-nodejs#set-up)
set heroku app config var
combine redux redux kit to heroku

Phase 2

choose parser tool
use node.js and get all links
change to use python
use scrapy or beauifulsoup lib
use beauifulsoup to find duplicated links

Phase 3

build auto shell script
use cron or pm2 to parser hourly
build auto Crawer to go to different webs
let user change url link
use scrapy to scrapy second or third depth link of the homepage

command

// get duplicated link and draw on the homepage
$ python viki.py

// auto run viki and git commit and push result
$ sh autoPublish.sh

Result

We can get a viki homepage and color duplicated link. In order to view the duplicated contents, we use the same color and numebr on the component. The result will automatically produced in /src/resource/viki_20161220_00000.html.

My repo

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
api		api
bin		bin
docs		docs
src		src
static		static
webpack		webpack
.babelrc		.babelrc
.editorconfig		.editorconfig
.eslintignore		.eslintignore
.eslintrc		.eslintrc
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.json		app.json
autoPublish.sh		autoPublish.sh
circle.yml		circle.yml
karma.conf.js		karma.conf.js
package.json		package.json
server.babel.js		server.babel.js
tests.webpack.js		tests.webpack.js

License

YanlongLai/web-linter

Folders and files

Latest commit

History

Repository files navigation

web-linter

Mission

command

Result

My repo

python_web_lint

web-linter

heroku (TODO)

About

Resources

License

Stars

Watchers

Forks

Languages