ziy212/HeadlessBrowser
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Steps to run the service: 1. start mongodb, create db: webcontents 2. prepare.sh #install prerequsites, not a complete list 3. nodemon db_server.js #mongo db service (port 4040) 4. python phantom_manager.py log_dir phanton_worker.js_path # this manager starts a server (port 8082) to receive web contents crawl/execution tasks Steps to train a website: 1. modify spider.py to give a domain name 2. scrapy runspider spider.py > evaluation/tmp/doman.txt 3. processURL.py evaluation/tmp/doman.txt evaluation/urls/ 5. python send_get_req.py evaluation/urls/domain_train.txt python send_get_req.py evaluation/urls/domain_test.txt ============== 6. python processScript.py evaluation/urls/domain_train.txt # get script python processScript.py evaluation/urls/domain_test.txt # get script 8. train: mkdir tmp/domain; python template.py evaluation/urls/domain_train.txt tmp/domain 9. test: python handler.py domain evaluation/urls/domain_test.txt =======CHECK SCRIPT================ 1. modify spider.py to give a domain name 2. scrapy runspider spider.py > evaluation/tmp/`doman`.txt #crawl URL 3. python processURL.py evaluation/tmp/`doman`.txt evaluation/urls/ #process URL list 4. python send_get_req.py evaluation/urls/`domain`_train.txt # render those URLs in headless browser # request will be sent to phantom_manager to execute, wait it to finish # screen -r manager # check if there is still task running "no working process ..." # ctrl + A; D; to leave screen session 5. python processScript.py evaluation/urls/`domain`_train.txt #extract script from pages 6. mkdir ./tmp/`domain`; python template.py evaluation/urls/`domain`_train.txt ./tmp/`domain` 7. read the contents of ./tmp/`domain`/debug An example is at ./tmp/pawprintpets/debug and its analysis is at ./tmp/pawprintpets/analysis format: [number: key --EXAMPLE-- exmpale] Some commands: notes: ps aux | grep "python" | grep -v "grep"| awk '{print $2}' | xargs kill -9 scrapy runspider spider.py db.trees.createIndex({domain:1, key:1},{unique:true})
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published