Skip to content
This repository has been archived by the owner on Jun 11, 2022. It is now read-only.

geekmoss/Selenium-crawl-and-download-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Demo

Terminal 1:

python3 -m http.server 8000 --bind 127.0.0.1 -d ./demo_server_files

For run test server.

Terminal 2:

python demo.py "http://localhost:8000/" | tee downloaded.list | xargs wget -q -P ./downloaded

# OR

echo "http://localhost:8000/" | python demo.py --urls - | tee downloaded.list | xargs wget -q -P ./downloaded

# OR

cat crawl.list
# Output:
# http://localhost:8000

python demo.py --urls crawl.list | tee downloaded.list | xargs wget -q -P ./downloaded

Explanation:

  • tee for save new urls for future download, use for breakpoints and can be used for skip downloeded files.
  • xargs run wget for each line from pipe
  • wget -q -P ./downloaded for download url. -q for no output, -P for download files into ./downloaded directory.

About

Selenium demo for crawl pages and download content.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published