- docker run -p 8050:8050 --memory=4.5G --restart=always scrapinghub/splash --disable-private-mode --max-timeout 3600 --maxrss 4000
设置 proxy https://juejin.cn/post/6844904160727400455
https://ip.jiangxianli.com/?page=1
https://www.scrapehero.com/how-to-rotate-proxies-and-ip-addresses-using-python-3/
- 百度百科抓不到 赞数据
- Splash with dynamic page https://stackoverflow.com/questions/51483008/scrapy-splash-not-rendering-dynamic-content-from-a-certain-react-driven-site
sh start_spider.sh sougou_spider 试管婴儿
知乎 不需要 --disable-private-mode 百度百科需要 --disable-private-mode
运行runner_spider.py: python3 runner_spider.py
[x] sougou_spider [x] lamaquan_spider [x] ask120_spider [x] babytree_spider [x] baidu_baike [x] baidu_zhidao [x] bozhong_spider [x] chaonei [x] fh21 [x] haodaifu [x] icheruby [x] jianshu [x] shiguanzhijia [x] tm51 [x] so39 [x] so99 [x] sougou [x] yunivf [ ] zhihu
带搜索的 ask120 babytree bozhong fh21 haodaifu jianshu shiguanzhijia so39 so99 sougou tm51 yunivf zhihu
baidu_baike baidu_zhidao
没有搜索的
chaonei icheruby lamaquan
- 在 --disable-private-mode 模式下运行 sh run_baidu_task.sh
- 其他两台机子 sh run_task1.sh sh run_task2.sh