Skip to content

The project is used to collect the news from mongolian news website https://gogo.mn.

Notifications You must be signed in to change notification settings

flandy2010/spider-for-mongolian-news

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

项目介绍

本项目用于爬取蒙古国新闻网站:gogo新闻网每日更新的新闻。

环境依赖

参数说明

  • --more_news_times 设置需要点击多少次“更多新闻”按钮
  • --threading_num 同时爬取的线程数
  • -o --output_dir 输出文件夹路径
  • -i --ip_list_file 存放可用ip的文件路径
  • -v --visited_url 存放访问过的url的文件路径
  • -u --unvisited_url 存放尚未访问的url的文件路径
  • -r --root_url 存放用于提取新闻url的根网址文件

运行

python main.py -o output_dir -r root_url_file -v visited_url_file

About

The project is used to collect the news from mongolian news website https://gogo.mn.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages