Skip to content

SaulTapia/NewsScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NewsScraper

Scraper for news sites made in spanish.

When pipeline.py is run, it'll create a .db file for each website and save it in the Databases folder.

Currently supports:

  • El universal
  • El pais
  • Cnnespanol
  • Pagina12
  • Milenio

Libraries needed:

  • Pandas
  • hashlib
  • nltk
  • sqlalchemy
  • requests
  • lxml
  • pyyaml

About

News scaper for different websites, mostly focused in spanish.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages