Skip to content

rajibmitra/AlmabaseNewsScrapper

 
 

Repository files navigation

#News Scrapper

This is a newscrapper built in scrapy and django.

To start first set the following:

  1. pip install -r requirements.txt

  2. Go to Almabase/Almabase/settings.py and set your mysql username and password

  3. Run build.sh

Build.sh will run the article scraping and modifying. Once done, it will terminate. This script can be put as a cron job.

You can check out the articles by running the Django server in Almabase as python manage.py runserver and navigating to BASEURL/index/showcollege

##Architecture

FeedParser(scrape RSS) -> Scrapy(Scrape webpages) -> Newspaper(Parse html) -> NaiveBayesClassifier(to classify the article)

About

New Scraper for almabase

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 37.6%
  • CSS 25.3%
  • Python 21.8%
  • HTML 15.2%
  • Shell 0.1%