Skip to content

cmoiccool/newsblaster

 
 

Repository files navigation

NewsBlaster

Newsblaster is a system that helps users find the news that is of the most interest to them. The system automatically collects, clusters, categorizes, and summarizes news from several sites on the web (CNN, Reuters, Fox News, etc.) on a daily basis.

This is the group space being used to improve and further develop Columbia's NewsBlaster system.


NewsBlaster at Columbia University Project

Installing

  1. Clone the repository

    git clone https://github.com/kedz/newsblaster.git

  2. Go to the NewsBlaster directory

    cd newsblaster/

  3. Execute the install script

    ./install.sh

This will install all dependencies and required files in your home directory by default. To override this please set NB_HOME . Example export NB_HOME=/tmp/newsblaster

Running

  1. Start NewsBlaster

    ./newsblaster.sh start

  2. Check for news articles . See Current Usage for details

Crawls are currently configured to run on a 30 minutes to 3 hours schedule for some spiders. As a result you will not have articles until at least a minimum of 30 minutes. This can be configured by changing the Celery schedule.

  1. Stopping NewsBlaster

./newsblaster stop

Current Usage

Documentation will be updated and changed as we continue to improve and build out NewsBlaster

You are currently able to query and retrieve articles based on a variety of meta information. Please see our iPython Notebook

All article have the following attributes associated with them.

JSON structure of each article

Papers

About

Group workspace for improvements to the Columbia Newsblaster system.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 44.8%
  • Java 39.5%
  • JavaScript 7.4%
  • Shell 6.0%
  • CSS 1.2%
  • HTML 1.1%