forked from propublica/Capitol-Words
notthatbreezy/Capitol-Words
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
useful info goes here Requirements * json or simplejson * beautifulsoup verion 3.0 series (it MUST be 3.0 series, not 3.1) http://www.crummy.com/software/BeautifulSoup/download/3.x/BeautifulSoup-3.0.8.1.tar.gz * solr * sunlightlabs API key Setup: * cp settings.example.py settings.py * create symlinks to settings.py from each of solr/, scraper/ and parser/ * tell solr where to find the schema file. eg, if using running the dev * environment in apache-solr-1.4.1/example/, it will uses schema.xml in the * directory /apache-solr-1.4.1/example/solr/conf. same is true for the * stopwords file. so set up symlinks to he real things, optionally backing up * the originals as .example. cd apache-solr-1.4.1/example/solr/conf mv schema.{,example.}xml mv stopwords.{,example.}txt ln -s /home/cwod/capitolwords/src/solr/schema.xml schema.xml ln -s /home/cwod/capitolwords/src/solr/stopwords.txt stopwords.txt Startup * start up solr. in a dev environment this looks like: cd $SOLR_DIR/example java -jar start.jar (uses jetty)
About
Scraping, parsing and indexing the daily Congressional Record to support phrase search over time, and by legislator and date
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published