GitHub - netconstructor/Capitol-Words: Scraping, parsing and indexing the daily Congressional Record to support phrase search over time, and by legislator and date

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 808 Commits
api		api
cwod_site		cwod_site
grammars		grammars
parser		parser
scraper		scraper
solr		solr
tests		tests
.gitignore		.gitignore
__init__.py		__init__.py
capitolwords.py		capitolwords.py
daily_then_weekly_update.sh		daily_then_weekly_update.sh
daily_update.sh		daily_update.sh
monthly_update.sh		monthly_update.sh
parse_and_ingest.py		parse_and_ingest.py
readme		readme
settings.example.py		settings.example.py

Repository files navigation

useful info goes here

Requirements
* json or simplejson
* beautifulsoup verion 3.0 series (it MUST be 3.0 series, not 3.1)
  http://www.crummy.com/software/BeautifulSoup/download/3.x/BeautifulSoup-3.0.8.1.tar.gz
* solr
* sunlightlabs API key

Setup:
* cp settings.example.py settings.py
* create symlinks to settings.py from each of solr/, scraper/ and parser/

* tell solr where to find the schema file. eg, if using running the dev
* environment in apache-solr-1.4.1/example/, it will uses schema.xml in the
* directory /apache-solr-1.4.1/example/solr/conf. same is true for the
* stopwords file. so set up symlinks to he real things, optionally backing up
* the originals as .example. 

cd apache-solr-1.4.1/example/solr/conf
mv schema.{,example.}xml
mv stopwords.{,example.}txt
ln -s /home/cwod/capitolwords/src/solr/schema.xml schema.xml
ln -s /home/cwod/capitolwords/src/solr/stopwords.txt stopwords.txt

Startup
* start up solr. in a dev environment this looks like:
  cd $SOLR_DIR/example
  java -jar start.jar (uses jetty)

About

Scraping, parsing and indexing the daily Congressional Record to support phrase search over time, and by legislator and date

Readme

Activity

1 star

1 watching

0 forks

Report repository

Releases

No releases published

Packages

No packages published

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api

api

cwod_site

cwod_site

grammars

grammars

parser

parser

scraper

scraper

solr

solr

tests

tests

.gitignore

.gitignore

init.py

init.py

capitolwords.py

capitolwords.py

daily_then_weekly_update.sh

daily_then_weekly_update.sh

daily_update.sh

daily_update.sh

monthly_update.sh

monthly_update.sh

parse_and_ingest.py

parse_and_ingest.py

readme

readme

settings.example.py

settings.example.py

Repository files navigation

About

Releases

Packages

netconstructor/Capitol-Words

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks