- Base on MASASHI Shibata's project
- Web Crawler
- Use pyltp to segment Chinese word
- MongoDB as storage
- Flask as web framework
- Python 2.7
- pip
-
Clone repository
$ git clone git@github.com:scorpio147wbh/information-retrieval-experiment.git
-
Download LTP Chinese word segment model from here
-
Install python packages
$ cd information-retrieval-experiment $ pip install -r requirements.txt
-
MongoDB settings
Please rewrite MONGO_URL in config.py
-
LTP settings
Please rewrite CWS_MODEL_PATH in config.py
-
Run
$ python run-crawler.py http://nlp.stanford.edu/courses/NAACL2013/ # build a index $ python run-webapp.py # access to http://127.0.0.1:5000