- indri
- NLTK
- Numpy
- make Search in cpp directory
- run query.py to get the search result
- assemble a task an run it through nugget_finder.py
- command
- python hg/query.py --search-from-parsed-query index-path search-exe-file query-str passage-length passage-step result-num
- python hg/query.py --search index-path search-exe-file query-str passage-length passage-step result-num (requires NLTK and CCLParser server running on local host, port 8852)
- output
- index info
- query info
- docno score rank
- passage-content
- term/phrase character positions;
- python hg/nugget_finder.py query ... (requires NLTK and CCLParser server running on local host, port 8852)
- output
- nugget
- rank
- score
- evidence (one document id per line, space, score and url if given)
- empty line
HTMLs file format: path to one html file, one per line. An optional URL can be added after a tab
INI file format: key value pairs, divided by '='. Each key value pair is trimmed. Lines starting with '#' are ignored.
Keys: tmp_folder, folder to store temporary files, required. Using full path is advisable. index_config_template, see provided indexing.template, defaults to that filename in the current folder index_command, path to IndriBuildIndex, defaults to IndriBuildIndex (in the PATH) search_command, path to Search, defaults to ./cpp/Search main_search_passage_count, number of passages to fetch for main search, defaults to 3 evidence_search_passage_count, number of passages to fetch for main search, defaults to 10