Dependencies

indri
NLTK
Numpy

Usage

make Search in cpp directory
run query.py to get the search result
assemble a task an run it through nugget_finder.py

query.py

command
- python hg/query.py --search-from-parsed-query index-path search-exe-file query-str passage-length passage-step result-num
- python hg/query.py --search index-path search-exe-file query-str passage-length passage-step result-num (requires NLTK and CCLParser server running on local host, port 8852)
output
1. index info
2. query info
3. docno score rank
4. passage-content
5. term/phrase character positions;

nugget_finder.py

python hg/nugget_finder.py query ... (requires NLTK and CCLParser server running on local host, port 8852)
output
1. nugget
2. rank
3. score
4. evidence (one document id per line, space, score and url if given)
5. empty line

HTMLs file format: path to one html file, one per line. An optional URL can be added after a tab

INI file format: key value pairs, divided by '='. Each key value pair is trimmed. Lines starting with '#' are ignored.

Keys: tmp_folder, folder to store temporary files, required. Using full path is advisable. index_config_template, see provided indexing.template, defaults to that filename in the current folder index_command, path to IndriBuildIndex, defaults to IndriBuildIndex (in the PATH) search_command, path to Search, defaults to ./cpp/Search main_search_passage_count, number of passages to fetch for main search, defaults to 3 evidence_search_passage_count, number of passages to fetch for main search, defaults to 10

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
cpp		cpp
docs		docs
hg		hg
wikipedia-process		wikipedia-process
.gitignore		.gitignore
indexing.template		indexing.template
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpp

cpp

docs

docs

hg

hg

wikipedia-process

wikipedia-process

.gitignore

.gitignore

indexing.template

indexing.template

readme.md

readme.md

requirements.txt

requirements.txt

Repository files navigation

Dependencies

Usage

query.py

nugget_finder.py

About

Releases

Packages

Big-Data/hunter-gatherer

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Usage

query.py

nugget_finder.py

About

Resources

Stars

Watchers

Forks