Search_Engine

To use the the program simply run main.py. Make sure to set index_corpus and query_engine variables to True/False to determine the functionality you desire: index_corpus - Build the engine from scratch running on the entire corpus query_engine - Allows the user to manually enter queries to run on the built engine

WHEN RUNNING LOCALLY MAKE SURE TO COMMENT OUT THE CORRECT LINE IN search_engine.main() (THE FILE SHOULD BE FOUND IN THE MAIN PROJECT'S FOLDER) :

glove_dict = GloveStrategy("glove.twitter.27B.25d.txt").embeddings_dict - LOCAL RUN OR glove_dict = GloveStrategy("../../../../glove.twitter.27B.25d.txt").embeddings_dict - SUBMISSION SYS RUN

Configuration info:

-ConfigClass(corpus_path, number_of_term_buckets=term_buckets, number_of_entities_buckets=entities_buckets) will set the number of number of term and entities buckets

-for stemming just change main(stemming=True) the default is stemming=False -main(corpus_path=corpus_path) with the corpus path on your PC -main(output_path=output_path) to set the output path on your PC

postings_handler.py: -adjust the max size of the all buckets before flush just change MAX_SIZE in posting_handler.py -adjust the threshold of the buckets who will flush just change THRESHOLD in posting_handler.py

ranker.py: -adjust the weight of each ranking measure of Glove just change GLOVE_WEIGHT in ranker.py -adjust the weight of each ranking measure of BM25 just change BM25_WEIGHT in ranker.py -adjust the weight of each ranking measure of referral-rank just change REFERRAL_WEIGHT in ranker.py -adjust the weight of each ranking measure of time-rank just change RELEVANCE_WEIGHT in ranker.py

Important info:

-run_engine method we initialize ConfigClass we to have 26 terms buckets and 2 entities buckets -only when finish_indexing is called all the referrals are insert into the inverted index

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.idea		.idea
__pycache__		__pycache__
venv		venv
.gitignore		.gitignore
GUI.py		GUI.py
README.md		README.md
bucket.py		bucket.py
configuration.py		configuration.py
document.py		document.py
glove.py		glove.py
indexer.py		indexer.py
install_conda.bat		install_conda.bat
install_conda.sh		install_conda.sh
instructions.txt		instructions.txt
main.py		main.py
metrics.py		metrics.py
output_queries_results.py		output_queries_results.py
parser_module.py		parser_module.py
postings_handler.py		postings_handler.py
python_install.sh		python_install.sh
ranker.py		ranker.py
reader.py		reader.py
requirements.txt		requirements.txt
run.bat		run.bat
run.sh		run.sh
sample.parquet		sample.parquet
search_engine.py		search_engine.py
searcher.py		searcher.py
setup.py		setup.py
stemmer.py		stemmer.py
tweet_tag.py		tweet_tag.py
tweet_vectors_handler.py		tweet_vectors_handler.py
utils.py		utils.py

eilamgal/Search_Engine

Folders and files

Latest commit

History

Repository files navigation

Search_Engine

Configuration info:

Important info:

About

Resources

Stars

Watchers

Forks

Languages