Skip to content
This repository has been archived by the owner on Mar 23, 2021. It is now read-only.
/ MiniGoogle Public archive

🔎 A full-fledged search engine that crawls, indexes, perform analysis, and searches through the papaers listed on research gate website. It comes with a Flask webserver and a light-weight UI as well. It was a course project.

Notifications You must be signed in to change notification settings

MJafarMashhadi/MiniGoogle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google(mini)

3rd Project of Modern Information Retrieval course.

Crawls research gate and indexes papers in the site. Clusters papers, authors and calculates rank for papers based on their citation and references.

Crawler is written from scratch. Indexing and retrieval is done with elastic search 2.1, web interface is powered by flask and bootstrap, numpy helps a lot in performing ranking and clustering calculations.

How to use

Install requirements from requirements.txt file. Creating a python virtual environment is a really good idea.

pip install -r requirements.txt
python ui/ui.py  # requires python3.4 or higher

And open http://127.0.0.1:5000/admin/ in your browser. Crawl, calculate page ranks, perform clustering and finally add documents to index. Now your mini version of google can be used. Just point your browser to http://127.0.0.1:5000/search.

Note: You should setup elastic search before adding documents to index. For more information read here

Contributors

About

🔎 A full-fledged search engine that crawls, indexes, perform analysis, and searches through the papaers listed on research gate website. It comes with a Flask webserver and a light-weight UI as well. It was a course project.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published