CMS Search Engine

Search engine built to enable users to query the various Word and Powerpoint files that are uploaded to the Moodle CMS of BITS Pilani Hyderabad.

Features

Search through the various file contents of the documents on the CMS
Returns the closest matching documents to the query
For each document, you are shown the 5 most similar sentences containing your query words
The index of documents is updated regularly and dynamically - no need to reconstruct it everytime
Backend in MongoDB for persisting the index
Fully documented code, viewable from docs

How to run

Clone this repo / click "Download as Zip" and extract the files.
Rename the sample_config.toml to config.toml and set the required values.
Ensure Python 3.7 is installed, and in your system PATH.
Install pipenv using pip install -U pipenv.
In the project folder, run pipenv install to install all python dependencies.
Download the nltk datasets:
1. Run pipenv run python.
2. >>> nltk.download("stopwords").
3. >>> nltk.download("wordnet").
4. >>> nltk.download("genesis").
[For doc support] Install catdoc to enable extraction from .doc files using apt install catdoc (Ubuntu). If you are on Windows, you can skip processing doc files by removing it from ALLOWED_EXTS in config file.

To generate the index: pipenv run python indexer.py. It will go through all the enrolled courses in your CMS account, and if a new file is encountered, add it to the index after processing it.

To query the index: pipenv run python main.py.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
docs		docs
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
cms_scraper.py		cms_scraper.py
database.py		database.py
extractor.py		extractor.py
indexer.py		indexer.py
main.py		main.py
preprocess.py		preprocess.py
queryprocess.py		queryprocess.py
sample_config.toml		sample_config.toml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

.gitignore

.gitignore

Pipfile

Pipfile

Pipfile.lock

Pipfile.lock

README.md

README.md

cms_scraper.py

cms_scraper.py

database.py

database.py

extractor.py

extractor.py

indexer.py

indexer.py

main.py

main.py

preprocess.py

preprocess.py

queryprocess.py

queryprocess.py

sample_config.toml

sample_config.toml

utils.py

utils.py

Repository files navigation

CMS Search Engine

Features

How to run

About

Releases

Packages

Contributors 2

Languages

iamkroot/cms-search

Folders and files

Latest commit

History

Repository files navigation

CMS Search Engine

Features

How to run

About

Resources

Stars

Watchers

Forks

Languages