GitHub - henrysdev/QueryEngine: Local search engine that utilizes k-means clustering to determine relevance.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src		src
README		README

Repository files navigation

Henry Warren 2018

[OVERVIEW]
 This program has the following execution flow:
    1. Reads in and parses CSV data from provided data file 
    (term-frequency.csv is included in the project files).

    2. Creates leader/follower clusters and prints the pairs and their respective distances out to console.

    3. Query Engine user input loop begins. You will be prompted to enter a search query.


[DEPENDENCIES]
 - Python 3
 - numpy


[SETUP]
 This project is written in Python 3. The only external library it uses is numpy, 
 which can be downloaded and installed via a package manager such as pip if you do not 
 already have it installed. This should do the trick if you do not have numpy already installed:

    $ pip3 install numpy


[RUN_INSTRUCTIONS]
 - Navigate into the project directory.

     $ cd QueryEngine/src/

 - Start the program (note that the second argument is the name of the data csv that was 
   generated by the web crawler. This is included in the src/ folder)

     $ python3 query_engine.py term-frequency.csv


[PROJECT_STRUCTURE]
 QueryEngine/
 |--> README
 |-- src/
     |--> database.py # holds document and term data from the csv
     |--> document.py # represents a document object
     |--> query_engine.py # main driving class for program
     |--> similarity.py # math and matrix calculations
     |--> term-frequency.csv # web crawler output file