$ python main.py --query {query}
$ python main.py --query="Trump Biden Taiwan China"
After the query,there will be five outputs:
- Term Frequency (TF) Weighting + Cosine Similarity
- Term Frequency (TF) Weighting + Euclidean Distance
- TF-IDF Weighting + Cosine Similarity
- TF-IDF Weighting + Euclidean Distance
- Relevance Feedback + TF-IDF Weighting + Cosine Similarity
-
EnglishNews:
collection of english news
-
English.stop:
collection of english stop words
-
main.py:
main execution file
-
Parser.py:
clean,remove stop words,tokenise the documents
-
PorterStemmer.py:
the Porter stemming algorithm, ported to Python from the version coded up in ANSI C by the author.
-
util.py:
utilities like tf, idf weighting, and cosine similarity and distance function
-
VectorSpace.py:
class of vectorSpaceModel