Skip to content

EilidhHendry/retrieval-algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

overlap.py: Implemented a simple word overlap retrieval algorithm. For each query Q and for each document D it computes the overlap score between Q and D.

tfidf.py: Implemented a tf.idf retrieval algorithm, based on the weighted sum formula with tf.idf weighting.
Added pseudo relevance feedback, which takes the top n matching documents and adds a selection of the most frequent
words in these documents to the query. 

rank_news.py: Ranks news articles by pairing each article with the most similar previous article. Current algorithm is a brute version that compares every new article with all past articles. Developing a version which uses term-at-a-time execution.

cosine_tfidf.py: Calculates a cosine tfidf score for each document. Takes a file of stored idf values as input on object initialisation. 

About

implementation of overlap and tfidf algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages