Information Retreival System using SPIMI, document as an index compression and BM25 ranking for boolean retrival.
Developing an indexing component for an Information Retrieval system using SPIMI (Single-Pass-In-Memory Indexing) algorithm. Unlike BSBI (Block Sort Based Indexing), the SPIMI is more efficient since it is scalable for large dataset by adding postings directly to the posting list. Boolean searching was also implemnetd using for AND, OR , NOT and also combinational queries. Index compression like DOCUMENT AS A STRING APPROACH was applied that resulted in 64KB file reduction. BM25 ranking algorithm for boolean query retrival is also implemented.