Skip to content

Developing an indexing component for an Information Retrieval system using SPIMI (Single-Pass-In-Memory Indexing) algorithm. Unlike BSBI (Block Sort Based Indexing), the SPIMI is more efficient since it is scalable for large dataset by adding postings directly to the posting list.

Roopika-G/SPIMI-Indexing

Repository files navigation

Information Retreival System using SPIMI, document as an index compression and BM25 ranking for boolean retrival.

Developing an indexing component for an Information Retrieval system using SPIMI (Single-Pass-In-Memory Indexing) algorithm. Unlike BSBI (Block Sort Based Indexing), the SPIMI is more efficient since it is scalable for large dataset by adding postings directly to the posting list. Boolean searching was also implemnetd using for AND, OR , NOT and also combinational queries. Index compression like DOCUMENT AS A STRING APPROACH was applied that resulted in 64KB file reduction. BM25 ranking algorithm for boolean query retrival is also implemented.

About

Developing an indexing component for an Information Retrieval system using SPIMI (Single-Pass-In-Memory Indexing) algorithm. Unlike BSBI (Block Sort Based Indexing), the SPIMI is more efficient since it is scalable for large dataset by adding postings directly to the posting list.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages