Following tasks have been done for the document similarity application(finding inverted index and calculate similarity calculation matrix).
Implemented in Avro uncompressed version.
Implemented in Parquet uncompressed version.
Implement in Snappy compression version (either Parquet/Avro)
Compared file sizes and execution times for all file formats to understand the impact of storage size on execution time.