Skip to content

Vivian-Ellis/Podcasts-Ad-Hoc-Retrieval

Repository files navigation

Podcasts

Ad-Hoc Retrieval for Podcasts Segments

Podcasts are a rapidly growing medium, but Spotify’s current search functionality is limited in that it does not allow a search within the actual contents of an episode. The implemented recommendation system enhances the search functionality within Spotify and allows users to find a jump-in point for relevant podcast episodes. A query is enriched by finding the subject, object, and named entities that are expanded by knowledge graphs. Latent Dirichlet Allocation (LDA) is used to tag all transcripts and queries in a finite number of topics. The overall coherence score of the LDA model is 0.537. Next, a Vector Space Model (VSM) is implemented to rank and retrieve relevant transcripts. Normalized discounted cumulative gain (nDCG) is the metric used to evaluate the segments relevance. Of the 8 test queries 5 had a nDCG over .73 and 3 queries had a nDCG below .73. On average the recommendation engine takes 48.623 seconds to search 25,000 podcasts and return the top-10 relevant search results.

About

Ad-Hoc Retrieval for Podcasts Segments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages