This is an analaysis of consumer profile data for streaming services, movie and television show ratings, and principal component analysis of sentiment in scripts/ transcripts. It involves data cleaning, summary statistics, deeper trend analysis and data visualization. This is a project executed in Python, using Pandas, Matplotlib, NLTK, PyPDF2, IMDbPY and labMTsimple. Data for this was originally sourced fom MRI-Simmons, Wikipedia, and IMDb.
To run this sample analysis, clone this git, install requirements.txt and run main.py.
This program is extensible to download and proess movie transcripts/ scripts that are not hosted locally. This is constrainted by availability on IMDb and being listed in the spreadsheet/ csv you feed into the beginning of the function.