Film Runtimes

This project aims to see whether films are shorter when the scriptwriter is also the director. I conjecture that writers who are thinking visually make shorter movies. To answer this question, I collated a list of nearly 200,000 films and then machine learning (specifically a Markov Chain Monte Carlo analysis) was performed to look for differences in runtimes.

A complete write-up of this project is contained in film.pdf, it contains more on the motivation behind this project, details on the analysis and a discussion of the results.

Below is a brief description of each of the files contained in this repository. My CV is also included, it contains my email address should you wish to contact me about this project.

Files

film.pdf is a report detailing the hypothesis, the steps taken to get the data and a detailed look at the model used in the analysis. Finally there is a discussion of the results.

filmObtainItem.py defines a class that looks up an entry on imdb.com and scrapes data about a movie such as the title and runtime

filmObtainDataset.py uses the class in filmObtainItem.py to get data on a large number of films and write them to the file film_data.txt. If there is a problem accessing the imdb.com page (such as by a server timeout) then that is logged in the file film_fail.txt

filmWrangle.py cleanes and prepares the data for analysis, producing the file film_wrangled.csv

filmModel.py defines a model for the data, defining a global average and deviations from that average for each country, language and genre.

filmMCMC.py performs a Markov Chain Monte Carlo analysis using the model defined in film.Model.py to find the values of the model's parameters. The results are written to results.csv

filmPlot.py plots the results of the MCMC and performs a gaussian process regression to find the "Slow trend"

film_data.txt is the data on all the movies that were successfully scraped by filmObtainDataset.py

film_fail.txt is a log of all the times that the web scraper failed, perhaps due to a server timeout

film_wrangled.csv is the data cleanly presented after wrangling

categories.txt is a list of all the countries, languages and genres present in the data

results.csv are the results of the MCMC and contain the best fit and confidence intervals for the model parameters

plots/ contains the plots of the film runtimes. The file Global.pdf shows the overall trend in film runtimes for all movies. Then each country file shows how the films from that country have deviated from that trend. They are presented as a comparison between films in which the writer was also the director (labelled "Same")and those where that wasn't the case (labbeled "Different").

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
plots		plots
.gitignore		.gitignore
CV - Blackadder.pdf		CV - Blackadder.pdf
README.md		README.md
categories.txt		categories.txt
film.pdf		film.pdf
filmMCMC.py		filmMCMC.py
filmModel.py		filmModel.py
filmObtainDataset.py		filmObtainDataset.py
filmObtainItem.py		filmObtainItem.py
filmPlot.py		filmPlot.py
filmWrangle.py		filmWrangle.py
film_data.txt		film_data.txt
film_fail.txt		film_fail.txt
film_wrangled.csv.zip		film_wrangled.csv.zip
results.csv		results.csv

igblackadder/Film_Project

Folders and files

Latest commit

History

Repository files navigation

Film Runtimes

Files

About

Resources

Stars

Watchers

Forks

Languages