predictimdb

Python tools to scrape data from IDMB and predict movie revenue!

In the data folder, you have raw, unparsed data retrieved by the scraper.

The data is stored as a JSON encoded list of dictionaries. Each movie is a dictionary with the following fields:

      'name' : string, movie name
      'mpaa' : string, MPAA rating
      'link' : string, link to IMDB movie page
      ‘year' : string, release year
      ‘runtime' : string, length of movie in minutes
      ‘revenue' : string, US box office revenue of movie
      'stars' : list of name-URL dictionaries
      'directors' : list of name-URL dictionaries
      'writers' : list of name-URL dictionaries
      'budget' : string, filming budget given by IMDB
      'genres' : list of strings
      'castlist' : list of strings, cast names
      'release' : string, release date
      'countries' : list of strings
      'tagline' : string, brief line about the movie
      'production' : list of name-URL dictionaries
      'userscore' : dictionary with fields ‘score' and ‘count'
      'metascore' : string, metascore as given by IMDB

A name-URL dictionary refers to a dictionary with the fields ‘name' and ‘url’. The ‘name' is simply the name of the entity and the ‘url' is the url to the IMDB page for that entity . The purpose of this is that if we want to get more metadata on any of the entities, we can. NOTE: any of these fields may be empty - check before referencing

To scrape the data yourself, refer to code in the scraping directory: scraper.py contains helper functions that parse HTML, and imdb.py does the actual scraping (iterates through URLs, stores data, etc)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
R_code		R_code
analysis		analysis
data		data
parsing		parsing
scraping		scraping
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R_code

R_code

analysis

analysis

data

data

parsing

parsing

scraping

scraping

README.md

README.md

Repository files navigation

predictimdb

About

Releases

Packages

Languages

rohan5a/predictimdb

Folders and files

Latest commit

History

Repository files navigation

predictimdb

About

Resources

Stars

Watchers

Forks

Languages