Skip to content

AlexMndzF/project-pipelines

 
 

Repository files navigation

Project Pipelines

Choose a movie genre, get a top recommendation


Overview

The goal of this project is for me to practice what I have learned in the Intermediate Python and Data Engineering chapter of the Ironhack program. For this project, I start with The Movies Dataset and web scraping Rotten Tomatos Top Movies. I import it and use my newly-acquired skills to build a data pipeline that processes the data and produces a result. I try demonstrate my proficiency with the tools we covered (functions, list comprehensions, string operations and web scraping) in my pipeline.


Project Structure

The project folder is structured in the following way:

  • main.py : that contains the code for my data pipeline.

  • INPUT : Folder where the dataset should be placed in csv format.

  • OUTPUT : Folder that contains the cleaned datasets and the output of my data pipeline.

  • SRC: Images and resources.

  • FUNCTIONS: Folder that contains the files functions.py with all the auxiliar functions used in this project.

1 - Clean and Analysis

  1. I acquire the data from the dataset CSV and the web scrapping.
  2. Clean the data and generate 2 new datasets to work with it

2 - Data Processing

  1. Create the functions explore the datasets with the parameters given.
  2. Returns movie recommendations according to the parameters and metadata from the films.

3 - Start the Query

  1. Run the main.py file and work with the 2 parameters, 'Year' and 'Genre'.
  2. Shows the movie recommendations.

To run the program the user needs to introduce two arguments:

  • A genre: -- or -s
  • Category of fast food company as: --fastfoodtype or -f

About

Project Pipelines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 79.0%
  • Python 21.0%