GitHub - Erick2206/scikit-test: Contains code and results for RedCarpetUp Assignment

The assignment was divided into two parts:

Calculation of Levenshtein Distance between movie titles in the primary dataset with the movie titles in the secondary dataset. Data was successfully scraped using BeautifulSoup. The output of the Levenshtein Distance is stored in the Results directory. Faced some problems here as the calculation of Levenshtein Distance was computationally very expensive. So, I calculated the distance of the first 10 primary dataset movies with all the secondary dataset movie titles. The code can successfully calculate for all the movies in the primary dataset given more time to compute.
Using machine learning algorithms to find meaningful insights into the data. I tried to predict the Revenue of the movie by using relevant features from the primary dataset. So, instead of classification, it became a regression problem, so accordingly I used the regression functions for SVM as well Lasso. The models have still not been trained and I am still working on it to correct a small bug in the preprocessing phase of the dataset. I hope to compelete it before it is evaluated.

Note: I really enjoyed working on the assignment, it seemed simple enough given my experience in it, but working with such an unclean data became a challenging task. It was a great learning experience for me and I would really like to work on such projects if I get an oppurtunity to work in this internship.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dataset		Dataset
Results		Results
LICENSE		LICENSE
README.md		README.md
compile_dataset.py		compile_dataset.py
predict.py		predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset

Dataset

Results

Results

LICENSE

LICENSE

README.md

README.md

compile_dataset.py

compile_dataset.py

predict.py

predict.py

Repository files navigation

About

Releases

Packages

Languages

License

Erick2206/scikit-test

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages