Skip to content

Predict if a song will hit the Billboard Year-End Hot 100 singles

License

Notifications You must be signed in to change notification settings

ribbas/Heat-Replay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Heat Replay

Slides for the project are available on Google Slides or in PDF here!

A data science project that will attempt to determine if the lyrical content of a song can predict if it will hit the Billboard Year-End Hot 100 singles. The project will intersect several datasets to create a final dataframe that will consist of songs that charted and those that did not chart, with each comprising almost 50% of the set, along with the bag of words version of their lyrics and the analyses on them, such as sentiment analysis, frequency of obscene words, frequency of words pertaining to certain themes, total number of unique words, etc. and the year they charted. The dataframe will also include the last column 'charted', a binary variable that corresponds to the chart status of the song.

Structure of features

  1. Track information

  2. Year (int)

  3. Decade (int)

  4. Lyrical content

  5. Unique Words, w/o stopwords (int)

  6. Density, w/o stopwords (int)

  7. Unique Words, w/ stopwords (int)

  8. Density, w/ stopwords (int)

  9. Nouns (int)

  10. Verbs (int)

  11. Adjectives (int)

  12. Syllables (int)

  13. Most used term (string)

  14. Most used frequency (int)

  15. Curses (binary)

  16. Total curses (int)

  17. Reading score (float)

  18. Sentiment (float)

  19. Chart

  20. Charted (binary)

Structure of repository

src
├── data; the datasets for the project
├── code; scripts to build the datasets
└── assets; static files and docs

23 directories, 60 files

About

Predict if a song will hit the Billboard Year-End Hot 100 singles

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published