This repository contains analyses of heavy metal artists and their lyrical content. The core data set combines artist information, including genre labels, and album reviews from The Metal-Archives (MA) and song lyrics from DarkLyrics (DL).
The analyses below provide insights on the history of heavy metal albums, and linguistic properties of metal lyrics.
For just the discussion, see the corresponding blog posts I wrote up on each topic.
Exploration of artists and album reviews
A data-driven discussion of the history and global demographics of the heavy metal music industry and its many genres. This notebook also provides statistical insights on the sentiments of MA users as expressed through online album reviews.
Neural network album review score prediction
Prediction of album review scores using a convolutional neural network and GloVe word embedding.
Brief overview of the lyrics data set.
Comparison of lexical diversity measures and what they tell us about artists and genres.
Concise visualizations of song lyrics from different genres.
Processing data for generating network graphs with Gephi.
Machine learning genre classification
This notebook presents the multi-label problem of genre classification based on lyrics. Different approaches and preprocessing steps are discussed, and various machine learning models are compared via cross-validation to demonstrate possible solutions.
Word embedding genre classification
An attempt at using GloVe word embedding and convolutional neural network, as well as LSTM, for genre classification.
For the genre classifier tool (see link at the bottom of page), a number of machine learning models were tuned and
trained to assign genre tags to text inputs of arbitrary length. As discussed in the machine learning notebook above,
these models are incorporated into pipelines that also vectorize (and oversample, when training) the data. The
relevant scripts are located in lyrics/scripts/
and are configured by the corresponding .yaml
files in
lyrics/
. The genre_classification_tuning.py
script tunes the models using cross-validation to determine
optimal hyperparameters. The genre_classification_train.py
script is used to train the model, given those optimal
hyperparameters, and genre_classification_test.py
can be used to test the pipeline for functionality before
deploying it to the genre classifier tool.
Source code for these webpages can be found in the pdqnguyen/metallyrics-web repository.
Explore the lyrics and album reviews data sets through interactive scatter plots and swarm plots.
Network graph of heavy metal bands
See how genre associations and lyrical similarity connect the disparate world of heavy metal artists.
Global and U.S. maps of heavy metal bands
Explore the world of heavy metal through choropleth maps.
Interactive genre classifier tool
Enter any text you want and see what heavy metal genres it fits in best.