This is an analysis of Drake’s Certified Lover Boy album using natural language processing with Speak Ai.
I am Tyler Bryden, a lover of music, hip-hop and data science, with a palpable excitement to have the opportunity to combine them together. I’ve put some ways to connect with me at the bottom of the content.
This is a work in process. I’ll be adding new charts and commentary over the next few days as we generate and share the results from our analysis.
To do this analysis, I used one of my favourite sites, Genius.com to source the official lyrics (thank you so much). I sourced the entire tracklist which is:
- Champagne Poetry
- Papi’s Home
- Girls Want Girls (featuring Lil Baby)
- In the Bible (featuring Lil Durk and Giveon)
- Love All (featuring Jay-Z)
- Fair Trade (featuring Travis Scott)
- Way 2 Sexy (featuring Future and Young Thug)
- N 2 Deep (featuring Future)
- Pipe Down
- Yebba’s Heartbreak (with Yebba)
- No Friends in the Industry
- Knife Talk (featuring 21 Savage and Project Pat)
- 7AM on Bridle Path
- Race My Mind
- Fountains (featuring Tems)
- Get Along Better (featuring Ty Dolla Sign)
- You Only Live Twice (featuring Lil Wayne and Rick Ross)
- IMY2 (featuring Kid Cudi)
- Fucking Fans
- The Remorse
To start, I used Google Sheets to help me clean the data.
I then stripped out the labels of “Verse”, “Chorus”, “Intro”, “Bridge”, and “Outro” to clean the data. I also removed the name of the person who was rapping or singing that verse.
I then adjusted the capital letters throughout the lines to remove the weight from entities that could skew the analysis. Named-Entity Recognition is finicky sometimes and I wanted to give Speak the best shot at accurate analysis and interesting results.
With that taken care of, I brought the text into Speak for analysis. You can see a screenshot of all the tracks here:
I added “Drake”, “CLB”, the featured artists (you can see Rick Ross, Future and more) and the track number as tags on the site to help me with organization and to filter analysis.
If you click on one of the tracks it would open up the individual text analysis panel that you can see below:
Speak will automatically begin doing work. Between the machine support, research experience, and love of hip-hop and music, some themes and patterns begin to emerge.
There were some things that emerged for me pretty quickly. Speak’s default categories that automatically extract and categorize language helped a lot. People, locations, dates, events, and brands really stuck out and gave me some ideas.
For example, I should be able to show an image with the heads of all the people that were mentioned throughout the album.
I saw some more specific examples: based on the cars that were mentioned, which brand or version was the most popular?
What cities and countries were mentioned most?
Other ones that immediately popped out were profanity, references of other works of art, the amount of “woahs” and “ayys”, weapons, fashion brands, and sexual innuendo.
This is the first visualization of geopolitical regions (“locations”) that Speak pulled out.
There is still some data cleaning to do to produce the final visuals, which will be an accurate word cloud and a nicer designed bar chart.
Sentiment analysis is a common and popular practice when analyzing text and language. As the amount of text grows and the information becomes abstract, sentiment is a great way to help you organize the information and quickly navigate to the most positive and negative moments. These are often points of interest that then can be further analyzed.
For this visualization, I did the sentiment score of each track based on the order of songs in the album. That painted an interesting picture of the most positive and negative songs on the album.
Track Word Count
Still figuring out exactly the value of this one. It’d be interesting to combine this with track duration to see the words per minute. Additionally, there would be some valuable insights if you tied this data together with the number of plays on Spotify or reads on Genius.
While I wait to export some of this data from Speak in raw format for further analysis and visualization, I will share some ideas for analysis.
Words per minute. Fashion brands. Sentiment over the entire album.
In order to do this, I need to add a few columns into the CSV I will use to analyze the data. This includes track number and song length.
This visualization shows the people mentioned on the album. Unsurprisingly, “Drake” and “Drizzy” show up most. However, there are some interesting and hilarious references. Shawn is Jay-Z. Love a Tiger Woods line. Benedict is Benedict Cumberbatch I believe. Richard Prior also made an appearance.
Load up the affiliate links. Wish I could link the brands to their site and if you make a purchase I make money. For now, I’ll just show you a bar chart. “YM” is Young Money. “Roc” is Rocafella. I like that Sono Bella, a laser liposuction and body contouring company got a shoutout 😂
What I Wish I Could Do
Right now with Speak, we are doing entity extraction but on text, we are not doing topic modelling and automatically detecting to reveal themes and topics. I’ve grouped some together manually but the system will eventually aid in the process.
If I had more time and resources, I would have liked to upload the actual tracks to Speak. Speak has a powerful speech-to-text engine. You would then edit the transcript to be accurate and have the exact timing, duration and speaker/rappers labelled properly. With that, you could create a searchable database to find any moment across the album, click it, and play it back instantly. Additionally, you could easily filter lyrics and analysis by the artist for a breakdown between artists featured on Certified Lover Boy.
I’d love to do more audio analysis and not just language analysis. With some python and time, you could do some incredible analysis of the sound quality of the music and the vocals to unlock even more insights and patterns.
How do you handle choruses? They are repeated several times in the song (often three times) and have the same lyrics. If, for example, locations are mentioned in the chorus and repeated several times, are you skewing the data?
Other Thoughts & Notes
Context is so important. Music, and especially hip-hop, contains many references to other hip-hop pieces and cultures. The same words can often have different meanings depending on the context and use.
One of the ones that shined through in the Certified Lover Boy analysis was a “Richard Prior” line on “No Friends in the Industry”. He used “prior” as part of the wordplay with the line being: “I had a Richard prior to these niggas, that’s the joke.”
Do I include that as a person who is mentioned on the album? I did.
Data is contextual. Without comparing Certified Lover Boy to other Drake albums, or even other artist’s albums, a lot of this information becomes standalone.
Data gets interesting once you start to compare over time. For instance, what locations has Drake talked about most over the last ten years of albums? I have a feel Houstatlantavegas may skew some data 😂
How has sentiment changed? What topic or thematic changes have emerged?
When I first got interested in doing this, I shared a story on social media asking people what they would like to know about Certified Lover Boy.
I got a few responses quite quickly from some hip-hop-loving friends interestingly with marketing and data experience.
One requested how many times 42 Casamigos, which is an alcohol that was mentioned several times on the album. He said he counted 4 times without AI. I sought to verify that for him.
He was right. 42 Casamigos was uniquely mentioned 4 times throughout the album. 42 Casamigos was repeated several times on “Love All” with Jay-Z, twice on “Girls Want Girls” with Lil Baby, and once on “Papi’s Home”.
Additionally, he talked about the comparison of this versus Donda by Kanye West (which, yes, I would like to get much deeper into). One of the main things he noticed is that although they were similarly long albums, he thought the words per minute would be much higher in Certified Lover Boy.
What would you like to know?
Because this is ongoing work, I would love to know what you would be interested in learning? I hope to do this same work with other popular albums.
Thanks so much for checking this out. I’ll be back to improve this content for you.