The Project

[icons size=’fa-4x’ custom_size=” icon=’fa-bolt’ type=’circle’ position=’left’ border=’yes’ border_color=” icon_color=” background_color=” margin=” icon_animation=” icon_animation_delay=” link=” target=’_self’]

Does a pop star’s lexicon wax or wane with fame?

What happens when you juxtapose the lyrics of Taylor’s self-titled debut album from 2006 with those from her album “1989”, the chart topping, million-copies-in-a-week latest album?

This is an extremely (valiant attempt at an) academic exploration of Taylor Swift’s first and latest albums.

A Quick Overview: The lyrics were pulled from the AZ Lyrics. The raw text files were cleaned using the free text editor TextWrangler for Mac. All punctuation, extra spacing, and special characters were removed. As a basic entry point to NLP, I have employed Voyant-Tools.org, the web-based reading and analysis environment for digital texts, to give some numeric values to pieces of the text.

This project is the result of being assigned the task of “playing with a dataset” for a Digital Humanities Praxis 2014 at the CUNY Graduate Center in New York City.

[icons size=’fa-4x’ custom_size=” icon=’fa-rocket’ type=’circle’ position=’left’ border=’yes’ border_color=” icon_color=” background_color=” margin=” icon_animation=” icon_animation_delay=” link=” target=’_self’]

The idea.

Like all things, this project began by approaching the ever omniscient Google search bar with the amalgamation of words rambling around in my head: datavis, data visualization, wtf is data visualization, how do I do cool data visualizations, what makes cool data visualizations, humanities data, humanities data tools, humanities datasets, what is a dataset, how to use nltk to explore data, nltk tutorial, nltk how to, nltk documentation, nltk corpora, nltk building your own corpora, building your own corpora, corpora, voyant, voyant tools.

Overwhelmed with options, I decided to explore NLP. With that, it was a matter of deciding what kind of text to dive into. Everything is a dataset echoed in my mind.

Well, if everything is a dataset, then surely Taylor Swift lyrics could be a dataset. It was decided; I would compare the lyrics and meta-data rich debut and latest albums of a pop star who seems to change with every release. Taylor Swift of 2006 and Taylor Swift of 2014 are different artists. I would use data to show how fame influences the art.

Questions like: What immediate changes are noticed in the size of this artist’s lexicon? Do different collaborators improve or degrade Swift’s music? Can we quantify “good” music based on words? What assumptions must be made to understand the data? What tools can we use to observe the music from an angle other than simply listening to the audio? Does a pop star’s lexicon wax or wane with fame?

[icons size=’fa-4x’ custom_size=” icon=’fa-code’ type=’circle’ position=’left’ border=’yes’ border_color=” icon_color=” background_color=” margin=” icon_animation=” icon_animation_delay=” link=” target=’_self’]

Mining Tay Sway lyrics.

I first experimented with copying and pasting lyrics from www.azlyrics.com into a text file. This was easy enough, but seemed tedious and unnecessary. I then read about the musixmatch API. I could try and pull some more comprehensive data faster. After a bit of tinkering, and spotty documentation I dug myself deep into a hole of confusion. Ultimately decided I was spending way to much time trying to work through their documentation and needed to simplify the process in order to ensure some sanity. Focusing on gathering data to visualize and answers to the above questions took priority over the actual data mining. I went back to copy pasting.

I then built a simple excel spreadsheet with the following variables:

Song Title
Album
Writer
Lenth in Minutes:Seconds
Length in Seconds
Word Count
Unique Word Count

The lyrics for standard and deluxe edition songs on both albums were pulled. I left out voice memos as these would skew the total number of words as well as unique words.

[icons size=’fa-4x’ custom_size=” icon=’fa-flask’ type=’circle’ position=’left’ border=’yes’ border_color=” icon_color=” background_color=” margin=” icon_animation=” icon_animation_delay=” link=” target=’_self’]

Data + Tools

Next, the raw text files were cleaned using the free text editor TextWrangler for Mac. All punctuation, extra spacing, and special characters were removed. This was as easy as using the “Replace All” command to find and replace each unwanted character. A side effect of this was that the word you’re became youre. This would prove to be an interesting accident when pushing the text throw the software. As a basic entry point to NLP, I employed Voyant-Tools.org, the web-based reading and analysis environment for digital texts. This gave some numeric values to the text.

The data was extracted and placed in the excel spreadsheet mentioned above. Once completed, I began to develop new questions about the Tay Sway corpus that I had not previously thought of.

To continue this project further, I am considering working through all 6 albums released since 2006 to see how the data evolves over time.

Tay Sway by the Numbers

An Extremely Academic Exploration of The Music of Taylor Swift

Does a pop star’s lexicon wax or wane with fame?

The idea.

Mining Tay Sway lyrics.

Data + Tools

Leave a Reply

Need help with the Commons?